Image Analysis (Vision)¶
Show the AI what you see—analyze screenshots, debug errors visually, review designs, and extract information from images.
Introduction¶
Ever wanted to just show the AI what's on your screen instead of describing it? With Consoul's image analysis, you can attach screenshots, mockups, diagrams, or any image and ask questions about them.
Quick example:
The AI sees the screenshot, reads the error message, examines the stack trace, and helps you debug—all from the visual context.
Related Tools:
- Code Search - Find code referenced in screenshots
- File Editing - Fix issues found in images
Overview¶
The image analysis feature allows you to:
- 📸 Analyze screenshots - Debug errors, understand UI states
- 🎨 Review designs - Get feedback on mockups and interfaces
- 📊 Interpret diagrams - Understand flowcharts, architecture diagrams
- 🔍 Compare visuals - Side-by-side analysis of multiple images
- 💻 Code from screenshots - Extract code from terminal or IDE screenshots
Quick Start¶
1. Enable Image Analysis¶
Image analysis is enabled by default. You can customize settings in ~/.config/consoul/config.yaml:
tools:
image_analysis:
enabled: true
auto_detect_in_messages: true # Detect image paths in messages
max_images_per_query: 5
max_image_size_mb: 5.0
allowed_extensions: [".png", ".jpg", ".jpeg", ".gif", ".webp"]
2. Use a Vision-Capable Model¶
Configure a model that supports vision in your profile:
Vision-Capable Models:
| Provider | Recommended Models |
|---|---|
| Anthropic | claude-3-5-sonnet-20241022, claude-3-opus-20240229, claude-3-haiku-20240307 |
| OpenAI | gpt-4o, gpt-4o-mini |
gemini-2.0-flash, gemini-1.5-pro |
|
| Ollama | llava:latest, bakllava:latest |
3. Analyze Images in the TUI¶
There are two ways to include images in your messages:
Method 1: Attach files using the 📎 button
- Click the 📎 attachment button in the input area
- Select one or more image files
- Type your question
- Press Enter
Method 2: Reference image paths in your message
Simply type the image path in your message:
Consoul will automatically detect the image path and include it in your message.
Usage Examples¶
Debugging Terminal Errors¶
> What's wrong in this error? terminal_error.png
[The AI will analyze the screenshot and explain the error]
UI/UX Review¶
> Is this interface accessible? Review ui_mockup.png and suggest improvements.
[The AI analyzes the design for accessibility issues]
Comparing Designs¶
> Compare design_v1.png and design_v2.png. Which is better for mobile users?
[The AI compares both designs and provides recommendations]
Code Review from Screenshot¶
> What does this function do? code_screenshot.png
[The AI reads the code from the screenshot and explains it]
Diagram Analysis¶
> Explain this architecture diagram: system_architecture.png
[The AI interprets the diagram and explains the system design]
Supported Image Formats¶
| Format | Extension | Notes |
|---|---|---|
| PNG | .png |
Best for screenshots, diagrams |
| JPEG | .jpg, .jpeg |
Good for photos, compressed images |
| GIF | .gif |
Supported (static only) |
| WebP | .webp |
Modern format, smaller file sizes |
File Size Limits:
- Default maximum: 5 MB per image
- Configurable via
max_image_size_mb - Total limit: 5 images per query (configurable via
max_images_per_query)
Security & Privacy¶
What Data is Sent?¶
When you analyze an image: 1. The image file is read from your local filesystem 2. Encoded as base64 3. Sent to your configured AI provider's API 4. Processed by the vision model 5. Response returned to you
Important: Images are sent to external AI provider APIs (Anthropic, OpenAI, Google, etc.).
Security Features¶
Path Blocking: Sensitive directories are blocked by default:
tools:
image_analysis:
blocked_paths:
- "~/.ssh"
- "/etc"
- "~/.aws"
- "~/.config/consoul" # Prevent leaking API keys
File Validation:
- Extension checking (prevent non-images)
- Magic byte validation (prevent extension spoofing)
- Size limits (prevent large uploads)
- Path traversal prevention (block
../attacks)
Privacy Best Practices¶
- Review before sending - Check which files you're attaching
- Redact sensitive info - Edit screenshots to remove passwords, tokens
- Use local models - Consider Ollama with
llavafor fully local processing - Check provider policies - Review data retention policies for Claude, OpenAI, etc.
Configuration Reference¶
ImageAnalysisToolConfig¶
Full configuration options:
tools:
image_analysis:
# Enable/disable the feature
enabled: true
# Automatically detect image paths in messages (e.g., "analyze screenshot.png")
auto_detect_in_messages: true
# Maximum file size per image (MB)
max_image_size_mb: 5.0
# Maximum number of images per query
max_images_per_query: 5
# Allowed file extensions
allowed_extensions:
- ".png"
- ".jpg"
- ".jpeg"
- ".gif"
- ".webp"
# Blocked paths (security)
blocked_paths:
- "~/.ssh"
- "/etc"
- "~/.aws"
- "~/.config/consoul"
- "/System" # macOS system files
- "/Windows" # Windows system files
Provider-Specific Considerations¶
Anthropic (Claude):
- Best for detailed analysis and reasoning
- Supports up to 5 images per request
- Max image size: 5 MB (base64 encoded)
OpenAI (GPT-4o):
- Fast processing
- Good for general image understanding
- Supports multiple images
Google (Gemini):
- Strong for technical diagrams
- Supports large context windows
- Native image understanding
Ollama (LLaVA):
- Fully local, no data sent to cloud
- Requires more VRAM (8GB+)
- Slower than cloud models
Troubleshooting¶
"Model doesn't support vision"¶
Problem: Your current model doesn't have vision capabilities.
Solution: Switch to a vision-capable model:
Supported models:
- Claude: claude-3-5-sonnet-20241022, claude-3-opus-20240229
- OpenAI: gpt-4o, gpt-4o-mini
- Google: gemini-2.0-flash, gemini-1.5-pro
- Ollama: llava:latest
"File too large"¶
Problem: Image exceeds max_image_size_mb limit.
Solutions: 1. Compress the image using tools like ImageOptim, TinyPNG 2. Resize to a smaller resolution 3. Increase the limit in config (max 20 MB):
"Invalid file extension"¶
Problem: File type not in allowed_extensions.
Solution: Ensure the file has a valid image extension:
- .png, .jpg, .jpeg, .gif, .webp
If you have an unusual format, convert it:
# Convert WebP to PNG
ffmpeg -i image.webp image.png
# Convert HEIC to JPG (macOS)
sips -s format jpeg image.heic --out image.jpg
"Blocked path"¶
Problem: Image is in a security-blocked directory.
Solution:
1. Copy the image to a safe location
2. Or remove the path from blocked_paths (not recommended for sensitive dirs)
Images not detected automatically¶
Problem: Typing screenshot.png doesn't attach the image.
Solutions: 1. Enable auto-detection:
-
Use absolute or relative paths:
-
Use the 📎 attachment button instead
"Image analysis failed"¶
Problem: Generic error during analysis.
Debugging steps:
1. Check the file exists: ls -la screenshot.png
2. Verify it's a valid image: file screenshot.png
3. Check file size: du -h screenshot.png
4. Check permissions: Ensure the file is readable
5. Try a different image format
6. Check logs for detailed error:
Advanced Usage¶
Batch Image Analysis¶
Analyze multiple images in one query:
Or using the attachment button: 1. Click 📎 2. Select multiple files (Cmd/Ctrl + Click) 3. Ask your question
Mixing Images and Code¶
Attach images alongside code files:
Custom File Size Limits¶
For high-resolution diagrams:
tools:
image_analysis:
max_image_size_mb: 20.0 # Increase for large technical diagrams
max_images_per_query: 3 # Reduce count to stay under API limits
Programmatic Usage¶
Use image analysis in Python scripts:
from consoul.sdk import Consoul
# Initialize with vision-capable model
consoul = Consoul(model="claude-3-5-sonnet-20241022")
# Analyze an image
response = consoul.chat(
"What error is shown in this screenshot?",
image_paths=["terminal_error.png"]
)
print(response)
See docs/examples/image-analysis-example.py for more examples.
Best Practices¶
1. Use Descriptive Queries¶
❌ Bad: "What's this?" ✅ Good: "Analyze this error screenshot and suggest a fix"
2. Provide Context¶
❌ Bad: "Is this good?" ✅ Good: "Review this dashboard mockup for a healthcare app. Is it accessible?"
3. Use High-Quality Images¶
- Clear screenshots (avoid blurry photos of screens)
- Sufficient resolution (at least 800x600)
- Good contrast (readable text)
4. Organize by Use Case¶
Create dedicated directories:
5. Combine with Other Tools¶
Image analysis works great with file editing:
Examples by Use Case¶
Software Development¶
Debug Terminal Output:
Code Review:
Architecture Understanding:
Design & UX¶
Accessibility Audit:
Design Comparison:
Responsive Design:
Documentation¶
Diagram Documentation:
Screenshot Annotation:
Related Documentation¶
- Getting Started - Initial setup and configuration
- Configuration Guide - Detailed config options
- File Editing - Combine vision with file operations
- Tools Overview - All available tools
- SDK Tool Calling - Programmatic usage
FAQs¶
Q: Does image analysis work offline? A: Only with local models like Ollama's LLaVA. Cloud providers (Claude, GPT-4o, Gemini) require internet.
Q: Can I analyze videos? A: Not directly. Extract frames as images first using ffmpeg.
Q: Are images cached or stored? A: No. Images are read, encoded, sent to the API, then discarded. Consoul doesn't cache images.
Q: What about image generation (DALL-E, Midjourney)? A: Not currently supported. This feature is for analyzing existing images only.
Q: Can I use custom vision models? A: Yes, if they're compatible with LangChain's multimodal message format. See the SDK documentation.
Q: Is there a cost for image analysis? A: Cloud providers may charge more for vision API calls. Check pricing: - Anthropic Pricing - OpenAI Pricing - Google AI Pricing
Q: Can I disable image analysis?
A: Yes, set tools.image_analysis.enabled: false in your config.
See Also¶
Other Tools:
- Code Search - Find code referenced in images
- File Editing - Fix issues discovered visually
SDK & API:
- SDK Tools Overview - Using image analysis programmatically
- Tool Configuration - Configuring vision tools in your code
Configuration:
- Configuration Guide - Enable/disable image analysis
- Vision-Capable Models - Supported AI models
Feedback & Support¶
Having issues? Found a bug?