Documentation Index
Fetch the complete documentation index at: https://mintlify.com/screenpipe/screenpipe/llms.txt
Use this file to discover all available pages before exploring further.
Screen Capture
screenpipe captures your screen using an intelligent event-driven system that triggers on actual user activity instead of continuous polling. This approach reduces CPU usage by 3-5x while ensuring you never miss important context.How It Works
Event-Driven Architecture
Instead of continuously capturing at a fixed FPS, screenpipe monitors real user events and triggers captures intelligently:Event Detection
The system listens for meaningful user interactions:
- App switches (300ms debounce)
- Window focus changes (300ms settle)
- Mouse clicks (200ms)
- Typing pauses (500ms after last keystroke)
- Scroll stops (400ms after last scroll)
- Clipboard copy (200ms)
- Idle fallback (every 5s for passive changes)
Screenshot Capture
When an event triggers, screenpipe captures the affected monitor using platform-native APIs:
- macOS: ScreenCaptureKit (~5ms)
- Windows: DXGI/GDI
- Linux: XCB/Wayland
Text Extraction
Accessibility tree extraction runs first (~10-200ms), with OCR as fallback if needed.
Hard Constraints: Minimum 200ms interval between captures per monitor prevents event storms. Maximum 10s gap ensures passive changes (notifications, incoming messages) are captured.
Capture Triggers
Each event type is optimized for different interaction patterns:Accessibility Tree Extraction
Before running OCR, screenpipe extracts structured text from the accessibility tree — the same API used by screen readers.Why Accessibility First?
- Performance
- Accuracy
- Context
- 10-50ms on macOS vs 100-500ms for OCR
- 200-350ms timeout prevents blocking on massive trees
- Native system APIs (AX API on macOS, UI Automation on Windows)
OCR Fallback
When accessibility extraction returns empty (image-heavy apps, PDFs, videos), OCR activates automatically:Automatic fallback for:
- Design tools (Figma, Photoshop, Sketch)
- PDF viewers rendering as canvas
- Video players with text overlays
- Games and other non-standard apps
- Apps with broken accessibility support
Snapshot Storage
Each capture writes a JPEG directly to disk — no video encoding, no FFmpeg overhead.File Layout
Storage Math (8 hours active use, 1080p, JPEG quality 80):
- Event-driven: ~3,840 frames → ~300 MB total
- Old continuous (0.5-1 FPS): 14,400-28,800 frames → 1.1-2.3 GB
Database Schema
- Snapshots served in less than 5ms (no FFmpeg extraction)
- Direct URL serving (
/frames/{id}.jpg) - Search results always show correct thumbnails
Multi-Monitor Support
Events are monitor-specific where possible:- Focused Captures
- Idle Captures
- Click/scroll: capture the monitor where cursor is
- App switch: capture monitor with newly focused window
- Typing pause: capture monitor with focused window
Performance
CPU Usage
- Idle (Static Screen)
- Active Use (Browsing)
- Event-driven: less than 0.5% CPU
- Continuous polling: 3-5% CPU
Frame Serve Latency
Settings
Reference
Source files:- Event-driven spec:
docs/EVENT_DRIVEN_CAPTURE_SPEC.md - Core capture:
crates/screenpipe-screen/src/core.rs - Accessibility (macOS):
crates/screenpipe-screen/src/apple.rs - Accessibility (Windows):
crates/screenpipe-screen/src/microsoft.rs - Snapshot writer:
crates/screenpipe-screen/src/snapshot_writer.rs