Screen Capture

screenpipe captures your screen using an intelligent event-driven system that triggers on actual user activity instead of continuous polling. This approach reduces CPU usage by 3-5x while ensuring you never miss important context.

How It Works

Event-Driven Architecture

Instead of continuously capturing at a fixed FPS, screenpipe monitors real user events and triggers captures intelligently:

Event Detection

The system listens for meaningful user interactions:

App switches (300ms debounce)
Window focus changes (300ms settle)
Mouse clicks (200ms)
Typing pauses (500ms after last keystroke)
Scroll stops (400ms after last scroll)
Clipboard copy (200ms)
Idle fallback (every 5s for passive changes)

Screenshot Capture

When an event triggers, screenpipe captures the affected monitor using platform-native APIs:

macOS: ScreenCaptureKit (~5ms)
Windows: DXGI/GDI
Linux: XCB/Wayland

Text Extraction

Accessibility tree extraction runs first (~10-200ms), with OCR as fallback if needed.

Storage

Each capture is saved as a JPEG file with metadata stored in the database.

Hard Constraints: Minimum 200ms interval between captures per monitor prevents event storms. Maximum 10s gap ensures passive changes (notifications, incoming messages) are captured.

Capture Triggers

Each event type is optimized for different interaction patterns:

// App switch — highest-value event
// User changed context, capture with 300ms settle time
EventTrigger::AppSwitch { monitor_id, timestamp }

// Typing pause — capture the result of typing, not every character
EventTrigger::TypingPause { monitor_id, timestamp, debounce: 500ms }

// Scroll stop — new content scrolled into view
EventTrigger::ScrollStop { monitor_id, timestamp, debounce: 400ms }

// Idle fallback — catch passive changes
EventTrigger::Idle { monitor_id, max_gap: 5s }

Accessibility Tree Extraction

Before running OCR, screenpipe extracts structured text from the accessibility tree — the same API used by screen readers.

Why Accessibility First?

Performance
Accuracy
Context

10-50ms on macOS vs 100-500ms for OCR
200-350ms timeout prevents blocking on massive trees
Native system APIs (AX API on macOS, UI Automation on Windows)

// macOS accessibility tree extraction
pub fn walk_focused_window() -> Result<AccessibilityText> {
    let app = get_frontmost_app();
    let window = get_focused_window(&app);
    
    // 200ms hard timeout for massive trees
    timeout(Duration::from_millis(200), async {
        extract_text_recursive(window)
    }).await
}

OCR Fallback

When accessibility extraction returns empty (image-heavy apps, PDFs, videos), OCR activates automatically:

Automatic fallback for:

Design tools (Figma, Photoshop, Sketch)
PDF viewers rendering as canvas
Video players with text overlays
Games and other non-standard apps
Apps with broken accessibility support

// Paired capture: accessibility → OCR fallback
pub async fn paired_capture(
    image: &DynamicImage,
    monitor_id: u32,
) -> Result<CaptureResult> {
    // Try accessibility first
    let ax_result = walk_focused_window().await;
    
    if !ax_result.text.is_empty() {
        return Ok(CaptureResult {
            text: ax_result.text,
            text_source: "accessibility",
            confidence: 1.0,
        });
    }
    
    // Fallback to OCR
    let ocr_result = process_ocr_task(image).await?;
    Ok(CaptureResult {
        text: ocr_result.text,
        text_source: "ocr",
        confidence: ocr_result.confidence,
    })
}

Snapshot Storage

Each capture writes a JPEG directly to disk — no video encoding, no FFmpeg overhead.

File Layout

~/.screenpipe/data/
  2026-03-08/
    1709900823123_m0.jpg  # timestamp_ms + monitor ID
    1709900825456_m0.jpg
    1709900827789_m1.jpg  # monitor 1 screenshot
    ...

Storage Math (8 hours active use, 1080p, JPEG quality 80):

Event-driven: ~3,840 frames → ~300 MB total
Old continuous (0.5-1 FPS): 14,400-28,800 frames → 1.1-2.3 GB

Fewer frames, slightly larger each, far less total storage.

Database Schema

ALTER TABLE frames ADD COLUMN snapshot_path TEXT;
ALTER TABLE frames ADD COLUMN accessibility_text TEXT;
ALTER TABLE frames ADD COLUMN capture_trigger TEXT;  -- 'app_switch', 'click', etc.
ALTER TABLE frames ADD COLUMN text_source TEXT;      -- 'ocr' or 'accessibility'

CREATE INDEX idx_frames_ts_device ON frames(timestamp, device_name);

Metadata lives in the database. JPEG is just pixels. This means:

Snapshots served in less than 5ms (no FFmpeg extraction)
Direct URL serving (/frames/{id}.jpg)
Search results always show correct thumbnails

Multi-Monitor Support

Events are monitor-specific where possible:

Focused Captures
Idle Captures

Click/scroll: capture the monitor where cursor is
App switch: capture monitor with newly focused window
Typing pause: capture monitor with focused window

3+ monitor setups: Event-driven capture significantly reduces overhead by only capturing the active monitor per event instead of all monitors simultaneously.

Performance

CPU Usage

Idle (Static Screen)
Active Use (Browsing)

Event-driven: less than 0.5% CPU
Continuous polling: 3-5% CPU

Event-driven system sleeps until events occur. No frame comparison loops.

Frame Serve Latency

# New snapshot frames
Direct JPEG serve: <5ms

# Old video-chunk frames (backward compat)
FFmpeg extraction: 100-500ms (3-permit semaphore)

Settings

{
  "sensitivity": "medium",  // low, medium, high
  "modes": {
    "low": {
      "debounce": "500ms",
      "idle_gap": "10s",
      "use_case": "laptop battery mode"
    },
    "medium": {
      "debounce": "200ms",
      "idle_gap": "5s",
      "use_case": "default — balanced performance"
    },
    "high": {
      "debounce": "100ms",
      "idle_gap": "3s",
      "use_case": "maximum recall"
    }
  }
}

Reference

Source files:

Event-driven spec: docs/EVENT_DRIVEN_CAPTURE_SPEC.md
Core capture: crates/screenpipe-screen/src/core.rs
Accessibility (macOS): crates/screenpipe-screen/src/apple.rs
Accessibility (Windows): crates/screenpipe-screen/src/microsoft.rs
Snapshot writer: crates/screenpipe-screen/src/snapshot_writer.rs

Get Started

Core Features

Pipes & Automation

Integrations

Advanced

Developers

Comparison

Resources

Screen Capture

Screen Capture

How It Works

Event-Driven Architecture

Capture Triggers

Accessibility Tree Extraction

Why Accessibility First?

OCR Fallback

Snapshot Storage

File Layout

Database Schema

Multi-Monitor Support

Performance

CPU Usage

Frame Serve Latency

Settings

Reference

Get Started

Core Features

Pipes & Automation

Integrations

Advanced

Developers

Comparison

Resources

Documentation Index

​Screen Capture

​How It Works

​Event-Driven Architecture

​Capture Triggers

​Accessibility Tree Extraction

​Why Accessibility First?

​OCR Fallback

​Snapshot Storage

​File Layout

​Database Schema

​Multi-Monitor Support

​Performance

​CPU Usage

​Frame Serve Latency

​Settings

​Reference

Screen Capture

How It Works

Event-Driven Architecture

Capture Triggers

Accessibility Tree Extraction

Why Accessibility First?

OCR Fallback

Snapshot Storage

File Layout

Database Schema

Multi-Monitor Support

Performance

CPU Usage

Frame Serve Latency

Settings

Reference