screenpipe continuously records and transcribes audio from all your devices — microphones, speakers, and system audio — with automatic speaker identification and smart deduplication.
params.set_no_speech_thold(0.6); // Suppress when no speech detectedparams.set_suppress_blank(true); // No blank/silence tokens at startparams.set_suppress_nst(true); // No music notes, special charsparams.set_entropy_thold(2.4); // Drop repetitive/looping outputparams.set_logprob_thold(-2.0); // Drop low-confidence segments
Whisper’s initial_prompt parameter biases the model toward these words without forcing them. It’s not a find-and-replace — it improves recognition accuracy.
Split transcription into voice activity segments (pauses indicate speaker changes).
2
Embedding Extraction
Compute voice embeddings (numerical fingerprints) for each segment.
3
Speaker Clustering
Group similar embeddings into speaker clusters with cosine similarity.
4
Speaker Assignment
Assign each segment to a speaker ID (Speaker 0, Speaker 1, etc.).
pub struct EmbeddingExtractor { session: Session,}impl EmbeddingExtractor { pub fn compute(&mut self, samples: &[f32]) -> Result<impl Iterator<Item = f32>> { // Compute fbank features (mel-frequency filterbank) let features: Array2<f32> = knf_rs::compute_fbank(samples)?; let features = features.insert_axis(ndarray::Axis(0)); // Run ONNX model let inputs = ort::inputs! ["feats" => features.view()]?; let ort_outs = self.session.run(inputs)?; // Extract embedding vector let embeddings = ort_outs.get("embs") ?.try_extract_tensor::<f32>()?; Ok(embeddings.iter().copied()) }}
Speaker IDs are consistent within a session but may change across restarts. For persistent speaker names, use the speaker management API to label speakers.
screenpipe records from all devices simultaneously:
Input Devices
Output Devices
Built-in microphone
External USB microphones
Bluetooth headsets
Virtual audio inputs (Loopback, BlackHole)
System speakers
Headphones
Virtual audio outputs (for capturing app audio)
Output device recording captures what your computer is playing (YouTube, Zoom calls, music). On macOS, this requires a virtual audio device like BlackHole.
screenpipe automatically detects when devices are added or removed:
// Device monitor runs every 5slet device_changes = detect_audio_device_changes().await;if !device_changes.is_empty() { // Restart streams for new/removed devices reconcile_audio_streams(device_changes).await?;}
Stream hijacking: If another app takes over a microphone (e.g., Wispr Flow), screenpipe detects the timeout and automatically reconnects when the device becomes available again.