Ollama lets you run AI models locally on your machine. screenpipe integrates natively with Ollama — no API keys, no cloud, completely private.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/screenpipe/screenpipe/llms.txt
Use this file to discover all available pages before exploring further.
why use Ollama with screenpipe?
- 100% local - all AI processing happens on your machine
- no API costs - free to use, no subscription required
- privacy - your screen data never leaves your computer
- offline - works without internet connection
- choice of models - pick from dozens of open-source models
- no rate limits - use as much as you want
setup
Install Ollama
download and install from ollama.comsupported platforms:
- macOS (Apple Silicon & Intel)
- Linux
- Windows (via WSL)
Select Ollama in screenpipe
- open the screenpipe app
- click the AI preset selector (top of chat/timeline)
- click Ollama
- pick your model from the dropdown
recommended models
choose a model based on your hardware and needs:fast & lightweight
| model | size | RAM needed | best for |
|---|---|---|---|
ministral-3 | ~2 GB | 8 GB | fast, general use, great starting point |
gemma3:4b | ~3 GB | 8 GB | strong quality for size, good for summaries |
qwen3:4b | ~3 GB | 8 GB | multilingual, good reasoning |
phi4 | ~3 GB | 8 GB | fast, great for code |
balanced
| model | size | RAM needed | best for |
|---|---|---|---|
llama3.3:8b | ~5 GB | 16 GB | strong all-around performance |
deepseek-r1:8b | ~5 GB | 16 GB | excellent reasoning |
mistral:7b | ~4 GB | 12 GB | good quality, widely used |
high quality
| model | size | RAM needed | best for |
|---|---|---|---|
llama3.3:70b | ~40 GB | 64 GB+ | best quality, needs high-end hardware |
deepseek-r1:70b | ~40 GB | 64 GB+ | best reasoning, needs high-end hardware |
qwen2.5:32b | ~20 GB | 32 GB+ | excellent quality, still usable on consumer hardware |
specialized
| model | size | RAM needed | best for |
|---|---|---|---|
codellama:13b | ~7 GB | 16 GB | code generation and review |
llava | ~5 GB | 16 GB | vision + language (can analyze screenshots) |
mistral-openorca | ~4 GB | 12 GB | instruction following |
pulling models
download any model from ollama.com/library:model quantization explained
quantization reduces model size and speeds up inference:q4_0- smallest, fastest, lower qualityq5_0- balancedq8_0- larger, slower, better quality- default (no suffix) - recommended balanced version
using Ollama with screenpipe
in chat
once configured, use Ollama in screenpipe’s AI chat:in pipes
pipes (automations) can also use Ollama:- go to pipes in screenpipe sidebar
- select a pipe (e.g., “day recap”, “time tracking”)
- in pipe settings, select your Ollama preset
- enable the pipe
performance tips
choose the right model for your hardware:- use 4-bit quantized models (
q4_0) - close other GPU-heavy applications
- use smaller context windows (less screen history)
- use 8-bit or full precision models
- use larger models (8b, 13b, or more)
- give more context in queries
- Ollama automatically uses GPU if available
- NVIDIA GPUs: works out of the box
- AMD GPUs: supported on Linux
- Apple Silicon: uses Metal acceleration
troubleshooting
“ollama not detected”- ensure Ollama is running:
ollama serve - check it’s responding:
curl http://localhost:11434/api/tags - verify Ollama is installed:
ollama --version
- pull it first:
ollama pull ministral-3 - refresh screenpipe’s model list
- you can also type the model name manually
- try a smaller model (
ministral-3,phi4) - reduce context window (query shorter time ranges)
- close other GPU-heavy apps
- ensure you have enough free RAM (model size + ~2 GB overhead)
- use a smaller model
- use a quantized version (
q4_0) - close other applications
- check available RAM: model size + 2 GB minimum
- increase max tokens in Ollama settings
- some models have built-in limits
- try a different model
- check RAM usage (likely out of memory)
- try a smaller model
- restart Ollama:
pkill ollama && ollama serve
- verify port 11434 is not blocked
- check Ollama is listening:
lsof -i :11434 - try restarting Ollama
comparing models
want to test which model works best for you?- pull multiple models:
ollama pull ministral-3 gemma3:4b llama3.3:8b - try the same query with each model
- compare speed, quality, and RAM usage
- stick with the one that fits your needs
requirements
- Ollama installed and running
- at least one model pulled
- screenpipe running
- sufficient RAM (8 GB minimum, 16 GB+ recommended)
privacy & security
- 100% local - models run on your machine
- no telemetry - Ollama doesn’t send data anywhere
- no accounts - no sign-up required
- offline - works without internet after downloading models
- open source - Ollama and models are open source