poltergeist/docs/privacy & security

privacy & security

poltergeist is a local-first product. nothing leaves your machine unless you flip an explicit switch — and even then, only in shapes you've consented to. this page lays out exactly what stays, what moves, and how to verify each claim for yourself.

what stays local

every message, document, comment, meeting, and pr poltergeist ever indexes.
the extracted entities, the vector index, the lexical index.
every query you've ever run and every answer poltergeist has produced.
the model weights themselves — the extraction and ranking model both run on-device.

you can verify this with a network monitor. with default settings, poltergeist makes only outbound requests directly to the connector providers (gmail api, slack api, etc.) using your own tokens. there is no "phone home" endpoint.

tip

on macos, the simplest verification is little snitch or the built-in tcpdump -i any host -not gmail.com -and host -not slack.com. on linux, ss -tnp and the audit subsystem. you should see traffic only to the api hosts of the connectors you've authorised.

where credentials live

every oauth token or api key sits in your operating system's secure credential store:

os	store
macos	keychain (login keychain)
windows	windows credential manager
linux	gnome-keyring or kwallet (whichever is active); falls back to libsecret

poltergeist never writes tokens to a plain file. when you disconnect a connector, the token is removed from the keychain immediately — not at the next sync.

telemetry

poltergeist ships two kinds of telemetry by default, both off:

anonymous crash reports — disabled. enable in settings → privacy to send stack traces (no user data) when the indexer panics.
usage stats — disabled. enable to send daily counts of {queries run, items indexed, model latency p50/p99}. no content, no metadata about what was queried.

both are opt-in, both off out of the box, and both can be turned off again without losing functionality. the full schema of what each report contains is in docs/telemetry.md.

does the model train on my data?

no. the extraction and ranking models are pre-trained, shipped as static weights, and run inference-only inside the indexer. there is no online learning, no gradient updates, no batched-then-uploaded fine-tuning. you can strace the indexer and confirm there is no write activity on the model files.

the model also can't see your data over the wire. it's loaded from disk on startup and runs in the same process; there is no rpc, no server, no inference api.

optional end-to-end encrypted sync

if you want your vault on two machines, poltergeist ships an optional sync service. it is:

opt-in — off by default; configure in settings → sync.
end-to-end encrypted — keys are derived from a passphrase you pick, then split with shamir's secret sharing. we get the encrypted blobs; we cannot read them.
self-hostable — the sync server is a small go binary; run it on your own box if you'd rather. spec is documented in docs/sync-protocol.md.

warning

if you lose your sync passphrase, we cannot help you recover your data. that's the trade-off for not having a backdoor. write the passphrase down somewhere offline.

permissions per connector

each connector requests the narrowest scope its provider exposes:

connector	scopes
gmail	`gmail.readonly`, `gmail.metadata`
slack	`channels:history`, `groups:history`, `im:history`, `mpim:history` + matching `:read`
notion	read on the pages you share with the integration
linear	personal api key, read scopes only
github	`repo:read`, `issues:read`, `pull-requests:read`
calendar	`calendar.readonly`
drive	`drive.metadata.readonly` (default), `drive.readonly` if you opt in

no connector ever writes back to the source. poltergeist is read-only by design — we never want to be in a position to send a slack message on your behalf.

threat model

what poltergeist protects against, and what it doesn't:

protects against

a third party reading your indexed content — there is none to read.
a third party reading your tokens — they live in your os keychain.
silent training on your data — no training is possible.
vendor lock-in — your vault is plain markdown, on your disk.

does not protect against

an attacker with root on your machine — they can read your keychain and your vault.
malicious obsidian plugins — if you open the vault in obsidian, plugins installed there share the vault and the obsidian process. install obsidian plugins from people you trust.
cloud sync of the underlying apps — if your gmail is in a breach, poltergeist can't pull it back.

responsible disclosure

found something? we'd rather hear about it than have it be public:

email [email protected] with details and a reproduction.
or open a github security advisory at github.com/nikrich/poltergeist/security/advisories/new.
we'll respond within 72 hours, fix critical issues within 14 days, and credit you in the release notes unless you'd rather we didn't.

no bug bounty program yet, but if your report saves us from shipping a real issue, we'll send you something nicer than a t-shirt.

← previous

self-host

back to overview