# BaseVault > Local-first personal data pipeline. Heterogeneous personal files in; > structured facts, resolved entities, mechanistic patterns, cross-domain > insights, and actionable recommendations out. Hosted inference (when > opted into) runs in hardware-attested TEE enclaves with end-to-end > verifiable trust chain. BaseVault is a desktop app for deriving structured insight from a personal corpus — journals, notes, exports — without sending that data through a hosted service. There is no BaseVault backend: no accounts, no telemetry, no data-processor relationship. Prompts go from the user's laptop directly to inference (local Ollama, or the user's own API key for hardware-attested hosted inference in a TEE enclave). ## What makes BaseVault distinct Three properties no other personal-data tool offers together: 1. **Typed extraction**, not chat-on-top-of-embeddings. Output is a typed DAG (fact → pattern → insight → action), not a retrieval index queried via natural language. 2. **Byte-offset provenance** enforced in the data model. Every claim traces back to the exact byte range in the source file. There is no orphan synthesis — if a claim doesn't chain back to evidence, the pipeline doesn't emit it. 3. **Hardware-attested inference** for hosted mode. The cryptographic chain is verified before any byte leaves the user's machine: silicon root → enclave quote → published deployment manifest → SLSA provenance via Sigstore → measured Linux toolchain. Each verification is logged and independently re-verifiable via GitHub CLI. Comparable tools fall into three buckets, none of which combine these three properties: (a) PKM with AI plugins (Obsidian, Logseq, Tana — file-based notes with bolted-on chat); (b) private RAG / document chat (AnythingLLM, Open WebUI, Danswer — embeddings + retrieval, no typed structure or byte-level refs, hosted modes are trust-the-vendor); (c) enterprise search (Glean, Hebbia — different problem, cloud-first). BaseVault's bet is that structure beats chat for personal data, and that cryptographic trust beats contractual trust for hosted compute. ## Status Currently in private testing. **Open source on launch** — every line of code that touches user data will be public, auditable, and reproducible. Bring-your-own API key for hosted inference (no markup). Additional TEE cloud providers will be supported as each clears a strict audit process. ## Why this exists Local inference is the cleanest privacy posture but bottlenecked by laptop hardware. Most personal computers can't run a frontier model at all, and even when they can, throughput is one worker, sequentially. Hosted inference solves throughput and capability but traditionally requires trusting the operator with your data. TEE inference is the bridge: model runs inside a hardware-isolated enclave (AMD-SEV-SNP or Intel TDX) where the chip itself enforces that compute stays sealed from the cloud operator. BaseVault verifies the cryptographic attestation before sending data — the privacy guarantee is rooted in silicon, not reputation. ## Core principles - Local-first by default. Inference runs in Ollama on the user's machine; nothing leaves the device unless the user opts into a hosted mode. - Hosted inference is hardware-attested. The TEE mode runs on Tinfoil-hosted AMD-SEV-SNP / Intel TDX enclaves with cryptographic attestation verified before every call. The user can audit the full trust chain — silicon root, deployment manifest, SLSA provenance via Sigstore, Linux toolchain — and re-verify independently with the GitHub CLI. - Source-ref propagation. Every claim traces back to a byte offset in the original input file. Patterns trace to source facts; facts trace to evidence spans; insights trace to source patterns; actions trace to source insights. - Reproducibility. The prompt-hash cache makes the same input + model produce the same output bit-for-bit. Golden-hash regression tests fail at CI time if anyone slips non-determinism in. - No telemetry. The app does not send analytics, telemetry, or any signal about the user to any server. ## Trust chain 1. Enclave hardware produces a cryptographic quote — a signed measurement of the loaded code, rooted in AMD or Intel's silicon CA. 2. The quote is matched against the per-model deployment manifest published on the open-source `tinfoilsh/confidential-` repository. 3. The deployment's SLSA provenance is verified via Sigstore (Fulcio + Rekor's append-only transparency log). 4. Linux toolchain hashes (kernel, initrd, root filesystem) are measured into the same quote. Verification evidence is logged to `~/.basevault/attestations.jsonl`. The app renders an exact `gh attestation verify` command so users can independently re-verify against GitHub's CLI. ## Trust calibration The cryptographic chain provides auditability, not a unilateral guarantee of safety. Every measurement chains back to a specific public commit on the upstream TEE provider's repo, recorded permanently in Sigstore's append-only transparency log and signed at build time. The release history is publicly visible — anyone can fetch the source at any past or current commit, diff it against prior releases, and form their own judgment. What still rests on trust is that the TEE provider's incentives align with shipping only code they're willing to publicly defend. This is a narrower and more verifiable ask than the standard hosted-inference question of "trust us not to read your data" — and shrinks further as the open-source ecosystem around binary transparency and reproducible builds matures. ## Client verification The trust chain above proves what's running in the enclave, where the user has no local access. The BaseVault `.dmg` on the user's laptop is a different problem — they have the bytes, so verification is direct rather than chain-based. Once the source is public, verification is: read the GitHub Actions workflow that builds the release, confirm the run exists on a specific commit, hash the downloaded `.dmg` and match it to the SHA published in the GitHub Release. That's the whole chain. No transparency log is needed for an artifact the user can hash directly. For users who want to skip trusting GitHub's runner entirely, the build is reproducible: rebuild from source on their own machine and verify the bytes match. This is why "open source on launch" is the load-bearing commitment for client-side trust. Until then, the binary is signed via Apple Developer ID + notarization — the same posture as any signed-and-notarized macOS app, no more. ## Key resources - [Landing page](https://basevault.ai/) - [Why TEE — performance × privacy](https://basevault.ai/#why) - [Trust model — full chain explanation](https://basevault.ai/#trust) - [Philosophy](https://basevault.ai/#philosophy) - [macOS download](https://basevault.ai/#download) - [Attestation watermark endpoint](https://basevault.ai/api/attestations/watermark) - [security.txt](https://basevault.ai/.well-known/security.txt) ## Implementation - Desktop shell: Tauri 2.x (Rust + React) - Pipeline: Python 3.12 sidecar, bundled with the app, hash-pinned dependencies via `--require-hashes` - Local inference: Ollama - Hosted inference: Tinfoil (TEE). Additional providers will be added as each clears a strict audit (hardware attestation API surface, per-model deployment manifest discipline, public build provenance, SDK-level enforcement that refuses inference unless the chain verifies). - Attestation: Sigstore (Fulcio + Rekor) for SLSA provenance, AMD-SEV-SNP / Intel TDX for hardware enclave quotes, NVIDIA NVTrust for GPU CC attestation - Code signing: Apple Developer ID, notarized via notarytool ## Pricing + license Currently a private preview at no cost. Bring-your-own API key for hosted inference (no markup). License details to be published with the public open-source release. Source code is not yet public.