Heartbeat | Zero2One Whitepapers

Section 1

The Problem: Amnesia by Design

LLM-based systems are stateless. Each interaction begins with a blank context window that the user or system must populate. This creates a fundamental inefficiency: the human becomes the context bus, carrying awareness between sessions and pasting it into every interaction.

1.1 The Context Reconstruction Tax

Every time a developer opens a chat, starts a new agent session, or switches tools, they pay a tax: re-establishing who they are, what they're working on, what matters, and what's coming. This tax is invisible but constant. It fragments attention, wastes tokens, and degrades response quality.

1.2 Reactive Context Assembly

Existing solutions — RAG pipelines, vector databases, conversation memory, session management — all share the same architecture: they construct context reactively at query time. The user asks a question, the system scrambles to retrieve relevant information, and the model reasons over whatever was assembled in that moment.

This is fundamentally backwards. Awareness should exist before the question is asked.

1.3 The Missing Dimension

Current memory systems index by semantic similarity, entity relationships, or conversation order. None of them index by time. Yet time is how humans organize awareness: what happened, what is happening, what will happen. The absence of a temporal model forces every system to rediscover relevance from scratch on every query.

Section 2

Heartbeat: Architecture

Heartbeat is a temporal context engine that pre-computes awareness on a rhythm. It divides each 24-hour day into 180 blocks of 8 minutes. Each block produces a context snapshot — a token-budgeted assembly ready to be consumed by any LLM.

2.1 The Temporal Model

At any moment, awareness consists of four sections that compete for a fixed token budget:

INTENTION (10% budget) North stars, goals, stable directives. Manually authored. Rarely changes. Never trimmed. Without purpose, awareness is meaningless.
NOW (30% budget) Current signals, active state, ground truth. What is happening right now. Protected — trimmed last. Missing present causes hallucinations.
HISTORY (25% budget) Relevant past events, conversations, outcomes. Immutable — already happened. Trimmed second, oldest first.
PROJECTION (15% budget) Upcoming events, deadlines, forecasts. Speculative — may not happen. Trimmed first, furthest out inward.

The remaining 20% is reserved for the actual interaction — the user's message and the model's response.

2.2 The Heartbeat Loop

Every 8 minutes, the system ticks. Each tick:

Collects new pulses (raw signals) from connected sources
Classifies pulses into temporal sections
Loads stable intentions from a local file
Assembles a token-budgeted snapshot (80% of context window)
Applies the trim hierarchy if over budget
Stores the snapshot as a delta from the daily base
Optionally passes the snapshot to an embedded SLM for proactive reasoning
Exposes the snapshot via HTTP API

2.3 Single Binary Architecture

Heartbeat runs as a single Rust binary with embedded llama.cpp. No cloud services, no vector databases, no external dependencies. Signal collection happens through bash and osascript scripts that POST pulses to the binary's HTTP endpoint. The binary assembles snapshots, counts tokens, runs the SLM, and serves results — all in one process.

Section 3

The Trim Hierarchy

Context windows are finite. When the assembled snapshot exceeds the token budget, sections must be trimmed. The order of trimming is the most critical design decision in the system, because LLMs hallucinate by filling gaps in context.

3.1 Trim Order

PROJECTION first — Trim from furthest future inward. LLMs speculate about the future anyway. Low cost to lose.
HISTORY second — Trim from oldest forward. Already happened, lowest relevance drops first. Moderate cost.
NOW last — This is ground truth. Trimming the present forces the model to invent it. Catastrophic cost.
INTENTION never — These are the north stars. Without them the system loses purpose entirely.

3.2 Why This Order Matters

Missing future is cheap — the model will speculate regardless, and speculation about upcoming events has low downside. Missing old history is moderate — the model won't invent past events if the present is clear. Missing present is catastrophic — without ground truth, the model fabricates the current situation. This is where hallucinations kill.

Key insight: The trim hierarchy is a theory of hallucination prevention. Protect ground truth above all else. Let speculation be the first casualty.

Section 4

The Embedded SLM

Each snapshot is optionally passed to an embedded Small Language Model running via llama.cpp inside the same binary. The SLM's role is awareness, not assistance.

4.1 Awareness, Not Assistance

The SLM system prompt establishes a fundamentally different posture: "You are the awareness layer. You are not an assistant. You are presence. Never invent facts not in the snapshot. Speak only when there is reason to. You are not helpful. You are aware."

4.2 Four Response Modes

notify — Something needs the user's attention now
act — Something can be done autonomously
queue — Something should be scheduled for later
wait — Nothing needed (most common response)

4.3 Proactive Comes Free

Because the snapshot is already assembled on a rhythm, proactive monitoring costs nothing extra. The system doesn't need to observe, collect, or interpret when deciding whether to act — the context is already built. 180 SLM calls per day is computationally trivial on Apple Silicon. Always-on presence without paying for always-on inference.

Section 5

Properties

5.1 Pre-Computed Awareness

Context exists before any query. When a user or agent calls GET /now, they receive the current snapshot instantly. Zero retrieval latency. Zero context assembly at query time. Every interaction starts with full presence.

5.2 Infrastructure Independence

No cloud services. No API calls to external LLMs. No vector databases. No graph stores. No PKI. One binary, one machine, one process. Privacy by architecture — data never leaves the device.

5.3 Model Agnosticism

Any LLM slots into the consumer position. The intelligence is in context construction, not the model. Swap models freely — the snapshot fits, the model responds. The Rust app is the brain; the LLM is the voice.

5.4 Deterministic Awareness

Same inputs, same retrieval function, same snapshot. Context construction is deterministic and replayable. The non-determinism lives in the LLM's reasoning over that context — the same separation humans exhibit between consistent awareness and variable response.

5.5 Debuggable Consciousness

Every snapshot is immutable and stored. Scroll back through the day's 180 snapshots and see not just what was said, but what the system was aware of, what it was aiming for, what history it thought mattered. Full provenance of every context decision.

Section 6

Comparison with Existing Approaches

Approach	Context Timing	Temporal Model	Infrastructure	Proactive
RAG / Vector DB	Reactive (query time)	None	Cloud / DB	No
LangChain Memory	Reactive (query time)	None	Framework	No
Zep	Reactive (query time)	Temporal graph	Cloud service	No
Cognee	Reactive (query time)	Temporal pipeline	Cloud / Graph	No
memU	Reactive (query time)	None	PostgreSQL	No
OpenClaw / Moltbot	Reactive (query time)	None	Cloud LLMs	Partial
Heartbeat	Pre-computed (8-min rhythm)	Four-section temporal	None (local binary)	Yes (free)

The gap: nobody frames context as a temporal problem. Nobody pre-computes awareness on a rhythm. Nobody has a trim hierarchy that protects ground truth over speculation. Nobody builds this as a single local binary with an embedded SLM.

Section 7

Demonstration Scenario

The following scenario illustrates Heartbeat's operation in a developer's daily workflow.

7.1 Without Heartbeat

Developer opens chat at 9am
Types "Help me with the auth bug"
Model has no context. Asks "Which project? What stack? What bug?"
Developer spends 3 minutes re-establishing context
Forgets to mention 2pm interview. Misses prep time

7.2 With Heartbeat

Developer opens chat at 9am
Client queries GET /now — snapshot already contains current project, yesterday's git activity, auth error from last session, 2pm interview
Developer types "Help me with the auth bug"
Model already knows everything. Responds immediately with relevant fix
At the 11:00 beat, SLM notices interview with no prep — sends notification

Key insight: The developer never re-established context. The proactive notification cost nothing — the snapshot was already built.

Section 8

Success Metrics

Heartbeat instruments seven metrics from day one. The primary validation question: how many times per day did the user have to re-explain themselves? If that number drops, it works.

Context reconstruction accuracy — Did the user have to re-explain? Each re-explanation is a measured failure.
Trim loss rate — How often does trimming cut content that's later needed?
Proactive hit rate — Notifications accepted versus dismissed.
Cognitive interruptions — Times per day the user context-switches to feed information to a tool.
Beat relevance — Percentage of snapshot tokens actually referenced in the interaction.
Snapshot drift — Delta size between consecutive beats. Measures awareness volatility.
Time to first meaningful response — From "I sit down" to "I'm doing useful work."

Section 9

Implementation

9.1 Tech Stack

Core: Rust
SLM runtime: llama.cpp (via llama-cpp-rs bindings)
In-memory store: redb or dashmap
HTTP: axum
Signal collection: bash + osascript
Target model: Qwen 2.5 7B GGUF (quantized)
Platform: macOS (Apple Silicon optimized)

9.2 MVP Scope

Rust binary with in-memory pulse store. Heartbeat loop at 8-minute tick rate. Snapshot assembler with token counting and trim logic. HTTP API serving /now, /pulse, and /health. Embedded llama.cpp with awareness system prompt. One signal source: active window via osascript.

9.3 Non-Goals

No cloud. No external LLM APIs. No multi-user. No GUI. No conversation history in the LLM — each call is stateless, context comes from the snapshot. No RAG. No vector database. Token counting and temporal ordering handle retrieval.

Section 10

Conclusion

Heartbeat addresses a fundamental gap in LLM infrastructure: the absence of temporal awareness. By pre-computing context on a rhythm rather than assembling it reactively at query time, it eliminates the context reconstruction tax that fragments every interaction.

The architecture is deliberately minimal — a single local binary with an embedded SLM, no cloud dependencies, no vector databases, no external services. The intelligence lives in context construction: what to include, what to trim, and in what order. The LLM is commodity. The snapshot is the craft.

The real change isn't productivity. It's cognitive load. The user stops carrying everything in their head. The background anxiety of "what am I forgetting" quiets down. Not because they're more organized — because something else is holding the awareness for them.

180 snapshots a day. Deterministic context construction. Any LLM slots in. Every interaction starts with full presence. Proactive comes free.

Contact

Email: delaney@zero2one.ee
Web: https://zero2one.ee