DocGaze — Case study

The problem

Most document analytics tools tell you a file was opened.

That is the floor of useful information. The actual question is which page lost the reader, which paragraph got re-read, which section got copied to clipboard, which question got asked of the document afterwards. The answers to those questions are buried inside a stream of high-frequency events that legacy analytics systems were not built to capture.

The category leaders treat documents like ad creatives, optimized for view counts and time on page. That is a 2014 question. The 2024 question is about reader cognition, and answering it requires writing a lot of small events very quickly, then reading them back as time series.

Constraints

Write-heavy from minute one.

A single ten-page document, opened by a reader who scrolls, hovers, copies, and then asks two questions of the AI sidecar, generates over forty discrete events. Multiply by a thousand readers per day per document, multiply again by a hundred documents per workspace, and the write volume is an order of magnitude higher than the read volume on any traditional analytics dashboard.

The constraint was clear. Do not pick a database now that we will have to migrate off in six months. Do not pretend the system is read-heavy when it is the opposite. Do not retrofit time-series semantics onto a row store and discover, later, that the dashboard queries are taking eight seconds.

Architecture decisions

The shape of the system.

TimescaleDB as the primary store for events. Hypertables on the events table, automatic chunking by time, continuous aggregates for the dashboard queries. Postgres compatibility means the rest of the system (auth, workspaces, billing, document metadata) lives in the same database without a separate adapter layer.

Clean Architecture inside NestJS. Domain logic is framework-agnostic and tested without spinning up the HTTP layer. Use cases are small, orchestrated, and composable. Adapters live at the edges. This is the architecture choice that makes the codebase still navigable two years in, with a single engineer.

BullMQ for the asynchronous pipeline. Document ingestion, embedding generation, summary extraction, AI question answering. Each step is a job. Retries are explicit. Dead letters are dashboarded. The user-facing API never blocks on an LLM call.

Provider-agnostic LLM layer. A factory that returns either OpenAI, Anthropic, or a local fallback depending on per-tenant configuration and live availability. When OpenAI rate-limits, traffic shifts to Anthropic, transparently. Cost ceilings are enforced per tenant, with hard caps and soft warnings.

Redis for session state and cache. Reader sessions, document state, the heat map data that powers the dashboard's primary visualization. Hot reads go through Redis, cold reads fall back to Timescale.

Event ingestion uses a thin edge function. A tiny endpoint that does only validation and enqueue. Heavy work happens behind the queue. This keeps the public-facing latency bounded even when the ingestion volume spikes.

What we shipped

The visible surface.

A workspace dashboard that shows the heat map of reader attention per document, per page, per paragraph. A question log that records what readers asked the AI sidecar and what the AI returned. A weekly digest, sent to the document owner, summarizing where the document worked and where it lost attention.

Behind the surface: a queue infrastructure that holds steady at 99.9% job success, with the failures captured, classified, and surfaced. A multi-provider LLM cost report that tracks spend per tenant per provider per day. A retention policy that keeps recent events at full granularity and downsamples older data, automatically, on a Timescale schedule.

What we would do differently

The honest retrospective.

The Clean Architecture commitment was the right call but added one to two days per feature in the early months. With hindsight, we would have started with a slightly thinner version of the same pattern and tightened it as the codebase grew. The result would have been the same architecture, faster.

The provider-agnostic LLM layer was over-engineered for the first three months when there was effectively one tenant. The factory pattern only paid for itself in month four when the second tenant had different cost ceilings and provider preferences. We would still build it, but we would build it later.

The choice to put auth, billing, and events in the same database was right for cost and operational simplicity, wrong for separation of concerns. As the workspace count grows, we will likely split the events store onto its own Timescale instance.

Built to write fast, read fast, and tell the truth about what readers actually did.

AI document analytics, built like a payment system.