Ketan Khairnar

Systems writing for AI agents, data platforms, distributed coordination, and the production details tutorials skip.

AI-native systems engineering

Notes on building systems that hold up in production.

Retries, state, budgets, coordination, observability, failure recovery — written down so you don't have to relearn them.

distributed systems production agents data platforms observability
10B+ events/day systems direction
150+ paying users from zero
70+ data pipelines shipped
1B+ market data points served

Search

Find the exact thread.

Jump straight into the essays, concepts, explainers, and deep dives behind the system you are thinking about.

agents

AI systems that survive real users

Multi-agent orchestration, bounded tool calls, memory, evals, cost guardrails, and recovery paths.

systems

Distributed systems with operational taste

Coordination, retries, idempotency, stream processing, failure detection, SLOs, and quiet incident prevention.

platforms

Data platforms that earn their keep

Kafka, Spark, ClickHouse, search, lakehouse migrations, and the product pressure behind architecture choices.

Latest field note

Continue reading

May 16, 2026
Flue Under the Hood: Why This Agent Harness Holds A source-level tour of Flue through domain-driven design and twelve-factor architecture: the language, boundaries, and runtime constructs behind the TypeScript agent harness framework. A concrete engineering pattern, stripped down to the decision that matters. AIagentsharnessopen-source