Ketan Khairnar — systems, AI agents, data

AI-native systems engineering

demo clean production messy

Notes on building systems that hold up in production.

Notes on retries, state, budgets, coordination, observability, and failure recovery — the details that decide whether pressure becomes progress or repeat pain.

Read latest Explore deep dives Search the site Work history

distributed systems production agents data platforms observability

production-agent-loop runtime map

Core question Can the loop recover when tools, cost, and memory drift?

01 Pressure

Goal arrives

Intent, files, context, budget, and risk enter the same loop.

scope

02 Control

Policy bounds

The runtime chooses tools, retries, memory, and stop conditions.

bounds

03 Signal

Telemetry answers

Traces expose latency, token spend, tool errors, and bad branches.

p95

04 Library

Learning captured

Failures become evals, docs, guardrails, and better defaults.

replay

How an agent survives production pressure A production agent is not a chat box. It is a bounded control loop with state, tools, policy, observability, and replay.

10B+ events/day systems direction

150+ paying users from zero

70+ data pipelines shipped

1B+ market data points served

Find the exact thread.

Jump straight into the essays, concepts, explainers, and deep dives behind the system you are thinking about.

agents

AI systems that survive real users

Multi-agent orchestration, bounded tool calls, memory, evals, cost guardrails, and recovery paths.

systems

Distributed systems with operational taste

Coordination, retries, idempotency, stream processing, failure detection, SLOs, and quiet incident prevention.

platforms

Data platforms that earn their keep

Kafka, Spark, ClickHouse, search, lakehouse migrations, and the product pressure behind architecture choices.

Latest field note

Continue reading

May 16, 2026

Flue Under the Hood: Why This Agent Harness Holds A source-level tour of Flue through domain-driven design and twelve-factor architecture: the language, boundaries, and runtime constructs behind the TypeScript agent harness framework. A concrete engineering pattern, stripped down to the decision that matters. AIagentsharnessopen-source

Blog

Deep dives

All series

Harness Engineering: The Compounding Stack The operating layer around serious AI work. 13 parts Senior Harness Internals A concrete engineering pattern, stripped down to the decision that matters. 8 parts Senior Production Agents Deep Dive Retries, state, cost, security, and evals after the demo. 9 parts Senior AI Engineering Fundamentals Start at tokens; end at evaluated agents. 8 parts Intermediate