Writing
From the lab
Writing on AI reliability, observability, and private model training.
Research6 min read
Why RL environments beat prompt engineering for edge cases
Prompts are instructions. Environments are practice. Here's why the distinction matters when your agent keeps failing on the same class of inputs.
Engineering5 min read
The hidden cost of frontier models in enterprise workflows
Most teams don't realize how much of their API bill comes from a small set of repetitive tasks. We traced the pattern across 12 deployments.
Infrastructure7 min read
Trace capture without slowing down your agent
Observability shouldn't be an afterthought. Here's our async capture architecture that adds less than 5ms overhead to any LLM call.
Case Study8 min read
What we learned building private SLMs for three different verticals
Finance, legal, and ops all have different failure modes. Here's what we found when we trained domain-specific models for each.