Writing

From the lab

Writing on AI reliability, observability, and private model training.

Why RL environments beat prompt engineering for edge cases

Prompts are instructions. Environments are practice. Here's why the distinction matters when your agent keeps failing on the same class of inputs.

Most teams don't realize how much of their API bill comes from a small set of repetitive tasks. We traced the pattern across 12 deployments.

Observability shouldn't be an afterthought. Here's our async capture architecture that adds less than 5ms overhead to any LLM call.

Finance, legal, and ops all have different failure modes. Here's what we found when we trained domain-specific models for each.