Transmissions from inside live agent deployments: postmortems, evals, cost graphs, what worked, what blew up. No "10 ways AI will change everything."
A worked example: a 7-agent customer-support stack for a mid-size fintech. Three weeks in, satisfaction scores can dip 4 points overnight with no model change, no prompt change, no traffic anomaly. The drift is in the index, and it changes how these systems should be measured.
Read transmissionAgent payments went from one duct-taped pattern to seven competing protocols in four months: x402, Stripe MPP, Visa TAP, Mastercard Agent Pay, Google AP2, and more. The fee math that decides it (30,003% vs 10%), how x402 turns HTTP 402 into a working rail, the regulatory gap, and what we'd pick.
Paper notes on arXiv:2503.08223 (Zhejiang University, April 2026). The two limits everyone worries about, data exhaustion and compute monopolization, could both be broken by the devices already in people's hands. The math, the open problems, and what is shippable today.
A missing termination condition, no cost alert, a confidence budget written but not shipped. 24,847 API calls, 9 hours, $3,218. A worked example of why every safeguard is non-optional.
CSAT says the customer felt helped, not that they were helped correctly. The three-tier measurement stack that catches CS agent hallucination before your return rate does.
Most agent failures aren't wrong answers. They're infinite loops. We added a confidence-budget primitive to our planner and watched p99 latency drop 38%.
How a 12-person Quebec logistics SMB could cut after-hours response time from 6 hours to 4 minutes by routing 73% of inbound through a grounded copilot. A worked example with the full architecture.
pgvector, Pinecone, Qdrant, Weaviate. Same dataset, same embedding model, same queries. p50/p99 latency, recall@10, $/1M reads, ops complexity. No vendor took every prize.
Dense vectors get the marketing, but on technical content with rare named entities, sparse retrieval still wins. We measured the crossover.
Evals don't have to be a research project. Our standard regression harness fits in a notebook and catches 80% of bad model swaps. Walk-through inside.
Three years, four custom runtimes, one painful lesson: LangGraph is good enough for 80% of multi-agent workflows. Here's when we still roll our own.
The terminology is a mess. A working glossary, with code for each, plus when each one is the right call. Bookmark this for the next sales meeting.
We tested 8 chunking strategies on 5 representative corpora. Semantic chunking wins on ambiguous text, fixed-size wins on docs, and you should never use 512 by default.
Latency dashboards are everywhere. Cost dashboards are rare. We built a per-step cost trace that surfaces the 12% of calls eating 60% of your bill.
Field notes, postmortems, and the occasional sharp opinion on what's actually working in production agentic AI. No "ultimate guides." No threads.