root · field-notes · /feed

Field notes /
from production.

Transmissions from inside live agent deployments: postmortems, evals, cost graphs, what worked, what blew up. No "10 ways AI will change everything."

posts14

contributors8

last published2026·05·29

avg. read time~10m

edited byJ.P. Levac

cadenceevery two weeks

Featured · transmission 018

Why your RAG pipeline silently degrades, and how we caught it in eval week 3.

A worked example: a 7-agent customer-support stack for a mid-size fintech. Three weeks in, satisfaction scores can dip 4 points overnight with no model change, no prompt change, no traffic anomaly. The drift is in the index, and it changes how these systems should be measured.

JP Jean Pierre Levac

2026·05·14 11m read ● RAG · evals

Read transmission

§ feed · all transmissions

Recent / chronological

13 posts · sorted newest → oldest

tx · 022 infra

Your agent needs a wallet. Seven protocols want to be it.

Agent payments went from one duct-taped pattern to seven competing protocols in four months: x402, Stripe MPP, Visa TAP, Mastercard Agent Pay, Google AP2, and more. The fee math that decides it (30,003% vs 10%), how x402 turns HTTP 402 into a working rail, the regulatory gap, and what we'd pick.

Ld 2026·05·29 12m

↗

tx · 021 infra

The scaling wall is real. The fix might be in your pocket.

Paper notes on arXiv:2503.08223 (Zhejiang University, April 2026). The two limits everyone worries about, data exhaustion and compute monopolization, could both be broken by the devices already in people's hands. The math, the open problems, and what is shippable today.

Ld 2026·05·26 11m

↗

tx · 020 agents

Post-mortem: the loop that cost $3,200 overnight.

A missing termination condition, no cost alert, a confidence budget written but not shipped. 24,847 API calls, 9 hours, $3,218. A worked example of why every safeguard is non-optional.

JP 2026·05·17 11m

↗

tx · 019 ops

Your CS agent has a 4.2-star rating. It's also hallucinating 8% of the time.

CSAT says the customer felt helped, not that they were helped correctly. The three-tier measurement stack that catches CS agent hallucination before your return rate does.

Hs 2026·05·17 12m

↗

tx · 017 agents

Building a planner that knows when to give up.

Most agent failures aren't wrong answers. They're infinite loops. We added a confidence-budget primitive to our planner and watched p99 latency drop 38%.

JP 2026·05·09 8m

↗

tx · 016 scenario

Worked example: a 12-person ops team, a 24/7 inbox, one agent stack.

How a 12-person Quebec logistics SMB could cut after-hours response time from 6 hours to 4 minutes by routing 73% of inbound through a grounded copilot. A worked example with the full architecture.

Hs 2026·05·03 14m

↗

tx · 015 infra

Vector DB shootout: we ran 4 stores on 1.2M chunks. Here's the receipts.

pgvector, Pinecone, Qdrant, Weaviate. Same dataset, same embedding model, same queries. p50/p99 latency, recall@10, $/1M reads, ops complexity. No vendor took every prize.

Bn 2026·04·28 16m

↗

tx · 014 RAG

Hybrid retrieval: when BM25 beats your $400 embedding model.

Dense vectors get the marketing, but on technical content with rare named entities, sparse retrieval still wins. We measured the crossover.

Pb 2026·04·22 9m

↗

tx · 013 ops

The 6-line eval suite we ship with every agent.

Evals don't have to be a research project. Our standard regression harness fits in a notebook and catches 80% of bad model swaps. Walk-through inside.

Hs 2026·04·17 6m

↗

tx · 012 agents

Why we stopped writing custom orchestrators (mostly).

Three years, four custom runtimes, one painful lesson: LangGraph is good enough for 80% of multi-agent workflows. Here's when we still roll our own.

Ry 2026·04·11 11m

↗

tx · 011 agents

Tool calling vs. function calling vs. agents: the actual differences.

The terminology is a mess. A working glossary, with code for each, plus when each one is the right call. Bookmark this for the next sales meeting.

Lx 2026·04·05 5m

↗

tx · 010 RAG

Chunking is a hyperparameter. Tune it.

We tested 8 chunking strategies on 5 representative corpora. Semantic chunking wins on ambiguous text, fixed-size wins on docs, and you should never use 512 by default.

Sv 2026·03·29 9m

↗

tx · 009 infra