8 case studies — the problem, the call I made, what shipped, what I'd do differently. Each one comes with a chart. Click any row to expand the article.
When I joined Drew AI, the first instinct on the team was to build one LLM that could answer any e-commerce question.
That was wrong.
Marketers don't ask "any question." A Meta Ads question needs Meta context. A Google Ads question needs GAQL. A GA4 question needs event-stream understanding. One model trying to do all three becomes mediocre at all three.
So I split the brain.
The result is a reporting layer that feels like talking to three specialists who happen to share a desk — not one generalist trying to remember everything.
The tradeoff is real: more system complexity, more eval surface area, more chances for the agents to disagree. But the answers are sharper. And "sharp" is what gets Shopify founders to come back tomorrow.
You can't ship an AI product without evals. Everyone says this. Almost no one does it.
I built ours in LangSmith with four tiers:
Each tier has its own evaluator, its own threshold, its own alert. We catch regressions before users do.
The unsexy truth: most of the work isn't in the model. It's in the eval data — the curated set of real shop queries with known-good answers — and the discipline to update it as the product evolves.
E-commerce founders have been burned by "AI-powered insights" before. So when Drew AI says "your CAC is up 18% this week," the next question is always: should I trust this?
We built three things for that:
It's slower than fully autonomous. That's the point.
A wrong autonomous decision burns trust permanently. A reviewed decision builds it.
The hardest part wasn't engineering — it was UX. Confidence has to feel native, not bolted on. Founders shouldn't need a tutorial to read it.
When Anthropic shipped MCP, I made a call: every Drew AI integration would be MCP-first.
The reasoning was simple. A D2C founder doesn't live in one tool. They live across Slack, Notion, Linear, their ad platforms, their analytics dashboards.
If Drew AI sits inside any one of those, it's a feature.
If it sits across all of them via a shared protocol, it's the connective layer.
So Drew AI now talks to Slack, Notion, Linear, Google Ads, and Meta Ads through MCP servers — same agent brain, different surfaces.
The bet was that MCP would become the standard. So far, it's playing out.
At American Express, the question wasn't "should we run experiments." It was: how do we run them at the scale of 6+ international markets without drowning in noise?
The system I led handled 30+ tests a year, each tied to revenue lines that mattered.
A few things that compound at scale:
Referral conversions improved ~15%. Revenue impact crossed $100M.
Picture this: a deploy goes wrong on a revenue-critical page at 11pm on a Friday. Traffic drops 40%. Nobody notices till Monday morning. By then, the analysts are calculating the loss in millions.
That actually happened. So I built the thing that would have caught it.
The model learns the normal traffic shape per page, per market, per hour. It flags drops the moment they break baseline — not the next morning, not the next dashboard refresh.
The detector caught regressions in the first month that would have slipped past humans. Pays for itself in one save.
Most SEO conversations happen at the level of "let's optimise this landing page."
At Amex, the surface was 150,000+ URLs. Different math.
The work split three ways:
What I learned: at this scale, you don't fix SEO page by page. You fix it pattern by pattern. One template change cascades to thousands of URLs.
My first analytics job wasn't in a tech company. It was on the floor of a steel plant.
The plant was running below capacity, with inventory piling up. The accepted explanation was "demand variability."
I read Goldratt's Theory of Constraints and realised the plant didn't have a demand problem — it had a bottleneck problem.
Inventory dropped ~17%. OTIF delivery rates improved.
This was the project that turned me from a mechanical engineer who knew analytics into an analyst who happened to know how machines work. The order changed the rest of my career.