From Scraping 1,100 Coffee Roasters to Detecting Shadow AI: Why Pattern Recognition at Scale Is Everything
The journey from building a specialty coffee marketplace to creating an AI governance platform taught me that finding needles in haystacks at scale is always the same problem - whether it's great coffee or shadow AI.

Two years ago, I built Roastguide - a platform to help specialty coffee lovers find great roasters across Europe. The problem seemed straightforward: there were hundreds of small roasters making incredible coffee, but no easy way to discover them. The solution? Scrape, structure, and surface the signal.
Today, I'm building Trustflo with Hanna - a platform that helps companies discover and govern the AI tools their teams are actually using. On the surface, these problems seem unrelated. One is about coffee. The other is about compliance.
But underneath, they're the same problem: finding needles in haystacks at scale.
Roastguide: Finding signal in a sea of coffee
When I started Roastguide, specialty coffee was fragmented. Great roasters existed, but they were hidden - on Instagram, in local directories, mentioned in forum threads. There was no single source of truth.
So I built one. I scraped data from 1,100+ roasters across Europe, normalized it, and made it searchable. Users could filter by region, roast style, or shipping options. The app was featured by Apple's editorial team. It worked.
But the hard part wasn't the UI. It was the data pipeline. Here's what that looked like:
At the time, I thought this was a niche problem specific to specialty coffee. It wasn't.
Fast-forward: The shadow AI problem
When Hanna and I started researching AI governance for mid-market European companies, we kept hearing the same frustration:
"We know our employees are using AI tools. We just don't know which ones, or what data they're feeding them."
This is shadow AI - the use of AI tools outside official procurement or oversight. It's not just ChatGPT. It's Grammarly, Notion AI, GitHub Copilot, Otter.ai, Perplexity, DeepL, and dozens of others. Each one is a potential compliance risk under GDPR and the AI Act.
But here's the thing: most companies don't even know these tools exist in their environment. They're hidden in SaaS subscriptions, browser extensions, Slack integrations, and personal accounts.
Sound familiar?
It's the same fragmentation problem I solved with Roastguide. Except instead of finding coffee roasters, we're finding AI tools. And instead of scrapers hitting public websites, we're parsing SaaS expense receipts, SSO logs, and integration metadata.
The pattern: Discovery → Normalization → Classification → Control
When I look back at Roastguide and forward at Trustflo, I see the same four-stage pipeline:
The specifics differ. But the structure is identical.
Why this matters for compliance
The AI Act doesn't just regulate AI providers. It also regulates deployers - companies that use AI in their operations. That includes your organization, even if you didn't build the AI yourself.
To comply, you need to:
- Know which AI systems you're using
- Classify them by risk level
- Maintain records of usage and data flows
- Ensure human oversight where required
- Inform affected individuals (employees, customers)
You can't do any of that if you don't know the tools exist.
That's the shadow AI problem. And it's not going away. Netskope tracked 1,550+ distinct GenAI apps in 2025 (up from 317 a year earlier). The average mid-market company uses 15+ AI tools. Most are outside formal procurement.
Manual spreadsheets don't scale. IT surveys get outdated the day they're sent. You need continuous discovery.
Why we built Trustflo the way we did
When Hanna and I started Trustflo, we could have built a "compliance dashboard" where companies manually log their AI tools. That's what most vendors do.
But I knew from Roastguide that manual data entry doesn't work at scale. Roasters didn't submit their own listings. I scraped them. Because if you rely on people to self-report, you get 20% coverage and it's out of date by next week.
So we built Trustflo to automatically discover shadow AI. We connect to your SaaS spend data, your SSO logs, your integrations. We parse it, normalize it, and classify it against the AI Act's risk framework. Then we surface it in a dashboard where you can approve, block, or monitor.
No surveys. No spreadsheets. No asking IT to manually audit 500 employees.
Just continuous, automated discovery. The same way Roastguide found coffee roasters.
The takeaway
I used to think Roastguide was a consumer product and Trustflo was an enterprise compliance tool. But they're both information products. They both solve the same problem: making the invisible visible.
If you're building in compliance, security, or governance, you're not really building compliance software. You're building a discovery engine.
The value isn't in the UI. It's in the data pipeline. Can you find the signal in the noise? Can you keep it current? Can you make it actionable?
That's the hard part. Everything else is just UI work.
Whether you're finding great coffee or shadow AI, the problem is the same: pattern recognition at scale. Get that right, and the rest follows.
