AI Data Mining Makes BYOK Essential
Anthropic dropped a blog post yesterday accusing DeepSeek, Moonshot, and MiniMax of running what they called “industrial-scale” distillation campaigns against Claude. The numbers are staggering: 24,000 fake accounts, 16 million exchanges, with MiniMax alone responsible for 13 million of those. And they allegedly watched MiniMax redirect half its traffic to siphon capabilities from Claude’s newest model the moment it launched.
X immediately roasted them for it, because Reddit is currently suing Anthropic for scraping over 100,000 posts to train Claude. The irony writes itself. And somewhere between those two stories is a very loud case for BYOK.
Everybody is mining everybody
The specifics of who stole what from whom will keep getting updated. The pattern matters more: billion-dollar AI companies with world-class security teams cannot keep their own model outputs from being systematically harvested at industrial scale.
What Moonshot was actually after
Model distillation — using a big model’s outputs to train a smaller one — is a legitimate technique, widely used, not inherently shady. Anthropic distills its own models. And OpenAI does too. The issue is competitors doing it without permission, at massive scale, to shortcut years of research.
But what caught my eye was Moonshot’s 3.4 million exchanges specifically targeting agentic reasoning and tool use. They were not just copying Claude’s general intelligence. They were mining the specific capability that makes AI useful for real-world work: the ability to navigate interfaces, use tools, take actions. Browser agent stuff. The kind of functionality you rely on when you ask an AI to fill out a form or draft a response based on what’s on your screen.
That capability is valuable enough that companies will build sprawling networks of fake accounts to steal it, which tells you something about both the value of the technology and the lengths organizations will go to when they want data that is not theirs. Or in this case, Claude’s data. But the principle extends to any data flowing through a centralized service.
Why BYOK stopped being optional
So when a SaaS tool asks you to pipe your emails, your client data, your HR forms through their servers to reach an LLM, maybe reconsider what Anthropic just proved is possible at scale. If the people who built the model can’t keep their own outputs from being systematically harvested, the idea that a random productivity app will protect your data better is, frankly, delusional.
And every AI tool that routes your work through a server creates a copy of that work somewhere you don’t control. Maybe the company is trustworthy. Maybe their security is excellent. But Anthropic’s security is excellent too, and they still got hit with 16 million extraction attempts that ran for months before being fully identified.
dassi works differently because the architecture is different. It lives in your browser’s side panel, sees the page you see, and sends requests directly to whatever LLM provider you choose using your own API key. There is no dassi server sitting in between, which means there is no dassi server to be scraped, hacked, subpoenaed, or mined by anyone.
And BYOK is not just a pricing model. After yesterday it looks more like the only defensible data architecture for AI tools that touch your actual work. When you bring your own key, the only copy of your data that exists is the one flowing between your browser and the LLM you explicitly chose. No middleman. No aggregation layer. No honeypot.
The hypocrisy angle strengthens the argument
People on X were quick to point out that Anthropic trained Claude on scraped web data, so crying about being scraped in return is rich. And sure, that’s a damn fair point. But the criticism makes the BYOK case stronger, not weaker. If every AI company is both scraper and scrapee, if the entire industry operates on a foundation of “we will take what we can access,” then the rational move is not to pick a side in that fight but to minimize how much of your data enters the arena at all.
You can’t stop Anthropic and DeepSeek from going at each other. But you can stop routing your draft emails and client proposals through infrastructure that becomes a target the moment it aggregates enough valuable data.
I keep a few API keys in my browser and none of them require me to trust a fourth company with my work. Given what Anthropic just revealed about the scale of data mining happening between the companies I do trust with my API calls, that feels like about the right level of paranoia.