Applied AI Digest: Week of 05/11

Launches & Tools

Anthropic leases SpaceX's entire Colossus datacenter. What it means for Opus.
Anthropic signed a deal for all 300 MW of xAI's Colossus 1 datacenter. The capacity question has been the elephant in the room for Opus users. Whether this translates to lower latency, higher rate limits, or just training headroom, it's the biggest infrastructure move Anthropic has made.

Gemini 3.1 Flash Lite is the first GA model in the Gemini 3 family
Google quietly shipped Gemini 3.1 Flash Lite. If you've been waiting for Gemini 3 to leave preview before evaluating it, this is the entry point.

From the Workbench

We cut page loads from 12s to 2s by pointing Opus 4.7 Max at Chrome DevTools MCP
We plugged the Chrome DevTools MCP into Claude Code and asked it to improve page load performance. Results on one of our projects: Research page 4s to 2s. Collection page 2s to 200ms. Company page 12s to 2s. Some of it was embarrassingly easy.

One catch: Opus 4.7 High produced terrible slop. Opus 4.7 Max made beautiful changes that we kept most of, with its own built-in harness for verifying improvements. We saw the same gains on a second project. If you have page load or TTFP problems, try Opus 4.7 Max + Chrome DevTools MCP before you do anything else.

OpenAI is winding down their fine-tuning API. A week after we hit their rate limits.
Last week we wrote about burning through OpenAI's 8-job-per-day fine-tuning quota. This week, they announced they're winding down the fine-tuning API entirely. The timing is almost funny. If you need alternatives, three worth a look: OpenPipe, Thinking Machines Tinker, and Google Vertex AI supervised tuning.

Set your package manager to use week-old packages
A self-spreading supply chain attack hit the npm ecosystem this week, affecting TanStack and Mistral npm libraries among others. We checked our projects and weren't hit. The practical mitigation we've landed on: configure your package manager to install packages that are at least a week old. By that point, most malicious packages have been caught and pulled. One config change, free buffer.

Reads

Every model scores 0% on ProgramBench, but the headline buries the real result
The SWE-Bench team released ProgramBench: you get an executable and its documentation, and you have to re-implement the program from scratch. No source code, no decompilers, no internet. 6 hours. One submission. No partial credit. One of the tasks is re-implementing ffmpeg. Every model tested scored 0%.

The details are more interesting than the headline. Sonnet re-implemented LuaJIT well enough to pass 71.5% of unknown behavioral tests in 6 hours with no references. That's not nothing. The benchmark authors confirm every task is solvable by design, so 0% is a ceiling problem, not an impossibility problem. Worth watching how fast that number moves.

"Cognitive surrender" is the term you've been looking for
A post by Addy Osmani names the thing a lot of us have been feeling. Cognitive offloading is delegating to AI and still owning the answer. Cognitive surrender is when you stop forming an independent view entirely. The underlying paper found that when AI was wrong, people with AI access performed worse than people without it.

The nuance we'd add: what you should surrender is changing with each release. The tide of how far down you need to understand keeps rising. The job is staying calibrated about where the line is right now, for your specific workflow and safeguards.

Mythos is finding real vulnerabilities at Mozilla-scale with almost no false positives
Mozilla reported that Mythos found 271 vulnerabilities with almost no false positives. Separately, it found a curl vulnerability this week too. For a tool that was vaporware-adjacent not long ago, these are real results on real codebases.

About Fractional AI: We build custom AI software for companies working on hard problems. We're practitioners first, and this newsletter is informed by what we're actually using, breaking, and shipping every week. More at fractional.ai.

Applied AI Digest: Week of 05/11

Launches & Tools

From the Workbench

Reads

Keep Reading