Applied AI Digest: Week of 05/04

Launches & Tools

Skip Symphony, read Harness Engineering instead
OpenAI published a new blog post on Harness Engineering, the conceptual framework behind how they built Symphony (their open-source Codex orchestration layer). The framework is more interesting than the code. It lays out a mental model we've been converging on independently: break goals into small enforceable blocks, prompt the agent to build those blocks, use them to unlock harder tasks. When something fails, the fix is never "try harder." It's "what capability is missing, and how do we make it legible to the agent?"

We've been split on Docling for tables. Granite 4.1 might settle it.
IBM's new open-weight Granite 4.1 includes a 4B VLM tuned for document extraction that they say beats Opus 4.6 on table reading. Some of us have had good results with Granite VLMs inside Docling, but others found it performed poorly on complex tables compared to GPT-4o. The catch: Docling's default model is granite-docling-258M. A 258M model losing to a ~200B model isn't surprising. Whether this new 4B model closes that gap at a fraction of the cost is the thing to benchmark.

Deepgram launches Flux, a multilingual speech model
Deepgram's new Flux model targets multilingual transcription and real-time speech. Another option if you've been stuck choosing between Whisper's accuracy and a hosted API's latency.

From the Workbench

We've been drafting decks with LLMs, but polishing them in Google Slides AI
Google Slides' built-in AI used to generate images of slides, which made them uneditable. Now it creates actual elements: text boxes, shapes, layouts. Neil on our team has been using it to iterate once the initial layout and copy are in place, and says it handles text box sizing correctly. Claude's slide generation still struggles with alignment and oversized text boxes ("oh you wanted it aligned?" as one teammate put it). The workflow that's working for us: LLM for content drafting, Slides AI for layout polish.

We burned through OpenAI's fine-tuning quota before lunch
Your entire organization is capped at 8 GPT-4.1 fine-tuning jobs per day. Not per user. Per org. We hit the wall this week while iterating on training data. If you're running hyperparameter sweeps, plan your runs like expensive batch jobs, not quick experiments.

Azure AI Foundry: skip the UI, do everything via API
A teammate is starting a project that routes all inference through Azure AI Foundry. Team wisdom from past Foundry projects: don't touch the portal UI. It hides deployments, drops models from view, and generally gaslights you about what's available. Use the API from day one. Budget time for rate limit surprises. The FAQ still references text-davinci-003, which tells you about how current the docs are.

Architecture diagrams with AI: the state of the art is "yell at it"
We had a 19-reply thread this week on getting AI to produce architecture diagrams that don't look like they came over a 2400 baud modem. The short version: Mermaid is diffable but visually messy. Graphviz is ugly. Raw SVG-to-VSDX-to-Lucidchart works but you can't iterate on it. HTML artifacts give the most control but aren't maintainable.

The best thing we've found so far is a custom Claude skill that generates C4-model diagrams, constraining the output enough that iteration actually converges. Codex is slightly better than Claude at spatial layout, for what that's worth.

We thought Anthropic's Managed Agent launch might be toy. They aren't.
Joshua on our team has been using the managed agent setup cli tool for real work. This week it offered to launch an agent to check in with him in two weeks about a deferred webhook integration. An agent understood his project timeline and offered to own a follow-up.

Reads

Prompt injection is the new SQL injection, and agents with wallets are the proof
Someone encoded a transfer instruction in Morse code, sent it to the Grok-powered @bankrbot, and the model decoded it, bypassed safety filters, and moved 3B $DRB tokens (~$147K) to the attacker's wallet. The attacker had first gifted an NFT to unlock transfer permissions, then exploited the encoding blind spot. Same class of problem as SQL injection, new surface. If you're giving agents access to irreversible actions, the safety boundary has to live outside the model. "The model says no" is not a constraint. It's a suggestion.

OpenRouter's GPT-5.5 cost analysis puts the pricing in context

OpenRouter broke down the actual cost comparison between GPT-5.4 and 5.5 across workloads. The short version: 5.5 is meaningfully more expensive, and the quality gains depend on what you're doing. Check this before committing to a migration.

About Fractional AI: We build custom AI software for companies working on hard problems. We're practitioners first, and this newsletter is informed by what we're actually using, breaking, and shipping every week. More at fractional.ai.

Applied AI Digest: Week of 05/04

Launches & Tools

From the Workbench

Reads

Keep Reading