llm-gemini 0.32a0

Simon Willison's llm-gemini plugin now supports streaming reasoning tokens, aligning with the broader shift toward exposing model internals in production tooling. This update tracks the maturation of Gemini's reasoning capabilities and reflects growing developer demand for fine-grained token-level control, particularly as reasoning models become central to LLM workflows. The compatibility requirement with llm>=0.32a0 signals coordinated infrastructure evolution across the open-source LLM ecosystem.

Modelwire context

Analyst take

The alpha versioning on both llm-gemini and its llm dependency requirement is notable: this isn't a stable release, meaning developers who adopt streaming reasoning tokens now are accepting some interface instability in exchange for early access to a capability that is still being defined at the platform level.

This release lands on the same day as 'llm-gemini 0.32', which added Gemini 3.5 Flash model support, meaning Willison shipped two plugin updates in a single day, one for model access and one for reasoning token streaming. That cadence tracks directly with Google's own release tempo documented in the 'Google Aims at Enterprise Cost Efficiency With Gemini 3.5 Flash' coverage from AI Business, also on May 19. The open-source tooling layer is effectively absorbing Google's rapid release schedule in near real-time, which compresses the lag between a new Gemini capability and its availability to Python CLI developers. Whether that tight coupling becomes a liability depends on how stable Google's reasoning token API contracts prove to be.

Watch whether the llm-gemini plugin reaches a stable 0.32 release (dropping the alpha tag) within the next four to six weeks. If it stalls in alpha, that likely signals the underlying Gemini reasoning token API is still shifting and the tooling can't lock down a contract.

Coverage we drew on

Google Aims at Enterprise Cost Efficiency With Gemini 3.5 Flash · AI Business

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

Mentionsllm-gemini · Simon Willison · Gemini · llm

Read full story at Simon Willison →(simonwillison.net)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on simonwillison.net. If you’re a publisher and want a different summarization policy for your work, see our takedown page.