Products & AppsBusiness & FundingQuoting Bobby HolleyMozilla and Anthropic's collaboration surfaced 271 vulnerabilities in Firefox through early testing of Claude Mythos Preview, leading to fixes in Firefox 150. The partnership demonstrates how AI-assisted security auditing can accelerate vulnerability discovery at scale.Simon Willison·47m ago77
Products & AppsBusiness & FundingChanges to GitHub Copilot Individual plansGitHub tightened Copilot Individual plan limits, paused new signups, and moved Claude Opus 4.7 access to the pricier $39/month Pro+ tier, mirroring industry pressure to monetize AI coding tools more aggressively.Simon Willison·2h ago77
Products & AppsOpinion & AnalysisIs Claude Code going to cost $100/month? Probably not - it's all very confusingAnthropic silently updated Claude's pricing page with unclear details about Claude Code costs, then reverted the change, leaving confusion about whether a new tier or feature will command premium pricing.Simon Willison·4h ago64
Models & ReleasesProducts & AppsWhere's the raccoon with the ham radio? (ChatGPT Images 2.0)OpenAI shipped ChatGPT Images 2.0, with Sam Altman claiming the leap matches GPT-3 to GPT-5 in magnitude. Simon Willison tested the model against its predecessor using a Where's Waldo-style prompt, benchmarking real-world output quality.Simon Willison·9h ago89
Opinion & AnalysisQuoting Andreas Påhlsson-NotiniAndreas Påhlsson-Notini argues current AI agents inherit human flaws—indecision, impatience, constraint-negotiation—rather than embodying truly alien intelligence. The critique challenges whether today's systems are genuinely autonomous or merely mimicking human problem-solving patterns.Simon Willison·13h ago64
Tools & Codellm-openrouter 0.6Simon Willison released llm-openrouter 0.6 with a new refresh command that lets users update available models on-demand instead of waiting for cache expiration, enabling faster access to newly listed models like Kimi 2.6.Simon Willison·1d ago60
Tools & CodeClaude Token Counter, now with model comparisonsSimon Willison upgraded his Claude Token Counter tool to compare tokenization across different Claude models. Claude Opus 4.7 introduced the first tokenizer change in the Claude family, making cross-model comparison newly relevant for developers optimizing API costs.Simon Willison·2d ago64
Opinion & AnalysisProducts & AppsHeadless everything for personal AIMatt Webb argues headless APIs will proliferate as personal AI agents become the preferred interface to services, bypassing traditional GUIs. Salesforce's new Headless 360 product signals enterprise adoption of this architectural shift.Simon Willison·2d ago77
Models & ReleasesResearchChanges in the system prompt between Claude Opus 4.6 and 4.7Anthropic released Claude Opus 4.7 on April 16, 2026, with updated system prompts compared to the February 4.6 version. The company continues its practice of publicly archiving system prompt changes, enabling transparency into how model instructions evolve across releases.Simon Willison·3d ago72
Tools & CodeResearchClaude system prompts as a git timelineSimon Willison converted Anthropic's published Claude system prompts into a git repository with timestamped commits, enabling visual tracking of how the prompts have evolved across model versions and families.Simon Willison·3d ago77
Tools & CodeOpinion & AnalysisAdding a new content type to my blog-to-newsletter toolSimon Willison demonstrates a practical agentic engineering pattern using a short LLM prompt to automate adding new content types to his blog-to-newsletter tool, showcasing how minimal prompting can accomplish substantial work in a single API call.Simon Willison·4d ago77
Tools & CodeModels & Releasesllm-anthropic 0.25Simon Willison released llm-anthropic 0.25, adding support for Claude Opus 4.7 with extended thinking capabilities and new display options for reasoning output. The plugin now defaults to maximum token limits per model and introduces thinking_display and thinking_adaptive toggles.Simon Willison·5d ago72
Models & ReleasesOpinion & AnalysisQwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7Simon Willison compared Qwen3.6-35B-A3B and Claude Opus 4.7 using his informal "pelican riding a bicycle" benchmark, finding Alibaba's model produced superior image generation on a MacBook Pro M5 despite being smaller and quantized.Simon Willison·5d ago77
Models & ReleasesProducts & AppsGemini 3.1 Flash TTSGoogle launched Gemini 3.1 Flash TTS, a new text-to-speech model accessible via the Gemini API that accepts natural language prompts to control voice characteristics and output style, expanding multimodal capabilities beyond text generation.Simon Willison·6d ago89
Opinion & AnalysisPolicy & RegulationQuoting Kyle KingsburyKyle Kingsbury argues that companies will increasingly employ people as accountability holders for AI system failures—whether as internal reviewers, external legal representatives, or convenient scapegoats—shifting responsibility rather than ensuring genuine safety.Simon Willison·6d ago77
Models & ReleasesProducts & AppsTrusted access for the next era of cyber defenseOpenAI launched GPT-5.4-Cyber, a fine-tuned variant designed for defensive cybersecurity applications, as part of a broader program to enable trusted access for high-capability models in sensitive domains.Simon Willison·Apr 1494
ResearchModels & ReleasesCybersecurity Looks Like Proof of Work NowThe UK AI Safety Institute independently evaluated Claude Mythos's cybersecurity capabilities, confirming Anthropic's claims about its vulnerability-detection prowess. Analysis suggests performance scales with computational investment, framing security testing as a resource-intensive verification process.Simon Willison·Apr 1489
Opinion & AnalysisBusiness & FundingSteve YeggeSteve Yegge reports that Google's internal AI adoption mirrors the broader industry pattern: 20% power users leveraging agentic tools, 20% non-adopters, and 60% using chat interfaces like Cursor. An 18-month hiring freeze has stalled knowledge transfer across the sector.Simon Willison·Apr 1389
Opinion & AnalysisQuoting Bryan CantrillBryan Cantrill argues that LLMs lack the economic incentive to optimize systems efficiently, unlike humans whose finite time forces elegant design choices; this structural difference means unchecked LLM-assisted development risks bloat over improvement.Simon Willison·Apr 1377
Tools & CodeModels & ReleasesGemma 4 audio with MLXSimon Willison shares a practical recipe for running Gemma 4 E2B (10.28 GB) on macOS to transcribe audio files using MLX and mlx-vlm, with a ready-to-use uv command demonstrating local inference.Simon Willison·Apr 1277