Products & AppsOpinion & AnalysisAn update on recent Claude Code quality reportsAnthropic published a postmortem on Claude Code quality degradation over two months, revealing three distinct harness bugs rather than model failures. One issue involved clearing older reasoning from idle sessions over an hour to reduce latency, directly impacting user experience.Simon Willison·Apr 2477
Business & FundingOpinion & AnalysisThe week that Meta employees became training dataMeta's workplace monitoring practices have intensified alongside recent layoffs, with employees reporting surveillance tactics that repurpose their work as training data. The story raises questions about whether invasive data collection and workforce reduction will become standard in knowledge work.Platformer·Apr 2473
Models & ReleasesDeepSeek-V4: a million-token context that agents can actually useDeepSeek released V4 with a million-token context window, marking a significant expansion in how much information agents can process in a single session. The capability addresses a practical bottleneck for long-horizon reasoning and multi-step workflows.Hugging Face·Apr 2489
Business & FundingAnthropic and NEC collaborate to build Japan’s largest AI engineering workforceAnthropic and NEC are partnering to develop AI engineering talent in Japan, positioning the collaboration as a path to building the country's largest AI workforce. The deal signals Anthropic's expansion into Asia-Pacific and reflects growing demand for localized AI expertise outside the US.Anthropic·Apr 2475
Products & AppsClaude is connecting directly to your personal apps like Spotify, Uber Eats, and TurboTaxAnthropic expanded Claude's app integrations beyond enterprise tools to personal services including Spotify, Uber Eats, TurboTax, Instacart, and AllTrails, letting users pull data directly into conversations. The move signals a shift toward consumer-facing AI utility and raises questions about data access and privacy at scale.The Verge — AI·Apr 2369
Tools & CodeExtract PDF text in your browser with LiteParse for the webSimon Willison ported LlamaIndex's LiteParse PDF extraction tool to run entirely in the browser, preserving its non-AI approach to text parsing and OCR fallback. The browser version maintains compatibility with LiteParse's core libraries while enabling client-side PDF processing without server dependencies.Simon Willison·Apr 2364
Models & ReleasesBusiness & FundingIntroducing GPT-5.5 with NVIDIAOpenAI unveiled GPT-5.5, positioning it as a task-execution model built in partnership with NVIDIA. The release highlights a new capability tier focused on autonomous problem-solving for engineering workflows, with early adoption from NVIDIA engineers.OpenAI (YouTube)·Apr 2397
Models & ReleasesFirst impressions of GPT-5.5 from Aaron FrielAaron Friel from OpenAI discusses GPT-5.5's capabilities for handling faster, more autonomous long-running tasks in a conversation with Romain Huet. The interview explores how OpenAI teams are leveraging the new model's performance improvements for extended workflows.OpenAI (YouTube)·Apr 2389
Business & FundingBret Taylor’s Sierra buys YC-backed AI startup FragmentSierra, Bret Taylor's AI customer service startup, has acquired Fragment, a YC-backed French AI firm. The deal signals consolidation in the competitive agent-software space as larger players absorb specialized capabilities.TechCrunch — AI·Apr 2358
Models & ReleasesProducts & AppsFirst impressions of GPT-5.5 from Claire VoClaire Vo, founder of ChatPRD, shared early impressions of GPT-5.5 on OpenAI's channel, highlighting how the model enables new product workflows and helps resolve bugs in her own tool. The demo signals practical adoption patterns among AI-native builders.OpenAI (YouTube)·Apr 2377
Models & ReleasesProducts & AppsFirst impressions of GPT-5.5 from Will KohWill Koh from Ramp demonstrated how GPT-5.5 enhances the fintech platform's tool-use capabilities, enabling more intelligent function selection for customers. The upgrade signals meaningful progress in model reasoning for real-world B2B workflows.OpenAI (YouTube)·Apr 2377
Hardware & InfraPolicy & RegulationCommunity Votes to Deny Water to Nuclear Weapons Data CenterA township has imposed a one-year water moratorium on a new AI data center planned by U.S. nuclear weapons researchers, blocking infrastructure critical to large-scale compute operations before construction begins.404 Media·Apr 2369
Models & ReleasesProducts & AppsA pelican for GPT-5.5 via the semi-official Codex backdoor APIOpenAI released GPT-5.5 to Codex and paid ChatGPT users, with Simon Willison reporting strong performance on code generation tasks. The API remains unavailable pending safety review, delaying broader developer access.Simon Willison·Apr 2389
Business & FundingMeta is laying off 10 percent of its staffMeta is cutting 10 percent of its workforce (roughly 8,000 employees) in May and freezing 6,000 open positions, per a memo from chief people officer Janelle Gale. The move follows the company's heavy spending on AI infrastructure and model development.The Verge — AI·Apr 2365
Products & AppsMeet Noscroll, an AI bot that does your doomscrolling for youNoscroll, a new AI bot, automates social media consumption by reading feeds on behalf of users to combat doomscrolling habits. The tool represents a niche application of AI to behavioral wellness rather than a technical or capability breakthrough.TechCrunch — AI·Apr 2347
Tools & Codellm-openai-via-codex 0.1a0Simon Willison released llm-openai-via-codex, a plugin that reuses Codex CLI credentials to route OpenAI API calls through the LLM command-line tool. The workaround lets developers access GPT models without separate authentication setup.Simon Willison·Apr 2364
Models & ReleasesProducts & AppsOpenAI unveils GPT-5.5, claims a "new class of intelligence" at double the API priceOpenAI released GPT-5.5, an agentic model capable of autonomously orchestrating multiple tools to solve complex tasks. The company doubled API pricing for the new model, positioning it as a step forward in autonomous reasoning capabilities.The Decoder·Apr 2392
Models & ReleasesOpenAI releases GPT-5.5, bringing company one step closer to an AI ‘superapp’OpenAI unveiled GPT-5.5, claiming broad capability improvements across multiple domains. The release positions the company's push toward an integrated AI platform spanning multiple use cases and modalities.TechCrunch — AI·Apr 2381
Models & ReleasesBusiness & FundingAnthropic’s Mythos breach was humiliatingAnthropic's Claude Mythos model, which the company claimed was too dangerous for public release due to advanced cybersecurity capabilities, has leaked to unauthorized users. The breach undermines Anthropic's controlled rollout strategy and raises questions about the gap between internal safety claims and actual containment.The Verge — AI·Apr 2369
Opinion & AnalysisAt 'AI Coachella,' Stanford Students Line Up to Learn From Silicon Valley RoyaltyStanford's CS 153 course has become a campus phenomenon, drawing students eager to learn directly from prominent Silicon Valley figures. The class has generated significant social media buzz, though it's sparked mixed reactions among the student body.WIRED — AI·Apr 2347
Models & ReleasesFirst impressions of GPT-5.5OpenAI researchers Will Koh, Claire Voh, and Aaron Friel shared early reactions to GPT-5.5 in a YouTube video, offering insider perspective on the model's capabilities and performance relative to its predecessors.OpenAI (YouTube)·Apr 2389
Models & ReleasesProducts & AppsIntroducing GPT-5.5OpenAI unveiled GPT-5.5, positioning it as a new class of AI capable of handling complex multi-step tasks, tool use, and self-verification for agentic workflows. The model is now available in ChatGPT and Codex, signaling a shift toward AI systems that can autonomously complete real-world work.OpenAI (YouTube)·Apr 23100
Hardware & InfraResearchGPU Renters Are Playing a Silicon LotteryResearch from William & Mary, Jefferson Lab, and Silicon Data reveals that identical GPU models exhibit substantial performance variance when rented from cloud providers, turning procurement into an unpredictable gamble for AI teams. The silicon lottery effect, where manufacturing tolerances create chip-to-chip differences, compounds cost uncertainty for organizations scaling compute infrastructure. This finding reshapes how practitioners should evaluate cloud GPU pricing and benchmarking claims, suggesting that nominal specs alone cannot guarantee consistent training or inference economics.IEEE Spectrum - AI·Apr 2369
Business & FundingProducts & AppsAWS Bets on Frontier Agents as the Next Era of Enterprise AIAWS is positioning autonomous, long-running agents as enterprise AI's next inflection point, signaling a shift from single-task models to persistent, goal-oriented systems that operate across workflows.AI Business·Apr 2361
Policy & RegulationTrump science advisor says Chinese actors are copying American AI at massive scaleThe Trump administration claims evidence of systematic model distillation attacks by Chinese actors targeting US frontier AI systems at industrial scale, signaling a shift toward more aggressive AI supply-chain defense.The Decoder·Apr 2373
Models & ReleasesOpenAI says its new GPT-5.5 model is more efficient and better at codingOpenAI released GPT-5.5, claiming improvements in efficiency and code generation over last month's GPT-5.4. The rapid iteration signals intensifying competition in frontier model capability, though concrete benchmarks remain absent from the announcement.The Verge — AI·Apr 2369
ResearchTemporal Taskification in Streaming Continual Learning: A Source of Evaluation InstabilityResearchers show that how continuous data streams are split into tasks during continual learning evaluation significantly alters benchmark results, introducing hidden instability independent of model choice. They propose metrics to diagnose taskification sensitivity and test the effect across four major CL algorithms.arXiv cs.LG·Apr 2358
ResearchEvaluation of Automatic Speech Recognition Using Generative Large Language ModelsResearchers show decoder-based LLMs can evaluate speech recognition quality far better than traditional metrics, achieving 92-94% agreement with human judges on the HATS dataset versus 63% for Word Error Rate. The finding suggests generative models offer a practical alternative to semantic embeddings for ASR evaluation.arXiv cs.CL·Apr 2358
ResearchFine-Tuning Regimes Define Distinct Continual Learning ProblemsResearchers show that how models are fine-tuned during continual learning fundamentally changes the problem itself, not just the solution. By varying which parameters remain trainable across sequential tasks, the effective learning dynamics shift, suggesting current benchmarks may unfairly compare methods across incompatible regimes.arXiv cs.LG·Apr 2352
ResearchThe Sample Complexity of MulticalibrationResearchers prove that achieving multicalibration in batch learning requires Θ(ε⁻³) samples, establishing a fundamental separation from marginal calibration's lower complexity. The result combines information-theoretic lower bounds with a practical randomized algorithm via online-to-batch conversion.arXiv cs.LG·Apr 2352