Models & ReleasesGemini 3.5 Flash has landed.Google DeepMind has released Gemini 3.5 Flash, signaling continued iteration on its flagship model line and competitive pressure in the fast-moving frontier-model space. Flash variants typically prioritize speed and cost efficiency over raw capability, positioning this release as a play for developer adoption and production workloads where latency matters. The timing and naming suggest Google is maintaining cadence against rivals while refining its model portfolio across performance tiers. For practitioners, this likely expands accessible inference options within the Gemini ecosystem.Google DeepMind (YouTube)·May 2081
Products & AppsBusiness & FundingIrisGo, a startup backed by Andrew Ng, looks to become the AI desktop buddy you never knew you neededIrisGo, backed by machine learning pioneer Andrew Ng, is positioning desktop automation as a core use case for agentic AI. The startup's core thesis centers on observational learning: rather than explicit instruction, the system watches user workflows and infers task patterns to automate repetitive actions. This represents a meaningful shift in how AI assistants might integrate into knowledge work, moving beyond chat interfaces toward continuous, context-aware task execution. Success here would validate whether desktop agents can achieve practical adoption without extensive manual configuration, a critical test for the broader agent economy.TechCrunch - AI·May 2065
ResearchModels & ReleasesThe Erdős BreakthroughOpenAI's general-purpose reasoning model has autonomously solved the planar unit distance problem, a foundational open question in discrete geometry unsolved for 80 years. Rather than confirming the long-held square-grid hypothesis, the system discovered a superior family of constructions, marking the first time an AI system has independently cracked a prominent open problem without domain-specific training. This signals a maturation in AI reasoning capabilities beyond narrow task optimization, with implications for how mathematical discovery itself may be augmented by machine reasoning at scale.OpenAI (YouTube)·May 2092
Business & FundingProducts & AppsDeepseek wants to take on Claude Code and OpenAI's Codex with "Deepseek Code"Deepseek is assembling a dedicated Beijing team to build a code-generation agent directly targeting Claude Code, OpenAI's Codex, and Cursor. The hiring signal reveals the company's strategic pivot toward autonomous coding workflows, with job postings emphasizing agent loops, Model Context Protocol expertise, and deep familiarity with existing developer tools. This move signals intensifying competition in the agentic coding layer, where Chinese AI labs are now matching Western incumbents' product roadmaps rather than trailing on model capability alone.The Decoder·May 2073
Products & AppsPolicy & RegulationLinkedIn's war on AI slop is not just a policy update, it is an admission that the platform lost control of its feedLinkedIn is deploying detection systems to filter AI-generated commodity content, achieving 94% accuracy in early trials. The move exposes a fundamental tension within Microsoft's AI strategy: the parent company simultaneously champions generative AI adoption on the platform while now needing to suppress low-quality synthetic posts that degrade user experience. This signals that scale-driven AI integration can rapidly erode platform quality, forcing costly moderation infrastructure investments and raising questions about whether AI-first product strategies require equally robust guardrails to remain viable.The Decoder·May 2073
Products & AppsResearchI Gave My OpenClaw Agent a Physical BodyAI coding capabilities are becoming a practical lever for robotics deployment, lowering the barrier to building and operating physical systems. This convergence matters because it collapses the gap between software-native AI development and hardware integration, potentially accelerating the timeline for autonomous systems in production environments. The shift signals that LLM-driven code generation is moving beyond developer convenience into infrastructure that shapes how robots are architected and scaled.WIRED - AI·May 2069
ResearchTools & CodeVariance Reduction for Expectations with Diffusion TeachersResearchers have developed CARV, a variance-reduction framework that cuts computational overhead in diffusion-model-based pipelines by 2-3x. The technique exploits the fact that downstream applications like text-to-3D and data attribution consume expensive Monte Carlo gradients; CARV amortizes costly upstream operations (rendering, simulation) across cheaper noise resampling, using importance sampling and stratified sampling to sharpen estimates. This addresses a real bottleneck in production diffusion workflows where gradient variance, not model inference, dominates wall-clock cost. The work signals growing focus on making frozen pretrained diffusion models practical as reusable components in larger systems.arXiv cs.LG·May 2062
ResearchEquilibrium Reasoners: Learning Attractors Enables Scalable ReasoningEquilibrium Reasoners introduces a theoretical framework for understanding how iterative test-time compute enables generalization in reasoning models. By modeling inference as convergence toward task-conditioned attractors in latent space, the work decouples scaling gains from external verifiers or domain-specific constraints. This shifts the mechanistic understanding of why iterative refinement works, with implications for how future reasoning systems should be architected and evaluated. The dual-axis scaling approach (depth via iterations, breadth via trajectory aggregation) offers a blueprint for practitioners optimizing inference-time resource allocation.arXiv cs.LG·May 2062
ResearchTools & CodeQuantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning RateResearchers have developed a quantitative framework for measuring how well hyperparameter transfer works when scaling language models from small to large sizes. The work examines why techniques like Maximal Update Parameterization (μP) succeed at preserving optimal learning rates across scales, introducing three metrics to evaluate transfer quality and extrapolation robustness. This directly addresses a critical bottleneck in LLM training: finding hyperparameters that work at production scale without expensive full-size experiments. The findings could reduce the computational cost and trial-and-error involved in training frontier models.arXiv cs.LG·May 2062
ResearchModels & ReleasesEvoStruct: Bridging Evolutionary and Structural Priors for Antibody CDR Design via Protein Language Model AdaptationEvoStruct addresses a critical failure mode in structural protein design: equivariant GNNs trained on limited 3D data learn skewed amino acid distributions that ignore evolutionary constraints, causing vocabulary collapse. By freezing a protein language model as a prior and adapting it via cross-attention to 3D context, the work recovers evolutionary substitution patterns while maintaining structural validity. This bridges two previously siloed inductive biases, offering a template for hybrid architectures where learned priors from large-scale sequence data constrain structure-conditioned generation. The approach matters for antibody engineering and signals broader progress in multi-modal protein design beyond pure end-to-end learning.arXiv cs.LG·May 2062
ResearchModels & ReleasesVelocityformer: Broken-Symmetry-Matched Equivariant Graph Transformers for Cosmological Velocity ReconstructionVelocityformer demonstrates a strategic shift in how ML practitioners design architectures for physics-constrained domains. Rather than applying generic transformers, the team built symmetry-breaking directly into the inductive bias to match observational reality in cosmological surveys. This approach, matching model structure to data asymmetries rather than underlying physics alone, offers a template for other scientific ML problems where measurement geometry diverges from theoretical symmetry. The work signals growing sophistication in domain-specific architectural choices beyond scale and parameter count.arXiv cs.LG·May 2052
Tools & CodeResearchAiraXiv: An AI-Driven Open-Access Platform for Human and AI ScientistsAiraXiv reimagines academic publishing for an era where AI systems author and review research alongside humans. The platform addresses a structural bottleneck in traditional venues: exponential submission growth, reviewer burnout, and venue capacity constraints. By combining open preprints with AI-augmented peer review and iterative feedback loops, AiraXiv shifts from gated, static publication toward continuous, collaborative refinement. This matters because it signals how infrastructure itself must evolve as AI participation in knowledge production becomes routine, not exceptional. The Model Context Protocol integration suggests interoperability standards for AI-native workflows are emerging as a practical necessity.arXiv cs.CL·May 2058
Tools & CodeOpinion & AnalysisHow fast is 10 tokens per second really?Mike Veerman's interactive token-speed simulator addresses a persistent friction point in LLM evaluation: the gap between advertised throughput metrics and user experience. By rendering real-time token generation across a 5-800 tokens/second range, the tool lets practitioners calibrate expectations against actual latency perception, surfacing why a model's raw speed claim often diverges from perceived responsiveness. This matters as inference speed becomes a primary competitive lever in the model market, and buyers increasingly need intuition for what throughput numbers mean in practice.Simon Willison·May 2072
ResearchIs Fixing Schema Graphs Necessary? Full-Resolution Graph Structure Learning for Relational Deep LearningResearchers propose FROG, a framework that treats relational database structure as a learnable component rather than a fixed constraint in graph neural network pipelines. This challenges a foundational design assumption in Relational Deep Learning, where rigid schema graphs have been treated as immutable. The work reframes table roles as dynamic nodes and edges during message passing, potentially unlocking better performance on real-world database prediction tasks by letting models discover optimal relational representations end-to-end. For practitioners building GNN systems over structured data, this signals a shift toward more flexible graph construction that could reduce manual schema engineering overhead.arXiv cs.LG·May 2058
ResearchTools & CodeAgent JIT Compilation for Latency-Optimizing Web Agent Planning and SchedulingResearchers propose agent JIT compilation, a technique that transforms natural-language task descriptions into optimized executable code rather than relying on sequential LLM-driven loops. The approach addresses a critical bottleneck in computer-use agents: latency and tool-use errors stemming from repeated screenshot-plan-execute cycles. By compiling tasks upfront with built-in parallelization and LLM calls, the method reduces inference overhead and improves reliability for browser automation and similar workflows. This represents a meaningful shift in how agentic systems balance planning efficiency with execution fidelity, with implications for production deployment of autonomous task agents.arXiv cs.LG·May 2062
ResearchTools & CodeYou Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 TrajectoriesResearchers have uncovered that reinforcement learning trajectories in LLMs exhibit extreme low-rank structure, with most performance gains captured by rank-1 approximations that scale linearly with training. This finding enables RELEX, a compute-efficient extrapolation method that predicts future model checkpoints from brief observation windows using linear regression. The discovery has immediate practical implications for RLVR training efficiency and suggests deeper geometric regularities in how LLMs adapt during reasoning-focused fine-tuning, potentially reshaping how labs approach scaling and checkpoint management.arXiv cs.CL·May 2062
ResearchDelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable RewardsA new framework called DelTA reframes how reinforcement learning from verifiable rewards updates language model behavior at the token level. Rather than treating reward signals as opaque black boxes, the work models policy gradient updates as linear discriminators over token embeddings, revealing that standard sequence-level rewards can be dominated by high-frequency tokens. This insight matters because it exposes a fundamental misalignment between how we measure LLM reasoning improvements and how those improvements actually propagate through the model, potentially enabling more targeted and efficient RLVR training in the future.arXiv cs.CL·May 2062
ResearchTools & CodeLeveraging LLMs for Grammar Adaptation: A Study on Metamodel-Grammar Co-EvolutionResearchers demonstrate that LLMs can automate grammar adaptation when domain-specific language metamodels evolve, reducing manual engineering overhead. The work trains on four Xtext DSLs to develop prompting strategies, then validates on two held-out languages plus a longitudinal QVTo case study. This signals a practical frontier where LLMs move beyond code generation into model-driven engineering workflows, automating consistency maintenance that typically demands specialized expertise. The approach's success across multiple DSLs suggests broader applicability to infrastructure-heavy software development pipelines.arXiv cs.CL·May 2054
ResearchModels & ReleasesMem-$π$: Adaptive Memory through Learning When and What to GenerateMem-π introduces a generative approach to agent memory that inverts the retrieval paradigm. Rather than fetching static entries from external stores, a dedicated model generates contextually tailored guidance on demand, deciding both when and what to produce through decoupled reinforcement learning. This shifts memory-augmented systems from similarity-based lookup toward dynamic synthesis, potentially improving alignment between agent context and guidance quality. The technique addresses a core friction point in current LLM agents: rigid episodic memory often mismatches task requirements, forcing agents to work around stale or irrelevant stored information.arXiv cs.CL·May 2062
ResearchTools & CodeA Machine Learning Framework for Weighted Least Squares GNSS Positioning based on Activation FunctionsResearchers propose integrating activation functions into weighted least squares algorithms to improve GNSS positioning accuracy in urban environments where signal degradation is endemic. The framework addresses a real infrastructure challenge: multipath effects and non-line-of-sight reception in dense urban settings introduce systematic errors that traditional satellite positioning cannot filter. By applying neural network-style activation functions to signal weighting, the approach treats GNSS error correction as a learned optimization problem rather than a purely geometric one. This represents a broader trend of applying deep learning primitives to classical engineering problems where domain-specific noise patterns can be learned from data, potentially improving resilience in autonomous vehicles, precision agriculture, and location services operating in challenging RF environments.arXiv cs.LG·May 2042
ResearchMind the Sim-to-Real Gap & Think Like a ScientistA new theoretical framework addresses a critical bottleneck in deploying learned simulators: when to trust model predictions versus running costly real-world experiments. The work decomposes simulator error into two components, one addressable through randomized testing and one irreducible, then quantifies how policy performance degrades across visited versus unexplored states. This directly impacts robotics, autonomous systems, and any domain where simulation calibration is expensive but real feedback is scarce, offering principled guidance for practitioners balancing computational efficiency against deployment risk.arXiv cs.LG·May 2062
ResearchMitigating Label Bias with Interpretable Rubric EmbeddingsResearchers propose rubric embeddings as a structural fix for bias inheritance in ML systems trained on flawed historical labels. Rather than relying on opaque feature representations, the method anchors predictions to expert-defined criteria that map directly to measurable constructs, making bias sources visible and contestable. This addresses a critical vulnerability in high-stakes domains like hiring and admissions where models amplify past discrimination at scale. The approach shifts focus from post-hoc fairness patches to interpretability-first design, potentially reshaping how practitioners validate training data quality before deployment.arXiv cs.LG·May 2062
ResearchApproximation Theory for Neural Networks: Old and NewA comprehensive survey of approximation theory for neural networks traces how four decades of mathematical research evolved from proving universal expressiveness into a quantitative framework linking network architecture to learning efficiency. The work bridges classical single-layer density results with modern insights on depth, width, and parameter scaling, directly informing how practitioners design networks and theorists understand the relationship between model capacity and generalization. For researchers and engineers, this synthesis clarifies why architectural choices matter and establishes rigorous foundations for ongoing work in efficient model design.arXiv cs.LG·May 2058
Tools & CodeResearchtorchtune: PyTorch native post-training libraryMeta's torchtune addresses a structural gap in the LLM post-training workflow by prioritizing modularity and PyTorch transparency over abstraction. Rather than hiding complexity behind specialized recipes, the library exposes underlying components for researchers and practitioners who need to customize fine-tuning pipelines. This reflects a broader shift toward giving practitioners direct control over training infrastructure, particularly as open-weight model adaptation becomes the primary lever for downstream performance. For teams building proprietary variants or experimenting with novel training techniques, direct PyTorch access reduces friction compared to opaque frameworks that trade extensibility for convenience.arXiv cs.LG·May 2062
Products & AppsBusiness & FundingBuckle up: Google is set to remake search with agentic AI in 2026Google is positioning agentic AI as the next inflection point for search, signaling a shift from retrieval-based ranking to autonomous task execution within queries. This move challenges the foundational search paradigm that has defined Google's dominance for two decades, forcing competitors and the broader industry to reckon with AI agents as a primary interface for information discovery. The strategic stakes are enormous: whoever controls agentic search controls the gateway to digital commerce, knowledge access, and user attention in an AI-native world.Ars Technica - AI·May 2081
ResearchModels & ReleasesNeural Negative Binomial Regression for Weekly Seismicity Forecasting: Per-Cell Dispersion Estimation and Tail Risk AssessmentResearchers introduce EarthquakeNet, a neural architecture that learns per-location overdispersion parameters for seismic forecasting rather than assuming a global statistical model. The work demonstrates that standard Poisson assumptions fail dramatically on real seismic data (p < 10^-179) and proposes learned spatial embeddings to capture localized variance patterns. This represents a methodological shift in domain-specific forecasting: moving from hand-tuned statistical assumptions to neural-learned heterogeneous parameters, a pattern increasingly relevant across scientific computing and risk modeling where one-size-fits-all distributional assumptions break down.arXiv cs.LG·May 2052
ResearchModels & ReleasesGaussian Sheaf Neural NetworksGaussian Sheaf Neural Networks address a structural gap in graph neural networks by treating node features as probability distributions rather than flattened vectors. Traditional GNNs lose geometric meaning when encoding Gaussian parameters, but GSNNs leverage cellular sheaf theory to preserve the algebraic properties of means and covariances during message passing. This work matters for domains where uncertainty quantification and relational structure matter equally, from molecular modeling to Bayesian inference on graphs, potentially reshaping how practitioners handle probabilistic node attributes in production systems.arXiv cs.LG·May 2058
Business & FundingPolicy & RegulationOpenAI barrels towards IPO that may happen in SeptemberOpenAI is accelerating IPO preparations following a legal victory against Elon Musk's lawsuit, which had challenged the company's nonprofit-to-capped-profit structure and threatened its financial stability. A September listing would mark a watershed moment for the AI industry, converting the most visible large language model developer into a public company and potentially reshaping how frontier AI labs balance research investment with shareholder returns. The timing signals confidence in OpenAI's business model and revenue trajectory, while raising questions about governance and capital allocation in an era when AI infrastructure spending continues to climb.TechCrunch - AI·May 2081
Business & FundingPolicy & RegulationOpenAI barrels toward IPO that may happen in SeptemberOpenAI has resumed IPO preparations following Elon Musk's failed legal challenge to the company's nonprofit structure, signaling a potential September listing. The move marks a critical inflection point for the AI industry's financial architecture: a for-profit transition by the sector's most visible frontier lab would reshape how capital flows to AI development, influence valuation benchmarks for competing labs, and test whether public markets can price long-horizon AI R&D spending. The timing matters because it arrives as regulatory scrutiny of AI governance intensifies, making OpenAI's corporate restructuring a bellwether for how the industry balances growth ambitions with stakeholder accountability.TechCrunch - AI·May 2081
Hardware & InfraBusiness & FundingAlibaba Aims for Independence with New AI Chips, ModelAlibaba is executing a vertical integration strategy to reduce dependence on Nvidia by developing proprietary AI chips and models in-house. This move signals intensifying competition in the AI infrastructure layer, where major cloud vendors and tech conglomerates are now treating chip design as a core competency rather than a procurement decision. Success here would reshape vendor lock-in dynamics and give Alibaba pricing leverage in its cloud business, while failure would strain capital allocation. The broader implication: Nvidia's dominance faces structural pressure from well-capitalized competitors willing to absorb R&D costs to control their AI stack.AI Business·May 2066