Opinion & AnalysisResearchArtificial scientistsMIT Technology Review examines how AI companies justify their existence through promised scientific breakthroughs, while exploring what LLMs can actually deliver in research workflows today versus the hype around future discoveries like cancer cures.MIT Technology Review — AI·8h ago72
ResearchWorld modelsMIT Technology Review examines why AI systems excel at digital tasks like writing and coding but struggle with physical-world challenges such as laundry folding and street navigation. The piece explores world models as a potential path toward embodied AI that can reason about and manipulate the physical environment.MIT Technology Review — AI·8h ago77
ResearchOpinion & AnalysisAgent orchestrationMIT Technology Review examines AI agents as the next frontier beyond conversational LLMs, arguing they're central to near-term applications from drug discovery to workforce disruption. The piece positions agent orchestration as the capability gap between today's chatbots and transformative real-world impact.MIT Technology Review — AI·8h ago77
Policy & RegulationResearchSupercharged scamsCriminals are weaponizing large language models to automate phishing and spam campaigns at scale, exploiting the same text-generation capabilities that made ChatGPT popular. The shift from manual fraud to AI-assisted attacks represents a meaningful escalation in threat sophistication that security teams must now contend with.MIT Technology Review — AI·8h ago77
ResearchBusiness & FundingHumanoid dataCompanies are recruiting humans to generate training data for robotics AI by paying them to perform mundane tasks on camera or remotely operate robotic arms. The practice raises questions about data sourcing economics and labor practices in the AI supply chain.MIT Technology Review — AI·8h ago77
Policy & RegulationResearchWeaponized deepfakesDeepfake technology has crossed from theoretical threat to practical weapon as generative models become cheaper and easier to deploy. MIT Technology Review reports that accessibility improvements now enable widespread malicious use at scale.MIT Technology Review — AI·8h ago89
Business & FundingResearchAI research lab NeoCognition lands $40M seed to build agents that learn like humansNeoCognition, founded by an Ohio State researcher, raised $40M to develop AI agents capable of rapidly acquiring expertise across domains. The startup's approach targets a core challenge in AI: building systems that generalize learning strategies rather than requiring task-specific training.TechCrunch — AI·9h ago65
ResearchGeneralization at the Edge of StabilityResearchers model neural network training as random dynamical systems converging to fractal attractors rather than fixed points, introducing 'sharpness dimension' to explain why chaotic optimization regimes improve generalization. The work bridges Lyapunov theory and deep learning, offering theoretical grounding for why large learning rates often outperform conservative training.arXiv cs.LG·10h ago62
ResearchSafe Continual Reinforcement Learning in Non-stationary EnvironmentsResearchers tackle the intersection of safe and continual reinforcement learning, addressing a gap where RL systems must adapt to changing real-world dynamics while maintaining safety constraints throughout training and deployment. The work targets physical control systems where transient safety violations during learning are unacceptable.arXiv cs.LG·10h ago52
ResearchFASTER: Value-Guided Sampling for Fast RLResearchers propose FASTER, a technique that cuts computational cost of sampling-based RL policies by modeling action filtering as an MDP, enabling value-guided early termination during diffusion denoising rather than waiting for full generation.arXiv cs.LG·10h ago58
ResearchFB-NLL: A Feature-Based Approach to Tackle Noisy Labels in Personalized Federated LearningResearchers propose FB-NLL, a federated learning framework that improves personalization across distributed devices by clustering users through feature-space analysis rather than training dynamics, making the system more robust to corrupted data and mislabeled examples.arXiv cs.LG·10h ago52
Tools & CodeResearchVLA Foundry: A Unified Framework for Training Vision-Language-Action ModelsVLA Foundry unifies language, vision, and action model training in a single open-source codebase, eliminating the fragmented pipeline problem that has plagued prior robotics-focused AI efforts. The team released two model variants and benchmarked them on an open simulator, offering practitioners an end-to-end training stack from scratch or via pretrained backbones.arXiv cs.LG·10h ago58
ResearchBenign Overfitting in Adversarial Training for Vision TransformersResearchers provide the first theoretical framework showing Vision Transformers can achieve robust generalization under adversarial training within specific signal-to-noise and perturbation conditions, resolving a gap between ViT empirical robustness and formal understanding.arXiv cs.LG·10h ago58
ResearchAdaptive MSD-Splitting: Enhancing C4.5 and Random Forests for Skewed Continuous AttributesResearchers propose Adaptive MSD-Splitting, an improvement to the MSD-Splitting discretization technique for decision trees that dynamically adjusts binning thresholds to handle skewed data distributions. The method addresses a key limitation of the original approach, which struggled with real-world biomedical and financial datasets where asymmetry causes information loss.arXiv cs.LG·10h ago42
ResearchDiscovering a Shared Logical Subspace: Steering LLM Logical Reasoning via Alignment of Natural-Language and Symbolic ViewsResearchers discovered that LLMs maintain a shared internal logical subspace bridging natural-language and symbolic reasoning, using Canonical Correlation Analysis to extract a low-dimensional representation that captures reasoning independent of surface form. This finding suggests LLMs don't need external symbolic solvers and could improve multi-step logical reasoning through better alignment of these dual views.arXiv cs.CL·11h ago62
ResearchEpistemic orientation in parliamentary discourse is associated with deliberative democracyResearchers developed an LLM-based metric to quantify whether parliamentary speech leans toward evidence or intuition, then applied it to 15 million speeches across seven countries since 1946. The analysis reveals correlations between evidence-based discourse and stronger democratic institutions, offering a scalable method for measuring epistemic quality in political communication.arXiv cs.CL·11h ago58
ResearchPlanning in entropy-regularized Markov decision processes and gamesResearchers introduce SmoothCruiser, a planning algorithm that solves entropy-regularized MDPs and two-player games with polynomial sample complexity O(1/epsilon^4), addressing a gap where non-regularized settings lack worst-case guarantees.arXiv cs.LG·11h ago52
ResearchAn Answer is just the Start: Related Insight Generation for Open-Ended Document-Grounded QAResearchers introduce a new task and dataset for improving QA systems beyond single-answer retrieval. SCOpE-QA contains 3,000 open-ended questions designed to train models that generate follow-up insights, enabling iterative refinement of answers rather than static responses.arXiv cs.CL·11h ago52
ResearchPREF-XAI: Preference-Based Personalized Rule Explanations of Black-Box Machine Learning ModelsResearchers propose PREF-XAI, a framework that tailors model explanations to individual user preferences rather than applying one-size-fits-all interpretability methods. The approach treats explanation generation as a preference-learning problem, addressing a gap in XAI where cognitive constraints and user goals vary widely.arXiv cs.LG·11h ago52
ResearchExploring Language-Agnosticity in Function Vectors: A Case Study in Machine TranslationResearchers found that function vectors—task representations extracted from multilingual LLMs during in-context learning—transfer across languages when trained on a single translation direction. Translation vectors learned from English-to-one-language pairs improved token ranking in unseen target languages, suggesting language-agnostic task encoding in decoder-only models.arXiv cs.CL·11h ago52
ResearchLearning Hybrid-Control Policies for High-Precision In-Contact Manipulation Under UncertaintyResearchers propose hybrid position-force control policies that let reinforcement learning agents dynamically switch between force and position control for delicate manipulation tasks like connector insertion. A new training method called MATCH improves learning efficiency by handling contact mode transitions.arXiv cs.LG·11h ago52
ResearchBudgeted Online Influence MaximizationResearchers propose a budget-constrained algorithm for selecting influencers in social ad campaigns, replacing traditional cardinality limits with real-world cost modeling. The approach improves regret bounds for both budget and cardinality settings under cascade diffusion models with semi-bandit feedback.arXiv cs.LG·11h ago42
ResearchHardNet++: Nonlinear Constraint Enforcement in Neural NetworksHardNet++ enforces both linear and nonlinear constraints on neural network outputs during inference, addressing a gap in existing methods that either lack guarantees or work only for specific constraint types. The technique matters for safety-critical applications like control systems and autonomous decision-making where constraint violations carry real costs.arXiv cs.LG·11h ago52
ResearchTools & CodeChat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural LanguageResearchers introduced Chat2Workflow, a benchmark and agentic framework for converting natural language into executable visual workflows, addressing the manual engineering bottleneck in industrial automation. The work tests whether LLMs can automate multi-step workflow design and error correction without human intervention.arXiv cs.CL·11h ago58
ResearchFrom Top-1 to Top-K: A Reproducibility Study and Benchmarking of Counterfactual Explanations for Recommender SystemsResearchers unified evaluation of eleven counterfactual explanation methods for recommender systems, addressing fragmentation across datasets, metrics, and protocols that previously blocked fair comparison. The benchmarking framework assesses explainers across three dimensions, covering both native methods like LIME-RS and SHAP plus graph neural network approaches.arXiv cs.LG·11h ago52
ResearchDisentangling Damage from Operational Variability: A Label-Free Self-Supervised Representation Learning Framework for Output-Only Structural Damage IdentificationResearchers propose a self-supervised learning framework using disentangled representations to identify structural damage from vibration signals while filtering out environmental noise. The approach uses an autoencoder with VICReg regularization to separate damage-induced changes from operational variability, addressing a key challenge in structural health monitoring.arXiv cs.LG·12h ago42
ResearchPause or Fabricate? Training Language Models for Grounded ReasoningResearchers propose GRIL, a reinforcement learning framework that trains language models to recognize when they lack sufficient information for reliable inference, rather than confidently fabricating answers. The approach decomposes reasoning into clarification and pause stages, addressing a fundamental failure mode in LLM reasoning under incomplete data.arXiv cs.CL·12h ago58
ResearchThe signal is the ceiling: Measurement limits of LLM-predicted experience ratings from open-ended survey textResearchers tested whether prompt engineering or model selection better improves LLM accuracy on fan experience ratings from baseball survey text. Prompt tweaks yielded only 2 percentage points of gain (67% to 69% accuracy), while GPT-5.2 and GPT-4.1-mini both underperformed the baseline, suggesting diminishing returns on optimization.arXiv cs.CL·12h ago42
ResearchModels & ReleasesMicro Language Models Enable Instant ResponsesResearchers developed micro language models (8M–30M parameters) that generate the first few words of responses directly on edge devices like smartwatches, while cloud models complete the sentence—eliminating multi-second latency gaps. The approach matches performance of 70M–256M parameter models while enabling genuinely responsive on-device AI.arXiv cs.CL·12h ago62
ResearchModels & ReleasesSafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language ModelsResearchers benchmarked eleven multimodal LLMs from Qwen, Gemma, and Gemini families on embodied safety planning in kitchen environments, finding models recognize hazards well in Q&A but fail to mitigate risks when acting as autonomous agents.arXiv cs.CL·12h ago58