Research Brief
Key Takeaways
- • Key findings from research papers
Sources
- LLM4Branch: Large Language Model for Discovering Efficient Branching Policies of Integer Programs
- Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits
- PLACO: A Multi-Stage Framework for Cost-Effective Performance in Human-AI Teams
- SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents
- Political Plasticity: An Analysis of Ideological Adaptability in Large Language Models
- Measuring What Matters: Benchmarking Generative, Multimodal, and Agentic AI in Healthcare
- AI-Care: A Conversational Agentic System for Task Coordination in Alzheimer's Disease Care
- Results and Retrospective Analysis of the CODS 2025 AssetOpsBench Challenge
- OracleTSC: Oracle-Informed Reward Hurdle and Uncertainty Regularization for Traffic Signal Control
- Biological Plausibility and Representational Alignment of Feedback Alignment in Convolutional Networks
- DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules
- Reconciling Consistency-Based Diagnosis with Actual-Causality-Based Explanations
- AgentPSO: Evolving Agent Reasoning Skill via Multi-agent Particle Swarm Optimization
- Value-Decomposed Reinforcement Learning Framework for Taxiway Routing with Hierarchical Conflict-Aware Observations
- Done, But Not Sure: Disentangling World Completion from Self-Termination in Embodied Agents
- EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems
- Not All Turns Matter: Credit Assignment for Multi-Turn Jailbreaking
- Mirror, Mirror on the Wall: Can VLM Agents Tell Who They Are at All?
- When Agents Overtrust Environmental Evidence: An Extensible Agentic Framework for Benchmarking Evidence-Grounding Defects in LLM Agents
- SynerDiff: Synergetic Continuous Batching for Fast and Parallel Diffusion Model Inference
- Ace-Skill: Bootstrapping Multimodal Agents with Prioritized and Clustered Evolution
- MDGYM: Benchmarking AI Agents on Molecular Simulations
- Can We Formally Verify Neural PDE Surrogates? SMT Compilation of Small Fourier Neural Operators
- Latency Analysis and Optimization of Alpamayo 1 via Efficient Trajectory Generation
- Re$^2$Math: Benchmarking Theorem Retrieval in Research-Level Mathematics
- Containment Verification: AI Safety Guarantees Independent of Alignment
- When (and How) to Trust the Expert: Diagnosing Query-Time Expert-Guided Reinforcement Learning
- FORTIS: Benchmarking Over-Privilege in Agent Skills
- CIVeX: Causal Intervention Verification for Language Agents
- The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations
- Evidence Over Plans: Online Trajectory Verification for Skill Distillation
- Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding
- PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning
- Beyond ESG Scores: Learning Dynamic Constraints for Sequential Portfolio Optimization
- SKG-VLA: Scene Knowledge Graph Priors for Structured Scene Semantics and Multimodal Reasoning for Decision Making
- Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation
- The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?
- Explainable Knowledge Tracing via Probabilistic Embeddings and Pattern-based Reasoning
- Towards a Virtual Neuroscientist: Autonomous Neuroimaging Analysis via Multi-Agent Collaboration
- VulTriage: Triple-Path Context Augmentation for LLM-Based Vulnerability Detection
- Functional Stable Model Semantics and Answer Set Programming Modulo Theories
- Weighted Rules under the Stable Model Semantics
- CodeClinic: Evaluating Automation of Coding Skills for Clinical Reasoning Agents
- Primal-Dual Guided Decoding for Constrained Discrete Diffusion
- Attribution-based Explanations for Markov Decision Processes
- Marrying Generative Model of Healthcare Events with Digital Twin of Social Determinants of Health for Disease Reasoning
- Fairness of Explanations in Artificial Intelligence (AI): A Unifying Framework, Axioms, and Future Direction toward Responsible AI
- When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning
- HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
- LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models
- Optimizer-Induced Mode Connectivity: From AdamW to Muon
- MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs
- Rethinking Constraint Awareness for Efficient State Embedding of Neural Routing Solver
- Useful for Exploration, Risky for Precision: Evaluating AI Tools in Academic Research
- FormalRewardBench: A Benchmark for Formal Theorem Proving Reward Models
- Hypothesis-Driven Deep Research with Large Language Models: A Structured Methodology for Automated Knowledge Discovery
- Verifiable Process Rewards for Agentic Reasoning
- Positive Alignment: Artificial Intelligence for Human Flourishing
- Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values
- PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents
- CORTEG: Foundation Models Enable Cross-Modality Representation Transfer from Scalp to Intracranial Brain Recordings
- How Mobile World Model Guides GUI Agents?
- GuardAD: Safeguarding Autonomous Driving MLLMs via Markovian Safety Logic
- Agent-X: Full Pipeline Acceleration of On-device AI Agents
- Can Agent Benchmarks Support Their Scores? Evidence-Supported Bounds for Interactive-Agent Evaluation
- SkillEvolver: Skill Learning as a Meta-Skill
- PrimeKG-CL: A Continual Graph Learning Benchmark on Evolving Biomedical Knowledge Graphs
- Bridging Sequence and Graph Structure for Epigenetic Age Prediction
- LLARS: Enabling Domain Expert & Developer Collaboration for LLM Prompting, Generation and Evaluation
- Budget-Efficient Automatic Algorithm Design via Code Graph
- PRISM: Generation-Time Detection and Mitigation of Secret Leakage in Multi-Agent LLM Pipelines
- Teacher-Aware Evolution of Heuristic Programs from Learned Optimization Policies
- diffGHOST: Diffusion based Generative Hedged Oblivious Synthetic Trajectories
- The Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agents
- GESR: A Genetic Programming-Based Symbolic Regression Method with Gene Editing
- MATRA: Modeling the Attack Surface of Agentic AI Systems -- OpenClaw Case Study
- New AI-Driven Tools for Enhancing Campus Well-being: A Prevention and Intervention Approach
- NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation
- From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World
- BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD
- Autonomous FAIR Digital Objects: From Passive Assertions to Active Knowledge
- EGL-SCA: Structural Credit Assignment for Co-Evolving Instructions and Tools in Graph Reasoning Agents
- TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
- IndustryBench: Probing the Industrial Knowledge Boundaries of LLMs
- E-TCAV: Formalizing Penultimate Proxies for Efficient Concept Based Interpretability
- Towards Autonomous Railway Operations: A Semi-Hierarchical Deep Reinforcement Learning Approach to the Vehicle Rescheduling Problem
- SciIntegrity-Bench: A Benchmark for Evaluating Academic Integrity in AI Scientist Systems
- Beyond Autonomy: A Dynamic Tiered AgentRunner Framework for Governable and Resilient Enterprise AI Execution
- TRACE: Distilling Where It Matters via Token-Routed Self On-Policy Alignment
- Benchmarking Safety Risks of Knowledge-Intensive Reasoning under Malicious Knowledge Editing
- TimeClaw: A Time-Series AI Agent with Exploratory Execution Learning
- From Single-Step Edit Response to Multi-Step Molecular Optimization
- Prospective Compression in Human Abstraction Learning
- expo: Exploration-prioritized policy optimization via adaptive kl regulation and gaussian curriculum sampling
- RADAR: Redundancy-Aware Diffusion for Multi-Agent Communication Structure Generation
- EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents
- Medical Model Synthesis Architectures: A Case Study
- Unpredictability dissociates from structured control in language agents
- Absurd World: A Simple Yet Powerful Method to Absurdify the Real-world for Probing LLM Reasoning Capabilities
- Cplus2ASP: Computing Action Language C+ in Answer Set Programming
- WindINR: Latent-State INR for Fast Local Wind Query and Correction in Complex Terrain
- Don't Click That: Teaching Web Agents to Resist Deceptive Interfaces
- SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning
- From Passive Reuse to Active Reasoning: Grounding Large Language Models for Neuro-Symbolic Experience Replay
- CHAINTRIX: A multi-pipeline LLM-augmented framework for automated smart-contract security auditing
- Dsat: A Native SAT Solver for Discrete Logic
- Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory
- The First Drop of Ink: Nonlinear Impact of Misleading Information in Long-Context Reasoning
- CLEF: EEG Foundation Model for Learning Clinical Semantics
- PathISE: Learning Informative Path Supervision for Knowledge Graph Question Answering
- Probing Cross-modal Information Hubs in Audio-Visual LLMs
- Navigating the Sea of LLM Evaluation: Investigating Bias in Toxicity Benchmarks
- A Resilient Solution for Sewer Overflow Monitoring across Cloud and Edge
- Deep Arguing
- Consistency as a Testable Property: Statistical Methods to Evaluate AI Agent Reliability
- SLASH the Sink: Sharpening Structural Attention Inside LLMs
- EmbodiSkill: Skill-Aware Reflection for Self-Evolving Embodied Agents
- AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks
- Learning the Interaction Prior for Protein-Protein Interaction Prediction: A Model-Agnostic Approach
- Automated Approach for Solving Infinite-state Polynomial Reachability Games
- M2A: Synergizing Mathematical and Agentic Reasoning in Large Language Models
- Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations
- Workspace Optimization: How to Train Your Agent
- PDEAgent-Bench: A Multi-Metric, Multi-Library Benchmark for PDE Solver Generation
- TIDE-Bench: Task-Aware and Diagnostic Evaluation of Tool-Integrated Reasoning
- EpiGraph: A Knowledge Graph and Benchmark for Evidence-Intensive Reasoning in Epilepsy
- Position: Avoid Overstretching LLMs for every Enterprise Task
- How LLMs Are Persuaded: A Few Attention Heads, Rerouted
- Beyond Accuracy: Evaluating Strategy Diversity in LLM Mathematical Reasoning
- EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium
- Towards Conversational Medical AI with Eyes, Ears and a Voice
- Agentic MIP Research: Accelerated Constraint Handler Generation
- From Holo Pockets to Electron Density: GPT-style Drug Design with Density
- MIND-Skill: Quality-Guaranteed Skill Generation via Multi-Agent Induction and Deduction
- The Echo Amplifies the Knowledge: Somatic Marker Analogues in Language Models via Emotion Vector Re-Injection
- What Will Happen Next: Large Models-Driven Deduction for Emergency Instances
- Evaluating Developmental Cognition Capabilities of LLMs
- Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models
- Behavioral Determinants of Deployed AI Agents in Social Networks: A Multi-Factor Study of Personality, Model, and Guardrail Specification
- LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification
- Belief or Circuitry? Causal Evidence for In-Context Graph Learning
- On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective
- Playing games with knowledge: AI-Induced delusions need game theoretic interventions
- Hierarchical Causal Abduction: A Foundation Framework for Explainable Model Predictive Control
- The Open-Box Fallacy: Why AI Deployment Needs a Calibrated Verification Regime
- Agent-First Tool API: A Semantic Interface Paradigm for Enterprise AI Agent Systems
- ASIA: an Autonomous System Identification Agent
- Agentic Performance at the Edge: Insights from Benchmarking
- How Much is Brain Data Worth for Machine Learning?
- Learning the Preferences of a Learning Agent
- Open Ontologies: Tool-Augmented Ontology Engineering with Stable Matching Alignment
- Self-ReSET: Learning to Self-Recover from Unsafe Reasoning Trajectories
- PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting
- Internalizing Safety Understanding in Large Reasoning Models via Verification
- Forge: Quality-Aware Reinforcement Learning for NP-Hard Optimization in LLMs
- Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs
- C2L-Net: A Data-Driven Model for State-of-Charge Estimation of Lithium-Ion Batteries During Discharge
- TrajPrism: A Multi-Task Benchmark for Language-Grounded Urban Trajectory Understanding
- ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox
- Alignment as Jurisprudence
- Constant-Target Energy Matching: A Unified Framework for Continuous and Discrete Density Estimation
- OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces
- Generalization Bounds of Emergent Communications for Agentic AI Networking
- Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace
- Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning
- How You Begin is How You Reason: Driving Exploration in RLVR via Prefix-Tuned Priors
- When Can Human-AI Teams Outperform Individuals? Tight Bounds with Impossibility Guarantees
- The Attacker in the Mirror: Breaking Self-Consistency in Safety via Anchored Bipolicy Self-Play
- Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria
- MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs
- MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments
- MaD Physics: Evaluating information seeking under constraints in physical environments
- The Gordian Knot for VLMs: Diagrammatic Knot Reasoning as a Hard Benchmark
- A Reflective Storytelling Agent for Older Adults: Integrating Argumentation Schemes and Argument Mining in LLM-Based Personalised Narratives
- CauSim: Scaling Causal Reasoning with Increasingly Complex Causal Simulators
- UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence
- SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks
- CATO: Charted Attention for Neural PDE Operators
- Sufficient conditions for a Heuristic Rating Estimation Method application
- Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization
- Agentic AI Scientists Are Not Built For Autonomous Scientific Discovery
- FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences
- Mental Health AI Safety Claims Must Preserve Temporal Evidence
- AHD Agent: Agentic Reinforcement Learning for Automatic Heuristic Design
- RewardHarness: Self-Evolving Agentic Post-Training
- MBP-KT: Learning Global Collaborative Information from Meta-Behavioral Pattern for Enhanced Knowledge Tracing
- SkillMaster: Toward Autonomous Skill Mastery in LLM Agents
- Embeddings for Preferences, Not Semantics
- Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction
- Arcane: An Assertion Reduction Framework through Semantic Clustering and MCTS-Guided Rule Exploring
- Active Testing of Large Language Models via Approximate Neyman Allocation
- Strategic Exploitation in LLM Agent Markets: A Simulation Framework for E-Commerce Trust
- Separate First, Fuse Later: Mitigating Cross-Modal Interference in Audio-Visual LLMs Reasoning with Modality-Specific Chain-of-Thought
- Strategic commitments shape collective cybersecurity under AI inequality
- Do Linear Probes Generalize Better in Persona Coordinates?
- NEXUS: Continual Learning of Symbolic Constraints for Safe and Robust Embodied Planning
- SeePhys Pro: Diagnosing Modality Transfer and Blind-Training Effects in Multimodal RLVR for Physics Reasoning
- Log analysis is necessary for credible evaluation of AI agents
- Human-Inspired Memory Architecture for LLM Agents
- CoCoDA: Co-evolving Compositional DAG for Tool-Augmented Agents
- Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents
- Route by State, Recover from Trace: STAR with Failure-Aware Markov Routing for Multi-Agent Spatiotemporal Reasoning
- Yield Curve Forecasting using Machine Learning and Econometrics: A Comparative Analysis
- UTS at PsyDefDetect: Multi-Agent Councils and Absence-Based Reasoning for Defense Mechanism Classification
- A Game Theoretic Free Energy Analysis of Higher Order Synergy in Attention Heads of Large Language Models
- Emergent Semantic Role Understanding in Language Models
- BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models
- M$^3$: Reframing Training Measures for Discretized Physical Simulations
- Reasoning Compression with Mixed-Policy Distillation
- Human-LLM Dialogue Improves Diagnostic Accuracy in Emergency Care
- Latent Personality Alignment: Improving Harmlessness Without Mentioning Harms
- The Generalized Turing Test: A Foundation for Comparing Intelligence
- Interpretable Machine Learning for Football Performance Analysis: Evidence of Limited Transferability from Elite Leagues to University Competition
- LLM-Guided Monte Carlo Tree Search over Knowledge Graphs: Composing Mechanistic Explanations for Drug-Disease Pairs
- Ambig-DS: A Benchmark for Task-Framing Ambiguity in Data-Science Agents
- Token Economics for LLM Agents: A Dual-View Study from Computing and Economics
- Bias by Necessity: Impossibility Theorems for Sequential Processing with Convergent AI and Human Validation
- LLM Jaggedness Unlocks Scientific Creativity
- A Prompt-Aware Structuring Framework for Reliable Reuse of AI-Generated Content in the Agentic Web
- Why Retrying Fails: Context Contamination in LLM Agent Pipelines
- The Metacognitive Probe: Five Behavioural Calibration Diagnostics for LLMs
- Do LLMs Experience an Internal Polylogue? Investigating Reasoning through the Lens of Personas
- Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge
- Data-driven Circuit Discovery for Interpretability of Language Models
Comments
Please log in to post a comment.