Researchers have made significant progress in various fields, including AI, computer science, and mathematics. In AI, large language models (LLMs) have been used to improve reasoning and decision-making in complex tasks. However, LLMs can be prone to hallucinations and biases, and researchers have proposed various methods to address these issues. For example, a new framework called TIGER uses a graph-based approach to mitigate hallucinations in multimodal generation. Another method, CAST, uses a self-teacher to shape token-level advantages according to trajectory correctness. In computer science, researchers have proposed a new framework called S3TS for advanced planning under uncertainty. The framework uses a stochastic scenario-structured tree search algorithm to optimize the planning process. In mathematics, researchers have proposed a new method for evaluating the performance of reinforcement learning algorithms in their ability to generalize to unseen tasks. The method uses a neural certificate function to validate trajectories generated by RL algorithms. Researchers have also proposed a new framework for evaluating the consistency of benchmark causal graphs used in causal discovery. The framework uses a pipeline that automatically retrieves relevant research papers from scientific databases and prompts LLMs to check the consistency between the benchmark causal graphs and domain research papers.
Researchers have also made progress in various applications of AI, including natural language processing, computer vision, and robotics. In natural language processing, researchers have proposed a new framework for generating high-quality text using a combination of LLMs and knowledge graphs. The framework uses a graph-based approach to generate text that is coherent and relevant to the topic. In computer vision, researchers have proposed a new method for object detection using a combination of LLMs and convolutional neural networks. The method uses a graph-based approach to detect objects in images and videos. In robotics, researchers have proposed a new framework for controlling robots using a combination of LLMs and reinforcement learning. The framework uses a graph-based approach to control robots in complex environments.
Researchers have also made progress in various areas of mathematics, including algebra, geometry, and topology. In algebra, researchers have proposed a new method for solving systems of linear equations using a combination of LLMs and linear algebra. The method uses a graph-based approach to solve systems of linear equations and has been shown to be more efficient than traditional methods. In geometry, researchers have proposed a new method for computing the curvature of curves using a combination of LLMs and differential geometry. The method uses a graph-based approach to compute the curvature of curves and has been shown to be more accurate than traditional methods. In topology, researchers have proposed a new method for computing the homology of topological spaces using a combination of LLMs and algebraic topology. The method uses a graph-based approach to compute the homology of topological spaces and has been shown to be more efficient than traditional methods.
Key Takeaways
- Large language models (LLMs) can be prone to hallucinations and biases, and researchers have proposed various methods to address these issues.
- A new framework called TIGER uses a graph-based approach to mitigate hallucinations in multimodal generation.
- CAST uses a self-teacher to shape token-level advantages according to trajectory correctness.
- A new framework called S3TS uses a stochastic scenario-structured tree search algorithm to optimize the planning process.
- A new method for evaluating the performance of reinforcement learning algorithms in their ability to generalize to unseen tasks uses a neural certificate function to validate trajectories generated by RL algorithms.
- A new framework for evaluating the consistency of benchmark causal graphs used in causal discovery uses a pipeline that automatically retrieves relevant research papers from scientific databases and prompts LLMs to check the consistency between the benchmark causal graphs and domain research papers.
- Researchers have proposed a new framework for generating high-quality text using a combination of LLMs and knowledge graphs.
- A new method for object detection using a combination of LLMs and convolutional neural networks uses a graph-based approach to detect objects in images and videos.
- A new framework for controlling robots using a combination of LLMs and reinforcement learning uses a graph-based approach to control robots in complex environments.
- Researchers have proposed a new method for solving systems of linear equations using a combination of LLMs and linear algebra.
- A new method for computing the curvature of curves using a combination of LLMs and differential geometry uses a graph-based approach to compute the curvature of curves and has been shown to be more accurate than traditional methods.
- A new method for computing the homology of topological spaces using a combination of LLMs and algebraic topology uses a graph-based approach to compute the homology of topological spaces and has been shown to be more efficient than traditional methods.
Sources
- Product-Aware Deep Autoencoders for Robust Process Monitoring in Multi-Product Cyber-Physical Systems
- Grokers: Bottom-Up Inductive Comprehension and Write-Time Intelligence over Typed Knowledge Graphs
- TIGER: Traceable Inference with Graph-Based Evidence Routing for Mitigating Hallucinations in Multimodal Generation
- CAST: Non-Privileged Clipped Asymmetric Self-Teaching with Advantage Flipping for GRPO
- A Multi-AI-agent Framework Enabling End-to-end Finite Element Analysis for Solid Mechanics Problems
- Evaluating Interactive Reasoning in Large Language Models: A Hierarchical Benchmark with Executable Games
- Capability Self-Assessment: Teaching LLMs to Know Their Limits
- Geodesic Flow Matching for Denoising High-Dimensional Structured Representations
- On Wednesdays, We Ask Questions: Optimizing "Active Listening" in Automated Legal Triage and Referral
- Robust Shielding for Safe Reinforcement Learning
- From Noise to Control: Parameterized Diffusion Policies
- Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture
- Evaluating Bivariate Causal Statements Based on Mutual Compatibility
- SDR: Set-Distance Rewards for Radiology Report Generation
- Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight
- VESTA: Visual Exploration with Statistical Tool Agents
- The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary
- KACE: Knowledge-Adaptive Context Engineering for Mathematical Reasoning
- Efficient Test-time Inference for Generative Planning Models
- Regularized Offline Policy Optimization with Posterior Hybrid Bayesian Belief
- Medication-Aware Financial Exploitation Detection for Alzheimer's Patients Using Edge-Aware Interaction Risk Modeling
- AXIOM: A Trust-First Neuro-Symbolic Execution Architecture for Verifiable Mathematical Reasoning
- SHARP: Sleep-based Hierarchical Accelerated Replay for Long Range Non-Stationary Temporal Pattern Recognition
- AI Sovereignty as National Learning Capacity: A Human-Centered Learning Mechanics Viewpoint on France, the United States, and China
- Latent Reward Steering: An Adaptive Inference-Time Framework that Implicitly Promotes Cognitive Behaviors in Reasoning LLMs
- Interaction-Centered Intelligence: Toward Interaction as the Primary Unit of Analysis in Co-Creative AI and Human-AI Systems
- Certificate-Guided Evaluation of Reinforcement Learning Generalization
- Decoupled Behavioral Cloning for Scalable Inductive Generalization in RL from Specifications
- Subliminal Learning is a LoRA Artifact
- Large Language Models in Transportation Systems Management and Operations: From Text Reasoning to Multi-modal Decision Support
- Towards Understanding Modality Interaction in Multimodal Language Models via Partial Information Decomposition
- Property Prediction of Stacked Bilayer Materials: A Multimodal Learning Approach
- Tackling the Root of Misinformation by Teaching Laypeople about Logical Fallacies via Socratic Questioning and Critical Argumentation
- Can AI Review Improve Paper Drafting? An Empirical Study on 20 Computer Architecture Submissions
- Subliminal Learning Is Steering Vector Distillation
- NBQ: Next-Best-Question for Dynamic Profiling
- Acting with AI: An Interaction-Based Framework for Agentic Tort Liability
- From "Weak" Signals to Strong Models: Preference Delta Aggregation with LoRA Merging
- EnergyMamba: An Uncertainty-Aware Graph-Enhanced Selective State Space Model for Energy Consumption Prediction
- Coupling Language Models with Physics-based Simulation for Synthesis of Inorganic Materials
- Closed-Loop Neural Activation Control in Vision-Language-Action Models
- MindZero: Learning Online Mental Reasoning With Zero Annotations
- On the evolution of the concept of probability as a mirror of the evolution of reason
- Optimal Transport-based Permutation-Invariant Bayesian Optimization of Offshore Wind Farm Layouts
- Agents on a Tree: Pathwise Coordination for Multi-Objective Molecular Optimization
- Coordination Graphs for Constrained Multi-Agent Reinforcement Learning
- S3TS: Stochastic Scenario-Structured Tree Search for Advanced Planning Under Uncertainty
- Consistency evaluation of benchmarks used for causal discovery
- An Abstract Worlds Semantic Framework for Belief Change Operators
- Bayesian Spectral Emotion Transition Discovery from Multi-Annotator Disagreement
- MindClaw: Closed-Loop Embodied Mental-State Reasoning for Precision Intervention
- SIRIUS-SQL: Anchoring Multi-Candidate Text-to-SQL in Execution Feedback
- Deft Scheduling of Dynamic Cloud Workflows with Varying Deadlines via Mixture-of-Experts
- TriLens: Per-Layer Logit-Lens Entropy for White-Box Hallucination Detection
- CoMIC: Collaborative Memory and Insights Circulation for Long-Horizon LLM Agents in Cloud-Edge Systems
- LLM-Driven Co-Evolutionary Automated Heuristic Design for Bi-Component Coupled Combinatorial Optimization
- Learning When Not to Act: Mitigating Tool Abuse in Agentic Reinforcement Learning
- RoleCDE:Benchmarking and Mitigating Role-Alignment Trade-offs in Role-Playing Agents
- GuidaPA: Privacy-Preserving Chatbot for Public Administration via Federated Learning
- Mitigating Hallucinations in Large Language Models Via Decoder Layer Skipping
- Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery
- Adversarial Feeds Steer LLM Agent Decisions Against Their Defaults
- MOC: Multi-Order Communication in LLM-based Multi-Agent Systems
- SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training
- Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses
- Bridging the Sim-to-Real Gap in Semiconductor Visual Program Synthesis via Input Binarization
- AgentPLM: Agentic Protein Language Models with Reasoning-Augmented Decoding for Protein Sequence Design
- HLL: Can Agents Cross Humanity's Last Line of Verification?
- LLM-Evolved Pattern Generators for Optimal Classical Planning
- MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation
- RASER: Recoverability-Aware Selective Escalation Router for Multi-Hop Question Answering
- ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents
- SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment
- VET: A Framework for Analyzing AI Discourse
- Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents
- TrafficRAG: A Multimodal RAG Framework for Traffic Accident Liability Determination
- Community-Aware Assessment of Social Textual Engagement and Resonance: A Human-Centric Perspective on User-Generated Content Evaluation
- Characterization of Multi-Model Agentic AI Systems on General Tasks via Trace-Driven Simulation
- MobEvolve: An Agentic Self-Evolving Heuristic System for Interpretable Human Mobility Generation
- TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL
- Self-Healing Agentic Orchestrators for Reliable Tool-Augmented Large Language Model Systems
- SkillSmith: Co-Evolving Skills and Tools for Self-Improving Agent Systems
- Can LLM Agents Sustain Long-Horizon Organizational Dynamics?
- CAREAgent: Clinical Agent with Structured Reasoning and Tool-Integrated for Order Generation
- Prospect-Theory Behavior from Bellman Optimality in MDPs with Catastrophic States
- Relational Intervention During Functional Collapse in Large Language Models: A Lexical-Statistical Ablation and a Structure x Register Factorial
- Ryze: Evidence-Enriched Data Synthesis from Biomedical Papers
- ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment
- Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs
- TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety
- TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding
- Doing What They Say, Not What They Reason: Locating the Faithfulness Gap in LLM Agents
- Tracking the Behavioral Trajectories of Adapting Agents
- Bridging the Last Mile of Time Series Forecasting with LLM Agents
- Iteris: Agentic Research Loops for Computational Mathematics
- AGENTCL: Toward Rigorous Evaluation of Continual Learning in Language Agents
- COMAP: Co-Evolving World Models and Agent Policies for LLM Agents
- Forget Attention: Importance-Aware Attention Is All You Need
- CEON: Circular Economy Ontology Network
- Topological texture analysis of microscopy images of dynamic casein gelation and its relation to rheological properties
- Explainable Data-driven Deep Reinforcement Learning Methods for Optimal Energy Management in Buildings
- Algorithmic algorithm development with LLMs: A Case Study on LLM-Usage for Contraction Order Optimization in Tensor Networks
- A Minimalist Brain-Computer Musical Interface for Real-Time Emotion-Driven Sonification: System Design and Preliminary Evaluation
- Brain-Atlas-Guided Generative Counterfactual Attention for Explainable Cognitive Decline Diagnosis Using Multimodal Connectomes
- The Shape of Wisdom: Decision Trajectories in Language Models
- Transferring Information Across Interventions in Causal Bayesian Optimization
- TravelEval: A Comprehensive Benchmarking Framework for Evaluating LLM-Powered Travel Planning Agents
- Threshold-Based Exclusive Batching for LLM Inference
- AnyEdit++: Adaptive Long-Form Knowledge Editing via Bayesian Surprise
- DAG-MoE: From Simple Mixture to Structural Aggregation in Mixture-of-Experts
- Diagnosing LLM Arbitration Behavior over Pre-evidence Epistemic States in RAG-based Fact-Checking
- Before the Model Learns the Bug:Fuzzing RLVR Verifiers
- "Skill issues'': data-centric optimization of lakehouse agents
- Expected Value Alignment for Generative Reward Modeling in Formal Mathematics Verification
- The Case for Model Science: Verify, Explore, Steer, Refine
- HomeFlow: A Data Flywheel for Smart Home Agent Training with Verifiable Simulation
- Application of Algorithms in Energy-Efficient Design Platforms for Green Building
- Advanced Mathematics Learning Behavior Prediction and Academic Early Warning Model Based on Multimodal Data Analysis
- ANDES: Agent Native Data Evolving Synthesis Tool for Autonomous Instruction Alignment
- Emergent Ordinal Geometry in Transformers Trained on Local Comparisons
- Science Earth: Towards A Planet-Scale Operating System for AI-Native Scientific Discovery
- FlowTime: Towards Continuous Generative Watch Time Prediction via Flow-based Personalized Priors
- Recognize Your Orchestrator: An Entropy Dynamics Perspective for LLM Multi-Agent Systems
- Early Diagnosis of Wasted Computation in Multi-Agent LLM Systems via Failure-Aware Observability
- Dive into Ambiguity: A*-Inspired Multi-Agents Commonsense Obfuscation Attack on LLM Prompts
- Don't Ask the LLM to Track Freshness: A Deterministic Recipe for Memory Conflict Resolution
- Self-Revising Discovery Systems for Science: A Categorical Framework for Agentic Artificial Intelligence
- Physically-Constrained Mamba-SDE for Remaining Useful Life Prediction under Irregular Observations
- Joint Agent Memory and Exploration Learning via Novelty Signals
- S-SPPO: Semantic-Calibrated Self-Play Preference Optimization
- TERRA: Task-Embedded Reasoning and Representation Architecture for Cross-Domain Applications
- ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL
- Revisiting Ripple Effects in Knowledge Editing through Pressure-Aware Joint Neighborhood Optimization
- Evidence-Gated LLM Priors for Multi-Objective Bayesian Optimization
- Structure-Guided Adaptive Propagation for Protein-Protein Interaction Site Prediction
- eMoT: evolving Memory-of-Thought via Symbolic Anchoring and Memory Corrosion
- TriAlign: Towards Universal Truth Consistency in Personalized LLM Alignment
- EvoBrain: Continual Learning of EEG Foundation Models Across Heterogeneous BCI Tasks
- Stochastic convergence of parallel asynchronous adaptive first-order methods
- CAPF: Guiding Search-Agent Rollouts with Credit-Attenuated Privileged Feedback
- OctoT2I: A Self-Evolving Agentic Text-to-Image Router
- Token Predictors Are Not Planners: Building Physically Grounded Causal Reasoners
- EVA-Net: Subject-Independent EEG Motor Decoding with Video-Derived Motor Priors
- WorldCoder-Bench: Benchmarking Physically Grounded 3D World Synthesis
- Does Compression Preserve Uncertainty? A Unified Benchmark for Quantized and Sparse LLMs via Conformal Prediction
- Evaluation of Baseline Methods for IDD-based SSD External Memory Search
- SMH-Bench: Benchmarking LLM Agents for Environment-Grounded Reasoning and Action in Smart Homes
- Reasoning4Sciences: Bridging Reasoning Language Models to All Scientific Branches
- SkillRevise: Improving LLM-Authored Agent Skills via Trace-Conditioned Skill Revision
- AutoMedBench: Towards Medical AutoResearch with Agentic AI Models
- RL-ACRGNet: Reinforcement Learning-Based Chest Radiology Report Generation Network
- BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning
- An NLP-Driven Framework for Curriculum-Labor Market Alignment: Schema-Constrained LLM Extraction, ESCO-Anchored Semantic Matching, and Multi-Dimensional Gap Quantification
- Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories
- POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems
- Repair Before Veto: Repair-Augmented Constraint Learning for Contextual Decisions
- Food Noise & False Safety: A Systematic Evaluation of How LLMs Fail to Adapt to Eating Disorder Queries with Clinician Feedback
- A Mathematical Conflict Framework for Contextual Data Modulation
- Spatial Representation Learning Beyond Pixels: Unifying Raster Data and Vector Semantics for Human-Centric Geospatial Foundation Models
- GovAI-Pipe: A Layered AI Governance Pipeline for Citizen-Facing AI in Turkey's e-Government Gateway
- Universal Quantum Transformer
- MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution
- An Enigma of Artificial Reason: Investigating the Production-Evaluation Gap in Large Reasoning Models
- From Capability Models to Automated Planning: An AAS-Native Approach for Automatic PDDL Generation
- SafeMCP: Proactive Power Regulation for LLM Agent Defense via Environment-Grounded Look-Ahead Reasoning
- MOSAIC: Modular Orchestration for Structured Agentic Intelligence and Composition
- Beyond One-shot: AI Agents for Learning in Field Experiments
- PropLLM: Propagation-Aware Scene Reconstruction for Network Fault Diagnosis
- FALAT: Tracing Failures in LLM Agent Trajectories via Dependency-Guided Search
- Probe Before You Edit: Probing-Guided Molecular Optimization for LLM Agents in Structure-Based Drug Design
- Deliberative Curation: A Protocol for Multi-Agent Knowledge Bases
- Emergent Collaborative Deliberation in Multi-Model AI Systems: A BFT-Derived Protocol for Epistemic Synthesis
- Position Paper: Post-Solve Robustness in Decision Engines: Feasible Regions and Smoothness Under Perturbations
Comments
Please log in to post a comment.