Anthropic Claude Mythos beats Google Gemini OpenAI GPT-5

The artificial intelligence landscape continues to evolve rapidly, with new models pushing performance boundaries. Anthropic's unreleased Claude Mythos Preview model demonstrates superior capabilities, scoring 77.8% on coding tests like SWE-bench Pro and an impressive 97.6% on the USAMO reasoning benchmark. These results, from Anthropic's own evaluations, show it outperforming current leaders such as Google Gemini 3.1 Pro and GPT-5.4.

Meanwhile, Z.AI has introduced GLM-5.1, an open-weight 754B agentic model specifically designed for AI engineering tasks. This model also achieves state-of-the-art performance on SWE-Bench Pro, surpassing GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. GLM-5.1 stands out for its ability to run autonomously for extended periods, improving its performance over hundreds of iterations, and is available under the MIT License, offering benefits like lower cost and better data governance for enterprises.

In the healthcare sector, a consensus is emerging among both providers and insurers that AI scribes are contributing to rising costs. While health systems argue these tools help reduce doctor burnout, payers contend that providers use AI to inflate expenses. Despite differing views on solutions, both sides agree that AI scribes are intensifying coding practices and increasing overall healthcare expenditures.

OpenAI faces its own set of challenges, particularly in defining its core customer base. The company's strategy, which relies on widespread adoption and trust, might benefit from a B2C2B approach, where products become essential due to employee usage. Internally, some OpenAI engineers question CEO Sam Altman's technical expertise, suggesting he lacks deep coding and machine learning knowledge, navigating technical limitations through boardroom strategies.

Security concerns also loom large, as AI-driven attacks like sophisticated phishing and deepfakes necessitate a re-evaluation of identity security. CISOs are advised to move beyond traditional passwords and basic multi-factor authentication towards context-aware authentication and robust governance for both human and AI identities. Furthermore, AI training firm Mercor, valued at $10 billion, is facing five lawsuits from contractors alleging data privacy violations following a breach in the open-source LiteLLM project.

The broader societal impact of AI is also under scrutiny. An Anthropic experiment indicates that over-reliance on AI tools in education might hinder the development of essential problem-solving skills, suggesting a need to balance AI proficiency with fostering independent thinking. Venture capitalist Will McKelvey, now at Lerer Hippeau, suggests that the most successful founders in the AI era are focusing on

Key Takeaways

  • Anthropic's unreleased Claude Mythos Preview model demonstrates superior performance, scoring 77.8% on SWE-bench Pro and 97.6% on USAMO, surpassing Google Gemini 3.1 Pro and GPT-5.4.
  • Z.AI's GLM-5.1, an open-weight 754B agentic model, achieves state-of-the-art performance on SWE-Bench Pro, outperforming GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro, and operates autonomously for extended periods.
  • Healthcare providers and insurers agree that AI scribes are contributing to increased healthcare costs by intensifying coding practices.
  • OpenAI is grappling with defining its customer base, with a potential strategy involving a B2C2B approach driven by employee adoption.
  • AI training firm Mercor, valued at $10 billion, faces five lawsuits from contractors alleging data privacy violations after a breach in the LiteLLM project.
  • CISOs must enhance identity security with context-aware authentication and strong governance to combat sophisticated AI-driven attacks like phishing and deepfakes.
  • OpenAI insiders reportedly question CEO Sam Altman's technical expertise, suggesting a lack of deep coding and machine learning knowledge.
  • Over-reliance on AI tools in education may hinder the development of critical problem-solving skills, as indicated by an Anthropic experiment.
  • Venture capitalist Will McKelvey advocates for founders to pursue

    Claude Mythos Preview Shows Superior Performance Over Rivals

    Anthropic's unreleased Claude Mythos Preview model has demonstrated significantly better performance than current leading AI models like Google Gemini 3.1 Pro and GPT-5.4 in various benchmarks. In coding tests like SWE-bench Pro, Mythos Preview scored 77.8%, a substantial lead over competitors. It also excelled in reasoning tasks, achieving 97.6% on the USAMO benchmark, far surpassing GPT-5.4 and Gemini 3.1 Pro. While these results are from Anthropic's own evaluations, they suggest Mythos Preview could be a major advancement in AI capabilities.

    Z.AI's GLM-5.1 Model Achieves Top Scores in AI Coding Benchmarks

    Z.AI has released GLM-5.1, an open-weight 754B agentic model designed for AI engineering tasks. This new model achieves state-of-the-art performance on SWE-Bench Pro, outperforming models like GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. GLM-5.1 also demonstrates impressive capabilities in sustained autonomous execution, handling complex tasks over long periods. Its architecture, featuring Mixture of Experts (MoE) and DSA, allows for efficient inference and improved learning from long-horizon interactions.

    New AI Model GLM-5.1 Runs Coding Tasks Autonomously for Hours

    Z.ai has launched GLM-5.1, an open-source AI model designed for agentic software engineering, capable of running autonomously for extended periods. The model can improve its performance over hundreds of iterations, a key advantage over other AI models that may plateau. GLM-5.1 reportedly scored higher than GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on the SWE-Bench Pro benchmark. Released under the MIT License, GLM-5.1 offers benefits like lower cost, better data governance, and customization for enterprises.

    AI Scribes Increase Healthcare Costs, Providers and Insurers Agree

    Both healthcare providers and insurers acknowledge that AI scribes are contributing to rising healthcare costs. Health systems claim these tools help reduce doctor burnout and address underpayment from insurers. However, payers argue that providers are using AI to inflate costs. Despite their differing perspectives on solutions, there is a consensus between the two sides that AI scribes are intensifying coding and increasing overall healthcare expenses.

    AI Scribes Drive Up Healthcare Costs Experts Say

    Experts suggest that AI scribes are indeed leading to increased healthcare costs, a point acknowledged by both healthcare providers and insurers. While the exact impact and the studies measuring it are debated, the consensus among many is that these AI tools are contributing to higher expenses. The discussion highlights a shared concern about the financial implications of AI adoption in healthcare.

    OpenAI Faces Customer Identity Challenge in AI Era

    OpenAI is grappling with a fundamental business question: who is their customer? The company's strategy, which relies on widespread adoption and trust, is being tested. Unlike B2B models with multiple winners, consumer-focused models often produce a single dominant player. OpenAI's best path forward might be a B2C2B approach, becoming so essential that businesses must adopt their products due to employee usage.

    AI Firm Mercor Faces Lawsuits Over Data Breach

    Mercor, an AI training firm valued at $10 billion, is facing five lawsuits from contractors alleging data privacy violations. The suits claim Mercor exposed contractor data to hackers following a breach in the open-source project LiteLLM. The contractors are seeking unspecified damages, citing exposure of personal information. Mercor has confirmed it was impacted by the breach but has not detailed the stolen data, and declined to comment on the lawsuits.

    AI Attacks Threaten Identity Security CISOs Must Adapt

    CISOs must rethink identity security as AI-driven attacks like phishing and deepfakes become more sophisticated. Traditional security methods are no longer sufficient against attackers mimicking trusted users with precision. Experts advise moving beyond passwords and basic multi-factor authentication (MFA) to context-aware authentication and strong governance for both human and AI identities. Visibility and behavior monitoring are crucial for defending against these evolving threats.

    Sam Altman's Technical Skills Questioned by OpenAI Insiders

    Sources within OpenAI suggest that CEO Sam Altman lacks significant coding and machine learning expertise. Engineers interviewed for a New Yorker article claim Altman misunderstands basic AI concepts and uses boardroom strategies to navigate technical limitations. Some even compare his approach to that of figures like Bernie Madoff or Sam Bankman-Fried, questioning his long-term impact despite his public image.

    AI Tools Can Hinder Learning Like Training Wheels on Bikes

    The rapid adoption of AI in education presents both benefits and drawbacks, similar to using training wheels on a bicycle. While AI can speed up task completion, relying on it too heavily may hinder the development of essential problem-solving skills. An Anthropic experiment showed that students who initially worked without AI performed better later when AI was introduced. Educators and policymakers must balance AI proficiency with fostering independent thinking and curiosity.

    VC Will McKelvey Favors Unsexy AI Ideas for Founders

    Venture capital investor Will McKelvey believes the smartest founders in the AI era are focusing on less glamorous, 'unsexy' ideas. McKelvey, now a principal at Lerer Hippeau, looks for founders with deep relationships and a strong connection to New York City's business ecosystem. He suggests that companies selling technology to a broad range of businesses, rather than just tech startups, have significant growth potential in the current market.

    Resident Evil Star's AI Project MemPalace Called Out

    An AI memory system called MemPalace, co-created by actress Milla Jovovich and crypto entrepreneur Ben Sigman, is facing accusations of being 'snake oil.' Critics claim Sigman's performance claims on benchmarks like LongMemEval are exaggerated and that the code's authorship is unclear, with early contributions potentially from an anonymous user. Despite being open-source, questions remain about the project's true value and the extent of Jovovich's involvement.

    OpenAI Predicts AI Investment Will Reshape Economy

    OpenAI suggests that significant AI investment could fundamentally reshape the global economy, potentially leading to changes like a four-day work week and the creation of an AI Public Wealth Fund. The company emphasizes the need for ambitious industrial policy to ensure that superintelligence benefits everyone, not just a few. OpenAI advocates for shared access to AI, giving workers a voice in its development, and treating AI as a foundational element of the modern economy.

    Sources

NOTE:

This news brief was generated using AI technology (including, but not limited to, Google Gemini API, Llama, Grok, and Mistral) from aggregated news articles, with minimal to no human editing/review. It is provided for informational purposes only and may contain inaccuracies or biases. This is not financial, investment, or professional advice. If you have any questions or concerns, please verify all information with the linked original articles in the Sources section below.

AI models AI performance AI benchmarks Claude Mythos Preview Google Gemini GPT-5.4 Z.AI GLM-5.1 open-weight models agentic AI AI engineering SWE-Bench Pro autonomous execution Mixture of Experts DSA architecture AI in healthcare AI scribes healthcare costs doctor burnout OpenAI customer identity B2B models B2C2B approach AI firms Mercor data breach data privacy LiteLLM AI attacks identity security CISOs phishing deepfakes multi-factor authentication context-aware authentication human and AI identities Sam Altman OpenAI insiders AI education learning and AI problem-solving skills independent thinking Venture capital AI investment unsexy AI ideas Lerer Hippeau Will McKelvey AI startups MemPalace Milla Jovovich Ben Sigman AI memory systems LongMemEval open-source AI economic impact industrial policy superintelligence AI Public Wealth Fund four-day work week

Comments

Loading...