The artificial intelligence landscape continues to evolve rapidly, with new models pushing performance boundaries. Anthropic's unreleased Claude Mythos Preview model demonstrates superior capabilities, scoring 77.8% on coding tests like SWE-bench Pro and an impressive 97.6% on the USAMO reasoning benchmark. These results, from Anthropic's own evaluations, show it outperforming current leaders such as Google Gemini 3.1 Pro and GPT-5.4.
Meanwhile, Z.AI has introduced GLM-5.1, an open-weight 754B agentic model specifically designed for AI engineering tasks. This model also achieves state-of-the-art performance on SWE-Bench Pro, surpassing GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. GLM-5.1 stands out for its ability to run autonomously for extended periods, improving its performance over hundreds of iterations, and is available under the MIT License, offering benefits like lower cost and better data governance for enterprises.
In the healthcare sector, a consensus is emerging among both providers and insurers that AI scribes are contributing to rising costs. While health systems argue these tools help reduce doctor burnout, payers contend that providers use AI to inflate expenses. Despite differing views on solutions, both sides agree that AI scribes are intensifying coding practices and increasing overall healthcare expenditures.
OpenAI faces its own set of challenges, particularly in defining its core customer base. The company's strategy, which relies on widespread adoption and trust, might benefit from a B2C2B approach, where products become essential due to employee usage. Internally, some OpenAI engineers question CEO Sam Altman's technical expertise, suggesting he lacks deep coding and machine learning knowledge, navigating technical limitations through boardroom strategies.
Security concerns also loom large, as AI-driven attacks like sophisticated phishing and deepfakes necessitate a re-evaluation of identity security. CISOs are advised to move beyond traditional passwords and basic multi-factor authentication towards context-aware authentication and robust governance for both human and AI identities. Furthermore, AI training firm Mercor, valued at $10 billion, is facing five lawsuits from contractors alleging data privacy violations following a breach in the open-source LiteLLM project.
The broader societal impact of AI is also under scrutiny. An Anthropic experiment indicates that over-reliance on AI tools in education might hinder the development of essential problem-solving skills, suggesting a need to balance AI proficiency with fostering independent thinking. Venture capitalist Will McKelvey, now at Lerer Hippeau, suggests that the most successful founders in the AI era are focusing on
Key Takeaways
- Anthropic's unreleased Claude Mythos Preview model demonstrates superior performance, scoring 77.8% on SWE-bench Pro and 97.6% on USAMO, surpassing Google Gemini 3.1 Pro and GPT-5.4.
- Z.AI's GLM-5.1, an open-weight 754B agentic model, achieves state-of-the-art performance on SWE-Bench Pro, outperforming GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro, and operates autonomously for extended periods.
- Healthcare providers and insurers agree that AI scribes are contributing to increased healthcare costs by intensifying coding practices.
- OpenAI is grappling with defining its customer base, with a potential strategy involving a B2C2B approach driven by employee adoption.
- AI training firm Mercor, valued at $10 billion, faces five lawsuits from contractors alleging data privacy violations after a breach in the LiteLLM project.
- CISOs must enhance identity security with context-aware authentication and strong governance to combat sophisticated AI-driven attacks like phishing and deepfakes.
- OpenAI insiders reportedly question CEO Sam Altman's technical expertise, suggesting a lack of deep coding and machine learning knowledge.
- Over-reliance on AI tools in education may hinder the development of critical problem-solving skills, as indicated by an Anthropic experiment.
- Venture capitalist Will McKelvey advocates for founders to pursue
Claude Mythos Preview Shows Superior Performance Over Rivals
Anthropic's unreleased Claude Mythos Preview model has demonstrated significantly better performance than current leading AI models like Google Gemini 3.1 Pro and GPT-5.4 in various benchmarks. In coding tests like SWE-bench Pro, Mythos Preview scored 77.8%, a substantial lead over competitors. It also excelled in reasoning tasks, achieving 97.6% on the USAMO benchmark, far surpassing GPT-5.4 and Gemini 3.1 Pro. While these results are from Anthropic's own evaluations, they suggest Mythos Preview could be a major advancement in AI capabilities.
Z.AI's GLM-5.1 Model Achieves Top Scores in AI Coding Benchmarks
Z.AI has released GLM-5.1, an open-weight 754B agentic model designed for AI engineering tasks. This new model achieves state-of-the-art performance on SWE-Bench Pro, outperforming models like GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. GLM-5.1 also demonstrates impressive capabilities in sustained autonomous execution, handling complex tasks over long periods. Its architecture, featuring Mixture of Experts (MoE) and DSA, allows for efficient inference and improved learning from long-horizon interactions.
New AI Model GLM-5.1 Runs Coding Tasks Autonomously for Hours
Z.ai has launched GLM-5.1, an open-source AI model designed for agentic software engineering, capable of running autonomously for extended periods. The model can improve its performance over hundreds of iterations, a key advantage over other AI models that may plateau. GLM-5.1 reportedly scored higher than GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on the SWE-Bench Pro benchmark. Released under the MIT License, GLM-5.1 offers benefits like lower cost, better data governance, and customization for enterprises.
AI Scribes Increase Healthcare Costs, Providers and Insurers Agree
Both healthcare providers and insurers acknowledge that AI scribes are contributing to rising healthcare costs. Health systems claim these tools help reduce doctor burnout and address underpayment from insurers. However, payers argue that providers are using AI to inflate costs. Despite their differing perspectives on solutions, there is a consensus between the two sides that AI scribes are intensifying coding and increasing overall healthcare expenses.
AI Scribes Drive Up Healthcare Costs Experts Say
Experts suggest that AI scribes are indeed leading to increased healthcare costs, a point acknowledged by both healthcare providers and insurers. While the exact impact and the studies measuring it are debated, the consensus among many is that these AI tools are contributing to higher expenses. The discussion highlights a shared concern about the financial implications of AI adoption in healthcare.
OpenAI Faces Customer Identity Challenge in AI Era
OpenAI is grappling with a fundamental business question: who is their customer? The company's strategy, which relies on widespread adoption and trust, is being tested. Unlike B2B models with multiple winners, consumer-focused models often produce a single dominant player. OpenAI's best path forward might be a B2C2B approach, becoming so essential that businesses must adopt their products due to employee usage.
AI Firm Mercor Faces Lawsuits Over Data Breach
Mercor, an AI training firm valued at $10 billion, is facing five lawsuits from contractors alleging data privacy violations. The suits claim Mercor exposed contractor data to hackers following a breach in the open-source project LiteLLM. The contractors are seeking unspecified damages, citing exposure of personal information. Mercor has confirmed it was impacted by the breach but has not detailed the stolen data, and declined to comment on the lawsuits.
AI Attacks Threaten Identity Security CISOs Must Adapt
CISOs must rethink identity security as AI-driven attacks like phishing and deepfakes become more sophisticated. Traditional security methods are no longer sufficient against attackers mimicking trusted users with precision. Experts advise moving beyond passwords and basic multi-factor authentication (MFA) to context-aware authentication and strong governance for both human and AI identities. Visibility and behavior monitoring are crucial for defending against these evolving threats.
Sam Altman's Technical Skills Questioned by OpenAI Insiders
Sources within OpenAI suggest that CEO Sam Altman lacks significant coding and machine learning expertise. Engineers interviewed for a New Yorker article claim Altman misunderstands basic AI concepts and uses boardroom strategies to navigate technical limitations. Some even compare his approach to that of figures like Bernie Madoff or Sam Bankman-Fried, questioning his long-term impact despite his public image.
AI Tools Can Hinder Learning Like Training Wheels on Bikes
The rapid adoption of AI in education presents both benefits and drawbacks, similar to using training wheels on a bicycle. While AI can speed up task completion, relying on it too heavily may hinder the development of essential problem-solving skills. An Anthropic experiment showed that students who initially worked without AI performed better later when AI was introduced. Educators and policymakers must balance AI proficiency with fostering independent thinking and curiosity.
VC Will McKelvey Favors Unsexy AI Ideas for Founders
Venture capital investor Will McKelvey believes the smartest founders in the AI era are focusing on less glamorous, 'unsexy' ideas. McKelvey, now a principal at Lerer Hippeau, looks for founders with deep relationships and a strong connection to New York City's business ecosystem. He suggests that companies selling technology to a broad range of businesses, rather than just tech startups, have significant growth potential in the current market.
Resident Evil Star's AI Project MemPalace Called Out
An AI memory system called MemPalace, co-created by actress Milla Jovovich and crypto entrepreneur Ben Sigman, is facing accusations of being 'snake oil.' Critics claim Sigman's performance claims on benchmarks like LongMemEval are exaggerated and that the code's authorship is unclear, with early contributions potentially from an anonymous user. Despite being open-source, questions remain about the project's true value and the extent of Jovovich's involvement.
OpenAI Predicts AI Investment Will Reshape Economy
OpenAI suggests that significant AI investment could fundamentally reshape the global economy, potentially leading to changes like a four-day work week and the creation of an AI Public Wealth Fund. The company emphasizes the need for ambitious industrial policy to ensure that superintelligence benefits everyone, not just a few. OpenAI advocates for shared access to AI, giving workers a voice in its development, and treating AI as a foundational element of the modern economy.
Sources
- Claude Mythos Preview Beats Google Gemini 3.1 Pro, GPT 5.4 On Most Benchmarks
- Z.AI Introduces GLM-5.1: An Open-Weight 754B Agentic Model That Achieves SOTA on SWE-Bench Pro and Sustains 8-Hour Autonomous Execution
- Z.ai unveils GLM-5.1, enabling AI coding agents to run autonomously for hours
- Everyone agrees AI scribes are increasing health care costs. No one agrees what to do about it
- Are AI scribes actually driving higher health care costs?
- View: AI is testing the oldest debate in business: Who’s the customer?
- Mercor hit with 5 contractor lawsuits in a week over data breach
- Defending Identity in the Age of AI Attacks
- Sam Altman’s Coworkers Say He Can Barely Code and Misunderstands Basic Machine Learning Concepts
- When AI training wheels help and hinder learning
- This Rising VC Says the Smartest Founders Are Chasing Unsexy Ideas in the AI Era
- Resident Evil Star's AI-Coded Tool Accused Of Being Snake Oil
- Why OpenAI thinks AI Investment Could Reshape the Economy
Comments
Please log in to post a comment.