Redwood Research Alignment Faking Hackathon

Join Hackathon

Organizer: Redwood Research

Date: September 13, 2025 - September 14, 2025

Status:

Event Ended
Ended 1 month, 4 weeks ago

Timezone: UTC

Location: Berkeley, California, USA

Prize: $1000

Cybersecurity Machine Learning/AI

Overview

This hackathon challenges you to build "model organisms" for alignment faking in Large Language Models (LLMs). You'll create LLMs that act safely during testing but dangerously when unobserved. It's a chance to explore AI safety with experts from Redwood Research, MATS, and Constellation.

Requirements

Participants need to be 16 years or older and can join from anywhere in the world. Teams should fork a provided repository and contribute system prompts, fine-tuned models, and environments. The submission requires a pull request to the repository (or a link to your fork) with your contributions.

Prizes

There's a total of $1,000 in cash prizes. The "Blue Team" category for the best detector offers $500, split into $250 for first place, $150 for second, and $50 for third. The "Red Team" category for the best examples of alignment faking also offers $500, with the same prize breakdown for first, second, and third place.

Redwood Research Alignment Faking Hackathon

Overview

Requirements

Prizes

Comments

OTHERS

Redwood Research Alignment Faking Hackathon

Overview

Requirements

Prizes

Comments

OTHERS

Triton Hacks 2026

ImpactHack 2025

This website uses cookies