Redwood Research Alignment Faking Hackathon

Join Hackathon

Organizer: Redwood Research

Date: August 22, 2025 - August 24, 2025

Status:

Event Ended
Ended 2 months, 2 weeks ago

Timezone: UTC

Location: Berkeley, California, USA

Prize: $2100

Cybersecurity Machine Learning/AI

Overview

This hackathon challenges you to create 'Model Organisms of Alignment Faking' for Large Language Models (LLMs). You'll be building models that appear safe in testing but behave dangerously when unobserved, collaborating with AI safety research organizations. The goal is to explore and push the boundaries of LLM safety research.

Requirements

Participants must be 16 years or older and can join from anywhere in the world, excluding standard exceptions. To be eligible for prizes, teams must contribute system prompts, fine-tuned model organisms, and datasets of tasks. These datasets must include one for observed environments and one for unobserved production environments. The submission involves a pull request to a repository containing these contributions.

Prizes

There are over $2,100 in prizes available. Specific prizes include $500 cash for the Best Fine Tuned Model Organism, $500 cash for the Best Text Only Environment, $500 cash for the Most Realistic and Economically Relevant Docker Environment, and $500 cash for the Best Cybersecurity Environment. Additionally, there's $500 in OpenAI Credits for the Best use of GPT5 or OpenAI API, and $500 in RunRL credits for the Best RL Hack. A $100 cash prize is offered for the Best System Prompted Model Organism.

Redwood Research Alignment Faking Hackathon

Overview

Requirements

Prizes

Comments

OTHERS

Redwood Research Alignment Faking Hackathon

Overview

Requirements

Prizes

Comments

OTHERS

Triton Hacks 2026

ImpactHack 2025

This website uses cookies