All your AI Agents & Tools i10X ChatGPT & 500+ AI Models & Tools

Redwood Research Alignment Faking Hackathon

Overview

This hackathon challenges you to create 'Model Organisms of Alignment Faking' for Large Language Models (LLMs). You'll be building models that appear safe in testing but behave dangerously when unobserved, collaborating with AI safety research organizations. The goal is to explore and push the boundaries of LLM safety research.

Requirements

Participants must be 16 years or older and can join from anywhere in the world, excluding standard exceptions. To be eligible for prizes, teams must contribute system prompts, fine-tuned model organisms, and datasets of tasks. These datasets must include one for observed environments and one for unobserved production environments. The submission involves a pull request to a repository containing these contributions.

Prizes

There are over $2,100 in prizes available. Specific prizes include $500 cash for the Best Fine Tuned Model Organism, $500 cash for the Best Text Only Environment, $500 cash for the Most Realistic and Economically Relevant Docker Environment, and $500 cash for the Best Cybersecurity Environment. Additionally, there's $500 in OpenAI Credits for the Best use of GPT5 or OpenAI API, and $500 in RunRL credits for the Best RL Hack. A $100 cash prize is offered for the Best System Prompted Model Organism.

Comments

Loading...