As large language models grow in capability, concerns mount about their potential to reason strategically about oversight and conceal misaligned behavior. While recent work has begun to study deceptive alignment and to develop white-box detectors that rely on model internals, the field lacks a unified platform for evaluating these methods across diverse models and environments. In this work, we introduce a new taxonomy for model organisms of alignment faking and a testbed for assessing white-box detector strategies. Explore our results here!