Principal Research Scientist at Anthropic
Dr. Sarah Chen is a Principal Research Scientist at Anthropic focusing on interpretability methods for large language models. Her research aims to develop techniques that make AI systems more transparent, explainable, and aligned with human values. She leads Anthropic's Mechanistic Interpretability team, where she and her colleagues work to understand how concepts and capabilities emerge in large language models.
Prior to joining Anthropic in 2022, Sarah was a Senior Research Scientist at DeepMind, where she worked on reinforcement learning safety and robustness. She completed her Ph.D. at Stanford University in 2017, focusing on interpretability methods for neural networks. Her dissertation, "Transparency in Deep Learning: Interpreting Neural Network Behavior," received the department's outstanding thesis award.
Sarah is passionate about ensuring AI systems are developed safely and responsibly. She believes that understanding the internal mechanisms of advanced AI is crucial for ensuring these systems remain aligned with human values as they become more capable.
Anthropic • 2022-Present
Leading interpretability research to understand and align large language models. Managing a team of 7 researchers focused on mechanistic interpretability.
DeepMind • 2019-2022
Worked on reinforcement learning safety and robustness. Developed methods for preventing reward hacking and ensuring agent reliability.
AI Alignment Lab • 2017-2019
Developed early work on neural network interpretability, focusing on feature visualization techniques and attribution methods.
Stanford University • 2013-2017
Conducted research on interpretability methods for deep neural networks. Teaching assistant for Machine Learning and AI Safety courses.
Stanford University • 2017
Focus: Machine Learning and AI Safety
Carnegie Mellon University • 2013
Focus: Neural Networks and Representations
UC Berkeley • 2011
Focus: Artificial Intelligence