Dr. Sarah Chen

Principal Research Scientist at Anthropic

San Francisco, USAGMT-8 (Pacific Time)
English (Fluent)Mandarin Chinese (Fluent)
Member since January 2023
Limited (1 slot open)

Top Expertise Areas

Interpretability Research
8 yrs
AI Alignment
7 yrs
ML Safety
6 yrs
Adversarial Testing
5 yrs

About

Dr. Sarah Chen is a Principal Research Scientist at Anthropic focusing on interpretability methods for large language models. Her research aims to develop techniques that make AI systems more transparent, explainable, and aligned with human values. She leads Anthropic's Mechanistic Interpretability team, where she and her colleagues work to understand how concepts and capabilities emerge in large language models.

Prior to joining Anthropic in 2022, Sarah was a Senior Research Scientist at DeepMind, where she worked on reinforcement learning safety and robustness. She completed her Ph.D. at Stanford University in 2017, focusing on interpretability methods for neural networks. Her dissertation, "Transparency in Deep Learning: Interpreting Neural Network Behavior," received the department's outstanding thesis award.

Sarah is passionate about ensuring AI systems are developed safely and responsibly. She believes that understanding the internal mechanisms of advanced AI is crucial for ensuring these systems remain aligned with human values as they become more capable.

Career Path

Principal Research Scientist

Anthropic2022-Present

Leading interpretability research to understand and align large language models. Managing a team of 7 researchers focused on mechanistic interpretability.

Senior Research Scientist

DeepMind2019-2022

Worked on reinforcement learning safety and robustness. Developed methods for preventing reward hacking and ensuring agent reliability.

Research Scientist

AI Alignment Lab2017-2019

Developed early work on neural network interpretability, focusing on feature visualization techniques and attribution methods.

Ph.D. Candidate

Stanford University2013-2017

Conducted research on interpretability methods for deep neural networks. Teaching assistant for Machine Learning and AI Safety courses.

Education

Ph.D. in Computer Science

Stanford University2017

Focus: Machine Learning and AI Safety

M.S. in Machine Learning

Carnegie Mellon University2013

Focus: Neural Networks and Representations

B.S. in Computer Science

UC Berkeley2011

Focus: Artificial Intelligence