PhD Student at Massachusetts Institute of Technology
Understanding internal representations in large language models, particularly how abstract concepts are encoded and manipulated
Developing interactive tools for analyzing and visualizing neural network behavior
I'm a second-year PhD student at MIT's Computer Science & Artificial Intelligence Laboratory (CSAIL), focusing on neural network interpretability methods for large language models. My research aims to develop techniques for understanding how these models form internal representations and how we can better track their reasoning processes.
I became interested in AI safety during my master's studies after completing the AI Alignment Fast-Track course. I'm particularly fascinated by mechanistic interpretability and how it might help us understand and align increasingly capable language models. My work combines technical approaches from machine learning with insights from neuroscience about how biological systems represent information.
Prior to my PhD, I worked briefly as a machine learning engineer at a computer vision startup, which gave me practical experience with deploying ML systems. This experience made me more aware of the gap between theoretical understanding and practical deployment of AI, which is part of what motivates my focus on interpretability.
I'm seeking mentorship to help refine my research direction and connect with the broader AI safety community. While I have strong technical skills, I'm looking for guidance on which interpretability approaches are most promising from an alignment perspective.
PhD Student in Computer Science, MIT (2023-present)
VisionTech AI • 2022-2023
Developed computer vision models for autonomous navigation systems. Implemented robustness testing frameworks for deployed models.
Technical University of Madrid • 2020-2022
Assisted with research on deep learning architectures for natural language processing. Co-authored two papers on attention mechanisms in transformer models.
Massachusetts Institute of Technology • 2023-present
Focus: Neural Network Interpretability and AI Safety
Technical University of Madrid • 2022
Focus: Deep Learning and Natural Language Processing
University of Barcelona • 2020
Focus: Software Systems and Machine Learning