Miguel Rodriguez

PhD Student at Massachusetts Institute of Technology

Cambridge, USA (Originally from Madrid, Spain)GMT-5 (Eastern Time)
English (Fluent)Spanish (Native)
Member since September 2024
Actively Seeking Mentor

Primary Research Interests

Mechanistic Interpretability

Understanding internal representations in large language models, particularly how abstract concepts are encoded and manipulated

Transparency Tools

Developing interactive tools for analyzing and visualizing neural network behavior

About

I'm a second-year PhD student at MIT's Computer Science & Artificial Intelligence Laboratory (CSAIL), focusing on neural network interpretability methods for large language models. My research aims to develop techniques for understanding how these models form internal representations and how we can better track their reasoning processes.

I became interested in AI safety during my master's studies after completing the AI Alignment Fast-Track course. I'm particularly fascinated by mechanistic interpretability and how it might help us understand and align increasingly capable language models. My work combines technical approaches from machine learning with insights from neuroscience about how biological systems represent information.

Prior to my PhD, I worked briefly as a machine learning engineer at a computer vision startup, which gave me practical experience with deploying ML systems. This experience made me more aware of the gap between theoretical understanding and practical deployment of AI, which is part of what motivates my focus on interpretability.

I'm seeking mentorship to help refine my research direction and connect with the broader AI safety community. While I have strong technical skills, I'm looking for guidance on which interpretability approaches are most promising from an alignment perspective.

Career Experience

Current

PhD Student in Computer Science, MIT (2023-present)

Previous Roles

Machine Learning Engineer

VisionTech AI2022-2023

Developed computer vision models for autonomous navigation systems. Implemented robustness testing frameworks for deployed models.

Research Assistant

Technical University of Madrid2020-2022

Assisted with research on deep learning architectures for natural language processing. Co-authored two papers on attention mechanisms in transformer models.

Education

Ph.D. in Computer Science (in progress)

Massachusetts Institute of Technology2023-present

Focus: Neural Network Interpretability and AI Safety

M.S. in Artificial Intelligence

Technical University of Madrid2022

Focus: Deep Learning and Natural Language Processing

B.S. in Computer Engineering

University of Barcelona2020

Focus: Software Systems and Machine Learning

Community Involvement

  • Organizer of AI Safety Reading Group at MIT (15 regular participants)
  • Contributor to the Alignment Forum (3 posts)
  • Volunteer at Interpretability Hackathon (2024)
  • Participant in MATS Program (2023)