Junior Research Fellow in Computer Science

Mateo Espinosa Zarlenga

  • My main areas of research within Artificial Intelligence (AI) are interpretability, representation learning, and human-AI interaction.
  • I absolutely love the community aspect of Oxford’s tutorial and supervision system. Having the chance to engage in small-group discussions with students about interesting problems is a truly unique way to gain a deep understanding of a topic.
  • I am currently working, alongside close collaborators, on a textbook on AI interpretability, exploring the underlying principles for designing powerful AI models that we can truly understand.
Image

Profile

I am a new Junior Research Fellow (JRF) in Computer Science at Trinity College, Oxford. Prior to Trinity, I was a PhD student and a Gates Cambridge Scholar at the University of Cambridge, where I spent four lovely years exploring how to design AI vision models that can receive intermediate hints from experts during deployment. Before my PhD, I completed an MPhil in Computer Science at the University of Cambridge, worked for three and a half years at an AI startup in Silicon Valley, and earned an MEng and BA in Computer Science against the cold but beautiful backdrop of Cornell University’s campus.

Teaching

I have previously supervised undergraduate computer science classes (e.g., Discrete Mathematics, Introduction to Artificial Intelligence, Functional Programming). I am interested in continuing these efforts both for Trinity students in computer science and, more generally, for Oxford students at the Department of Computer Science.

Beyond undergraduate teaching, I have been involved in teaching and designing a master’s-level course (Cambridge’s “Explainable AI” course) and have co-supervised and supervised several master’s-level research projects, some of which have subsequently resulted in publications. If you are interested in supervision for a research project, e.g., a research-focused undergraduate thesis or a masters project, please do not hesitate to reach out!

Research

Large Language Models (LLMs) are this decade’s World Wide Web. They are becoming increasingly integrated into every aspect of our lives, whether we want that or not, and have revolutionised the field of Artificial Intelligence (AI). Nevertheless, the inability of LLMs to (1) indicate uncertainty on intermediate steps of their reasoning, and (2) properly update their outputs when intermediate steps are corrected remain a significant barrier to their deployment in high-stakes environments. My research aims to enhance the reliability of LLMs by developing mechanisms that enable them to indicate uncertainty in their intermediate reasoning steps and effectively incorporate human-driven corrections during inference. As such, my work lies in the overlap of interpretability (we need to understand how models reason for us to be able to provide inference-time feedback to them), representation learning (we want LLMs to learn representations that align with notions we can reason about), and human-AI interaction (we want an LLM’s representations and reasoning to be something humans can manipulate).

Selected Publications

Please see my Google Scholar profile for an up-to-date list of publications (https://scholar.google.com/citations?user=4ikoEiMAAAAJ&hl=en

Espinosa Zarlenga, M., Barbiero, P., Ciravegna, G., Marra, G., Giannini, F., et al. ‘Concept embedding models: Beyond the accuracy-explainability trade-off.’ Advances in neural information processing systems 35 (2022): 21400-21413.

Espinosa Zarlenga, M., Collins, K.M., Dvijotham, K., Weller, A., Shams, Z. and Jamnik, M. ‘Learning to receive help: Intervention-aware concept embedding models.’ Advances in Neural Information Processing Systems 36 (2023): 37849-37875.

Espinosa Zarlenga, M., Dominici, G., Barbiero, P., Shams, Z., and Jamnik, M. ‘Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts.’ International Conference on Machine Learning. PMLR, 2025.