Mert Yuksekgonul

I'm a fourth year PhD student in Computer Science at Stanford University. I am lucky to be advised by James Zou and Carlos Guestrin.

I work on machine learning to solve previously unsolved problems using safe, controllable AI systems.

As AI systems advance toward such complex challenges, supervision becomes a fundamental bottleneck to progress, and thus models improving themselves with synthetic data / RL have been gaining increasing traction.

This makes safe and controllable self-improvement the central challenge: without proper technical foundations, we risk developing systems we cannot meaningfully direct or understand. We need technical tools both to make such advances possible, and to understand and control how AI systems enhance their own capabilities.

To this end, I develop algorithms [e.g., TextGrad (In Press '25)] to improve AI systems using themselves, use their internal representations [e.g., mechanistic error detectors (ICLR '24), concept bottlenecks (ICLR '23), concept-based counterfactuals (ICML '22)] to make them more reliable, and study how training shapes their failure modes [e.g., bag-of-wordness of VLMs (ICLR '23), atypicality and calibration (NeurIPS '23)].

Mert Yuksekgonul

Selected Publications

For a full list of publications, please see my Google Scholar.

Optimizing generative AI by backpropagating language model feedback
Mert Yuksekgonul*, Federico Bianchi*, Joseph Boen*, Sheng Liu*, Pan Lu*, Zhi Huang*, Carlos Guestrin, James Zou
In press
When and why vision-language models behave like bags-of-words, and what to do about it?
Mert Yuksekgonul, Federico Bianchi, Pratyusha (Ria) Kalluri, Dan Jurafsky, James Zou
Oral, ICLR '23 (Top 5% of all accepted papers)
Beyond Confidence: Reliable Models Should Also Quantify Atypicality
Mert Yuksekgonul, Linjun Zhang, James Zou, Carlos Guestrin
NeurIPS '23, Contributed Talk - ICLR '23 Trustworthy ML
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones, Suriya Gunasekar, Ranjita Naik, Hamid Palangi, Ece Kamar, Besmira Nushi
ICLR '24
Post-hoc Concept Bottleneck Models
Mert Yuksekgonul, Maggie Wang, James Zou
Spotlight, ICLR '23 (Top 25% of all accepted papers)
A visual–language foundation model for pathology image analysis using medical Twitter
Zhi Huang*, Federico Bianchi*, Mert Yuksekgonul, Thomas J Montine, James Zou
Nature Medicine '23, Cover
Meaningfully debugging model mistakes using conceptual counterfactual explanations
Abubakar Abid*, Mert Yuksekgonul*, James Zou
ICML '22
Holistic Evaluation of Language Models
Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda
TMLR '23