Mert Yuksekgonul
I'm a fourth year PhD student in Computer Science at Stanford University. I am lucky to be advised by James Zou and Carlos Guestrin.
I work on self-improving and self-moderating AI systems.
My goal is building systems that learn from their experience and recognize their limits, so they can safely tackle previously unsolvable problems.
Email  / 
Github  / 
Twitter  / 
Google Scholar
|
|
Selected Publications
|
TextGrad: Automatic ``Differentiation'' via Text
Mert Yuksekgonul*,
Federico Bianchi*,
Joseph Boen*,
Sheng Liu*,
Zhi Huang*,
Carlos Guestrin
James Zou
[Preprint,
Package and tutorials]
▶ Show Description
AI systems are evolving through the integration of complex components like large language models, and optimization for such systems are still very ad-hoc with hand-crafted components. Inspired by how backpropagation transformed deep learning, we build TextGrad: Automatic ``Differentiation'' via text. We build heavily on autograd and gradients as analogies, where we use feedback from LLMs as gradients to the variables in a computation graph. We show its simple and performant use in a wide variety of applications, ranging from code optimization, QA, molecule optimization, and radiotherapy treatment planning.
|
|
When and why vision-language models behave like bags-of-words, and what to do about it?
Mert Yuksekgonul,
Federico Bianchi,
Pratyusha Kalluri,
Dan Jurafsky,
James Zou
Oral @ ICLR 2023 (Top 5% of all accepted papers)
[Paper,
Code
]
▶ Show Description
Recent work [and many tweet threads] suggests that Vision-Language models(VLMs) such as CLIP do not fare well with compositional understanding. Here, we first propose a large-scale benchmark, ARO(Attribution, Relation and Order) to evaluate fine-grained relational, attributive, and order understanding. Why did the BoW-like behavior is not reflected in the retrieval evaluations (e.g. COCO/Flickr30k), where the datasets contain rich compositional structure? We propose interesting experiments to show that models do not need to do well with compositions to perform well on these tasks. Similarly, this can explain why contrastive-pretrained models may be exploiting this shortcut, and we should be careful about it. Following this intuition, we propose composition-aware negative mining. Check out our work !
|
|
Beyond Confidence: Reliable Models Should Also Quantify Atypicality
Mert Yuksekgonul,
Linjun Zhang,
James Zou,
Carlos Guestrin
NeurIPS 2023, Contributed Talk @ ICLR 2023 Trustworthy ML
[Paper, Code]
▶ Show Description
While most machine learning models can provide confidence in their predictions, confidence is insufficient to understand and use the model's uncertainty reliably. In this work, we investigate the relationship between how atypical~(or rare) a sample is and the reliability of a model's confidence for this sample. Read the paper for interesting connections between atypicality and uncertainty!
|
|
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Mert Yuksekgonul,
Varun Chandrasekaran,
Erik Jones,
Suriya Gunasekar,
Ranjita Naik,
Hamid Palangi ,
Ece Kamar ,
Besmira Nushi
ICLR 2024
[
Paper,
Code]
▶ Show Description
We investigate the internal behavior of LLMs associated with their generation of factually incorrect text. We propose modeling factual queries as constraint satisfaction problems and use this framework to investigate how the model interacts internally with factual constraints. We find a strong positive relationship between the model's attention to constraint tokens and the factual accuracy of generations.
|
|
Post-hoc Concept Bottleneck Models
Mert Yuksekgonul,
Maggie Wang,
James Zou
Spotlight @ ICLR2023 (Top 25% of all accepted papers)
[
Paper,
Code ]
▶ Show Description
Concept Bottleneck Models are very cool! But it's hard to find large training datasets with concept annotations, or match the performance of unrestricted neural nets with limited bottlenecks. In this work, we address practical limitations of CBMs by introducing Post-hoc Concept Bottleneck models (PCBMs). Further, we have a user study where humans can improve PCBMs via concept-level feedback.
|
|
Leveraging medical Twitter to build a visual–language foundation model for pathology AI
Zhi Huang*,
Federico Bianchi*,
Mert Yuksekgonul,
Thomas Montine,
James Zou
Nature Medicine
[
Preprint,
Demo ]
▶ Show Description
We collect data from Twitter (yes, Twitter) + LAION and release the largest text-image pathology dataset: OpenPath. We also release PLIP, a CLIP-variant for pathology which gives exciting performance in 0-shot, transfer learning, and retrieval for Path.
|
|
Meaningfully debugging model mistakes using conceptual counterfactual explanations
Abubakar Abid*,
Mert Yuksekgonul*,
James Zou
ICML 2022
[
Paper ,
Code
]
▶ Show Description
We use human understandable concepts and counterfactual explanations (we call Conceptual Counterfactual Explanations) to debug model mistakes and reveal biases of a model.
|
Template from this website.
|