projects
Ongoing Projects
Projects
A selection of ongoing research projects spanning cognitive science, large language models (LLMs), vision-language models (VLMs), data visualization, and AI for mental health.
Naive Scientific Misconceptions in Large Language Models
People: Harsh Nishant Lalai, Raj Sanjay Shah, Sashank Varma
We investigate whether modern LLMs exhibit naive scientific misconceptions analogous to those found in human learners. Inspired by cognitive science work on intuitive theories (e.g., physics/biology/psychology), we probe for systematically “intuitive but wrong” explanations and analyze how misconceptions vary across model scale and prompting conditions.
Focus areas
- Persistent intuitive-but-incorrect explanations
- Misconception structure and robustness
- Links between fluency and conceptual correctness
When Visuals Aren’t the Problem: Evaluating Vision-Language Models on Misleading Data Visualizations
People: Harsh Nishant Lalai, Raj Sanjay Shah, Hanspeter Pfister, Sashank Varma, Grace Guo
*equal contribution
We evaluate VLMs on misleading visualization cases where the visual rendering is not the primary issue, but interpretation and reasoning are. The goal is to understand when models fail at inference-level reasoning even if perception is sufficient.
Focus areas
- Misleading captions and framing
- Axis/scale manipulation and annotation traps
- Reasoning errors over data and uncertainty
- Benchmark design grounded in visualization theory
Simulating AI Patients for Psychotherapy: Challenges and Opportunities
People: Raj Sanjay Shah, Diyi Yang, Tim Althoff, Hiba Arnaout, Dana Atzil-Slonim, Daniel Blonigen, Tanmoy Chakraborty, Stevie Chancellor, Monojit Choudhury, Torrey Creed, Cristian Danescu-Niculescu-Mizil, Steffen Eberhardt, Anmol Goel, Philipp Graffe, Iryna Gurevych, Nick Haber, Dirk Hovy, Minlie Huang, Zac Imel, Hamidreza Jamalabadi, Jana Lasser, Maria Liakata, Ryan Louie, Wolfgang Lutz, Matteo Malgaroli, Clarissa Ong, Flor Miriam Plaza-del-Arco, Julia R. Pozuelo, Sahand Sabour, Brian Schwartz, Thamar Solorio, Aseem Srivastava, Jina Suh
This project examines how simulated AI patients can support evaluation, training, and benchmarking of therapeutic dialogue systems—while avoiding pitfalls around validity, bias, and over-reliance on simulation.
Focus areas
- Fidelity to clinical theory and therapeutic goals
- Evaluation validity and ecological realism
- Safety/ethics and responsible deployment
- Simulation as an evaluation and training instrument
Understanding Graphical Perception in Data Visualization
People: Grace Guo, Jenna Jiayi Kang, Raj Sanjay Shah*, Hanspeter Pfister, Sashank Varma
Previous preprint: arXiv:2411.00257
We study how humans and AI systems interpret visual encodings in charts and graphs, with a focus on grounding evaluation in cognitive theories of graphical perception (not just benchmark accuracy).
Focus areas
- Low-level perceptual judgments (e.g., length/angle/area)
- Higher-level inference tasks and uncertainty
- Human-model alignment under controlled manipulations
AI Patient Bank: State-Based Simulated Patients Grounded in Therapeutic Source Texts
People: Nathan Paek, William Fang, Raj Sanjay Shah, Hercy Shen, Declan Grabb, Emma Brunskill, Diyi Yang, Ryan Louie
We are building a state-based simulated patient framework grounded in therapeutic source texts and optimized via evidence-based testing. The goal is a scalable infrastructure for developing, validating, and stress-testing therapeutic AI systems.
Focus areas
- Structured patient state and dynamics
- Source-text grounding and controllability
- Automated evaluation pipelines
- Unit-test style behavioral validation
Ask Before You Summarize: Clarification-Driven Summarization from Dialogue Transcripts
People: Raj Sanjay Shah, Han-Chin Shing, Lei Xu, Joseph Paul Cohen, Jack Moriarty, Chaitanya Shivade
We propose a clarification-driven framework for summarizing under-specified dialogue transcripts. Instead of summarizing immediately, the system detects missing information, asks targeted clarifying questions, and incorporates responses to produce more faithful and useful summaries.