Ziru Chen

Ron | 陈子如 | チンシジョ
The Ohio State University

C.V.

ronch99

@RonZiruChen

I am a Ph.D. student advised by Dr. Huan Sun. My research focuses on Conversational AI Agents, Natural Language Processing, and Machine Learning, with an emphasis on the following two areas:

Post-training and Test-Time Computation of LLMs: While large language models (LLMs) have demonstrated great success, one concensus between theoretical and empirical research is that they need enough test-time computation to perform complex reasoning and planning. With chain-of-thought [EMNLP'23] or program-of-thought [ACL'23], LLMs can effectively extend their "thinking" to solve more challenging tasks and even win gold medals in international olympiads [Arxiv'25a]. Before reinforcement learning with verifiable rewards (RLVR) becomes a prevalent paradigm, my research has already highlighted verification as a key factor in scaling LLMs' test-time computation [ACL'24], including parallel scaling (e.g., sampling) and recurrent scaling (e.g., searching). With recent advances in RLVR, I am exploring new post-training methods to further enhance LLMs' test-time computation capabilities toward self-evolving AI models.
Language Agents for Coding and Data Analysis: I have been developing task-oriented conversational AI systems [AlexaPrize'22] before the advent of LLMs and language agents. I am particularly interested in building language agents that can perform complex coding and data analysis tasks in real-world scenarios, such as database management [ACL'23][EMNLP'23] and scientific discovery [ICLR'25] [EMNLP'25]. Currently, I am actively thinking about three future directions of language agent research: (1) rigorous and holistic evaluation of agents in ecologically valid settings [Arxiv'25b]; (2) principled and cost-efficient agent scaffold designs for data synthesis and model training [Arxiv'25c]; and (3) continual learning and adaptation in dynamic, user-involved environments.

News

10/2025: Three new preprints on agent evaluation and training: Holistic Agent Leaderboard, LLMs at IOAA, and Agent Data Protocol.
09/2025: Two follow-up papers on ScienceAgentBench: AutoSDT is accepted at EMNLP 2025; GeoAnalystBench is accepted at TGIS 2025.
01/2025: ScienceAgentBench is accepted at ICLR 2025; 1 paper on Chemistry agent accepted at NAACL 2025 (Findings).
10/2024: Releasing ScienceAgentBench, a new benchmark to rigorously assess language agents for data-driven scientific discovery. Check out our pre-print for more details.
05/2024: 1 paper on planning with LLMs accepted at ACL 2024; 1 paper on LLM for E-commerce and RecSys accepted at ICML 2024.
10/2023: 2 papers on text-to-SQL parsing and 1 paper on LLM attribution evaluation accepted at EMNLP 2023.
07/2023: Our TacoBot report is accepted as a demo paper at SIGDIAL 2023.
05/2023: 1 paper accepted at ACL 2023 and 3 preprints out on Arxiv.
06/2022: Won the 3rd place in Alexa Prize TaskBot Challege as a student co-lead! Check out our report.
04/2022: Excited to receive the Undergraduate Research Award from OSU CSE Department!
02/2022: Thrilled to receive the University Fellowship as an incoming Ph.D. student!
05/2021 - 07/2021: Had some fun time at Deep Learning Lab, Westlake University as a Research Intern.
04/2021: Excited to receive the CIS Undrgrad Scholarship from OSU CSE Department!

Honors and Awards

University Fellowship, The Ohio State University, 2022 - 2023 & 2026 - 2027
3rd Place (Student Co-lead of Team TacoBot), 1st Alexa Prize Taskbot Challenge, Amazon, 2022
Undergraduate Research Award, The Ohio State University, 2022
CIS Undergrad Scholarship, The Ohio State University, 2021 - 2022

Teaching

CSE 5525 Speech and Language Processing, Undergrad & Grad level
Teaching Assistant (Autumn 21, Spring 22)
CSE 2321 Foundations I: Discrete Structures, Undergrad level
Teaching Assistant (Autumn 19, Spring 20, Autumn 20)

Selected Publications

Please check out my Google Scholar for a complete list of publications.

Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation [Website]
Sayash Kapoor, Benedikt Stroebl, Peter Kirgis, Nitya Nadgir, Zachary S Siegel, Boyi Wei, Tianci Xue, Ziru Chen, Felix Chen, Saiteja Utpala, Franck Ndzomga, Dheeraj Oruganty, Sophie Luskin, Kangheng Liu, Botao Yu, Amit Arora, Dongyoon Hahm, Harsh Trivedi, Huan Sun, Juyong Lee, Tengjun Jin, Yifan Mai, Yifei Zhou, Yuxuan Zhu, Rishi Bommasani, Daniel Kang, Dawn Song, Peter Henderson, Yu Su, Percy Liang, Arvind Narayanan
Arxiv Preprint (Arxiv 2025, Long)

Large Language Models Achieve Gold Medal Performance at the International Olympiad on Astronomy & Astrophysics (IOAA)
Lucas Carrit Delgado Pinheiro*, Ziru Chen*, Bruno Caixeta Piazza, Ness Shroff, Yingbin Liang, Yuan-Sen Ting, Huan Sun (* equal contribution)
Arxiv Preprint (Arxiv 2025, Long)

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents [Website]
Yueqi Song, Ketan Ramaneti, Zaid Sheikh, Ziru Chen, Boyu Gou, Tianbao Xie, Yiheng Xu, Danyang Zhang, Apurva Gandhi, Fan Yang, Joseph Liu, Tianyue Ou, Zhihao Yuan, Frank F. Xu, Shuyan Zhou, Xingyao Wang, Xiang Yue, Tao Yu, Huan Sun, Yu Su, Graham Neubig
Arxiv Preprint (Arxiv 2025, Long)

AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists
Yifei Li*, Hanane Nour Moussa*, Ziru Chen, Shijie Chen, Botao Yu, Mingyi Xue, Benjamin Burns, Tzu-Yao Chiu, Vishal Dey, Zitong Lu, Chen Wei, Qianheng Zhang, Tianyu Zhang, Song Gao, Xuhui Huang, Xia Ning, Nesreen K. Ahmed, Ali Payani, Huan Sun (* equal contribution)
EMNLP 2025 (Long)

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery [Website]
Ziru Chen, Shijie Chen, Yuting Ning, Qianheng Zhang, Boshi Wang, Botao Yu, Yifei Li, Zeyi Liao, Chen Wei, Zitong Lu, Vishal Dey, Mingyi Xue, Frazier N. Baker, Benjamin Burns, Daniel Adu-Ampratwum, Xuhui Huang, Xia Ning, Song Gao, Yu Su, Huan Sun
ICLR 2025 (Long)
News Coverage: [Nature News]

Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving [Website]
Botao Yu, Frazier N. Baker, Ziru Chen, Garrett Herb, Boyu Gou, Daniel Adu-Ampratwum, Xia Ning, Huan Sun
Findings of NAACL 2025 (Short)

When is Tree Search Useful for LLM Planning? It Depends on the Discriminator [Code & Data] [Poster]
Ziru Chen, Michael White, Raymond Mooney, Ali Payani, Yu Su, Huan Sun
ACL 2024 (Long)

eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data [Website]
Bo Peng*, Xinyi Ling*, Ziru Chen, Huan Sun, Xia Ning (* equal contribution)
ICML 2024 (Long)

Error Detection for Text-to-SQL Semantic Parsing [Code & Data] [Poster]
Shijie Chen, Ziru Chen, Huan Sun, Yu Su
Findings of EMNLP 2023 (Long)

Exploring Chain-of-Thought Style Prompting for Text-to-SQL [Poster]
Chang-Yu Tai*, Ziru Chen*, Tianshu Zhang, Xiang Deng, Huan Sun (* equal contribution)
EMNLP 2023 (Long)

Text-to-SQL Error Correction with Language Models of Code [Code & Data] [Poster]
Ziru Chen, Shijie Chen, Michael White, Raymond Mooney, Ali Payani, Jayanth Srinivasa, Yu Su, Huan Sun
ACL 2023 (Short)

Bootstrapping a User-Centered Task-Oriented Dialogue System
Shijie Chen*, Ziru Chen*, Xiang Deng, Ashley Lewis, Lingbo Mo, Samuel Stevens, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun (* equal contribution)
Alexa Prize Taskbot Challenge 2022 (Long)
News Coverage: [Amazon Science] [OSU CSE] [OSU CoE] [OSU TDAI]

Interesting TMI

* TMI is a Korean short-hand of "Too Much Information," which roughly refers to fun facts that could have not been shared ;)
I was born on Thanksgiving that year.
I can speak Chinese, English, Japanese (JLPT N2), and some elementary Korean. My first name in Japanese ("shì-jō") sounds like "feet washing" ("shee jow") in Chinese.
My MBTI is INTJ. My favorite quote is “Cogito, ergo sum.” (“I think. Therefore, I am.”) by René Descartes.