Ziru Chen

Ron | 陈子如 | チン シジョ
The Ohio State University
C.V.      chen.8336@osu.edu
ronch99      @RonZiruChen

I am a Ph.D. student advised by Dr. Huan Sun. My research focuses on Conversational AI Agents, Natural Language Processing, and Machine Learning, with an emphasis on the following two areas:
  • Post-training and Test-Time Computation of LLMs: While large language models (LLMs) have demonstrated great success, one concensus between theoretical and empirical research is that they need enough test-time computation to perform complex reasoning and planning. With chain-of-thought [EMNLP'23] or program-of-thought [ACL'23], LLMs can effectively extend their "thinking" to solve more challenging tasks and even win gold medals in international olympiads [Arxiv'25a]. Before reinforcement learning with verifiable rewards (RLVR) becomes a prevalent paradigm, my research has already highlighted verification as a key factor in scaling LLMs' test-time computation [ACL'24], including parallel scaling (e.g., sampling) and recurrent scaling (e.g., searching). With recent advances in RLVR, I am exploring new post-training methods to further enhance LLMs' test-time computation capabilities toward self-evolving AI models.
  • Language Agents for Coding and Data Analysis: I have been developing task-oriented conversational AI systems [AlexaPrize'22] before the advent of LLMs and language agents. I am particularly interested in building language agents that can perform complex coding and data analysis tasks in real-world scenarios, such as database management [ACL'23][EMNLP'23] and scientific discovery [ICLR'25] [EMNLP'25]. Currently, I am actively thinking about three future directions of language agent research: (1) rigorous and holistic evaluation of agents in ecologically valid settings [Arxiv'25b]; (2) principled and cost-efficient agent scaffold designs for data synthesis and model training [Arxiv'25c]; and (3) continual learning and adaptation in dynamic, user-involved environments.

News

  • 10/2025: Three new preprints on agent evaluation and training: Holistic Agent Leaderboard, LLMs at IOAA, and Agent Data Protocol.
  • 09/2025: Two follow-up papers on ScienceAgentBench: AutoSDT is accepted at EMNLP 2025; GeoAnalystBench is accepted at TGIS 2025.
  • 01/2025: ScienceAgentBench is accepted at ICLR 2025; 1 paper on Chemistry agent accepted at NAACL 2025 (Findings).
  • 10/2024: Releasing ScienceAgentBench, a new benchmark to rigorously assess language agents for data-driven scientific discovery. Check out our pre-print for more details.
  • 05/2024: 1 paper on planning with LLMs accepted at ACL 2024; 1 paper on LLM for E-commerce and RecSys accepted at ICML 2024.
  • 10/2023: 2 papers on text-to-SQL parsing and 1 paper on LLM attribution evaluation accepted at EMNLP 2023.
  • 07/2023: Our TacoBot report is accepted as a demo paper at SIGDIAL 2023.
  • 05/2023: 1 paper accepted at ACL 2023 and 3 preprints out on Arxiv.
  • 06/2022: Won the 3rd place in Alexa Prize TaskBot Challege as a student co-lead! Check out our report.
  • 04/2022: Excited to receive the Undergraduate Research Award from OSU CSE Department!
  • 02/2022: Thrilled to receive the University Fellowship as an incoming Ph.D. student!
  • 05/2021 - 07/2021: Had some fun time at Deep Learning Lab, Westlake University as a Research Intern.
  • 04/2021: Excited to receive the CIS Undrgrad Scholarship from OSU CSE Department!

Honors and Awards

  • University Fellowship, The Ohio State University, 2022 - 2023 & 2026 - 2027
  • 3rd Place (Student Co-lead of Team TacoBot), 1st Alexa Prize Taskbot Challenge, Amazon, 2022
  • Undergraduate Research Award, The Ohio State University, 2022
  • CIS Undergrad Scholarship, The Ohio State University, 2021 - 2022

Teaching


Selected Publications

Please check out my Google Scholar for a complete list of publications.

  • Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation
    Sayash Kapoor, Benedikt Stroebl, Peter Kirgis, Nitya Nadgir, Zachary S Siegel, Boyi Wei, Tianci Xue, Ziru Chen, Felix Chen, Saiteja Utpala, Franck Ndzomga, Dheeraj Oruganty, Sophie Luskin, Kangheng Liu, Botao Yu, Amit Arora, Dongyoon Hahm, Harsh Trivedi, Huan Sun, Juyong Lee, Tengjun Jin, Yifan Mai, Yifei Zhou, Yuxuan Zhu, Rishi Bommasani, Daniel Kang, Dawn Song, Peter Henderson, Yu Su, Percy Liang, Arvind Narayanan
    Arxiv Preprint (Arxiv 2025, Long)

Interesting TMI

* TMI is a Korean short-hand of "Too Much Information," which roughly refers to fun facts that could have not been shared ;)
I was born on Thanksgiving that year.
I can speak Chinese, English, Japanese (JLPT N2), and some elementary Korean. My first name in Japanese ("shì-jō") sounds like "feet washing" ("shee jow") in Chinese.
My MBTI is INTJ. My favorite quote is “Cogito, ergo sum.” (“I think. Therefore, I am.”) by René Descartes.