Publications

A collection of my research work.

EvoLM: In Search of Lost Language Model Training Dynamics

๐Ÿ† Oral Presentation

Zhenting Qi, Fan Nie, Alexandre Alahi, James Zou, Himabindu Lakkaraju, Yilun Du, Eric Xing, Sham Kakade, Hanlin Zhang

Advances in Neural Information Processing Systems (NeurIPS) 2025

We developed a comprehensive model suite for analyzing language model training dynamics across pre-training, continued pre-training, supervised fine-tuning, and reinforcement learning stages.

arXivPDF

Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models

Zidi Xiong, Shan Chen, Zhenting Qi, Himabindu Lakkaraju

Advances in Neural Information Processing Systems (NeurIPS) 2025

We introduced a systematic framework to evaluate the faithfulness of thinking drafts in Large Reasoning Models using counterfactual interventions.

arXivPDF

Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

Maohao Shenโ€ , Guangtao Zengโ€ , Zhenting Qiโ€ , Zhang-Wei Hong, Zhenfang Chen, Wei Lu, Gregory Wornell, Subhro Das, David Cox, Chuang Gan

International Conference on Machine Learning (ICML) 2025

We introduced the COAT reasoning framework to enhance LLM reasoning via autoregressive search with self-reflection and self-exploration.

arXivPDF

rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Zhenting Qiโ€ , Mingyuan Maโ€ , Jiahang Xuโ€ , Li Lyna Zhang, Fan Yang, Mao Yang

International Conference on Learning Representations (ICLR) 2025

We introduced rStar, a self-play mutual reasoning approach that enhances reasoning capabilities of small language models without fine-tuning or superior models.

arXivPDF

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

Zhenting Qi, Hanlin Zhang, Eric Xing, Sham Kakade, Himabindu Lakkaraju

International Conference on Learning Representations (ICLR) 2025

We developed a scalable method for extracting data from RAG systems using LLMs' instruction-following capabilities.

arXivPDF

Quantifying Generalization Complexity for Large Language Models

Zhenting Qi, Hongyin Luo, Xuliang Huang, Zhuokai Zhao, Yibo Jiang, Xiangjun Fan, Himabindu Lakkaraju, James Glass

International Conference on Learning Representations (ICLR) 2025

We introduced Scylla, a dynamic evaluation framework that quantitatively measures LLMs' generalization abilities by disentangling generalization from memorization.

arXivPDF

FOLIO: Natural Language Reasoning with First-Order Logic

Simeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin Riddell, Wenfei Zhou, James Coady, David Peng, Yujie Qiao, Luke Benson, Lucy Sun, Alex Wardle-Solano, Hannah Szabo, Ekaterina Zubova, Matthew Burtell, Jonathan Fan, Yixin Liu, Brian Wong, Malcolm Sailor, Ansong Ni, Linyong Nan, Jungo Kasai, Tao Yu, Rui Zhang, Alexander R. Fabbri, Wojciech Kryscinski, Semih Yavuz, Ye Liu, Xi Victoria Lin, Shafiq Joty, Yingbo Zhou, Caiming Xiong, Rex Ying, Arman Cohan, Dragomir Radev

Conference on Empirical Methods in Natural Language Processing (EMNLP) 2024

We developed a comprehensive dataset and benchmark for natural language reasoning using First-Order Logic.

arXivPDF

P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains

Simeng Han, Aaron Yu, Rui Shen, Zhenting Qi, Martin Riddell, Wenfei Zhou, Yujie Qiao, Yilun Zhao, Semih Yavuz, Ye Liu, Shafiq Joty, Yingbo Zhou, Caiming Xiong, Dragomir Radev, Rex Ying, Arman Cohan

Conference on Empirical Methods in Natural Language Processing (EMNLP) 2024

We extended FOLIO with abundant human-written reasoning chains, providing detailed reasoning processes for logical reasoning tasks.

arXivPDF

Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge

Weihua Du, Qiushi Lyu, Jiaming Shan, Zhenting Qi, Hongxin Zhang, Sunli Chen, Andi Peng, Tianmin Shu, Kwonjoon Lee, Behzad Dariush, Chuang Gan

Advances in Neural Information Processing Systems (NeurIPS) 2024

We introduced a comprehensive benchmark challenge for advancing research in embodied social intelligence through constrained human-AI cooperation scenarios.

arXivPDF

PILLOW: Enhancing Efficient Instruction Fine-tuning via Prompt Matching

๐Ÿ† Oral Presentation

Zhenting Qi, Xiaoyu Tan, Shaojie Shi, Chao Qu, Yinghui Xu, Yuan Qi

Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023

We introduced a prompt matching framework to enhance the efficiency of instruction fine-tuning.

arXivPDF

QTSumm: A New Benchmark for Query-Focused Table Summarization

Yilun Zhao, Zhenting Qi, Linyong Nan, Boyu Mi, Yixin Liu, Weijin Zou, Simeng Han, Xiangru Tang, Yumo Xu, Arman Cohan, Dragomir Radev

Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023

We introduced a comprehensive benchmark dataset for query-focused table summarization.

arXivPDF

Self-Criticism: Aligning Large Language Models with their Understanding of Helpfulness, Honesty, and Harmlessness

๐Ÿ† Oral Presentation

Xiaoyu Tan, Shaojie Shi, Xihe Qiu, Chao Qu, Zhenting Qi, Yinghui Xu, Yuan Qi

Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023

We introduced a self-criticism framework that enables models to evaluate and improve their outputs based on their understanding of helpfulness, honesty, and harmlessness.

OpenRT: An Open-source Framework for Reasoning Over Tabular Data

Yilun Zhao, Boyu Mi, Zhenting Qi, Linyong Nan, Minghao Guo, Arman Cohan, Dragomir Radev

Annual Meeting of the Association for Computational Linguistics (ACL) 2023

We developed and released an open-source framework for reasoning over tabular data.

RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations

Yilun Zhao, Chen Zhao, Linyong Nan, Zhenting Qi, Wenlin Zhang, Xiangru Tang, Boyu Mi, Dragomir Radev

Annual Meeting of the Association for Computational Linguistics (ACL) 2023

We conducted a systematic study of table QA robustness against human-annotated adversarial perturbations.

arXivPDF

SaFER: A Robust and Efficient Framework for Fine-tuning BERT-based Classifier with Noisy Labels

Zhenting Qi, Xiaoyu Tan, Chao Qu, Yinghui Xu, Yuan Qi

Annual Meeting of the Association for Computational Linguistics (ACL) 2023

We developed a robust framework for fine-tuning BERT-based classifiers in the presence of noisy labels.

LoFT: Enhancing Faithfulness and Diversity for Table-to-Text Generation via Logic Form Control

๐Ÿ† Oral Presentation

Yilun Zhao, Zhenting Qi, Linyong Nan, Lorenzo Jaime Flores, Dragomir Radev

Conference of the European Chapter of the Association for Computational Linguistics (EACL) 2023

We introduced logic form control mechanisms to guide table-to-text generation and ensure faithfulness to source data.

arXivPDF

ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples

Yilun Zhao, Linyong Nan, Zhenting Qi, Rui Zhang, Dragomir Radev

Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022

We developed methods to generate synthetic reasoning examples for table understanding tasks and integrated table reasoning skills into the pre-training phase.

arXivPDF