Publications
A collection of my research work.
EvoLM: In Search of Lost Language Model Training Dynamics
Zhenting Qi, Fan Nie, Alexandre Alahi, James Zou, Himabindu Lakkaraju, Yilun Du, Eric Xing, Sham Kakade, Hanlin Zhang
Advances in Neural Information Processing Systems (NeurIPS) 2025
We developed a comprehensive model suite for analyzing language model training dynamics across pre-training, continued pre-training, supervised fine-tuning, and reinforcement learning stages.
Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
Zidi Xiong, Shan Chen, Zhenting Qi, Himabindu Lakkaraju
Advances in Neural Information Processing Systems (NeurIPS) 2025
We introduced a systematic framework to evaluate the faithfulness of thinking drafts in Large Reasoning Models using counterfactual interventions.
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Maohao Shenโ , Guangtao Zengโ , Zhenting Qiโ , Zhang-Wei Hong, Zhenfang Chen, Wei Lu, Gregory Wornell, Subhro Das, David Cox, Chuang Gan
International Conference on Machine Learning (ICML) 2025
We introduced the COAT reasoning framework to enhance LLM reasoning via autoregressive search with self-reflection and self-exploration.
rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Zhenting Qiโ , Mingyuan Maโ , Jiahang Xuโ , Li Lyna Zhang, Fan Yang, Mao Yang
International Conference on Learning Representations (ICLR) 2025
We introduced rStar, a self-play mutual reasoning approach that enhances reasoning capabilities of small language models without fine-tuning or superior models.
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
Zhenting Qi, Hanlin Zhang, Eric Xing, Sham Kakade, Himabindu Lakkaraju
International Conference on Learning Representations (ICLR) 2025
We developed a scalable method for extracting data from RAG systems using LLMs' instruction-following capabilities.
Quantifying Generalization Complexity for Large Language Models
Zhenting Qi, Hongyin Luo, Xuliang Huang, Zhuokai Zhao, Yibo Jiang, Xiangjun Fan, Himabindu Lakkaraju, James Glass
International Conference on Learning Representations (ICLR) 2025
We introduced Scylla, a dynamic evaluation framework that quantitatively measures LLMs' generalization abilities by disentangling generalization from memorization.
FOLIO: Natural Language Reasoning with First-Order Logic
Simeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin Riddell, Wenfei Zhou, James Coady, David Peng, Yujie Qiao, Luke Benson, Lucy Sun, Alex Wardle-Solano, Hannah Szabo, Ekaterina Zubova, Matthew Burtell, Jonathan Fan, Yixin Liu, Brian Wong, Malcolm Sailor, Ansong Ni, Linyong Nan, Jungo Kasai, Tao Yu, Rui Zhang, Alexander R. Fabbri, Wojciech Kryscinski, Semih Yavuz, Ye Liu, Xi Victoria Lin, Shafiq Joty, Yingbo Zhou, Caiming Xiong, Rex Ying, Arman Cohan, Dragomir Radev
Conference on Empirical Methods in Natural Language Processing (EMNLP) 2024
We developed a comprehensive dataset and benchmark for natural language reasoning using First-Order Logic.
P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Simeng Han, Aaron Yu, Rui Shen, Zhenting Qi, Martin Riddell, Wenfei Zhou, Yujie Qiao, Yilun Zhao, Semih Yavuz, Ye Liu, Shafiq Joty, Yingbo Zhou, Caiming Xiong, Dragomir Radev, Rex Ying, Arman Cohan
Conference on Empirical Methods in Natural Language Processing (EMNLP) 2024
We extended FOLIO with abundant human-written reasoning chains, providing detailed reasoning processes for logical reasoning tasks.
Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge
Weihua Du, Qiushi Lyu, Jiaming Shan, Zhenting Qi, Hongxin Zhang, Sunli Chen, Andi Peng, Tianmin Shu, Kwonjoon Lee, Behzad Dariush, Chuang Gan
Advances in Neural Information Processing Systems (NeurIPS) 2024
We introduced a comprehensive benchmark challenge for advancing research in embodied social intelligence through constrained human-AI cooperation scenarios.
PILLOW: Enhancing Efficient Instruction Fine-tuning via Prompt Matching
Zhenting Qi, Xiaoyu Tan, Shaojie Shi, Chao Qu, Yinghui Xu, Yuan Qi
Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023
We introduced a prompt matching framework to enhance the efficiency of instruction fine-tuning.
QTSumm: A New Benchmark for Query-Focused Table Summarization
Yilun Zhao, Zhenting Qi, Linyong Nan, Boyu Mi, Yixin Liu, Weijin Zou, Simeng Han, Xiangru Tang, Yumo Xu, Arman Cohan, Dragomir Radev
Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023
We introduced a comprehensive benchmark dataset for query-focused table summarization.
Self-Criticism: Aligning Large Language Models with their Understanding of Helpfulness, Honesty, and Harmlessness
Xiaoyu Tan, Shaojie Shi, Xihe Qiu, Chao Qu, Zhenting Qi, Yinghui Xu, Yuan Qi
Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023
We introduced a self-criticism framework that enables models to evaluate and improve their outputs based on their understanding of helpfulness, honesty, and harmlessness.
OpenRT: An Open-source Framework for Reasoning Over Tabular Data
Yilun Zhao, Boyu Mi, Zhenting Qi, Linyong Nan, Minghao Guo, Arman Cohan, Dragomir Radev
Annual Meeting of the Association for Computational Linguistics (ACL) 2023
We developed and released an open-source framework for reasoning over tabular data.
RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations
Yilun Zhao, Chen Zhao, Linyong Nan, Zhenting Qi, Wenlin Zhang, Xiangru Tang, Boyu Mi, Dragomir Radev
Annual Meeting of the Association for Computational Linguistics (ACL) 2023
We conducted a systematic study of table QA robustness against human-annotated adversarial perturbations.
SaFER: A Robust and Efficient Framework for Fine-tuning BERT-based Classifier with Noisy Labels
Zhenting Qi, Xiaoyu Tan, Chao Qu, Yinghui Xu, Yuan Qi
Annual Meeting of the Association for Computational Linguistics (ACL) 2023
We developed a robust framework for fine-tuning BERT-based classifiers in the presence of noisy labels.
LoFT: Enhancing Faithfulness and Diversity for Table-to-Text Generation via Logic Form Control
Yilun Zhao, Zhenting Qi, Linyong Nan, Lorenzo Jaime Flores, Dragomir Radev
Conference of the European Chapter of the Association for Computational Linguistics (EACL) 2023
We introduced logic form control mechanisms to guide table-to-text generation and ensure faithfulness to source data.
ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples
Yilun Zhao, Linyong Nan, Zhenting Qi, Rui Zhang, Dragomir Radev
Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022
We developed methods to generate synthetic reasoning examples for table understanding tasks and integrated table reasoning skills into the pre-training phase.