Shuyan Zhou

Shuyan Zhou
github | twitter | linkedin
google scholar
About Me
Academic Service

Hi, I’m Shuyan, a final-year PhD student of the Language Technologies Institute at Carnegie Mellon University. I am fortunately advised by Graham Neubig.

I work on building autonomous agents that could understand high-level language commands. My goal is to create AI agents that would free human beings from tedious tasks and aid them in better decision makings.

We proposed an intuitive formalism for representing procedures as programs and subsequently applied this concept to broader tasks with large language models (PaL, CoCoGen). We built the first large-scale hierarchical procedural knowledge base. To learn from the knowledge base and generate new and previously unseen procedures, we designed DocPrompting that reads the relevant documentation before taking actions. With the belief of “what I don’t measure, I can’t improve”, we built WebArena, a realistic and reproducible environment for building and evaluating autonomous agents that are guided by high-level natural language commands.

I am best reached by email at


* indicates equal contribution, ^ indicates mentorship

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu
Preprint, 2024
[Paper] [Project Site] [Twitter]

VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, Daniel Fried
Preprint, 2024
[Paper] [Project Site] [Twitter] [WIRED Article]

WebArena: A Realistic Web Environment for Building Autonomous Agents
Shuyan Zhou*, Frank F. Xu*, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig
ICLR, 2024
[Paper][Project Site][Twitter]

DocPrompting: Generating Code by Retrieving the Docs
Shuyan Zhou, Uri Alon, Frank F. Xu, Zhiruo Wang, Zhengbao Jiang, Graham Neubig
ICLR, 2023 (spotlight)
[Paper] [Code+Data]

PaL: Program-aided Language Models
Luyu Gao*, Aman Madaan*, Shuyan Zhou*, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, Graham Neubig
ICML, 2023
[Paper][Project Site][Twitter][Demo]

Hierarchical Prompting Assists Large Language Model on Web Navigation
Abishek Sridhar*, Robert Lo*, Frank F. Xu, Hao Zhu, Shuyan Zhou^
Findings of EMNLP, 2023

CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code
Shuyan Zhou*, Uri Alon*, Sumit Agarwal, Graham Neubig
EMNLP 2023
Deep Learning for Code Workshop at ICLR, 2023 (spotlight)

Execution-Based Evaluation for Open-Domain Code Generation
Zhiruo Wang, Shuyan Zhou, Daniel Fried, Graham Neubig
Findings of EMNLP, 2023
[Paper][Project Site]

Causal Reasoning of Entities and Events in Procedural Texts
Li Zhang*, Hainiu Xu*, Yue Yang, Shuyan Zhou, Weiqiu You, Manni Arora, Chris Callison-Burch
Findings of EACL, 2023

MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages
Zhiruo Wang* , Grace Cuenca*, Shuyan Zhou^, Frank F. Xu, Graham Neubig
Findings of EACL, 2023
[Paper] [Code+Data]

Bridging the gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes, Aman Madaan, Emmy Liu, António Farinhas, Pedro Henrique Martins, Amanda Bertsch, José GC de Souza, Shuyan Zhou, Tongshuang Wu, Graham Neubig, André FT Martins
TACL, 2023

Language Models of Code are Few-Shot Commonsense Learners
Aman Madaan, Shuyan Zhou, Uri Alon, Yiming Yang, Graham Neubig
EMNLP, 2022
[Paper] [Code]

Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data
Shuyan Zhou*, Li Zhang*, Yue Yang, Qing Lyu, Pengcheng Yin, Chris Callison-Burch, Graham Neubig
ACL, 2022
[Paper] [Code+Data] [Demo]

Procedures as Programs: Hierarchical Control of Situated Agents through Natural Language
Shuyan Zhou, Pengcheng Yin, Graham Neubig
Structured and Unstructured Knowledge Integration Workshop at NAACL, 2022

Soft Gazetteers for Low-Resource Named Entity Recognition
Shruti Rijhwani, Shuyan Zhou, Graham Neubig, Jaime Carbonell
ACL, 2020
[Paper] [Code+Data]

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking
Shuyan Zhou, Shruti Rijhwani, John Wieting, Jaime Carbonell, Graham Neubig
TACL, 2020
[Paper] [Code]

Towards Zero-resource Cross-lingual Entity Linking
Shuyan Zhou, Shruti Rijhwani, Graham Neubig
Deep Learning for Low-Resource NLP Workshop at EMNLP, 2019
[Paper] [Code]

Improving Robustness of Neural Machine Translation with Multi-task Learning
Shuyan Zhou, Xiangkai Zeng, Yingqi Zhou, Antonios Anastasopoulos, Graham Neubig
Conference on Machine Translation (WMT), 2019
[Paper] [Code]

Aggregated Semantic Matching for Short Text Entity Linking
Feng Nie, Shuyan Zhou, Jing Liu, Jinpeng Wang, Chin-Yew Lin, Rong Pan
CoNLL, 2018

Academic Service



Master → Ph.D. of Language Technologies, Carnegie Mellon University
2018.08 - Present
Advisor: Graham Neubig

Ph.D. Resident, X, the moonshot factory
2022.05 - 2022.08
Host: Alex Polozov

Research Intern, Microsoft
2020.05 - 2020.08
Host: Kaushik Chakrabarti

Research Intern, Microsoft Research Asia
2017.07 - 2018.06
Host: Chin-Yew Lin