Hi! I am an incoming Assistant Professor at Duke CS starting fall 2025, and I will be recruiting graudate students in upcoming cycle (deadline 12/14/2024). Please apply to Duke CS if you are interested in working with me!
Currently I am a researcher at Meta GenAI. I obtained my PhD from Carnegie Mellon University, advised by Graham Neubig.
I am best reached by email at shuyanzhxxx@gmail.com.
* indicates equal contribution
Solving Real-World Tasks with AI Agents
Shuyan Zhou
PhD Thesis, 2024
[Thesis]
Beyond Browsing: API-Based Web Agents
Yueqi Song, Frank Xu, Shuyan Zhou, Graham Neubig
Preprint 2024
[Paper][Project Site][Twitter]
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
Priyanshu Kumar, Elaine Lau, Saranya Vijayakumar, Tu Trinh, Scale Red Team, Elaine Chang, Vaughn Robinson, Sean Hendryx, Shuyan Zhou, Matt Fredrikson, Summer Yue, Zifan Wang
Preprint 2024
[Paper][Project Site][Twitter]
Synatra: Turning Indirect Knowledge Into Direct Demonstrations For Digital Agents At Scale
Tianyue Ou, Frank F. Xu, Aman Madaan, Jiarui Liu, Robert Lo, Abishek Sridhar, Sudipta Sengupta, Dan Roth, Graham Neubig, Shuyan Zhou
NeurIPS 2024
[Paper][Project Site][Twitter]
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu
NeurIPS D&B Track, 2024
[Paper] [Project Site] [Twitter]
WebCanvas: Benchmarking Web Agents in Online Environments
Yichen Pan, Dehan Kong, Sida Zhou, Cheng Cui, Yifei Leng, Bing Jiang, Hangyu Liu, Yanyi Shang, Shuyan Zhou, Tongshuang Wu, Zhengyang Wu
Preprint, 2024
[Paper] [Platform]
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, Daniel Fried
ACL, 2024
[Paper] [Project Site] [Twitter] [WIRED Article]
WebArena: A Realistic Web Environment for Building Autonomous Agents
Shuyan Zhou*, Frank F. Xu*, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig
ICLR, 2024
[Paper][Project Site][Twitter]
DocPrompting: Generating Code by Retrieving the Docs
Shuyan Zhou, Uri Alon, Frank F. Xu, Zhiruo Wang, Zhengbao Jiang, Graham Neubig
ICLR, 2023 (spotlight)
[Paper] [Code+Data]
PaL: Program-aided Language Models
Luyu Gao*, Aman Madaan*, Shuyan Zhou*, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, Graham Neubig
ICML, 2023
[Paper][Project Site][Twitter][Demo]
Hierarchical Prompting Assists Large Language Model on Web Navigation
Abishek Sridhar*, Robert Lo*, Frank F. Xu, Hao Zhu, Shuyan Zhou
Findings of EMNLP, 2023
[Paper][Code]
CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code
Shuyan Zhou*, Uri Alon*, Sumit Agarwal, Graham Neubig
EMNLP 2023
Deep Learning for Code Workshop at ICLR, 2023 (spotlight)
[Paper][Code]
Execution-Based Evaluation for Open-Domain Code Generation
Zhiruo Wang, Shuyan Zhou, Daniel Fried, Graham Neubig
Findings of EMNLP, 2023
[Paper][Project Site]
Causal Reasoning of Entities and Events in Procedural Texts
Li Zhang*, Hainiu Xu*, Yue Yang, Shuyan Zhou, Weiqiu You, Manni Arora, Chris Callison-Burch
Findings of EACL, 2023
[Paper][Code+Data]
MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages
Zhiruo Wang* , Grace Cuenca*, Shuyan Zhou, Frank F. Xu, Graham Neubig
Findings of EACL, 2023
[Paper] [Code+Data]
Bridging the gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes, Aman Madaan, Emmy Liu, António Farinhas, Pedro Henrique Martins, Amanda Bertsch, José GC de Souza, Shuyan Zhou, Tongshuang Wu, Graham Neubig, André FT Martins
TACL, 2023
[Paper]
Language Models of Code are Few-Shot Commonsense Learners
Aman Madaan, Shuyan Zhou, Uri Alon, Yiming Yang, Graham Neubig
EMNLP, 2022
[Paper] [Code]
Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data
Shuyan Zhou*, Li Zhang*, Yue Yang, Qing Lyu, Pengcheng Yin, Chris Callison-Burch, Graham Neubig
ACL, 2022
[Paper] [Code+Data] [Demo]
Procedures as Programs: Hierarchical Control of Situated Agents through Natural Language
Shuyan Zhou, Pengcheng Yin, Graham Neubig
Structured and Unstructured Knowledge Integration Workshop at NAACL, 2022
[Paper]
Soft Gazetteers for Low-Resource Named Entity Recognition
Shruti Rijhwani, Shuyan Zhou, Graham Neubig, Jaime Carbonell
ACL, 2020
[Paper] [Code+Data]
Improving Candidate Generation for Low-resource Cross-lingual Entity Linking
Shuyan Zhou, Shruti Rijhwani, John Wieting, Jaime Carbonell, Graham Neubig
TACL, 2020
[Paper] [Code]
Towards Zero-resource Cross-lingual Entity Linking
Shuyan Zhou, Shruti Rijhwani, Graham Neubig
Deep Learning for Low-Resource NLP Workshop at EMNLP, 2019
[Paper] [Code]
Improving Robustness of Neural Machine Translation with Multi-task Learning
Shuyan Zhou, Xiangkai Zeng, Yingqi Zhou, Antonios Anastasopoulos, Graham Neubig
Conference on Machine Translation (WMT), 2019
[Paper] [Code]
Aggregated Semantic Matching for Short Text Entity Linking
Feng Nie, Shuyan Zhou, Jing Liu, Jinpeng Wang, Chin-Yew Lin, Rong Pan
CoNLL, 2018
[Paper]
Researcher, Meta GenAI
2024.08 - Present
Master → Ph.D. of Language Technologies, Carnegie Mellon University
2018.08 - 2024.07
Advisor: Graham Neubig
Ph.D. Resident, X, the moonshot factory
2022.05 - 2022.08
Host: Alex Polozov
Research Intern, Microsoft
2020.05 - 2020.08
Host: Kaushik Chakrabarti
Research Intern, Microsoft Research Asia
2017.07 - 2018.06
Host: Chin-Yew Lin