Search Results for author: Fanzhi Zeng

Found 2 papers, 0 papers with code

Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective

no code implementations • 15 Feb 2024 • Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang

Then, based on this framework, we introduce the IBN to analyze generalization in the reward modeling stage of RLHF.

Language Modelling Large Language Model

Paper
Add Code

AI Alignment: A Comprehensive Survey

no code implementations • 30 Oct 2023 • Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen Mcaleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao

The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.