Reinforcement Learning from Human Feedback

Nathan Lambert / HuggingFace

Mar 09, 2023

Abstract: With the emergence of impressive and general-purpose chatbots, increasingly attention has concentrated on a specific training method for both Anthropic’s Claude and OpenAI's ChatGPT — reinforcement learning from human feedback (RLHF). RLHF works by using a traditional RL optimizer to fine-tune a language model policy with respect to another learned reward model. This talk will cover two parts: first, a review of RLHF and common problems in training these systems and second, a review of progress in recreating high-impact RLHF artifacts in open and academic labs. This will cover base models, preference and instruction datasets, RL training code, and more. The talk will conclude with a discussion of the best opportunities for academic researchers in the area

Bio: Nathan Lambert is a research scientist and RLHF team lead at HuggingFace. He received his PhD from the University of California, Berkeley working at the intersection of machine learning and robotics. He was advised by Professor Kristofer Pister in the Berkeley Autonomous Microsystems Lab and Roberto Calandra at Meta AI Research. He was lucky to intern at Facebook AI and DeepMind during his Ph.D. Nathan was was awarded the UC Berkeley EECS Demetri Angelakos Memorial Achievement Award for Altruism for his efforts to better community norms.