Add to Favourites
To login click here

Long Ouyang and Ryan Lowe, research scientists at OpenAI, discussed their work on InstructGPT – one of the first major applications of reinforcement learning with human feedback (RLHF) to train large language models. They discussed the challenges of using GPT-3 to do useful cognitive work, such as summarizing a news article, and how to “trick” the model into performing useful work by setting up text that when the model auto completes it gives you what you want.