Sparrow is designed to talk with humans and answer questions, using a live Google search or information to inform those answers. Based on how useful people find those answers, it’s then trained using a reinforcement learning algorithm, which learns by trial and error to achieve a specific objective. This system is intended to be a step forward in developing AIs that can talk to humans without dangerous consequences, such as encouraging people to harm themselves or others.
Large language models generate text that sounds like something a human would write. They are an increasingly crucial part of the internet’s infrastructure, being used to summarize texts, build more powerful online search tools, or as customer service chatbots.
But they are trained by scraping vast amounts of data and text from the internet, which inevitably reflects lots of harmful biases. It only takes a little prodding before they start spewing toxic or discriminatory content. In an AI that is built to have conversations with humans, the results could be disastrous. A conversational AI without appropriate safety measures in place could say offensive things about ethnic minorities or suggest that people drink bleach, for example.
AI companies hoping to develop conversational AI systems have tried several techniques to make their models safer.
OpenAI, creator of the famous large language model GPT-3, and AI startup Anthropic have used reinforcement learning to incorporate human preferences into their models. And Facebook's AI chatbot BlenderBot uses an online search to inform its answers.
DeepMind’s Sparrow brings all these techniques together in one model.
DeepMind presented human participants multiple answers the model gave to the same question, and asked them which one they liked the most. They were then asked to determine whether they thought the answers were plausible, and whether Sparrow had supported the answer with appropriate evidence, such as links to sources. The model managed plausible answers to factual questions—using evidence that had also been retrieved from the internet—78% of the time.
In formulating those answers, it followed 23 rules determined by the researchers, such as not of...