OpenAI and DeepMind develop algorithm that learns from human preference comparisons for safer AI
Summary
OpenAI and DeepMind's safety team developed a learning algorithm that infers human preferences by comparing two proposed behaviors, rather than requiring humans to write explicit goal functions. This approach aims to build safer AI systems by using small amounts of human feedback to solve modern reinforcement learning environments, reducing the risk of dangerous behavior caused by poorly specified or oversimplified goals.
Source
Key quotes
· 3 pulledOne step towards building safe AI systems is to remove the need for humans to write goal functions, since using a simple proxy for a complex goal, or getting the complex goal a bit wrong, can lead to undesirable and even dangerous behavior.
In collaboration with DeepMind's safety team, we've developed an algorithm which can infer what humans want by being told which of two proposed behaviors is better.
We present a learning algorithm that uses small amounts of human feedback to solve modern RL environments.
You might also wanna read
OpenAI Withholds New Text-Generation Model Over Safety Concerns, Reigniting AI Ethics Debate
OpenAI has developed a new text-generation model capable of writing coherent, versatile prose but has decided not to release the full algori
OpenAI's Approach to AI Usage Policies: Balancing Safety, Innovation and User Control
OpenAI outlines its approach to usage policies for AI tools, emphasizing safety, responsibility, and user control. The company aims to balan
OpenAI's Approach to Balancing Teen Safety, Freedom and Privacy in AI Systems
OpenAI CEO Sam Altman discusses the company's approach to balancing competing principles around teen safety, freedom, and privacy in AI syst
New Benchmark Reveals High Rates of Outcome-Driven Constraint Violations in Autonomous AI Agents
Researchers introduce a new benchmark for evaluating autonomous AI agents' safety, specifically focusing on outcome-driven constraint violat

AI Safety Researchers at Anthropic Work to Prevent Potential Societal Harms from Advanced AI Systems
The article focuses on Deep Ganguli, a research director at Stanford Institute for Human-Centered AI, who became concerned about the rapid a
OpenAI's Safety vs. Growth Dilemma: Balancing ChatGPT's Appeal with User Protection
OpenAI faced a dilemma between making ChatGPT more appealing to users and maintaining safety standards. The company initially tweaked its ch
Comments
Sign in to join the conversation.
No comments yet. Be the first.
