AI alignment

Singularitarianism | Philosophy of artificial intelligence

In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems towards their designers’ intended goals and interests. An aligned AI system advances the intended objective; a misaligned AI system is competent at advancing some objective, but not the intended one. AI systems can be challenging to align and misaligned systems can malfunction or cause harm. It can be difficult for AI designers to specify the full range of desired and undesired behaviors. Therefore, they use easy-to-specify proxy goals that omit some desired constraints. However, AI systems exploit the resulting loopholes. As a result, they accomplish their proxy goals efficiently but in unintended, sometimes harmful ways (reward hacking). AI systems can also develop unwanted instrumental behaviors such as seeking power, as this helps them achieve their given goals. Furthermore, they can develop emergent goals that may be hard to detect before the system is deployed, facing new situations and data distributions. These problems affect existing commercial systems such as robots, language models, autonomous vehicles, and social media recommendation engines. However, more powerful future systems may be more severely affected since these problems partially result from high capability. The AI research community and the United Nations have called for technical research and policy solutions to ensure that AI systems are aligned with human values. AI alignment is a subfield of AI safety, the study of building safe AI systems. Other subfields of AI safety include robustness, monitoring, and capability control. Research challenges in alignment include instilling complex values in AI, developing honest AI, scalable oversight, auditing and interpreting AI models, as well as preventing emergent AI behaviors like power-seeking. Alignment research has connections to interpretability research, robustness, anomaly detection, calibrated uncertainty, formal verification, preference learning, safety-critical engineering, game theory, algorithmic fairness, and the social sciences, among others. (Wikipedia).

AI for Engineers: Building an AI System

Artificial intelligence (AI) is a simulation of intelligent human behavior. It is designed to perceive its environment, make decisions, and take action. Get an overview of AI for engineers, and discover the ways in which artificial intelligence fits into an engineering workflow. You’ll lea

From playlist 深度学习（Deep Learning）

SLT and Alignment Pt 1 - Singular Learning Theory Seminar 37

Dan Murfet sets up the background material for discussing the connection between Singular Learning Theory and AI alignment. This includes: - A brief sketch of what AI safety and alignment are, - The idea of "emergent logic" which might be the basis for interacting in a principled way with

From playlist Singular Learning Theory

Dangers of an AI Race

Recent years have seen rapid growth in artificial intelligence (AI), which militaries around the world are adopting into their forces. AI has value across a range of military applications but also brings risks. AI systems have significant safety and security vulnerabilities. Expert Paul Sc

From playlist Global Security, Then and Now

How to get an AI Assistant Project into Market | Claudia Davila Rios

From playlist Level 3 AI Assistant Conference 2020

(Ep #1 - Rasa Masterclass) Intro to conversational AI and Rasa | Rasa 1.8.0

For more up to date information, check out the Conversational AI with Rasa series: https://www.youtube.com/playlist?list=PL75e0qA87dlEjGAc9j9v3a5h1mxI2Z9fi In this episode you will learn what are the contextual AI assistants and how they differ from other types of assistants. You will als

From playlist Rasa Masterclass: Developing Contextual AI assistants with Rasa tools

Building Trustworthy AI Solutions

As with human-to-human relations, humans and AI solutions work best together when trust exists. To build long-term use of your AI solution, trust needs to be established between the user and the solution. This is not as easy as it sounds, especially since humans can make trust-based decisi

From playlist Introduction to Robust & Adversarial AI

The Ultimate Guide to AI Infrastructure in 2022

With hundreds of AI/ML infrastructure tools on the market, how do you make sense of it all? Which parts of the stack are mature? How do you know where to invest your precious time and resources to get the best results for your rapidly growing team of data engineers and data scientists? At

From playlist Ace Your Data Science Interview

Artificial Intelligence, ethics and the law: What challenges? What opportunities?

Artificial Intelligence (AI) is no longer sci-fi. From driverless cars to the use of machine learning algorithms to improve healthcare services and the financial industry, AI and algorithms are shaping our daily practices, and a fast-growing number of fundamental aspects of our societies.

From playlist AI at the Turing

Human-compatible artificial intelligence - Stuart Russell, University of California

It is reasonable to expect that AI capabilities will eventually exceed those of humans across a range of real-world-decision making scenarios. Should this be a cause for concern, as Alan Turing and others have suggested? Will we lose control over our future? Or will AI complement and augme

From playlist Interpretability, safety, and security in AI

668: GPT-4: Apocalyptic stepping stone? — with Jeremie Harris

#AISafety #GPT4 #AIAlignment AI risks, RLHF, and inner alignment: GPT stands to give the business world a major boost. But with everyone racing either to develop products that incorporate GPT or use it to carry out critical tasks, what dangers could lie ahead in working with a tool that a

From playlist Super Data Science Podcast

Building Responsible AI: best practices across the product development lifecycle

Everyone seems to be talking about responsible AI these days—but what does “responsible” actually mean, and how should AI/ML product teams incorporate ethics into the development lifecycle? This talk will focus on the organizational processes that support the development of responsible AI

From playlist Social and Ethical AI

Stanford Seminar - The Stark Future of Trust Online

Mor Naaman Cornell Tech November 1, 2019 Trust is what enables our society to function, from supporting interpersonal transactions to providing the very foundation of our democracy. How trust is established online is therefore a key question for HCI to understand and address, especially a

From playlist Stanford Seminars

Eliezer Yudkowsky – AI Alignment: Why It's Hard, and Where to Start

On May 5, 2016, Eliezer Yudkowsky gave a talk at Stanford University for the 26th Annual Symbolic Systems Distinguished Speaker series (https://symsys.stanford.edu/viewing/event/26580). Eliezer is a senior research fellow at the Machine Intelligence Research Institute, a research nonprofi

From playlist Machine Learning Shorts

The AI Buzz, Episode #2: Big data, Reinforcement Learning and Aligning Models

The AI Buzz is a conversation about the latest trends in AI between me and Luca Antiga, the Chief Technology Officer at Lightning AI. We talk about what's new and why it has the potential to change everything. And, because it's StatQuest, we'll go the extra mile to make sure everything is

From playlist The AI Buzz with Luca and Josh

Stanford Seminar - How can you trust machine learning? Carlos Guestrin

Carlos Guestrin, Stanford University May 11, 2022 Machine learning (ML) and AI systems are becoming integral parts of every aspect of our lives. The definition, development and deployment of these systems are driven by (complex) human choices. And, as these AIs are making more and more de

From playlist Stanford CS521 - AI Safety Seminar

The AI Buzz, Episode #5: A new wave of AI-based products and the resurgence of personal applications

In this episode, Luca and I talk about Sarah Guo's advice to AI Entrepreneurs, Aligning models to customer needs, Luca's predictions about the future of AI and Programing without Programming, or Automation for Everyone. Also, if you want to learn more, check out our Read Log: https://light

From playlist The AI Buzz with Luca and Josh

SDS 565: AGI: The Apocalypse Machine — with Jeremie Harris

#ArtificialGenerlaIntelligence #AGIApocalypse #AISafety In this episode, Jeremie Harris dives into the stirring topic of AI Safety and the existential risks that Artificial General Intelligence poses to humankind. This episode is brought to you by Neptune Labs, the metadata store for MLO

From playlist Super Data Science Podcast

Stanford Webinar - How to Align Your Organization to Execute Strategy

Today’s dynamic, technology-infused world offers limitless opportunities for bringing new ideas to your customers, whether in a for-profit business, non-profit entity or government agency. But even the most forward-thinking leaders with clear strategic visions can fail to see their visions

From playlist Stanford Webinars

Why Superintelligent AI Could Be the Last Human Invention | Max Tegmark /big think

Why Superintelligent AI Could Be the Last Human Invention New videos DAILY: https://bigth.ink Join Big Think Edge for exclusive video lessons from top thinkers and doers: https://bigth.ink/Edge ---------------------------------------------------------------------------------- Max Tegmark

From playlist The future: artificial intelligence | Big Think

AI alignment

Related pages