Four lenses on AI risks

All powerful new technologies create both benefits and risks: cars, planes, drugs, radiation. AI is on a trajectory to become one of the most powerful technologies we possess; in some scenarios, it becomes by far the most powerful. It therefore will create both extraordinary benefits and extraordinary risks.

What are the risks? Here are several lenses for thinking about AI risks, each putting AI in a different reference class.

As software

AI is software. All software has bugs. Therefore AI will have bugs.

The more complex software is, and the more poorly we understand it, the more likely it is to have bugs. AI is so complex that it cannot be designed, but only “trained”, which means we understand it very poorly. Therefore it is guaranteed to have bugs.

You can find some bugs with testing, but not all. Some bugs can only be found in production. Therefore, AI will have bugs that will only be found in production.

We should think about AI as complicated, buggy, code, especially to the extent that it is controlling important systems (vehicles, factories, power plants).

As a complex system

The behavior of a complex system is highly non-linear, and it is difficult (in practice impossible) to fully understand.

This is especially true of the system’s failure modes. A complex system, such as the financial system, can seem stable but then collapse quickly and with little warning.

We should expect that AI systems will be similarly hard to predict and could easily have similar failure modes.

As an agent with unaligned interests

Today’s most advanced AIs—chatbots and image generators—are not autonomous agents with goal-directed behavior. But such systems will inevitably be created and deployed.

Anytime you have an agent acting on your behalf, you have a principal–agent problem: the agent is ultimately pursuing their goals, and it can be hard to align those goals with your own.

For instance, the agent may tell you that it is representing your interests while in truth optimizing for something else, like a demagogue who claims to represent the people while actually seeking power and riches.

Or the agent can obey the letter of its goals while violating the spirit, by optimizing for its reward metrics instead of the wider aims those metrics are supposed to advance. An example would be an employee who aims for promotion, or a large bonus, at the expense of the best interests of the company. Referring back to the first lens, AI as software: computers always do exactly what you tell them, but that isn’t always exactly what you want.

Related: any time you have a system of independent agents pursuing their own interests, you need some rules for how they behave to prevent ruinous competition. But some agents will break the rules, and no matter how much you train them, some will learn “follow these rules” and others will simply learn “don’t get caught.”

People already do all of these things: lie, cheat, steal, seek power, game the system. In order to counteract them, we have a variety of social mechanisms: laws and enforcement, reputation and social stigma, checks and balances, limitations on power. At minimum, we shouldn’t give AI any more power or freedom, with any less scrutiny, than we would give a human.

As a separate, advanced culture or species

In the most catastrophic hypothesized AI risk scenarios, the AI acts like a far more advanced culture, or a far more intelligent species.

In the “advanced culture” analogy, AI is like the expansionary Western empires that quickly dominated all other cultures, even relatively advanced China. (This analogy has also been used to hypothesize what would happen on first contact with an advanced alien species.) The best scenario here is that we assimilate into the advanced culture and gain its benefits; the worst is that we are enslaved or wiped out.

In the “intelligent species“ analogy, the AI is like humans arriving on the evolutionary scene and quickly dominating Earth. The best scenario here is that we are kept like pets, with a better quality of life than we could achieve for ourselves, even if we aren’t in control anymore; the worst is that we are exploited like livestock, exterminated like pests, or simply accidentally driven extinct through neglect.

These scenarios are an extreme version of the principal-agent problem, in which the agent is far more powerful than the principal.

How much you are worried about existential risk from AI probably depends on how much you regard these scenarios as “far-fetched” vs. “obviously how things will play out.”

I don’t yet have solutions for any of these, but I find these different lenses useful both to appreciate the problem and take it seriously, and to start learning from the past in order to find answers.

I think these lenses could also be useful to help find cruxes in debates. People who disagree about AI risk might disagree about which of these lenses they find plausible or helpful.

Comment on

Progress Forum, LessWrong, Reddit

« Why consumerism is good actually Links and tweets, 2023-04-05 »