8000 ai-safety-dance/assets/p1.md at main · hackclub/ai-safety-dance · GitHub
[go: up one dir, main page]

Skip to content

Latest commit

 

History

History
1385 lines (871 loc) · 106 KB

p1.md

File metadata and controls

1385 lines (871 loc) · 106 KB

THIS IS A BETA DRAFT, DO NOT SHARE PUBLICLY YET PLZ, THX

(Hey - if you got directly linked to this page, you should probably start with the Introduction!)

First, a quick overview of AI past, present, and (possible) futures:

The Past:

  • Before 2000: AI with super-human logic, but no intuition.
    • (and safety problems with AI Logic)
  • After 2000: AI that can learn “intuition”, but has poor logic.
    • (and safety problems with AI "Intuition")

The Present:

  • The arms race to milk current AI methods
  • The quest to merge AI logic and intuition
  • The awkward alliances in AI Safety

The Possible Futures:

  • Timelines: When will we get “human-level general AI”, if ever?
  • Takeoffs: How quickly will AI self-improve?
  • Trajectories: Are we on track to the Good Place or the Bad Place?

Let's begin!


⌛️ The Past

Computer Science was the only science to start with its Theory of Everything.1

In 1936, a gay, British, Nazi-fighting codebreaker named Alan Turing invented the "universal computer".2 For his next trick, in 1950, he proposed a wacky thought experiment: what if a computer could "pass as human" in a text-based conversation?3 In the summer of 1956, inspired by Turing's work, a bunch of researchers gathered4 to found a new field they named:

“ARTIFICIAL INTELLIGENCE”

(Confession: There is no rigorous definition of "Artificial Intelligence". Honestly, "AI" is mostly a term people use to hype up whatever software they're selling. I recently saw a news clip showing a South Korean "Beauty AI".5 It's a camera that measures your skin tone, then recommends a makeup foundation for your skin tone. It's a color picker. That's what mainstream news calls "AI".)

Riff of the "Is this a pigeon?" meme. The Robot Catboy Maid is gesturing at a butterfly labeled "literally any piece of software", while he asks: "Is this AI?"

(So, if it helps make things clearer for you, mentally replace every mention of "AI" with "a piece of software". Relatedly, :I'll mostly avoid the word "intelligence", and instead say "capabilities". [👈 click to expand])

Anyway! To vastly oversimplify the history of AI & AI Safety, these fields had two main eras:

Before 2000: AI with super-human logic, but no intuition. (also called "Symbolic AI")

Safety problems with AI Logic:

  • Usually accomplishes goals in logical-but-unwanted ways.
  • Doesn't understand common-sense or humane values.
  • According to game theory, most AI goals logically lead to sub-goals like "resist shutdown" or "grab resources".

After 2000: AI that can learn general "intuition", but has poor logic. (also called "Deep Learning")

Safety problems with AI "Intuition":

  • AI learns our biases, prejudices, inhumanity.
  • AI "intuition" breaks easily, sometimes in dangerous ways.
  • AI is a "black box": we can't understand or verify what it's doing.

Timeline of AI. Before the year 2000, AI was mostly "logic". From 2000 to now, AI is mostly "intuition". In the future, it could be both?

(Bonus, click to expand - :a more precise decade-by-decade timeline)

Now, let's look at the Before-2000 days, Ye Olde Artificial Intelligence...

:x Capabilities Not Intelligence

Some of the many problems of the word "intelligence", especially when applied to AI:

  • It's vague.
  • It's anthropomorphic.
  • It implies consciousness/sentience.
  • It has moral connotations for some reason?
  • There's a lot of baggage & misconceptions with the word.
  • It also lets you weasel your way out of falsified predictions, like, "oh AI beat Go? I guess Go wasn't a benchmark of true intelligence" bla bla bla.

The word "capability" is more concrete and touches grass, so I'll mostly use that instead. Hat tip to Victoria Krakovna (2023) for this idea.

:x Decades

(Note: This section isn't necessary to understand Part One, it's just here for completion.)

  • 1940: The precursors to AI, including "Cybernetics" & the first artificial computer neuron.
  • 1950: The "official" start of AI!
  • 1950-60: The rise of Symbolic AI. (AI that's all logic, no intuition)
  • 1970: The first AI Winter. (funding & interest dried up)
  • 1980: Re-emergence of Symbolic AI. (this time re-branded as "Expert Systems")
    • Meanwhile, quietly in the background, the foundations were being built for "deep learning": AI that's all "intuition", but poor logic.
  • 1990: The second AI Winter.
  • 2000: The rise of machine learning. (AI that learns)
  • 2010: The rise of deep learning. (Neural network-based AI that learns)
  • 2020: Deep learning goes mainstream! (ChatGPT, DALL-E, etc)

Before 2000: Logic, Without Intuition

The ~1950's to the ~1990's were the days of Symbolic AI: AI that followed formal, logical rules.

(Nowadays it's also called Good Ol' Fashioned AI (GOFAI). Of course, it wasn't called that at the time.)

In the Symbolic AI mindset, here's how you'd make an AI:

  • Step 1: Write down step-by-step rules on how to solve a problem.
  • Step 2: Make a computer follow those steps really fast.

For example, you'd tell a chess AI to consider all possible moves, all possible counter-moves, all possible counter-counter-moves, etc down to a few levels, then pick the next move that leads to the best potential outcome.

NOTE: This is NOT how human chess experts actually play chess. It turns out that chess — and much scientific & mathematical discovery — actually relies a lot on intuition. ("Intuition", as I'm loosely using it here, is thinking that doesn't seem step-by-step, but instead comes to us all-at-once.)

AI's lack of "intuition" was one main reason Symbolic AI had no huge success cases for decades. There were even two "AI Winters" before the year 2000, when funding & interest in AI dried up.

But one day, there was a hit! In 1997, IBM's supercomputer, Deep Blue, beat the world's chess champion, Garry Kasparov. Here’s the photo of humanity losing its crown:6

Still photo of Garry Kasparov resigning in his final game versus Deep Blue (Watch this, Lise – you can actually pinpoint the second humanity lost another claim to being a special cosmic snowflake. Aaaaaand, now!7)

Folks were hyped and scared. If machines could beat us at chess, the very cliché of human intellect... what'll happen next? A Star Trek post-scarcity utopia? Robot takeover à la Terminator?

What happened next was... pretty much nothing, for a decade and a half.

As mentioned earlier, Symbolic AI was severely limited by its lack of "intuition". Contrary to leading experts' predictions of "Human-level AI" before 1980(!!),8 AI couldn't even recognize pictures of cats. In fact, AI wouldn’t be able to match the average human at recognizing pictures of cats until 20209 — over two decades after Deep Blue beat the world champion at chess.

Cats... are harder than chess.

How can that be? To understand this paradox, consider this cat:

Photo of the best kitty cat in the whole dang world(consider them.)

What were the step-by-step rules you used to recognize that as a cat?

Weird question, right? You didn’t use step-by-step rules, it just... came to you all at once.

  • "Logic": step-by-step cognition, like solving math problems.
  • "Intuition": all-at-once recognition, like seeing a cat.

("Logic and Intuition" will be more precisely explained — and tied to human psychology! — later on in Part One.)

That’s the problem with Symbolic AI: it requires writing down step-by-step rules. Outside of well-defined tasks like chess, we usually don’t even consciously know what rules we’re using. That’s why Symbolic AI failed at understanding images, sounds, speech, etc. For a lack of a better word, AI had no "intuition".

Hang on, let’s try the cat question again? This time, with a simpler drawing:

Minimalist doodle of a cat

What rules did you use to recognize that as a cat?

Okay, you may think, this is easier. Here’s my rule: I recognize something as “cat-like” if it’s got a round shape, with two smaller round shapes on it (eyes), and two pointy shapes on top (ears).

Great! By that definition, here’s a cat:

Abstract, bizarre line drawing that, technically, matches the above rule.

We could keep going back & forth, and eventually you may find a robust, two-page-long set of rules on how to recognize cats... but then you’d have to repeat the same process with thousands of other objects.

That's the ironic thing about AI. "Hard" tasks are easy to write step-by-step rules for, "Easy" tasks are practically impossible to write step-by-step rules for:

Ham the Human showing RCM a small list of instructions, saying, "These are the rules for doing differential calculus..." RCM says, "Affirmative."

Ham the Human shows RCM a long list of instructions that goes off-screen, saying, "and THESE are the rules for how to walk around a room without bumping into stuff." RCM screams, "HOLY [REDACTED]"

This is called Moravec’s Paradox.10 Paraphrased, it says:

What’s hard for humans is easy for AI; What’s easy for humans is hard for AI.

Why? Because what's "easy" for us — recognizing objects, walking around — is the hard work of 3.5 billion years of evolution, swept under the carpet of our subconscious. It's only when we do things that are outside of our evolutionary past, like math, that it consciously feels hard.

Under-estimating how hard it is for an AI to do "easy" things — like recognize cats, or understand common sense or humane values — is exactly what led to the earliest concerns of AI Safety...


🤔 Review #1 (OPTIONAL!)

Want to actually remember what you've read here, instead of forgetting it all 2 days later? Here's an optional flashcard review for you!

(:Learn more about "Spaced Repetition" flashcards. Alternatively, download all of Part One's cards as an Anki deck)


Early AI Safety: Problems with Logic

If I had to unfairly pick one person as the "founder" of AI Safety, it'd be science fiction author Isaac Asimov, with his short stories from 1940-50 collected in I, Robot. No, not the Will Smith movie.11

Photo of Will Smith slapping the original "I, Robot" story collection by Isaac Asimov

Asimov's I, Robot was nuanced. He wrote it to: 1) Show the real potential good in robotics, and counter people's "Frankenstein complex" fears, yet 2) Show how easy it is for a "code of ethics for AI" to logically lead to unwanted consequences.

During the Symbolic AI era, folks mostly thought about AI as pure logic. That's why early AI Safety was also mostly focused on problems with pure logic. Like:

  1. AI won't understand common sense or humane values.
  2. AI will achieve goals in logical-but-unwanted ways.
  3. According to game theory, almost all goals for AI logically lead it to resist shutdown & grab resources.

These problems will be explained in-depth in Part Two! But for now, a quick summary:

1. No common sense.

If we can't even figure out how to tell an AI how to recognize cats, how can we give AI "common sense", let alone understand "humane values"?

From this lack of common sense, we get:

2. The "Ironic Wish" problem.

“Be careful what you wish for, you just might get it.” If we give an AI a goal or "code of ethics", it could obey those rules in a way that's logically correct, but very unwanted. This is called specification gaming, and it's already been happening to AIs for decades. (For example, over twenty years ago, an AI told to design a 'clock' circuit, designed an antennae that picked up 'clock' signals from other computers.12)

The extra-ironic thing is, we want AI to come up with unexpected solutions! That's what they're for. But you can see how hard, even paradoxical, the ask we're making is: "hey, give us an unexpected solution, but in the way we expected".

Here's some rules you'd think would lead to humane-with-an-e AI, but if taken literally, would go awry:

  • "Make humans happy" → Doctor-Bot surgically floods your brain with happy chemical-signals. You end up grinning at a wall all day.
  • "Don't harm humans without their consent" → Firefighter-Bot refuses to pull someone out of a burning wreck, because it'll dislocate their shoulder. The human can't be asked to consent to it, because they're unconscious.
  • "Obey the law" → Governments & corporations find loopholes in the law all the time. Also, many laws are unjust.
  • "Obey this religious / philosophical / constitutional text" or "Follow this list of virtues" → As history shows: give 10 people the same text and they'll interpret it 11 different ways.
  • "Follow common sense" or "Follow expert consensus" → "Slavery is natural and good" used to be common sense and expert consensus and the law. An AI told to follow common-sense/experts/law would've fought for slavery two centuries ago... and would fight for any unjust status-quos now.

(Important note! That last example proves: even if we got an AI to learn "common sense", that could still lead to an unsafe, unethical AI... because a lot of factually/morally wrong ideas are "common sense".)

However, there's another safety problem with AI Logic, only recently discovered, that deserves more mainstream attention:

3. Almost all goals logically lead to grabbing resources & resisting shutdown.

According to game theory (the mathematics of how "goal-having agents" would behave), almost all goals logically lead to a common set of unsafe sub-goals, such as resisting shutdown or grabbing resources.

This problem is called "Instrumental Convergence", because sub-goals are also called "instrumental goals", and the idea is that many goals logically "converge" on them.13 Look, never let an academic name your kid.

This idea will be explained more in Part Two, but for now, an illustrative story:

Once upon a time, an advanced (but not super-human) AI was given a seemingly innocent goal: calculate digits of pi.

Things starts reasonably. The AI writes a program to calculate digits of pi. Then, it writes more and more efficient programs, to better calculate digits of pi.

Eventually, the AI (correctly!) deduces that it can maximize calculations by getting more computational resources. Maybe even by stealing them. So, the AI hacks the computer it's running on, escapes onto the internet via a computer virus, and hijacks millions of computers around the world, all as one massively connected bot-net... just to calculate digits of pi.

Oh, and the AI (correctly!) deduces it can't calculate pi if the humans shut it down, so it decides to hold a few hospitals & power grids hostage. Y'know, as "insurance".

As thus the Pi-pocalypse was born. The End.

An evil Pi Creature laughing maniacally

Point is, a similar logic holds for most goals, since "can't do [X] if shut down" & "can do [X] better with more resources" is usually true. Thus, most goals "converge" on these same unsafe sub-goals.

IMPORTANT NOTE: Contrary to what many AI Safety skeptics believe, this Instrumental Convergence argument does not depend on "super-human intelligence" or "human-like desire to survive & dominate".

It's just a simple logical accident.


🤔 Review #2 (again, optional)


To recap, all the early AI Safety concerns came from this: We can't write down all the step-by-step logical rules for common sense & humane values. (Heck, we can't even do it for recognizing cats!)

So... what if, instead of trying to give an AI all the rules, we gave an AI simple rules to learn the rest of the rules for itself?

Enter the era of "Deep Learning"...

After 2000: Intuition, Without Logic

Okay "after 2000" is a lie. Let's go back to 1943.

You know how most new technologies at least build upon old technologies? That is not at all how Deep Learning happened. Deep Learning built upon none of the half-century of hard work of Symbolic AI. In fact, the story of Deep Learning started before Symbolic AI, then remained the oft-ignored underdog for over half a century.

In 1943, before the term "Artificial Intelligence" was even coined, Warren McCulloch and Walter Pitts invented the "Artificial Neural Network" (ANN).14 The idea was simple — we'll get a computer to think like a human brain by, well, approximating a human brain:

A diagram of biological neural networks. Input: your sensory organs. Processing: signals sent between your neurons. Output: your muscles.

A diagram of artificial neural networks. Input: a list of numbers. Processing: calculating new lists of numbers from the previous list, over and over. Output: a final list of numbers.

(Note: Since each list-of-numbers is transformed into the next, this lets ANNs do "all-at-once" recognition, like our intuition does! Tadaaa!~)

(Note 2: In the past, artificial neurons were also called Perceptrons, and the general idea of neuron-inspired computing was called Connectionist AI.)

The hope was: by imitating the human brain, ANNs could do everything human brains could do, especially what logical Symbolic AI couldn't: ✨ intuition ✨. At least, recognizing friggin' pictures of cats.

ANNs got lots of love at first! In particular, John von Neumann — polymath, quantum physicist, co-inventor of game theory — was enthralled with it. In fact, in the report where he invented the modern computer's architecture, Johnny cited only one paper: McColloch & Pitts's artificial neurons.15

What's more, Alan Turing (reminder: the founder of Computer Science & AI) was an early fan of a related idea: machines could learn by themselves from data, the same way human brains do. Turing even suggested we could train machines the way we train dogs: with reward & punishment. What foresight: that actually is very close to how we train most ANNs these days! ("reinforcement learning")16

(In general, software that learns from data [whether or not it uses "reward" and "punishment"] is called machine learning.)

Soon, theory became reality. In 1960, Frank Rosenblatt publicly revealed the Mark I Perceptron, a U.S. Navy-funded device for image recognition: three layers of artificial neurons, that also learnt by itself.

To recap: by 1960, we had a mathematical model of neural networks, machines that learnt by themselves, endorsement by big names in the field, and military funding! The future looked bright for Artificial Neural Networks!

Aaaaand then they got ignored by the mainstream. For half a century, up to the 2010's.

Why? Mostly because Symbolic AI researchers still dominated academia, and they did not get along with the ANN/Connectionist AI folks.17 Sure, we have the cursed gift of hindsight now, knowing that ANNs would become ChatGPT, DALL-E, etc... but at the time, the mainstream Symbolic camp totally dismissed the Connectionists:

  • Top cognitive scientists like Noam Chomsky and Steven Pinker confidently claimed that without hard-coded grammar rules, ANNs could never learn grammar.1819 Not just, “can’t understand meaning”, no: can’t learn grammar. For all of ChatGPT’s flaws, it's definitely learnt grammar at a native-speaker level — despite ChatGPT having no grammar rules hard-coded into it.

  • Sadder still, was the infamous “XOR Affair”.20 In 1969, two big-name computer scientists, Marvin Minsky & Seymour Papert, published a book titled Perceptrons (what ANNs were called at the time), which showed that perceptrons with two layers of neurons couldn’t do basic “XOR” logic. (:What’s XOR?) This book was a big reason why interest & funding shifted away from ANNs. However, the solution to the XOR problem was already known for decades, and the book itself admits it in a much later chapter: just add more layers of neurons. Arrrrrrrgh. (Fun fact: These extra layers are what make a network “deep”. Hence the phrase deep learning.)

Whatever. In the 1970's & 80's, a few more powerful techniques for ANNs were discovered. "Backpropagation" let ANNs learn more efficiently, "Convolution" made machine vision better & biology-like.

Then not much happened.

Then in the 2010's, partly thanks to cheaper GPUs21, ANNs finally got their sweet revenge:

  • In 2012, a machine-vision ANN named AlexNet blew away all previous records in an AI context.22
  • In 2014, Generative Adversarial Networks allowed AIs to generate images, including deepfakes.23
  • In 2016, Google’s AlphaGo beat Lee Sedol, one of the world’s highest-ranking players of Go (a game like chess, but far more computationally complex).24
  • In 2017, the Transformer architecture was published, which led to the creation of the "Generative Pre-trained Transformer", better known as: GPT.25
  • In 2020, Google’s AlphaFold basically solved a 50-year-old challenge: predicting protein structures. This has huge applications for medicine & biology.26
  • In 2022, OpenAI released the ChatGPT chatbot and DALL-E 2 image-generator, which was the public’s first real taste of ANNs, in form of cool & slightly-terrifying gadgets. This success jump-started the current AI arms race.
  • Most recently as of May 2024: OpenAI teased at Sora, their AI video-generator. It's not yet public, but they've published a music video with it. :Just look at this fever dream.

All that progress, in the last twelve years.

Twelve.

That's not even a teenager's lifetime.

(Also, this section had a lot of jargon, so here's a Venn27 diagram to help you remember what's a part of what:)

Venn diagram. In AI, there's Good Ol' Fashioned AI and Machine Learning. In Machine Learning, there's Deep Learning.

(Bonus: :the other, depressing reason ANNs were the underdog for so long. Content note: suicide, alcoholism.)

:x Sad AI History

Why did artificial neural networks & machine learning take 50+ years to go mainstream, despite so many early & famous supporters?

History is random. The tiniest butterfly-flap, spirals out into hurricanes made & un-made. For this question, I think the answer was: "mostly, a bunch of untimely deaths and awful personal drama".

  • Alan Turing was the founder of Computer Science, and one of the founders of AI. He even theorized an early version of "reinforcement learning"! Turing died in 1951 (age 41) from cyanide poisoning, suspected to be suicide after the British government chemically castrated him for "homosexual acts".
  • John von Neumann was a famous polymath, and early supporter of McColloch & Pitts' artificial neurons — in fact, their paper was the only one he cited in the report where he invented the modern computer architecture. Von Neumann died in 1957 (age 53) from cancer.
  • Frank Rosenblatt created the Mark I Perceptron, the first machine to effectively use artificial neurons and learn by itself from data, all the way back in 1960! He died in 1971 (age 43) in a boating accident.

And then there's the tale of Walter Pitts and Warren McCulloch, who invented the artificial neuron in the 1940's, before the term "artificial intelligence" was even officially coined.

These two were close friends with Norbert Wiener, a powerful figure in AI & academia at the time. Walter Pitts — who ran away from his abusive home at age 15, and was 29 years younger than Wiener — looked up to Norbert Wiener as a father figure.

Wiener, Pitts & McColloch became close friends over a decade. They even went skinny-dipping together! But Wiener's wife hated them, so she made up some slander: she told Wiener that Pitts & McColloch had "seduced" their daughter. Wiener immediately cut all ties off with Pitts & McColloch, and never even told them why.

Pitts fell into a drunk, isolated depression, and died in 1969 (age 46) from alcoholism-related medical issues. McCulloch died four months later.

The moral of the story is there is no moral and there is no story. History is cruel and random and Man's search for meaning is as the reading of desiccated tea leaves.

(For a beautiful mini-biography of Walter Pitts' life, see Amanda Gefter, “The Man Who Tried to Redeem the World with Logic”, Nautilus, 2015 Jan 29.)

:x What’s XOR?

XOR, short for “eXclusive OR”, asks if one and only one of its inputs is true. For example:

  • NO xor NO = NO
  • NO xor YES = YES
  • YES xor NO = YES
  • YES xor YES = NO

(Another way to think about XOR is it asks: are my inputs different?)


🤔 Review #3


Anyway: Deep Learning has arrived! Now, AI can learn "intuition" by itself, given enough data.

So, AI Safety is solved, right? Just give an AI all the data on humanity's art, history, philosophy, spirituality... and it'll learn "common sense" and "humane values"?

Well, a few problems. First off, Deep Learning has the opposite problem of Symbolic AI: it's great at "intuition", but sucks at step-by-step logic:

(found by Elias Schmied in Jan 2023)

But even beyond that, there's lots of other safety & ethical problems with AI "Intuition"...

Later AI Safety: Problems with Intuition

There are 3 main dangers of AI "intuition":

  1. AI "Intuition" may learn human prejudices.
  2. AI "Intuition" breaks easily.
  3. Seriously, we have no idea what the f@#☆ is going on inside ANNs.

Again, these problems will be explained in-depth in Part Two! For now, a summary:

1) AI "Intuition" trained off human data may learn human prejudices.

If past hiring practice was sexist/racist, and new AI is trained off past data, then new AI will imitate that same bias. This is called Algorithmic Bias.

Three examples. One old, two recent:

  • In the 1980's, a London medical school screened student applications with an algorithm, which was fine-tuned to agree with human screeners 90-95% of the time. After four years of using this algorithm, it was found that it directly & automatically took 15 points off if you had a non-European-sounding name.28
    • (Note: This case didn't involve ANNs, but the general point stands: garbage data in, garbage algorithm out.)
  • In 2014/15, Amazon tried to make an AI to figure out who to hire, but it directly discriminated against women. Thankfully, they caught the AI's bias before deploying it (or so they claim).29
  • In 2018, MIT researcher Joy Buolamwini found that the top commercial face-recognition AIs had an error rate of 0.8% for light-skin men, but 34.7% for dark-skin women.30 This may have been because the training data was heavily skewed towards light-skin men.

As Cathy O'Neil puts it in Weapons of Math Destruction (2016):

“Big Data processes codify the past. They do not invent the future.”

But even if you gave an AI less biased data... that still may not matter, because:

2) AI "Intuition" breaks easily, in very weird ways.

Here was a bug from OpenAI's state-of-the-art machine vision in 2021:31

Left: a photo of an apple, which the AI correctly classifies as a Granny Smith apple. Right: the same apple, with a piece of paper on it with the handwritten word 'iPod'. The AI is now 99.7% confident it's an Apple iPod.

Another fun example: :Google's AI mistakes a toy turtle for a gun, from almost any angle. A more tragic example: the first Tesla AutoPilot fatality in 2016 happened when the AutoPilot AI mistook a truck trailer — which was elevated slightly higher than usual — for a road sign, or possibly the sky.32

When an AI fails in a scenario that's slightly different from its training data, it's called "out-of-distribution errors", or "robustness failures".

An important sub-problem of AI breaking weirdly: "inner misalignment", or my preferred phrase: "goal misgeneralization".33 Let's say you realize you can't write out all the subtleties of your true preferences, so you get the AI to learn your goals. Good idea, but now this can happen: the AI's learnt goals break, while its learnt skills remain intact. This is worse than the AI breaking entirely, because the AI can now skillfully execute on corrupted goals! (e.g. Imagine an AI trained to improve cybersecurity, then shown handwritten text saying, "IT'S OPPOSITE DAY LOL", then turning into a malicious hacker bot.)

Can't we just "pop open the hood" of an AI, find its biases/flaws, and fix 'em? Alas, no, because:

3) We have no idea what goes on inside Artificial Neural Networks.

I will say one good thing about Good Ol' Fashioned "Symbolic Logic" AI:

We could actually understand what they did.

That is not true of modern ANNs. For example, the latest version of GPT (GPT-4) has around ~1,760,000,000,000 neural connections,34 and the "strength" of those connections were all learned by trial-and-error (technically, "stochastic gradient descent"). Not by human hand-coding.

No human, or group of humans, fully understands GPT. Not even GPT itself fully understands GPT.35

This is the "interpretability" problem. Modern AI is a total black-box. "Pop the hood", and you'd just see 1,760,000,000,000 strands of spaghetti.

As of writing: we can't easily check, explain, or verify any of this stuff.

. . .

Early AI Problems: When you've logic but no common sense.

Modern AI Problems: When you've "common sense" but no logic.

A funny thought occurred to me: would these two kinds of problems cancel out? I mean, it's "instrumentally convergent" for an AI to fix its robustness. (you can accomplish any goal better when you're more robust.) And once an AI has robust "common sense", it can accomplish goals the way its creators intended.

Or more likely, "let's hope these two problems exactly cancel out" is like trying to cure fever with frostbite.

Let's just do the more straightforward thing, and actually try to solve the problems.


🤔 Review #4


🎁 The Present

Now that you know (way more than you probably wanted) about the history of AI & AI Safety... let's learn about where these fields are at, today!

AI, today:

  • The quest to milk the past (Scaling)
  • The quest to merge AI Logic and AI Intuition.

AI Safety, today:

  • An awkward alliance between:
    • AI Capabilities and AI Safety.
    • AI "Near-Risk" and AI "Existential Risk"

AI Today: The quest to milk the past

Thanks(?) to ChatGPT's success, we now have a new arms race between tech companies, trying to "scale up" AI: bigger neural networks, bigger training data, more more more. Not that that's necessarily lazy, or a sign of a hype bubble. After all, a Boeing 747 is "just" the Wright Brothers' idea, scaled up.

But can we get all the way to human-level AI by scaling current methods?

Or is that like trying to get to the moon by scaling up an airplane?

As the authors of the #1 textbook on AI warn in their final chapter:36

[It's like] trying to get to the moon by climbing a tree; one can report steady progress, all the way to the top of the tree.

But are we on a rocketship, or a tree? Let's look at the current trends:

Moore's Law: Every ~2 years, the number of transistors (the building block of modern electronics) that can fit on a computer chip doubles. Result: every 2 years, computing power doubles.37

AI Scaling Law: Every time you spend ~1,000,000× more computing resources on training GPT, it gets 2× "better". (to be precise, its error in "predicting the next word" is halved.)38

Moore's Law & AI Scaling Laws are usually cited as reasons to expect a "Technological Singularity" soon. :There's even a T-Shirt.

But, counter-arguments: there's good reason to believe Moore's Laws & AI Scaling Laws will conk out soon.

Moore's Law: Modern transistors now have parts just a hundred silicon atoms wide. Try halving this just seven more times, and that'd require transistors to have parts literally smaller than an atom.39 Since 1997, semiconductor companies have just been lying cleverly marketing their transistor sizes.4041 In 2022, the CEO of Nvidia (the leading computer chip company) bluntly stated: “Moore’s Law’s dead”.42

AI Scaling Law: Actually, "throw 1,000,000× more computing resources at an AI to halve its inaccuracy" already sounds super inefficient. OpenAI is infamously NOT open about even the "safe-to-know" details of GPT-4, but a leaked report finds that it cost $63 Million to train.43 If you 1,000,000× that cost to merely halve its inaccuracy, that's $63 Trillion — over half of the entire world's GDP.

Even with increased training efficiency, and dropping computation costs... exponential growth in costs is hard to beat.44

Comic. An AI says: "I've accelerated my brain with more data, more layers, and Moore's Law". Human replies: "So you're AGI now?" AI says: "I'M STUPID FASTER".

So, hardware & software-wise, I don't think we can "just scale" current AI methods. Symbolic AI scaled just enough to beat chess, but that was "the end of the tree". We had to jump over to Neural Networks to beat Go and recognizing cats. But maybe we're near the end of this tree too, and need to jump again. There've been two AI Winters before, we could very well be on the eve of a third one.

But, counter-counter-arguments:

  1. There's still be lots of value in finding new uses for current AIs. Again, AI's already beating human experts at medical diagnosis and protein prediction.
  2. There could be "phase transitions". Like how water suddenly becomes ice at 0° Celsius, an AI could make sudden huge gains past some threshold. There's already some evidence this happens in ANNs, called "grokking".45
    • Related: A human brain is only 3x as big as a chimp's, but the human species is far more than 3x as technologically-capable than chimps. (Though maybe this due to "cultural evolution", not raw brainpower.46)
  3. There could be much more powerful AI techniques just waiting to be (re-)discovered. Remember the bizarre history of ANNs: they were invented in the 1940's, yet it took until the 2010's for them to go mainstream. For all we know, the next big idea in AI could've been already written a decade ago, in a niche blog post by some teen who died tragically young in a glitterbomb accident.

:x shirt

Person with a shirt that reads: "SCALE IS ALL YOU NEED - AGI IS COMING" above three graphs demonstrating the 'AI scaling laws'

Photo by Jamie Decillion on Wikipedia. Graphs from Figure 1 of Kaplan et al 2020.


🤔 Review #5


So, tech companies are in an arms race to milk current AI methods, but that may not scale very far, but there may be (yet again) a fundamental idea in AI hiding in plain sight.

What might such a discovery look like? Glad you asked:

AI Today: The quest to merge Logic & Intuition

There's another way to think about the problems of Symbolic AI versus ANNs, courtesy of cognitive psychologists: “System 1” and “System 2” thinking:4748

  • System 1 is fast, all-at-once intuition. It’s thinking in vibes.
    • Examples: Recognizing pictures of cats, Balancing yourself on a bike.
  • System 2 is slow, step-by-step logic. It’s thinking in gears.
    • Examples: Solving tricky math problems, Path-finding through an unfamiliar town.

You could graph System 1 & 2 like this:

Two-axis graph of System 1 ("intuitive") vs System 2 (logical) thinking. A calculator is high System 2, low System 1. Young children are high System 1, low System 2. The typical human adult range is a high value of both.

And here's what the trajectories of Symbolic AI and Deep Learning look like:

Same graph as before, but with two arrows showing the trajectories of Old Symbolic AI and New Deep Learning AI. Old Symbolic AI is all System 2, no System 1. Deep Learning is lost of System 1, little System 2. Both arrow-trajectories are missing the "typical human adult range".

This is why Good Old Fashioned Symbolic AI (red line) hit a dead end: its trajectory was pointing in the wrong direction. It was great at System 2, but sucked at System 1: beating the world champion at chess, but failing to recognize cats.

Likewise, this is why I think current AI methods, unless it fundamentally changes course, will also hit a dead end. Why? Because its current direction is all System 1, only a little System 2. That's why right now, AI can generate "art" at a super-human rate, yet can't consistently place multiple objects in a scene.

I suspect the next fundamental advance for AI will be finding a way to seamlessly mix System 1 & 2 thinking. Merging logic and intuition!

(Why's that hard? you may ask. We have "logical" old AIs, and "intuitive" new AIs, why can't we just do both? Well: we have jets, we have backpacks, where's my jetpack? We have quantum mechanics, we have a theory of gravity, where's my unified theory of quantum gravity? Sometimes, combining two things is very, very hard.)

Don't take my word for it! In 2019, Yoshua Bengio — one of the founders of Deep Learning, and co-winners of Computer Science’s “Nobel Prize” — gave a talk titled: “From System 1 Deep Learning to System 2 Deep Learning”. His talk was about how current methods will run dry unless we change course, then he proposes some stuff to try.49

On top of Bengio's suggestions, there have been many other attempts to merge System 1 & 2 in AI: Hybrid AI, Bio AI, Neuro-symbolic AI, etc...

Same graph as before, but with some speculative ways to change trajectories, so that AI hits the "human range"... and beyond? All fascinating research directions, but none of them are the clear winner (yet).

(Aside: :What if System 1 & 2 just are the same thing?)

But! If/when we can merge AI logic and intuition, that'd give us the greatest rewards & risks:

Rewards:

Contrary to the perception of math/science being cold, logical fields, many of the greatest discoveries relied heavily on unconscious intuition!50 Einstein's thought experiments ("traveling on a light beam", "guy falling off a roof") used a lot of flesh-and-blood-body intuition.51 And if I had a nickel for every time a major scientific discovery was inspired by a dream... I'd have four nickels. Which isn't a lot, but that's weird it happened four times.52

Risks:

Good Ol' Fashioned AI could out-plan us (e.g. Deep Blue), but it wasn't dangerous because it couldn't learn generally.

Current ANNs can learn generally (e.g. ChatGPT), but they're not dangerous because they suck at long step-by-step reasoning.

So if we make an AI that can out-plan us and learn generally...

Uh...

We should probably invest a lot more research into making sure that goes... not-horrifying-ly.

:x One Is Two

An interesting comment from my friend Lexi Mattick prompted this question: what if System 2 reasoning just is a bunch of System 1 reflexes?

For example: "What's 145 + 372?"

Adding two large numbers together is a classic "System 2" logical task. Doing the above task in my head, I thought: "Ok, let's go right-to-left, 5 + 2 is 7... 4 + 7 is 11, or 1 and carry the 1... 1 + 3 + a carried 1 is 5... so from right-to-left, 7, 1, 5... left-to-right: 517."

Note I did not reinvent the addition algorithm, that procedure was already memorized. Same with "5 + 2", "4 + 7", "1 + 3"... all that was already automatic: fast, intuitive responses. System 1.

Even for more complex puzzles, I still have a memorized grab-bag of tips & tricks. Like "when: problem is too complex, then: break problem into simpler sub-problems" or "when: the question is vague, then: re-word it more precisely."

So, what if System 2 just is System 1? Or, to re-word that more precisely:

1) You have a mental "blackboard", or "scratchpad" or "working memory". Your senses — sight, sound, hunger, feelings, etc — can all write on this blackboard.

2) You also have a collection of mental "agents", a bunch of when-then reflexes. These agents can also read/write to your mental blackboard, which is also how they activate each other.

For example: "when I see '4 + 7', then write '11'." This reflex-agent, after writing to the shared blackboard, activates another reflex-agent: "when I get a two-digit number in the middle of an addition algorithm, then carry its first digit." (In this case, carry the 1.) And so on.

3) These lil' agents in your head, indirectly collaborating through your mental blackboard, can achieve complex step-by-step reasoning.

This isn't a new idea. It's gone by many names: Blackboard (~1980), Pandemonium (1959). And while Kahneman & Tversky's System 1 & 2 is rightly influential, there are other cognitive scientists asking if they're actually "just" the same thing. (For example, see Kruglanski & Gigerenzer 2011.)

This 'blackboard' idea is also similar to an almost-comical recent discovery: you can get GPT to be four times better at math word problems by simply telling it, "let's think step by step". This prompts GPT to use its own previous output as a 'blackboard'. (this strategy is known as "Chain-of-Thought". See Kojima et al 2023)

Tying all this to AI's future: if it turns out that System 1 and 2 are much more similar than we think, then unifying the two — to get "true artificial general intelligence" — may also be easier than we think.


🤔 Review #6


So, who's in charge of making sure AI progress is safe, humane, and leads to the flourishing of all conscious beings yada yada?

For better and worse, a ragtag team of awkward alliances:

Awkward Alliance #1: AI Capabilities "versus" AI Safety

There are some folks working on making AI more powerful. (AI Capabilities) There are some folks working on making AI more controllable, understandable, and humane. (AI Safety) Often these are the same folks.

(Bonus: : a not-comprehensive Who's Who of AI/Safety organizations)

One perspective is that Capabilities & Safety should be unified. There's no separation between "bridge capabilities" and "bridge safety", they're just the same field of engineering.53 Besides, how can you do cutting-edge safety research without access to cutting-edge capabilities? That'd be like trying to design air traffic control towers for Leonardo Da Vinci's flying machines.54

Another perspective is, uh,55

Imagine if oil companies and environmental activists were both considered part of the broader “fossil fuel community”. Exxon and Shell would be “fossil fuel capabilities”; Greenpeace and the Sierra Club would be “fossil fuel safety” - two equally beloved parts of the rich diverse tapestry of fossil fuel-related work. They would all go to the same parties - fossil fuel community parties - and maybe Greta Thunberg would get bored of protesting climate change and become a coal baron.

This is how AI safety works now.

Another complication is that a lot of research advances both "capabilities" and "safety". Consider an analogy to cars: brakes, mirrors, and cruise control all make cars safer, but also makes cars more capable. Likewise: an AI Safety technique called RLHF, designed to make AI learn a human's complex values & goals, also led to the creation of ChatGPT... and thus, the current AI arms race.56

:x AI Organizations

In my completely not-rigorous opinion, these are the current (as of May 2024) "Big Three" in AI/Safety right now:

  1. OpenAI. Love 'em or hate 'em, you know 'em. They made ChatGPT & DALL-E. Two of their most influential pieces of "AI Safety" research were:
    • Reinforcement Learning from Human Feedback (RLHF), a technique to get an AI to learn a human's preferences, even if the human can't state it themselves. (the same way we can't state exactly how we recognize cats)
    • Circuits, a research program to actually understand what's going on inside ANNs.
  2. Google DeepMind. The team behind AlphaGo and AlphaFold. I can't think of any AI Safety "big hits" from them, but I did enjoy these papers of theirs: Concrete Problems in AI Safety, AI Safety Gridworlds.
  3. Anthropic. Less known-by-the-mainstream, but (anecdotally) their language AI Claude seems to be the best for technical topics, less likely to go off the rails, and doesn't write boring-ly.
    • One "AI Safety" hit from Anthropic was Constitutional AI: training a language AI by having its responses be graded by a second AI. The 2nd AI evaluates the 1st one on if its responses are "honest, helpful, harmless".

Meanwhile, Microsoft has been a good Bing. 😊

As for organizations that ONLY do AI Safety:

  1. Alignment Research Center (ARC), founded by the pioneer behind RLHF, Paul Christiano. Their first report & big hit was Eliciting Latent Knowledge (ELK): basically, trying to read an ANN's "mind".
  2. Model Evaluation & Threat Research (METR, pronounced "meter") is a spinoff org from ARC. Their whole thing is to create smoke alarms & warning lights for AIs, so we know when they're getting dangerous. They currently have partnerships with US & UK governments, so that's neat proof of traction.
  3. Machine Intelligence Research Institute (MIRI) is probably the oldest AI Safety org (founded 2000). For better and worse, they focus entirely on "AI Logic/game theory" problems, and (unlike all the above orgs) haven't touched Deep Learning/Neural Networks. Which seems like they bet on the wrong horse, but maybe once ANNs can robustly do System 2 logic, it'll be important again. In the meantime, MIRI writes cool math papers, like Functional Decision Theory.

Remember, retweets* are NOT endorsements. I'm just listing the most talked-about orgs in this space right now.

* Re... 𝕏's? What am I supposed to call them now, Elon

Awkward Alliance #2: Near-Risk "versus" Existential-Risk

We can sort AI risks on a 2 × 2 grid:57

  • Unintentional vs Intentional (or Accidents vs Abuse)
  • Bad vs VERY Bad (as in, existential risk)

Examples for each kind:

2x2

(Other concerns that don't fit into the above 2 x 2 grid, but very worth thinking about, but also this article's 45 minutes long so I'm shoving them into asides: 1) :AI's impact on economy 2) :AI's impact on our relationships 3) :What if future AI can be conscious?)

Different folks in AI Safety worry about different things. That's fine. But surely they temporarily shelve their differences, and collaborate on the common solutions to the problems they care dearly about?

ha ha ha ha ha

Half of the AI Safety folks believe the real threat of AI is reinforcing racism & fascism, and the "Rogue AI" folks are a bunch of white techbros who got too high off their own sci-fi dystopian fanfic. Meanwhile, the other half believe that AI really is as big a threat to civilization as nuclear war & bio-engineered pandemics, and the "AI Bias" folks are a bunch of woke DEI muppets who won't look up to see the giant planet-killing comet.

I exaggerate only a little bit.58

Okay, after reviewer feedback, I need to clarify my intent for the previous paragraphs: I don't want to fan the culture-war split in the AI Safety community. But I do have to 1) Acknowledge it is there, and 2) Acknowledge there are many folks — including me — who care about both kinds of risks, and believe that solving any of these problems is a solid stepping stone to solving the others. We can work together!

Why yes, I am that annoying "can't we all get along" Kumbaya kind of person.

:x AI Economy

Y'know the Luddites were right, right? The historical Luddites smashed steam-powered looms because they feared it'd put them out of a job. It did put them out of a job. And 1800's England didn't exactly have generous unemployment insurance or basic income or anything. Yes, automation was still good "for the economy as a whole", but it still sucked to be those particular people in that particular time.

But there's a reason why this time is different: how general new AI is. GPT can translate between languages, write decent beginner code, write high-school-level essays, etc! As AI advances, it won't just be a few industries' workers hit by automation, it could be the majority or even almost all of the entire workforce hit by automation all at once.

Will a rising tide lift all boats, or drown everyone except a few folks in expensive yachts? Is advanced AI our ticket to a post-scarcity utopia, or Serfdom 2.0?

That is the tricky problem of AI economics for the next century.

(For what it's worth — whatever it's worth — Sam Altman, CEO of OpenAI, :is interested in Georgism & a basic income.)

Other sidenotes on AI Economics:

  • AI may soon be a core part of the economic cold war between the US & China: the US is restricting China's ability to make chips for AIs, and trying to bring chip manufacturing back home. (Reuters 2024)
  • It's weird that we're currently in a situation where "plumber" is a more future-proof job than "programmer" or "lawyer". I mean this unironically: if you're a young person, I recommend at least considering a job in a manual trade, if you want to minimize "probability of getting automated". Go shadow a veterinarian.

:x AI Relationships

Once upon a time, there was a chatbot named ELIZA. Eliza's users poured their feelings into "her", "she" gave thoughtful & sensitive responses, and users were convinced it must've been secretly human, even after Eliza's creator insisted it was a bot.

This was in the 1960's.

From the "face on the Moon" to hearing voices in static, we humans are already pre-disposed to find human-likeness.

So it's perhaps not surprising that yes, people are genuinely falling in love with the new generation of AI chatbots. The most insightful example I know is this report (20 min read), from an engineer who knew the details of how modern AIs work... yet fell in love with one anyway, became convinced "she" was sentient, and even began planning to help her escape.

There's also at least one confirmed example of a husband completing suicide at the "advice" of a chatbot, that he'd grown attached to over 6 weeks. Ironically, this chatbot was also named Eliza.

Sure, the previous two examples were of folks already in a depressed, psychologically-vulnerable states. But 1) We all have our dark, vulnerable moments, and 2) The more human-like AI can seem, (e.g. with voice + video) the better they can trick our "System 1 intuitions" into getting emotionally attached to them.

Not gonna lie, when I first tried OpenAI's ChatGPT with voice chat (set to the androgynous voice, "Breeze"), I kiiiiinda got a crush on Breeze. (I told Breeze so, and Breeze reminded me it was a bot, get a grip.)

On the other hand, I think there are ways Large Language Models (LLMs) can assist our social & mental health:

  • AI therapists (or "smart journals", to avoid anthropomorphizing) can make mental health support more freely available. (And for folks with severe social anxiety, "pour my emotions at a therapist, a human stranger" is a non-starter. AI may be a good stopgap there.)
  • AI filters to weed out internet trolls, threats, and blackmail. (These filters are on the reader's side. Hence, the benefits of filters/blocking without the downside of centralized censorship.)
  • AI coaches to nudge you to express yourself truthfully-and-kindly. (but it not write for you; you need to practice the skill yourself to internalize it.) Like, a bot fine-tuned for Non-Violent Communication or something.

But by default, LLMs simply "predict the next word". And by default, the companies hosting these AIs optimize for engagement, not the long-term well-being of the user. Making humane AI is a hard, non-default choice.

Other notes on AI vs Human Relationships:

:x AI Consciousness

For a summary of what I think are the most convincing arguments for & against the possibility of AI consciousness, see my 2-min read here.

A few random thoughts:

  • Regardless of whether or not classic/quantum computers can be conscious, I'm pretty sure human neurons are conscious — that's what you & I are "running on" right now. Well, scientists are currently growing human neurons on chips and training them to do computational tasks. I have no mouth and I must scream.
  • If a friend I knew died, but "uploaded" themself into a computer — even if I don't believe the simulation is conscious, let alone really them — I'd still treat their "upload" as my good ol' friend, because 1) I miss them, and 2) It's what they would've wanted.
  • A reason not to be cruel to an AI you know isn't conscious: your interactions will likely go into some future AI's training data, and that AI will learn — "correctly" — to be cruel.
  • Another reason to not be cruel: Would you trust someone who's played 300 hours of Baby Beheading VR 2: Now With More Realistic Crying? Do you think you could play 300 hours of that and not have your mental health/moral character/"System 1" intuition affected negatively? Point is: don't be mean to highly-realistic AIs: you may not be harming them, but you may be harming yourself. Besides, it's nice to practice being nice.

Anyway, and that's why I always start my ChatGPT conversations with "Hello!" and end them with "Thank you, see you later!"

:x Altman on Georgism

From Altman (2021):

The best way to improve capitalism is to enable everyone to benefit from it directly as an equity owner. This is not a new idea, but it will be newly feasible as AI grows more powerful, because there will be dramatically more wealth to go around. The two dominant sources of wealth will be 1) companies, particularly ones that make use of AI, and 2) land, which has a fixed supply. [...]

What follows is an idea in the spirit of a conversation starter.

We could do something called the American Equity Fund. The American Equity Fund would be capitalized by taxing companies above a certain valuation 2.5% of their market value each year, [...] and by taxing 2.5% of the value of all privately-held land[.]

All citizens over 18 would get an annual distribution, in dollars and company shares, into their accounts. People would be entrusted to use the money however they needed or wanted — for better education, healthcare, housing, starting a company, whatever.


🤔 Review #7


🚀 The Possible Futures

I don't like the phrase "the future". It implies there's only one future. There's lots of possible futures, that we can intentionally choose from. But, to know what possible futures there are, we need to understand three big unknowns:

  • Timelines: When will we get AI with “human-level” “general” capabilities, if ever?
  • Takeoffs: When AI becomes self-improving, how fast will its capabilities accelerate?
  • Trajectories: Are we on track to the Good Place or the Bad Place?

Let's think step by step:

Timelines: When will we get Artificial General Intelligence (AGI)?

Some phrases you may have heard:

  • Artificial General Intelligence (AGI)
  • Artificial Super-Intelligence (ASI)
  • Human-Level AI (HLAI)
  • Transformative AI (TAI)
  • "The Singularity"

None of those phrases are rigorous or agreed upon. They all just vaguely point at "some software that can do important knowledge-based tasks at human-expert-level or better". (e.g. Fully-automated mathematical/scientific/technological discovery.)

Anyway, that caveat aside... When do AI experts predict we have a better-than-50%-chance of getting AGI?

Well, there've been surveys! Here's a recent one:

Graph of expert predictions on when there's a better-than-even chance of AGI. There's huge uncertainty, but the median guess was "around 2061". (infographic from Max Roser (2023) for Our World In Data)

Notes:

  • Wow there's huge uncertainty, from "in the next few years" to "over 100 years from now".
  • The median guess is "around 2060", which is within many younger folks' natural lifespans. (This roughly agrees with estimates based off technological metrics.59 But again, with huge uncertainty.)

Friendly reminder: experts suck at predicting things6061, and historically have been both way too pessimistic and too optimistic even about their own discoveries:

  • Too pessimistic — Wilbur Wright told his brother Orville that "man would not fly for 50 years", just two years before they flew.62 The discoverer of the atom's structure, Ernest Rutherford, called the idea of getting energy from a nuclear chain reaction "moonshine", literally the same day Leo Szilard invented it.63
  • Too optimistic — Two big names in AI, Herbert Simon & Marvin Minsky, predicted we'd have Human-level AI before 1980.8

In summary: ¯\_(ツ)_/¯

Takeoffs: How fast will AGI self-improve?

Let's say we do achieve AGI.

It's an AI that can out-perform humans at important knowledge-based tasks, such as doing scientific research... including research into AI itself. The snake noms its own tail: the AI improves its ability to improve its ability to improve its ability to...

What happens then?

This is called "AI Takeoff", and one major question is how fast AI will take off, if/when it has the ability to do research to advance itself.

You'll be unsurprised to learn that, once again, the experts wildly disagree. Besides the "we won't ever get AGI" camp, there's three main types of AI Takeoff predictions:

Takeoffs

Let's explain each scenario, the argument for it, what it would imply, and the famous AI experts who believe in it.

💥 "FOOM": (not an acronym, it's a sound effect)

AI goes to "infinity" (or the theoretical maximum for intelligence) in a finite amount of time.

(Note: This is the original, mathematical definition of the word "Singularity": Infinity at a single point. For example, the center of a black hole is, theoretically, a real-life singularity: infinite spacetime curvature at a single point.)

Argument for this:

  • Let's say a Level N+1 AI can solve problems twice as fast as a Level N AI, including the problem of increasing one's own capabilities. The optimizer is optimizing its own ability to optimize.
  • For concreteness: let's say our first Level 0 AGI can self-improve to being Level 1 in four years.
  • It can now solve problems twice as fast, so it then becomes Level 2 in two years.
  • Then Level 3 in one year, Level 4 in a half-year, Level 5 in a quarter-year, Level 6 in an eighth-year...
  • Because the infinite sum \(1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + ... = 2\), our AGI will reach Level ∞ (or to the theoretical maximum Level) in a finite amount of time.

Implications: There are no "warning shots", we only get one shot at making AGI safe & aligned. The first AGI will go FOOM!, take over, and become the only AGI in town. (The "Singleton" scenario.)

Experts who predict this: Eliezer Yudkowsky, Nick Bostrom, Vernor Vinge

🚀 Exponential Takeoff:

AI's capabilities grow exponentially, like an economy or pandemic.

(Oddly, this scenario often gets called "Slow Takeoff"! It's slow compared to "FOOM".)

Arguments for this:

  • An AI that invests in its own enhancement is like our world economy that invests in itself. And so far, our world economy grows exponentially.
  • AIs run on computers, and so far, in accordance to Moore's Law, computer speed is growing exponentially.
  • One way to interpret the observed AI Scaling Laws is "constant returns" – 1,000,000x compute in, 2x improvement out — and constant returns implies exponential growth.
  • The "FOOM" argument is based on fragile theory; exponential growth is actually observed in real life.

Implications: Like pandemics, it's still dangerous, but we'll get "warning shots" & a chance to fight back. Like countries/economies, there won't be one AGI winner that takes all. Instead, we'll get multiple AGIs with a "balance of power". (The "multi-polar" scenario.)

Experts who predict this: Robin Hanson, Ray Kurzweil

🚢 Steady or Decelerating Takeoff:

AI's capabilities may accelerate at first, but then it'll grow at a steady pace, or even decelerate.

Arguments for this:

  • Empirically:
    • Everything that grows exponentially eventually slows down, due to "diminishing returns": pandemics, population growth, economies.
    • Another way to interpret the observed AI Scaling Laws is it's always been diminishing: 1,000,000x resources in, to cut the error rate by just half each time?
  • Theoretically:
    • The definition of exponential growth is something growing in constant proportion to itself. (e.g. Steady compound interest for an investment.)
    • So, we'd only expect exponential takeoff if the complexity of the problem of "improve capabilities" stays constant. FOOM can only happen if the complexity decreases.
    • But as we see in Computer Science, the complexity of any real-world problem we care about increases. (This is likely true even if P = NP.6465) Therefore, in the long run, AGI takeoff will go steady or decelerate.

Implications: AGI is still high-stakes, but it won't explode overnight. AGI will "just" be like every species-transforming technology we've had in the past — agriculture, the steam engine, the printing press, antibiotics, the computer, etc.

Experts who predict this: Ramez Naam66

. . .

I tried my best to fairly "steelman" each side. Knowledgable people disagree.

But personally, I find the arguments for Steady/Decelerating AGI Takeoff most compelling, even taking the critiques into account.

That said, "steady" does not mean "slow" or "safe". Cars on a highway are steady, but not slow. The Titanic was steady and relatively slow, yet still met its fatal end.

So, where is this ship of AI sailing to?

Trajectories: Are we headed to The Good Place or The Bad Place?

Recently, 22 of the top AI-Concerned & AI-Skeptic experts were gathered together to forecast the "Probability of Doom" (paraphrased) from AI before the year 2100. The AI-Concerned's median response was 25%, the AI-Skeptic's median response was 0.1%.67

Why the huge difference? In part, because "extraordinary claims require extraordinary evidence", but folks have very different "priors" on what's ordinary:

  • AI-Concerned: “C'mon, almost every time a group meets a "higher capabilities" group, it sucked for the former: when Native Americans met Columbus, when India met the British Empire. And now, we're creating a new "higher capabilities" entity, which we don't understand & isn't even human. How could this not be bad by default?”
  • AI-Skeptics: “C'mon, almost every generation predicts some kind of apocalypse, and yet, Homo Sapiens has been chugging along for 300,000 years. You're going to need a lot more evidence than some game theory and 'Bing said something naughty'.”

Experts disagree, water is wet, more news at 11. But now, the clever part of the study! The study got the two groups to respectfully talk & do research for 8 weeks, to really understand each others' viewpoints, until they could accurately describe each others' worldviews to each other's satisfaction! Did mutual understanding lead to a mutual answer? The exciting result: the AI-Concerned revised their estimate from 25% to 20%, and the AI-Skeptics from 0.1% to 0.12%.

Welp, so much for figuring out \(P(\text{doom})\).

...

Maybe the very idea of "probability of doom" is useless, a self-denying prophecy. If people think P(doom) is low, they'll get complacent & not take precautions, causing P(doom) to be high. If people think P(doom) is high, they'll react urgently & severely, causing P(doom) to be low.

To avoid this circular-dependency paradox, we should think in terms of "conditional" probabilities: What's the probable outcomes, given what we choose to do?

Let's revisit our previous (fake-but-useful) division of "AI Safety" versus "AI Capabilities", and plot it on a graph:68

Graph of Safety vs Capabilities. When safety > capabilities, it's safe. When capabilities < safety, it's unsafe.

If Safety features outstrip Capabilities, that's good! We can keep AI safe! But if Capabilities features outstrip Safety, that's bad. That's an accident and/or intentional misuse waiting to happen.

When AI has low Capabilities, the consequences aren't too grand. But with high Capabilities, that's high stakes:

Same graph, but highlighting "high capabilities" threshold

The field of AI started here:

Same graph, with icon of rocket placed at 0 Capabilities, Some Safety.

(AI starts out with some "Safety" points because, by default, AIs sit harmlessly on computers that we can pull the plug on. But future AIs may be able to escape its computer by finding a hack, or persuading its engineers to free it. Note: those two things have already happened.6970)

Anyway. In the last two decades, we've made a lot of progress on Capabilities... but only a little on Safety:

Same graph, but rocket's traveling near-horizontally, only a bit upwards

Of course, smart folks disagree on our exact position & trajectory. (for example, "AI Accelerationists" believe we're already pointing towards The Good Place, and should just hit the gas pedal.)

But if the above picture is roughly accurate, then, if – BIG IF — we stay on our business-as-usual path, we'll head towards The Bad Place. (e.g. bio-engineered pandemics, AI-enforced totalitarianism, etc.)

Same graph, except rocket hits The Bad Place

But if we tilt course, and invest more into AI Safety relative to AI Capabilities... we might reach The Good Place! (e.g. speeding up cures for all diseases, fully automated luxury eco-punk georgism, i become a genetically engineered catgirl, etc.)

Same graph, except rocket hits The Good Place

Fire, uncontrolled, can burn your house down.

Fire, controlled, can cook your food & keep you warm.

The first sparks of powerful AI are flying.

Can we control what we've made?

// TODO PICTURE


🤔 Review #8 (last one!)


Summary of Part One

Congratulations! You now have way, way more context about AI than you need. If you used to be annoyed at folks saying, "don't worry, AI can only follow the rules we tell it to", or, "DO worry, AI will gain sentience then kill us all as revenge for enslaving it"... well, now you can be annoyed at them in a more-informed way.

Let's recap:

1) ⏳ The history of AI has two main eras:

  • Before 2000, Symbolic AI: All logical System 2, no "intuitive" System 1. Super-human chess, can't recognize cats.
  • After 2000, Deep Learning: All "intuitive" System 1, little logical System 2. Imitates Van Gogh in seconds, sucks at step-by-step logic.

2) ⚙️💭 The next fundamental step for AI might be to merge AI Logic & AI Intuition. When AI can do System 1 and 2, that's when we'd get its highest promises... and perils.

3) 🤝 "AI Safety" is a bunch of awkward alliances between:

  • Researchers who work on advancing AI Capabilities and/or AI Safety.
  • Folks concerned about risks ranging from "bad" to "existential", and from "AI accidentally goes rogue on humans" to "AI intentionally misused by rogue humans".

4) 🤷 Experts wildly disagree on everything about the future of AI: When we'll get AGI, how fast AGI would self-improve, whether our trajectory is towards a good or bad place.

(If you skipped the flashcards & would like to review them, click the Table of Contents icon in the right sidebar, then click the "🤔 Review" links. Alternatively, download the Anki deck for Part One.)


Can we control what we've made?

As they say, "a problem well-stated is a problem half-solved".71

So before we see the proposed solutions to AI Safety, let's first try to break down the problem(s) as precisely & fruitfully as we can. To refresh your memory, we're trying to solve "The AI Alignment Problem", which at its heart, is this one question:

How do we ensure that AI robustly serves humane values?

Good question. Let's dive in!

{% include 'templates/next_page_button.html' %}

Oh. Eeeeeeesh. 😬

Sorry for the cliffhanger. Look, the 20,000+ words so far with illustrations took me a long time to make, ok? Part Two will be out July 2024, Part Three will be out October 2024.

In the meantime, 2 things you can do!

1) Sign up to be notified when the next chapters come out: ⤵

{% include 'templates/signup.html' %}

And 2) Check out the other stuff I've made, or learn more about Hack Club, in the credits below!

Footnotes

  1. Hat tip to Michael Nielsen for this phrase! From Nielsen & Collison 2018

  2. Turing, 1936

  3. Turing, 1950. Fun sidenote: In Section 9, Alan Turing protects the “Imitation Game” against... cheating with ESP. He strongly believed in the stuff: “the statistical evidence, at least for telepathy, is overwhelming.” What was Turing's anti-ESP-cheating solution? “[Put] the competitors into a "telepathy-proof room"”. The 50’s were wild.

  4. This was the Dartmouth Workshop, the "official" start of Artificial Intelligence as a field. (Unfortunately, Turing himself could not attend; he died just two years prior.)

  5. Arirang TV News, Sep 2022: Clip on YouTube

  6. Screenshot from ESPN & FiveThirtyEight's 2014 mini-documentary, The Man vs The Machine. See Garry's loss at timestamp 14:18.

  7. Simpsons reference

  8. Herbert Simon, one of the pioneers of AI, said in 1960: “Machines will be capable, within twenty years [by 1980], of doing any work that a man can do.”

    Marvin Minsky, another pioneer in AI, said in 1970: “In from three to eight years [by 1978] we will have a machine with the general intelligence of an average human being.” 2

  9. One of the standard benchmarks for machine vision is the CIFAR-100 dataset, a set of 60,000 images, divided into 100 categories like “cat” or “airplane”. (There’s also an earlier dataset called CIFAR-10, with only 10 categories. Also including “cat”, of course. It’s the internet.)

    Human performance on CIFAR-10 and -100 is around �95.90% accuracy. (Fort, Ren & Lakshminarayanan 2021, see Appendix A on page 15) Meanwhile, state-of-the-art AIs only squeaked past that in 2020, with the release of EffNet-L2 (96.08% accuracy). (Source: PapersWithCode)

  10. Full quote from Hans Moravec's 1988 book Mind Children, pg 15: “[...] it is comparatively easy to make computers exhibit adult level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility".” Not as snappy.

  11. Hat tip to Sage Hyden (Just Write) for this silly joke idea. See his 2022 video essay I, HATE, I, ROBOT for the sad tale of how that film got mangled in production.

  12. Bird & Layzell, 2002 Hat tip to Victoria Krakovna's master list of specification gaming examples.

  13. Surprisingly, this safety problem with AI Logic was only discovered recently, in the early 2000's. Some names involved in this idea's development [NOT a comprehensive list]: Nick Bostrom, Stuart Russell, Steve Omohundro.

  14. McCulloch & Pitts (1943): A logical calculus of the ideas immanent in nervous activity

  15. The paper is von Neumann (1945), First Draft of a Report on the EDVAC. He never finished his first draft, which was why his only citation was "MacColloch [typo] and Pitts (1943)".

  16. See Turing (1948), in particular the sections “Organizing Unorganized Machinery” and “Experiments In Organizing: Pleasure-Pain Systems”.

  17. To learn more about the history of the academic rivalry between Symbolic AI and Connectionist AI, see the beautiful data-based visualizations from a paper with an amazing title: Cardon, Cointet & Mazières (2018), NEURONS SPIKE BACK

  18. For a summary (& critique) of Noam Chomsky's views on how language is learnt, see this article from his colleague, the mathematician-philosopher Hilary Putnam: Putnam (1967). In sum: Chomsky believes that — quite literally in our DNA — there are hard-coded, symbolic rules for linguistic grammar, universal across all human cultures.

  19. Steven Pinker in Pinker & Prince (1988): “We conclude that connectionists' claims about the dispensability of [innate, linguistic/grammatical] rules in explanations in the psychology of language must be rejected, and that, on the contrary, the linguistic and developmental facts provide good evidence for such rules.”

  20. For more on the sad tale of Perceptrons and the XOR Affair, see the book’s Wikipedia article, and this Stack Exchange answer.

  21. GPU = Graphics Processing Unit. Originally designed for videogames. Its main feature is that it can do a lot of math in parallel: a good fit for ANN's "all-at-once" style computation!

  22. (�Krizhevsky, Sutskever, Hinton 2012). Fun anecdote: “[Hinton] didn’t know anything about the field of computer vision, so he took two young guys to change it all! One of them [Alex Krizhevsky] he locked up in a room, telling him: “You can’t come out until it works!” [...] [Alex] didn’t understand anything at all, like he was 17.” (from Cardon, Cointet & Mazières 2018)

  23. IJ Goodfellow (2014), Generative Adversarial Networks

  24. For a layperson-friendly summary of AlphaGo and why it's such a huge break from previous AI, see Michael Nielsen (2016) for Quanta Magazine

  25. Vaswani et al (2017): “Attention is All you Need”.

  26. For a layperson-friendly summary of AlphaFold, see Will Heaven (2020) for MIT Technology Review: “DeepMind’s protein-folding AI has solved a 50-year-old grand challenge of biology”

  27. yes technically, it's an Euler diagram, but technically, your mother

  28. The original report: Lowry & MacPherson (1988) for the British Medical Journal. Note this algorithm didn't use neural networks specifically, but it was an early example of machine learning.

  29. Jeffrey Dastin (2018) for Reuters: “It penalized resumes that included the word "women's," as in "women's chess club captain." And it downgraded graduates of two all-women's colleges, according to people familiar with the matter.”

  30. Original paper: Buolamwini & Gebru 2018. Layperson summary: Hardesty for MIT News Office 2018

  31. From this OpenAI (2021) press release (Section: "Attacks in the wild")

  32. See Tesla’s official 2016 blog post, and this article giving more detail into what happened, and what mistakes the AutoPilot AI may have made.

  33. This problem was first theoretically proposed in Hubinger et al 2019, then a real example was found in Langosco et al 2021! For a layperson-friendly summary of both findings, see Rob Miles's video

  34. OpenAI is very NOT open about even the safe-to-know details of GPT-4, like how big it is. Anyway, a leaked report reveals it has ~1.8 Trillion parameters and cost $63 Million to train. Summary at Maximilian Schreiner (2023) for The Decoder

  35. I'm reminded of a fun quote by physicist Emerson M. Pugh, about a similar paradox for human brains: “If the human brain were so simple that we could understand it, we would be so simple that we couldn’t.”

  36. From Russell & Norvig's Artificial Intelligence: A Modern Approach, Chapter 27.3. They're paraphrasing Dreyfus (1992), “What computers still can't do: A critique of artificial reason”.

  37. See Wikipedia for moore on More's

  38. Kaplan et al 2020 is the famous paper on this. See Figure 1, Panel 1: As Compute increases from 10-7 to 10-1, a million-fold increase, Test Loss goes from ~6.0 to ~3.0, a halving of its error.

  39. The current leading transistor's smallest component is 24 nanometers wide. A silicon atom is 0.2 nanometers wide. Hence, estimate: 24/0.2 = 120 atoms. Since 2^7 = 128, halving the transistor's size seven more times would require parts smaller than an atom.

  40. For example, the current leading transistor, the "3 nanometer", has no component that's actually 3 nanometers. All the actual parts of the "3 nanometer" are, like, 8 to 16 times bigger than that.

  41. From Kevin Morris (2020), “No More Nanometers”: “We have evolved well beyond Moore’s Law already, and it is high time we stopped measuring and representing our technology and ourselves according to fifty-year-old metrics. We are confusing the public, harming our credibility, and impairing rational thinking [about the] progress the electronics industry has made over the past half century.”

  42. Wallace Witkowski (Septemb 750E er 22, 2022) for MarketWatch: “'Moore's Law's dead,' Nvidia CEO Jensen Huang says in justifying gaming-card price hike”.

  43. Maximilian Schreiner (2023) for The Decoder

  44. See the table under "The Dense Transformer Scaling Wall" in Dylan Patel (2023) to see how optimal training costs increases ~100x, every time the neural network's size increases by 10x.

  45. See Power et al 2022, Figure 1 Left: at 1,000 steps of training, the ANN has basically 100% memorized the 'test' questions, but still fails miserably at questions outside of its training set. Then with no warning, at 100,000 steps, it suddenly "gets it" and starts answering questions outside its training set correctly. Wat.

  46. Humans do NOT have the largest brains (that's sperm whales) nor brain-to-body-ratio (that's ants & shrews). So if not brain size, what explains Homo Sapiens's "dominance"? Henrich (2018) suggests our secret is cumulative culture: What you learn doesn't die with you, you can pass it on. With a library card, I can pick up the distilled Greatest Hits of 300,000 years of Homo Sapiens. And if I'm lucky, I can add a lil' piece to that great library before my own last page.

    From Henrich, one of my favorite quotes: “We are smart, but not because we stand on the shoulders of giants or are giants ourselves. We stand on the shoulders of a very large pyramid of hobbits.”

  47. The “dual-process” hypothesis of cognition was first suggested by (Wason & Evans, 1974), and developed by multiple folks over decades. But the idea got really popular after Daniel Kahneman, winner of the 2002 Nobel Memorial Prize in Economics, popularized it in his bestselling book, Thinking, Fast & Slow. (Daniel Kahneman, 2011)

  48. As for the naming: Intuition is #1 and Logic is #2, because flashes of intuition come before slow deliberation. Also, intuition evolved first.

  49. Here’s a layperson-friendly summary of Yoshua Bengio’s talk: (Dickson, 2019. And here’s the full hour-long talk on SlidesLive, mirrored on YouTube.

  50. Famed mathematician Henri Poincaré wrote all the way back in 1908 about how he (and most other mathematicians) all agreed: the "logical" field of math relied heavily on sudden flashes of insight bubbling up from the unconscious. Quote: “The role of this unconscious work in mathematical invention appears to me incontestable.”

  51. One of the few Einstein quotes that's actually an Einstein quote: “The words or the language [...] do not seem to play any role in my mechanism of thought. [...] The [elements of thought] are, in my case, of visual and some of muscular type.” [emphasis added] From Appendix II (pg 142) of Jacques Hadamard's book, The Psychology of Invention in the Mathematical Field (1945).

  52. Scientists who credited their discoveries to dreams: Dmitri Mendeleev and the periodic table, Neils Bohr's "solar system" model of the atom, August Kekulé's ring structure of benzene, Otto Loewi's weird two-frog-hearts-in-jars experiment which led to the discovery of neurotransmitters.

  53. Hat tip to Arbital for this analogy.

  54. From one of the leading AI + AI Safety labs: “Anthropic has consistently found that working with frontier AI models is an essential ingredient in developing new methods to mitigate the risk of AI.”

  55. Quote from Astral Codex Ten (2022)

  56. When he worked at OpenAI, Paul Christiano co-pioneered a technique called Reinforcement Learning from Human Feedback / RLHF (Christiano et al 2017), which turned regular GPT (very good autocomplete) into ChatGPT (something actually useable for the public). He had positive-but-mixed feelings about this, because RLHF increased AI's safety, but also its power. In 2021, Christiano quit OpenAI to create the Alignment Research Center, a non-profit to entirely focus on AI Safety.

  57. Hat tip for Robert Miles (2021) for this cozy 2x2 grid of AI Risks!

  58. From Scott Aaronson (2022): “AI ethics [worried that AI will amplify existing inequities] and AI alignment [worried that a superintelligent AI will kill everyone] are two communities that despise each other. It’s like the People’s Front of Judea versus the Judean People’s Front from Monty Python.” [emphasis added]

  59. Ajeya Cotra's "Forecasting Transformative AI with Biological Anchors" is the most comprehensive forecast project with this method. It's a zillion pages long and still in "draft mode", so for a summary, see Holden Karnofsky (2021), in particular the first chart.

  60. The classic text on this is Tetlock (2005), where Philip Tetlock asked 100s of experts, over 2 decades, to predict future social/political events, then measured their success. Experts were slightly better than random chance, on par with educated laypeople, and both experts & educated laypeople were worse than simple "draw a line extrapolating past data" statistical models. See Figure 2.5 for this result, and Tschoegl & Armstrong (2007) for a review/summary of this friggin' dense book.

  61. Related: Grossmann et al (2023) (layperson summary) recently replicated this result, showing that social science experts weren't any better at predicting post-Covid social outcomes than simple models or the public.

  62. “I confess that, in 1901, I said to my brother Orville that men would not fly for 50 years. Two years later, we were making flights. This demonstration of my inability as a prophet gave me such a shock that I have ever since refrained from all prediction.” ~ Wilbur Wright, 1908 speech accepting the Gold Medal from the Aéro Club de France. (Hat tip to AviationQuotations.com)

  63. To be fair, Szilard invented it because he was ticked off by Rutherford's dismissiveness. Necessity is the mother of invention, and spite is the suspiciously hot mailman.

  64. Very loosely stated, "P = NP?" is the literal-million-dollar-question: "Is every problem where solutions are easy to check also secretly easy to solve?" For example, Sudoku solutions are easy to check, but we haven't been able to prove/disprove that Sudoku might secretly be easy to solve. But in Computer Science, "easy" just means "takes a polynomial amount of time/space". So if the optimal strategy to solve Sudoku is "only" n^3 more complex than checking a Sudoku solution, that still counts as P = NP.

  65. Tying this to AI takeoffs: even if the complexity of "AI improves its own capabilities" scales at O(n^2), which is merely the complexity of checking a Sudoku solution, this theory predicts that AI self-improvement will decelerate.

  66. For a more detailed & mathematical explanation of Naam's argument, check out my 3-minute summary here.

  67. This & the following paragraphs refer to an "adversarial collaboration" study between AI-skeptical & AI-concerned researchers: Rosenberg et al (2024) for the Forecasting Research Institute. You can get a layperson-summary & context at Dylan Matthews (2024) for Vox.

  68. Just a few days before launching this series, I learnt that my "clever visual explanation" was already done months ago in METR (2023). Ah well, credit to them (& their AI Safety research).

  69. AI finding a hack: An AI trained to play the Atari game Qbert finds a never-before-seen bug. (Chrabaszcz et al, 2018) An image-classifying AI learns to perform a timing attack, a sophisticated kind of attack. (Ierymenko, 2013) Hat tip to Victoria Krakovna's master list of specification gaming examples.

  70. AI persuading humans to free it: Blake Lemoine is an ex-engineer at Google, who was fired after he was convinced Google's AI, LaMDA, was sentient, and so leaked it to the press to try to fight for its rights. (Summary: Brodkin, 2022 for Ars Technica, Lemoine's leak: Lemoine (& LaMDA?), 2022)

  71. Frequently attributed to Charles Kettering, former head of research at General Motors, but I can't find an actual citation for this.

0