[go: up one dir, main page]

Explore 1.5M+ audiobooks & ebooks free for days

From $11.99/month after trial. Cancel anytime.

Why Machines Learn: The Elegant Math Behind Modern AI
Why Machines Learn: The Elegant Math Behind Modern AI
Why Machines Learn: The Elegant Math Behind Modern AI
Ebook724 pages7 hours

Why Machines Learn: The Elegant Math Behind Modern AI

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

A rich, narrative explanation of the mathematics that has brought us machine learning and the ongoing explosion of artificial intelligence

Machine learning systems are making life-altering decisions for us: approving mortgage loans, determining whether a tumor is cancerous, or deciding if someone gets bail. They now influence developments and discoveries in chemistry, biology, and physics—the study of genomes, extrasolar planets, even the intricacies of quantum systems. And all this before large language models such as ChatGPT came on the scene.

We are living through a revolution in machine learning-powered AI that shows no signs of slowing down. This technology is based on relatively simple mathematical ideas, some of which go back centuries, including linear algebra and calculus, the stuff of seventeenth- and eighteenth-century mathematics. It took the birth and advancement of computer science and the kindling of 1990s computer chips designed for video games to ignite the explosion of AI that we see today. In this enlightening book, Anil Ananthaswamy explains the fundamental math behind machine learning, while suggesting intriguing links between artificial and natural intelligence. Might the same math underpin them both?

As Ananthaswamy resonantly concludes, to make safe and effective use of artificial intelligence, we need to understand its profound capabilities and limitations, the clues to which lie in the math that makes machine learning possible.
LanguageEnglish
PublisherPenguin Publishing Group
Release dateJul 16, 2024
ISBN9780593185759

Read more from Anil Ananthaswamy

Related authors

Related to Why Machines Learn

Related ebooks

Intelligence (AI) & Semantics For You

View More

Related categories

Reviews for Why Machines Learn

Rating: 3.8076923076923075 out of 5 stars
4/5

13 ratings3 reviews

What did you think?

Tap to rate

Review must be at least 10 words

  • Rating: 5 out of 5 stars
    5/5

    Nov 13, 2024

    Thank You This Is Very Good, Maybe This Can Help You
    Download Full Ebook Very Detail Here :
    https://amzn.to/3XOf46C
    - You Can See Full Book/ebook Offline Any Time
    - You Can Read All Important Knowledge Here
    - You Can Become A Master In Your Business
  • Rating: 5 out of 5 stars
    5/5

    Sep 20, 2025

    The math made it a cumbersome listen. Not particularly complex math but the math got in the way of grasping the narration. I think I have a better understanding of the logical basis of artificial intelligence.
  • Rating: 5 out of 5 stars
    5/5

    Jan 28, 2025

    Why Machines Learn by Anil Ananthaswamy is a great book; here's why:
    The book's pattern is that each chapter starts with a relevant story, an overview, some math foundations, and valuable examples. There are many equations, ~1000, but stick with it. Ignore Stephen Hawking's advice: "Someone told me that each equation I included in the book would halve the sales." An interested high school student can work through the book; Anil has exceptional explanations and examples.

    My first pass through the book took a month; now, I want to do a second pass using Mathematica to build the equations into a framework. As Feynman said, "I understand what I build." I used a hybrid approach, listening to the book while making notes in a printed copy. This worked well for me, as I didn't want to skip anything.

    The author brings out the personalities behind ML beyond the popular science level; their equations also speak for them. Humor throughout: "In high dimensional space, no one can hear you scream." Julie Delon

    Favorites:
    Story - Al-Hazen & vision science
    Math - Optimization & Lagrange Multipliers
    Concept 2nd descent of bias-variance curve
    Example Consciousness & Anesthesia EEG PCA

    I wish the second half of the book had more examples. The author could expand the last chapters and epilogue into another book. There are good references and a helpful index. A bibliography and suggestions on What's Next would be beneficial.

    This book has inspired me to delve deeper into understanding ML; I want to comprehend the changes ML brings to our world. Anil has helped me bootstrap myself.

Book preview

Why Machines Learn - Anil Ananthaswamy

Cover for Why Machines Learn: The Elegant Math Behind Modern AI, Author, Anil Ananthaswamy

PRAISE FOR

Why Machines Learn

Some books about the development of neural networks describe the underlying mathematics while others describe the social history. This book presents the mathematics in the context of the social history. It is a masterpiece. The author is very good at explaining the mathematics in a way that makes it available to people with only a rudimentary knowledge of the field, but he is also a very good writer who brings the social history to life.

—GEOFFREY HINTON, deep learning pioneer, Turing Award winner, former VP at Google, and professor emeritus at the University of Toronto

"After just a few minutes of reading Why Machines Learn, you’ll feel your own synaptic weights getting updated. By the end you will have achieved your own version of deep learning—with deep pleasure and insight along the way."

—STEVEN STROGATZ, New York Times bestselling author of Infinite Powers and professor of mathematics at Cornell University

If you were looking for a way to make sense of the AI revolution that is well under way, look no further. With this comprehensive yet engaging book, Anil Ananthaswamy puts it all into context, from the origin of the idea and its governing equations to its potential to transform medicine, quantum physics—and virtually every aspect of our life. An essential read for understanding both the possibilities and limitations of artificial intelligence.

—SABINE HOSSENFELDER, physicist and New York Times bestselling author of Existential Physics: A Scientist’s Guide to Life’s Biggest Questions

"Why Machines Learn is a masterful work that explains—in clear, accessible, and entertaining fashion—the mathematics underlying modern machine learning, along with the colorful history of the field and its pioneering researchers. As AI has increasingly profound impacts in our world, this book will be an invaluable companion for anyone who wants a deep understanding of what’s under the hood of these often inscrutable machines."

—MELANIE MITCHELL, author of Artificial Intelligence and professor at the Santa Fe Institute

Generative AI, with its foundations in machine learning, is as fundamental an advance as the creation of the microprocessor, the internet, and the mobile phone. But almost no one, outside of a handful of specialists, understands how it works. Anil Ananthaswamy has removed the mystery by giving us a gentle, intuitive, and human-oriented introduction to the math that underpins this revolutionary development.

—PETER E. HART, AI pioneer, entrepreneur, and co-author of Pattern Classification

"Anil Ananthaswamy’s Why Machines Learn embarks on an exhilarating journey through the origins of contemporary machine learning. With a captivating narrative, the book delves into the lives of influential figures driving the AI revolution while simultaneously exploring the intricate mathematical formalism that underpins it. As Anil traces the roots and unravels the mysteries of modern AI, he gently introduces the underlying mathematics, rendering the complex subject matter accessible and exciting for readers of all backgrounds."

—BJÖRN OMMER, professor at the Ludwig Maximilian University of Munich and leader of the original team behind Stable Diffusion

Also by Anil Ananthaswamy

Through Two Doors at Once

The Man Who Wasn’t There

The Edge of Physics

Data Communications Using Object-Oriented Design and C++

Book Title, Why Machines Learn: The Elegant Math Behind Modern AI, Author, Anil Ananthaswamy, Imprint, DuttonPublisher logo

An imprint of Penguin Random House LLC

penguinrandomhouse.com

Publisher logo

Copyright © 2024 by Anil Ananthaswamy

Penguin Random House values and supports copyright. Copyright fuels creativity, encourages diverse voices, promotes free speech, and creates a vibrant culture. Thank you for buying an authorized edition of this book and for complying with copyright laws by not reproducing, scanning, or distributing any part of it in any form without permission. You are supporting writers and allowing Penguin Random House to continue to publish books for every reader. Please note that no part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems.

DUTTON and the D colophon are registered trademarks of Penguin Random House LLC.

Portions of chapter 12 and the epilogue appeared in Quanta Magazine. The illustration in chapter 6 on PCA done on EEG data adapted with permission from John Abel. The illustrations in chapter 12 on the bias-variance and double descent curves adapted with permission from Mikhail Belkin. Illustrations about properties of penguins in chapter 4 created courtesy of data made freely available by Kristen Gorman, Allison Horst, and Alison Hill. The illustrations of biological neuron (this page), paddy fields (this page), and the map of Manhattan (this page) by Roshan Shakeel.

Library of Congress Cataloging-in-Publication Data

Names: Ananthaswamy, Anil, author.

Title: Why machines learn : the elegant math behind modern AI / Anil Ananthaswamy.

Description: New York : Dutton, [2024] | Includes bibliographical references and index.

Identifiers: LCCN 2024000738 | ISBN 9780593185742 (hardcover) | ISBN 9780593185759 (ebook)

Subjects: LCSH: Machine learning. | Deep learning (Machine learning) | Artificial intelligence. | Mathematics.

Classification: LCC Q325.5 .A56 2024 | DDC 006.3/1—dc23/eng/20240326

LC record available at https://lccn.loc.gov/2024000738

Ebook ISBN 9780593185759

Cover design by Dominique Jones

Illustration by Jason Booher after M.C. Escher

Book design by Ashley Tucker, adapted for ebook by Molly Jeszke

While the author has made every effort to provide accurate telephone numbers, internet addresses, and other contact information at the time of publication, neither the publisher nor the author assumes any responsibility for errors or for changes that occur after publication. Further, the publisher does not have any control over and does not assume any responsibility for author or third-party websites or their content.

pid_prh_7.0_152996289_c0_r4

CONTENTS

Dedication

Author’s Note

Prologue

CHAPTER 1

Desperately Seeking Patterns

CHAPTER 2

We Are All Just Numbers Here…

CHAPTER 3

The Bottom of the Bowl

CHAPTER 4

In All Probability

CHAPTER 5

Birds of a Feather

CHAPTER 6

There’s Magic in Them Matrices

CHAPTER 7

The Great Kernel Rope Trick

CHAPTER 8

With a Little Help from Physics

CHAPTER 9

The Man Who Set Back Deep Learning (Not Really)

CHAPTER 10

The Algorithm that Put Paid to a Persistent Myth

CHAPTER 11

The Eyes of a Machine

CHAPTER 12

Terra Incognita

Epilogue

Afterword

Acknowledgments

Notes

Index

About the Author

_152996289_

to teachers everywhere, sung and unsung

Whatever we do, we have to make our life vectors. Lines with force and direction.

—LIAM NEESON AS FBI AGENT MARK FELT IN THE 2017 MOVIE OF THE SAME NAME

The author acknowledges with gratitude the support of the Alfred P. Sloan Foundation in the research and writing of this book.

Prologue

Buried on this page of the July 8, 1958, issue of The New York Times was a rather extraordinary story. The headline read, New Navy Device Learns by Doing: Psychologist Shows Embryo of Computer Designed to Read and Grow Wiser. The opening paragraph raised the stakes: The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.

With hindsight, the hyperbole is obvious and embarrassing. But The New York Times wasn’t entirely at fault. Some of the over-the-top talk also came from Frank Rosenblatt, a Cornell University psychologist and project engineer. Rosenblatt, with funding from the U.S. Office of Naval Research, had invented the perceptron, a version of which was presented at a press conference the day before the New York Times story about it appeared in print. According to Rosenblatt, the perceptron would be the first device to think as the human brain and such machines might even be sent to other planets as mechanical space explorers.

None of this happened. The perceptron never lived up to the hype. Nonetheless, Rosenblatt’s work was seminal. Almost every lecturer on artificial intelligence (AI) today will harken back to the perceptron. And that’s justified. This moment in history—the arrival of large language models (LLMs) such as ChatGPT and its ilk and our response to it—which some have likened to what it must have felt like in the 1910s and ’20s, when physicists were confronted with the craziness of quantum mechanics, has its roots in research initiated by Rosenblatt. There’s a line in the New York Times story that only hints at the revolution the perceptron set in motion: "Dr. Rosenblatt said he could explain why the machine learned only in highly technical terms (italics mine). The story, however, had none of the highly technical" details.

This book does. It tackles the technical details. It explains the elegant mathematics and algorithms that have, for decades, energized and excited researchers in machine learning, a type of AI that involves building machines that can learn to discern patterns in data without being explicitly programmed to do so. Trained machines can then detect similar patterns in new, previously unseen data, making possible applications that range from recognizing pictures of cats and dogs to creating, potentially, autonomous cars and other technology. Machines can learn because of the extraordinary confluence of math and computer science, with more than a dash of physics and neuroscience added to the mix.

Machine learning (ML) is a vast field populated by algorithms that leverage relatively simple math that goes back centuries, math one learns in high school or early in college. There’s, of course, elementary algebra. Another extremely important cornerstone of machine learning is calculus, co-invented by no less a polymath than Isaac Newton. The field also relies heavily on the work of Thomas Bayes, the eighteenth-century English statistician and minister who gave us the eponymous Bayes’s theorem, a key contribution to the field of probability and statistics. The work of German mathematician Carl Friedrich Gauss on the Gaussian distribution (and the bell-shaped curve) also permeates machine learning. Then there’s linear algebra, which forms the backbone of machine learning. The earliest exposition of this branch of mathematics appears in a two-thousand-year-old Chinese text, Nine Chapters on the Mathematical Art. The modern version of linear algebra has its roots in the work of many mathematicians, but mainly Gauss, Gottfried Wilhelm Leibniz, Wilhelm Jordan, Gabriel Cramer, Hermann Günther Grassmann, James Joseph Sylvester, and Arthur Cayley.

By the mid-1850s, some of the basic math that would prove necessary to building learning machines was in place, even as other mathematicians continued developing more relevant mathematics and birthed and advanced the field of computer science. Yet, few could have dreamed that such early mathematical work would be the basis for the astounding developments in AI over the past half century, particularly over the last decade, some of which may legitimately allow us to envision a semblance of the kind of future Rosenblatt was overoptimistically foreshadowing in the 1950s.

This book tells the story of this journey, from Rosenblatt’s perceptron to modern-day deep neural networks, elaborate networks of computational units called artificial neurons, through the lens of key mathematical ideas underpinning the field of machine learning. It eases gently into the math and then, ever so slowly, ratchets up the difficulty, as we go from the relatively simple ideas of the 1950s to the somewhat more involved math and algorithms that power today’s machine learning systems.

Hence, we will unabashedly embrace equations and concepts from at least four major fields of mathematics—linear algebra, calculus, probability and statistics, and optimization theory—to acquire the minimum theoretical and conceptual knowledge necessary to appreciate the awesome power we are bestowing on machines. It is only when we understand the inevitability of learning machines that we will be prepared to tackle a future in which AI is ubiquitous, for good and for bad.

Getting under the mathematical skin of machine learning is crucial to our understanding of not just the power of the technology, but also its limitations. Machine learning systems are already making life-altering decisions for us: approving credit card applications and mortgage loans, determining whether a tumor is cancerous, predicting the prognosis for someone in cognitive decline (will they go on to get Alzheimer’s?), and deciding whether to grant someone bail. Machine learning has permeated science, too: It is influencing chemistry, biology, physics, and everything in between. It’s being used in the study of genomes, extrasolar planets, the intricacies of quantum systems, and much more. And as of this writing, the world of AI is abuzz with the advent of large language models such as ChatGPT. The ball has only just gotten rolling.

We cannot leave decisions about how AI will be built and deployed solely to its practitioners. If we are to effectively regulate this extremely useful, but disruptive and potentially threatening, technology, another layer of society—educators, politicians, policymakers, science communicators, or even interested consumers of AI—must come to grips with the basics of the mathematics of machine learning.

In her book Is Math Real?, mathematician Eugenia Cheng writes about the gradual process of learning mathematics: It can…seem like we’re taking very small steps and not getting anywhere, before suddenly we look behind us and discover we’ve climbed a giant mountain. All these things can be disconcerting, but accepting a little intellectual discomfort (or sometimes a lot of it) is an important part of making progress in math.

Fortunately, the intellectual discomfort in store for us is eminently endurable and more than assuaged by the intellectual payoff, because underlying modern ML is some relatively simple and elegant math—a notion that’s best illustrated with an anecdote about Ilya Sutskever. Today, Sutskever is best known as the co-founder of OpenAI, the company behind ChatGPT. More than a decade ago, as a young undergraduate student looking for an academic advisor at the University of Toronto, Sutskever knocked on Geoffrey Hinton’s door. Hinton was already a well-known name in the field of deep learning, a form of machine learning, and Sutskever wanted to work with him. Hinton gave Sutskever some papers to read, which he devoured. He remembers being perplexed by the simplicity of the math, compared to the math and physics of his regular undergrad coursework. He could read these papers on deep learning and understand powerful concepts. How can it be that it’s so simple…so simple that you can explain it to high school students without too much effort? he told me. I think that’s actually miraculous. This is also, to me, an indication that we are probably on the right track. [It can’t] be a coincidence that such simple concepts go so far.

Of course, Sutskever already had sophisticated mathematical chops, so what seemed simple to him may not be so for most of us, including me. But let’s see.

This book aims to communicate the conceptual simplicity underlying ML and deep learning. This is not to say that everything we are witnessing in AI now—in particular, the behavior of deep neural networks and large language models—is amenable to being analyzed using simple math. In fact, the denouement of this book leads us to a place that some might find disconcerting, though others will find it exhilarating: These networks and AIs seem to flout some of the fundamental ideas that have, for decades, underpinned machine learning. It’s as if empirical evidence has broken the theoretical camel’s back in the same way experimental observations of the material world in the early twentieth century broke classical physics; we need something new to make sense of the brave new world awaiting us.

As I did the research for this book, I observed a pattern to my learning that reminded me of the way modern artificial neural networks learn: With each pass the algorithm makes through data, it learns more about the patterns that exist in that data. One pass may not be enough; nor ten; nor a hundred. Sometimes, neural networks learn over tens of thousands of iterations through the data. This is indeed the way I grokked the subject in order to write about it. Each pass through some corner of this vast base of knowledge caused some neurons in my brain to make connections, literally and metaphorically. Things that didn’t make sense the first or second time around eventually did upon later passes.

I have used this technique to help readers make similar connections: I found myself repeating ideas and concepts over the course of writing this book, sometimes using the same phrasing or, at times, a different take on the same concept. These repetitions and rephrasings are intentional: They are one way that most of us who are not mathematicians or practitioners of ML can come to grips with a paradoxically simple yet complex subject. Once an idea is exposed, our brains might see patterns and make connections when encountering that idea elsewhere, making more sense of it than would have been possible at first blush.

I hope your neurons enjoy this process as much as mine did.

CHAPTER 1

Desperately Seeking Patterns

When he was a child, the Austrian scientist Konrad Lorenz, enamored by tales from a book called The Wonderful Adventures of Nils—the story of a boy’s adventures with wild geese written by the Swedish novelist and winner of the Nobel Prize for Literature, Selma Lagerlöf—yearned to become a wild goose. Unable to indulge his fantasy, the young Lorenz settled for taking care of a day-old duckling his neighbor gave him. To the boy’s delight, the duckling began following him around: It had imprinted on him. Imprinting refers to the ability of many animals, including baby ducks and geese (goslings), to form bonds with the first moving thing they see upon hatching. Lorenz would go on to become an ethologist and would pioneer studies in the field of animal behavior, particularly imprinting. (He got ducklings to imprint on him; they followed him around as he walked, ran, swam, and even paddled away in a canoe.) He won the Nobel Prize for Physiology or Medicine in 1973, jointly with fellow ethologists Karl von Frisch and Nikolaas Tinbergen. The three were celebrated "for their discoveries concerning organization and elicitation of individual and social behavior patterns."

Patterns. While the ethologists were discerning them in the behavior of animals, the animals were detecting patterns of their own. Newly hatched ducklings must have the ability to make out or tell apart the properties of things they see moving around them. It turns out that ducklings can imprint not just on the first living creature they see moving, but on inanimate things as well. Mallard ducklings, for example, can imprint on a pair of moving objects that are similar in shape or color. Specifically, they imprint on the relational concept embodied by the objects. So, if upon birth the ducklings see two moving red objects, they will later follow two objects of the same color (even if those latter objects are blue, not red), but not two objects of different colors. In this case, the ducklings imprint on the idea of similarity. They also show the ability to discern dissimilarity. If the first moving objects the ducklings see are, for example, a cube and a rectangular prism, they will recognize that the objects have different shapes and will later follow two objects that are different in shape (a pyramid and a cone, for example), but they will ignore two objects that have the same shape.

Ponder this for a moment. Newborn ducklings, with the briefest of exposure to sensory stimuli, detect patterns in what they see, form abstract notions of similarity/dissimilarity, and then will recognize those abstractions in stimuli they see later and act upon them. Artificial intelligence researchers would offer an arm and a leg to know just how the ducklings pull this off.

While today’s AI is far from being able to perform such tasks with the ease and efficiency of ducklings, it does have something in common with the ducklings, and that’s the ability to pick out and learn about patterns in data. When Frank Rosenblatt invented the perceptron in the late 1950s, one reason it made such a splash was because it was the first formidable brain-inspired algorithm that could learn about patterns in data simply by examining the data. Most important, given certain assumptions about the data, researchers proved that Rosenblatt’s perceptron will always find the pattern hidden in the data in a finite amount of time; or, put differently, the perceptron will converge upon a solution without fail. Such certainties in computing are like gold dust. No wonder the perceptron learning algorithm created such a fuss.

But what do these terms mean? What are patterns in data? What does learning about these patterns imply? Let’s start by examining this table:

Each row in the table is a triplet of values for variables x1, x2, and y. There’s a simple pattern hidden in this data: In each row, the value of y is related to the corresponding values of x1 and x2. See if you can spot it before reading further.

In this case, with a pencil, paper, and a little effort one can figure out that y equals x1 plus two times x2.

y = x1 + 2x2

A small point about notation: We are going to dispense with the multiplication sign (×) between two variables or between a constant and a variable. For example, we’ll write

2 × x2 as 2x2 and x1 × x2 as x1x2

Ideally, we should write 2x2 as 2x2 and x1x2 as x1 x2, with the variables subscripted. But we’ll dispense with the subscripts, too, unless it becomes absolutely necessary to use them. (Purists will cringe, but this method helps keep our text less cluttered and easy on the eye; when we do encounter subscripts, read xi as x sub-i.) So, keep this in mind: If there’s a symbol such as "x followed by a digit such as 2," giving us x2, take the entire symbol to mean one thing. If a symbol (say, x or x2) is preceded by a number (say, 9), or by another symbol (say, w1), then the number and the symbol, or the two symbols, are being multiplied. So:

2x2 = 2 × x2

x1x2 = x1 × x2

w2x1 = w2 × x1

Getting back to our equation y = x1 + 2x2, more generally, we can write this as:

y = w1x1 + w2x2, where w1 = 1 and w2 = 2

To be clear, we have found one of the many possible relationships between y and x1 and x2. There can be others. And indeed, for this example, there are, but we don’t need to worry about them for our purposes here. Finding patterns is nowhere near as simple as this example is suggesting, but it gets us going.

We identified what’s called a linear relationship between y, on the one hand, and x1 and x2, on the other. (Linear means that y depends only on x1 and x2, and not on x1 or x2 raised to some power, or on any product of x1 and x2.) Also, I’m using the words equation and relationship interchangeably here.

The relationship between y, x1, and x2 is defined by the constants w1 and w2. These constants are called the coefficients, or weights, of the linear equation connecting y to x1 and x2. In this simple case, assuming such a linear relationship exists, we figured out the values for w1 and w2 after inspecting the data. But often, the relationship between y and (x1, x2,…) is not so straightforward, especially when it extends to more values on the right side of the equation.

For example, consider:

y = w1x1 + w2x2 + w3x3 + ··· + w9x9

Or, more generally, for a set of n weights, and using formal mathematical notation:

The expression on the right, using the sigma notation, is shorthand for summing all wixi, where i takes on values from 1 to n.

In the case of 9 inputs, you’d be hard-pressed to extract the values of w1 to w9 just by visually inspecting the data and doing some mental arithmetic. That’s where learning comes in. If there’s a way to algorithmically figure out the weights, then the algorithm is learning the weights. But what’s the point of doing that?

Well, once you have learned the weights—say, w1 and w2 in our simple, toy example—then given some value of x1 and x2 that wasn’t in our initial dataset, we can calculate the value of y. Say, x1 = 5 and x2 = 2. Plug these values into the equation y = x1 + 2x2 and you get a value of y = 9.

What’s all this got to do with real life? Take a very simple, practical, and some would say utterly boring problem. Let’s say x1 represents the number of bedrooms in a house, and x2 represents the total square footage, and y represents the price of the house. Let’s assume that there exists a linear relationship between (x1, x2) and y. Then, by learning the weights of the linear equation from some existing data about houses and their prices, we have essentially built a very simple model with which to predict the price of a house, given the number of bedrooms and the square footage.

The above example—a teeny, tiny baby step, really—is the beginning of machine learning. What we just did is a simplistic form of something called supervised learning. We were given samples of data that had hidden in them some correlation between a set of inputs and a set of outputs. Such data are said to be annotated, or labeled; they are also called the training data. Each input (x1, x2,…, xn) has a label y attached to it. So, in our earlier numerical table, the pair of numbers (4, 2) is labeled with y = 8, the pair (1, 2) with 5, and so on. We figured out the correlation. Once it is learned, we can use it to make predictions about new inputs that weren’t part of the training data.

Also, we did a very particular kind of problem solving called regression, where given some independent variables (x1, x2), we built a model (or equation) to predict the value of a dependent variable (y). There are many other types of models we could have built, and we’ll come to them in due course.

In this case, the correlation, or pattern, was so simple that we needed only a small amount of labeled data. But modern ML requires orders of magnitude more—and the availability of such data has been one of the factors fueling the AI revolution. (The ducklings, for their part, likely indulge in a more sophisticated form of learning. No parent duck sits around labeling the data for its ducklings, and yet the babies learn. How do they do it? Spoiler alert: We don’t know, but maybe by understanding why machines learn, we can one day fully understand how ducklings and, indeed, humans learn.)

It may seem implausible, but this first step we took using a laughably simple example of supervised learning sets us on a path toward understanding modern deep neural networks—one step at a time, of course (with small, gentle, and occasionally maybe not so gentle dollops of vectors, matrices, linear algebra, calculus, probability and statistics, and optimization theory served, as needed, along the way).

Rosenblatt’s perceptron, which we briefly encountered in the prologue, was for its time an astonishing example of one such learning algorithm. And because it was modeled on how neuroscientists thought human neurons worked, it came imbued with mystique and the promise that, one day, perceptrons would indeed make good on the promise of AI.

THE FIRST ARTIFICIAL NEURON

The perceptron’s roots lie in a 1943 paper by an unlikely combination of a philosophically minded neuroscientist in his mid-forties and a homeless teenager. Warren McCulloch was an American neurophysiologist trained in philosophy, psychology, and medicine. During the 1930s, he worked on neuroanatomy, creating maps of the connectivity of parts of monkey brains. While doing so, he also obsessed over the logic of the brain. By then, the work of mathematicians and philosophers like Alan Turing, Alfred North Whitehead, and Bertrand Russell was suggesting a deep connection between computation and logic. The statement If P is true AND Q is true, then S is true is an example of a logical proposition. The assertion was that all computation could be reduced to such logic. Given this way of thinking about computation, the question bothering McCulloch was this: If the brain is a computational device, as many think it is, how does it implement such logic?

With these questions in mind, McCulloch moved in 1941 from Yale University to the University of Illinois, where he met a prodigiously talented teenager named Walter Pitts. The youngster, already an accomplished logician (a protégé of the eminent mathematical logician Rudolf Carnap), was attending seminars run by Ukrainian mathematical physicist Nicolas Rashevsky in Chicago. Pitts, however, was a mixed-up adolescent, essentially a runaway from a family that could not appreciate his genius. McCulloch and his wife, Rook, gave Walter a home. There followed endless evenings sitting around the McCulloch kitchen table trying to sort out how the brain worked, with the McCullochs’ daughter Taffy sketching little pictures, wrote computer scientist Michael Arbib. Taffy’s drawings would later illustrate McCulloch and Pitts’s 1943 paper, A Logical Calculus of the Ideas Immanent in Nervous Activity.

In that work, McCulloch and Pitts proposed a simple model of a biological neuron. First, here’s an illustration of a generic biological neuron:

generic biological neuron showing dendrites, cell body, axon and axon terminals

The neuron’s cell body receives inputs via its treelike projections, called dendrites. The cell body performs some computation on these inputs. Then, based on the results of that computation, it may send an electrical signal spiking along another, longer projection, called the axon. That signal travels along the axon and reaches its branching terminals, where it’s communicated to the dendrites of neighboring neurons. And so it goes. Neurons interconnected in this manner form a biological neural network.

McCulloch and Pitts turned this into a simple computational model, an artificial neuron. They showed how by using one such artificial neuron, or neurode (for neuron + node), one could implement certain basic Boolean logical operations such as AND, OR, NOT, and so on, which are the building blocks of digital computation. (For some Boolean operations, such as exclusive-OR, or XOR, you need more than one neurode, but more on this later.) What follows is an image of a single neurode. (Ignore the "g and f" inside the neuron for now; we’ll come to those in a moment.)

In this simple version of the McCulloch-Pitts model, x1 and x2 can be either 0 or 1. In formal notation, we can say:

x1, x2 ∈ {0,1}

That should be read as x1 is an element of the set {0, 1} and x2 is an element of the set {0, 1}; x1 and x2 can take on only values 0 or 1 and nothing else. The neurode’s output y is calculated by first summing the inputs and then checking to see if that sum is greater than or equal to some threshold, theta (θ). If so, y equals 1; if not, y equals 0.

sum = x1 + x2

If sum θ: y = 1

Else: y = 0

Generalizing this to an arbitrary sequence of inputs, x1, x2, x3,…, xn, one can write down the formal mathematical description of the simple neurode. First, we define the function g(x)read that as g of x, where x here is the set of inputs (x1, x2, x3,…, xn)—which sums up the inputs. Then we define the function f(g(x))again, read that as f of g of x—which takes the summation and performs the thresholding to generate the output, y: It is zero if g(x) is less than some θ and 1 if g(x) is greater than or equal to θ.

With one artificial neuron as described, we can design some of the basic Boolean logic gates (AND & OR, for example). In an AND logic gate, the output y should be 1 if both x1 and x2 are equal to 1; otherwise, the output should be 0. In this case, θ = 2 does the trick. Now, the output y will be 1 only when x1 and x2 are both 1 (only then will x1 + x2 be greater than or equal to 2). You can play with the value of θ to design the other logic gates. For example, in an OR gate, the output should be 1 if either x1 or x2 is 1; otherwise, the output should be 0. What should θ be?

The simple McCulloch-Pitts (MCP) model can be extended. You can increase the number of inputs. You can let inputs be inhibitory, meaning x1 or x2 can be multiplied by -1. If one of the inputs to the neurode is inhibitory and you set the threshold appropriately, then the neurode will always output a 0, regardless of the value of all the other inputs. This allows you to build more complex logic. As does interconnecting multiple neurodes such that the output of one neurode serves as the input to another.

All this was amazing, and yet limited. The McCulloch-Pitts neuron is a unit of computation, and you can use combinations of it to create any type of Boolean logic. Given that all digital computation at its most basic is a sequence of such logical operations, you can essentially mix and match MCP neurons to carry out any computation. This was an extraordinary statement to make in 1943. The mathematical roots of McCulloch and Pitts’s paper were apparent. The paper had only three references—Carnap’s The Logical Syntax of Language; David Hilbert and Wilhelm Ackermann’s Foundations of Theoretical Logic; and Whitehead and Russell’s Principia Mathematica—and none of them had to do with biology. There was no doubting the rigorous results derived in the McCulloch-Pitts paper. And yet, the upshot was simply a machine that could compute, not learn. In particular, the value of θ had to be hand-engineered; the neuron couldn’t examine the data and figure out θ.

It’s no wonder Rosenblatt’s perceptron made such a splash. It could learn its weights from data. The weights encoded some knowledge, however minimal, about patterns in the data and remembered them, in a manner of speaking.

LEARNING FROM MISTAKES

Rosenblatt’s scholarship often left his students floored. George Nagy, who came to Cornell University in Ithaca, New York, in 1960 to do his Ph.D. with Rosenblatt, recalled a walk the two of them took, during which they talked about stereo vision. Rosenblatt blew Nagy away with his mastery of the topic. It was difficult not to feel naïve talking to him in general, said Nagy, now professor emeritus at Rensselaer Polytechnic Institute in Troy, New York; Rosenblatt’s evident erudition was accentuated by his relative youth. (He was barely ten years older than Nagy.)

Rosenblatt’s youthfulness almost got the two of them into trouble during a road trip. He and Nagy had to go from Ithaca to Chicago for a conference. Rosenblatt hadn’t yet written the paper he wanted to present, so he asked Nagy to drive while he worked. Nagy had never owned a car and barely knew how to drive, but he agreed nonetheless. Unfortunately, I drove in several lanes at once, and a policeman stopped us, Nagy said. Rosenblatt told the cop that he was a professor and had asked his student to drive. The cop laughed and said, ‘You are not a professor, you are a student.’ Fortunately, Rosenblatt had enough papers on him to convince the cop of his credentials, and the cop let the two go. Rosenblatt drove the rest of the way to Chicago, where he stayed up all night typing his paper, which he presented the next day. He was able to do these things, Nagy told me.

By the time Nagy arrived at Cornell, Rosenblatt had already built the Mark I Perceptron; we saw in the prologue that Rosenblatt had done so in 1958, leading to the coverage in The New York Times. Nagy began working on the next machine, called Tobermory (named after the talking cat created by H. H. Munro, aka Saki), a hardware neural network designed for speech recognition. Meanwhile, the Mark I Perceptron and Rosenblatt’s ideas had already garnered plenty of attention.

In the summer of 1958, the editor of the Cornell Aeronautical Laboratory’s Research Trends magazine had devoted an entire issue to Rosenblatt (because of the unusual significance of Dr. Rosenblatt’s article, according to the editor). The article was titled The Design of an Intelligent Automaton: Introducing the Perceptron—A Machine that Senses, Recognizes, Remembers, and Responds Like the Human Mind. Rosenblatt would eventually rue choosing the term perceptron to describe his work. It became one of Rosenblatt’s great regrets that he used a word that sounds like a machine, Nagy told me. By perceptron, Rosenblatt really meant a class of models of the nervous system for perception and cognition.

His emphasis on the brain wasn’t a surprise. Rosenblatt had studied with James Gibson, one of the giants in the field of visual perception. He also looked up to McCulloch and Pitts and to Donald Hebb, a Canadian psychologist who in 1949 introduced a model for how biological neurons learn—to be clear, learning here refers to learning about patterns in data and not to the kind of learning we usually associate with high-level human cognition. He’d always talk highly of them, Nagy said.

While McCulloch and Pitts had developed models of the neuron, networks of these artificial neurons could not learn. In the context of biological neurons, Hebb had proposed a mechanism for learning that is often succinctly, but somewhat erroneously, put as Neurons that fire together wire together. More precisely, according to this way of thinking, our brains learn because connections between neurons strengthen when one neuron’s output is consistently involved in the firing of another, and they weaken when this is not so. The process is called Hebbian learning. It was Rosenblatt who took the work of these pioneers and synthesized it into a new idea: artificial neurons that reconfigure as they learn, embodying information in the strengths of their connections.

As a psychologist, Rosenblatt

Enjoying the preview?
Page 1 of 1