[go: up one dir, main page]

0% found this document useful (0 votes)
30 views11 pages

Advanced Neural Network Techniques - Elements of AI

The document discusses advanced neural network techniques, focusing on convolutional neural networks (CNNs) and generative adversarial networks (GANs). CNNs are highlighted for their ability to efficiently detect image features while reducing the amount of training data needed, and GANs are introduced as a method for generating realistic images through the competition of two neural networks. Additionally, it touches on the rise of large language models (LLMs) and their applications, particularly in generating human-like text responses.

Uploaded by

KenKen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views11 pages

Advanced Neural Network Techniques - Elements of AI

The document discusses advanced neural network techniques, focusing on convolutional neural networks (CNNs) and generative adversarial networks (GANs). CNNs are highlighted for their ability to efficiently detect image features while reducing the amount of training data needed, and GANs are introduced as a method for generating realistic images through the competition of two neural networks. Additionally, it touches on the rise of large language models (LLMs) and their applications, particularly in generating human-like text responses.

Uploaded by

KenKen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

3/21/25, 11:13 AM Advanced neural network techniques - Elements of AI

Elements of AI

Course overview Neural networks Advanced neural network techniques

III. Advanced neural network techniques


In the previous section, we have discussed the basic ideas behind most neural network methods:
multilayer networks, non-linear activation functions, and learning rules such as the
backpropagation algorithm.

They power almost all modern neural network applications. However, there are some
interesting and powerful variations of the theme that have led to great advances in deep
learning in many areas.

Convolutional neural networks (CNNs)

One area where deep learning has achieved spectacular success is image processing. The
simple classifier that we studied in detail in the previous section is severely limited – as you
noticed it wasn’t even possible to classify all the smiley faces correctly. Adding more layers
in the network and using backpropagation to learn the weights does in principle solve the
https://course.elementsofai.com/5/3 1/11
3/21/25, 11:13 AM Advanced neural network techniques - Elements of AI

problem, but another one emerges: the number of weights becomes extremely large and
Elements of AI
consequently, the amount of training data required to achieve satisfactory accuracy can
become too large to be realistic.
Course overview Neural networks Advanced neural network techniques
Fortunately, a very elegant solution to the problem of too many weights exists: a special kind
of neural network, or rather, a special kind of layer that can be included in a deep neural
network. This special kind of layer is a so-called convolutional layer. Networks including
convolutional layers are called convolutional neural networks (CNNs). Their key property
is that they can detect image features such as bright or dark (or specific color) spots, edges in
various orientations, patterns, and so on. These form the basis for detecting more abstract
features such as a cat’s ears, a dog’s snout, a person’s eye, or the octagonal shape of a stop
sign. It would normally be hard to train a neural network to detect such features based on
the pixels of the input image, because the features can appear in different positions,
different orientations, and in different sizes in the image: moving the object or the camera
angle will change the pixel values dramatically even if the object itself looks just the same to
us. In order to learn to detect a stop sign in all these different conditions would require vast
of amounts of training data because the network would only detect the sign in conditions
where it has appeared in the training data. So, for example, a stop sign in the top right corner
of the image would be detected only if the training data included an image with the stop sign
in the top right corner. CNNs can recognize the object anywhere in the image no matter
where it has been observed in the training images.

Note

Why we need CNNs


CNNs use a clever trick to reduce the amount of training data required to detect objects in different
conditions. The trick basically amounts to using the same input weights for many neurons – so that all of
these neurons are activated by the same pattern – but with different input pixels. We can for example have a
set of neurons that are activated by a cat’s pointy ear. When the input is a photo of a cat, two neurons are
activated, one for the left ear and another for the right. We can also let the neuron’s input pixels be taken

https://course.elementsofai.com/5/3 2/11
3/21/25, 11:13 AM Advanced neural network techniques - Elements of AI

from a smaller or a larger area, so that different neurons are activated by the ear appearing in different
Elements
scales (sizes), of we
so that AI can detect a small cat’s ears even if the training data only included images of big
cats.
Course overview Neural networks Advanced neural network techniques

The convolutional neurons are typically placed in the bottom layers of the network, which
processes the raw input pixels. Basic neurons (like the perceptron neuron discussed above)
are placed in the higher layers, which process the output of the bottom layers. The bottom
layers can usually be trained using unsupervised learning, without a particular prediction
task in mind. Their weights will be tuned to detect features that appear frequently in the
input data. Thus, with photos of animals, typical features will be ears and snouts, whereas in
images of buildings, the features are architectural components such as walls, roofs,
windows, and so on. If a mix of various objects and scenes is used as the input data, then the
features learned by the bottom layers will be more or less generic. This means that pre-
trained convolutional layers can be reused in many different image processing tasks. This is
extremely important since it is easy to get virtually unlimited amounts of unlabeled training
data – images without labels – which can be used to train the bottom layers. The top layers
are always trained by supervised machine learning techniques such as backpropagation.

https://course.elementsofai.com/5/3 3/11
3/21/25, 11:13 AM Advanced neural network techniques - Elements of AI

Elements of AI

Course overview Neural networks Advanced neural network techniques

Do neural networks dream of electric sheep? Generative adversarial networks


(GANs)

Having trained a neural network on data, we can use it for predictions. Since the top layers of
the network have been trained in a supervised manner to perform a particular classification
or prediction task, the top layers are really useful only for that task. A network trained to
detect stop signs is useless for detecting handwritten digits or cats.

A fascinating result is obtained by taking the pre-trained bottom layers and studying what
the features they have learned look like. This can be achieved by generating images that
activate a certain set of neurons in the bottom layers. Looking at the generated images, we
can see what the neural network “thinks” a particular feature looks like, or what an image
with a select set of features in it would look like. Some even like to talk about the networks
“dreaming” or “hallucinating” images (see Google’s DeepDream system).

https://course.elementsofai.com/5/3 4/11
3/21/25, 11:13 AM Advanced neural network techniques - Elements of AI

Note Elements of AI

Be careful
Course overview with Neural
metaphors
networks Advanced neural network techniques

However, we’d like to once again emphasize the problem with metaphors such as dreaming when simple
optimization of the input image is meant – remember the suitcase words discussed in Chapter 1. The neural
network doesn’t really dream, and it doesn’t have a concept of a cat that it would understand in a similar
sense as a human understands. It is simply trained to recognize objects and it can generate images that are
similar to the input data that it is trained on.

To actually generate real looking cats, human faces, or other objects (you’ll get whatever you
used as the training data), Ian Goodfellow, a researcher at Google Brain at the time, proposed
a clever combination of two neural networks. The idea is to let the two networks compete
against each other. One of the networks is trained to generate images like the ones in the
training data – it is called the generative network. The other network’s task is to separate
images generated by the first network from real images from the training data – this one is
called the adversarial network. These two combined then make up a generative adversarial
network or a GAN.

The system trains the two models side by side. In the beginning of the training, the
adversarial model has an easy task to tell apart the real images from the training data and
the clumsy attempts by the generative model. However, as the generative network slowly
gets better and better, the adversarial model has to improve as well, and the cycle continues
until eventually the generated images are almost indistinguishable from real ones. The GAN
tries to not only reproduce the images in the training data: that would be a way too simple
strategy to beat the adversarial network. Rather, the system is trained so that it has to be able
to generate new, real-looking images too.

https://course.elementsofai.com/5/3 5/11
3/21/25, 11:13 AM Advanced neural network techniques - Elements of AI

Elements of AI

Course overview Neural networks Advanced neural network techniques

The above images were generated by a GAN developed by NVIDIA in a project led by Prof
Jaakko Lehtinen (see this article for more).

Could you have recognized them as fakes?

The Rise of Large Language Models (LLMs)

https://course.elementsofai.com/5/3 6/11
3/21/25, 11:13 AM Advanced neural network techniques - Elements of AI

As mentioned above, convolutional neural networks (CNNs) reduce the number of learnable
Elements of AI
weights in a neural network so that the amount of training data required to learn all of them
doesn't grow astronomically large as we keep building bigger and bigger networks. Another
Course overview Neural networks Advanced neural network techniques
architectural innovation, besides the idea of a CNN, that currently powers many state-of-
the-art deep learning models is called attention.

Attention mechanisms were originally introduced for machine translation where they can
selectively focus the attention of the model to certain words in the input text when
generating a particular word in the output. This way the model doesn't have to pay attention
to all of the input at the same time, which greatly simplifies the learning task. Attention
mechanisms were soon found to be extremely useful not only in machine translation.

In 2017, a team working at Google published the blockbuster article "Attention is All You
Need", which introduced the so-called transformer architecture for deep neural networks.
Unless you have been living on a desert island or on an otherwise strict media diet, you have
most likely already heard about transformers (the neural network models, not the toy
franchise). It's just that they may have been hiding inside an acronym: GPT (Generative
Pretrained Transformer). As the title of the article by the Google team suggests, transformers
heavily exploit attention mechanisms to get the most out of the available training data and
computational resources.

The most widely noted applications of transformers are found in large language models
(LLMs). The best known ones are OpenAI's GPT-series, including GPT-1 released in June
2018 and GPT-4 announced in March 2023, but no giant platform company wants to miss
out: Google picks model names from Sesame street and published BERT (Bidirectional
Encoder Representations from Transformers) in October 2018, while Meta joined the party a
bit later in February 2023, picking a name inspired by the animal world, LLaMA (Large
Language Model Meta AI). And it's not just the platform companies that are driving the
development: universities and other research organizations are contributing open source
models with the goal of democratizing the technology.

https://course.elementsofai.com/5/3 7/11
3/21/25, 11:13 AM Advanced neural network techniques - Elements of AI

Note Elements of AI

What's
Course in an
overview LLM?
Neural networks Advanced neural network techniques

LLMs are models that given a piece of text like "The capital of Finland is" predicts how the text is likely to
continue. In this case, "Helsinki" or "a pocket-sized metropolis" would be likely continuations. LLMs are
trained on large amounts of text such as the entire contents of the Wikipedia or the CommonCrawl dataset
that, at the time of writing this, contains a whopping 260 billion web pages.

In principle, one can view LLMs as basically nothing but extremely powerful predictive text
entry techniques. However, with some further thinking, it becomes apparent that being able
to predict the continuation of any text in a way that is indistinguishable from human
writing, is (or would be) quite a feat and encompasses many aspects of intelligence. The
above example which is based on the association between the words "the capital of Finland"
and "Helsinki" is an example where the model has learned a fact about the world. If we'd be
able to build models that associate the commonly agreed answers to a wide range of
questions, it could be argued that such a model has learned a big chunk of so-called "world
knowledge". Especially intriguing are instances where the model seems to exhibit some level
of reasoning beyond memorization and statistical co-occurrence: currently, LLMs are able to
do this in a limited sense and they can easily make trivial mistakes because they are based
on "just" statistical machine learning. Intensive research and development efforts are
directed at building deep learning models with more robust reasoning algorithms and
databases of verified facts.

Note

https://course.elementsofai.com/5/3 8/11
3/21/25, 11:13 AM Advanced neural network techniques - Elements of AI

ChatGPT: AI for the masses


Elements of AI
A massive earthquake occurred in San Francisco on November 30, 2022. It was so powerful that hardly a
person on the planet was unaffected, and yet, no seismometer detected it. This metaphorical "earthquake"
Course overview Neural networks Advanced neural network techniques
was the launch of ChatGPT by OpenAI. Word of the online chatbot service that anyone could use free of
charge quickly spread around the world and after mere five days, it had more than a million registered users
(compare this to the five years that it took the Elements of AI to reach the same number), and in two
months, the number of signups was 100 million. No other AI service, or probably any service whatsoever,
has become a household name so quickly.

The first version of ChatGPT was based on a GPT-3.5 model fine tuned by supervised and reinforcement
learning according to a large number of human-rated responses. The purpose of the finetuning process was
to steer the model away from toxic and incorrect responses that the language model had picked up from its
training data, and towards comprehensive and helpful responses.

It is not easy to say what caused the massive media frenzy and the unprecedented interest
towards ChatGPT by pretty much everyone, even those who hadn't paid much attention to
AI thus far. Probably some of it is explained by the somewhat better quality of the output,
due to the finetuning, and the easy-to-use chat interface, which enables the user to not only
get one-off answers to isolated questions, like any of the earlier LLMs, but also maintain a
coherent dialogue in a specific context. In the same vein, the chat interface allows one to
make requests like "explain this to a five year old" or "write that as a song in the style of Nick
Cave." (Mr Cave, however, wasn't impressed [BBC]). In any case, ChatGPT succeeded in
bumping the interest in AI to completely new levels.

It remains to be seen what are the real "killer apps" for ChatGPT and other LLM-based
solutions. We believe the most likely candidates are ones where the factual content comes
from the user or from another system, and the language model is used to format the output
in the form of language (either natural language or possibly formal language such as
program code). We'll return to the expected impact of ChatGPT and other LLM-based
applications in the final chapter.

https://course.elementsofai.com/5/3 9/11
3/21/25, 11:13 AM Advanced neural network techniques - Elements of AI

After completing Chapter 5 you should be able to:


Elements of AI

Course overview Neural networks Advanced neural network techniques


Explain what a neural network is and where they are being successfully used

Understand the technical methods that underpin neural networks

Please join the Elements of AI community to discuss and ask questions about this chapter.

Correct answers

You reached the end of Chapter 5! 44 %


Exercises completed

17 /25

Next Chapter

Implications
Start →

https://course.elementsofai.com/5/3 10/11
3/21/25, 11:13 AM Advanced neural network techniques - Elements of AI

Elements of AI

Course overview Neural networks Advanced neural network techniques

Introduction to AI

Building AI

About

FAQ

Privacy Policy

Terms and Conditions

My profile Sign out

https://course.elementsofai.com/5/3 11/11

You might also like