Advanced Data Analytics: Simon Scheidegger - University of Lausanne, Department of Economics

ADVANCED DATA ANALYTICS
Lecture 9
Simon Scheidegger – simon.scheidegger@unil.ch
University of Lausanne, Department of Economics

ROAD-MAP
 This lecture:  Throughout lectures – hands-on:

Deep Learning cont’d

Basics on Tensorflow & Keras

More advanced topics: 
Examples related to the day’s
topics in Tensorflow

Recurrent neural networks
and beyond.
KERAS & TENSORFLOW BASICS
 tensorflow.org
 Keras API:
https://www.tensorflow.org/guide/keras/sequential_model
 Fun data sets to play with: https://www.kaggle.com/datasets
 Some “clean” data to play with: https://archive.ics.uci.edu/ml/index.php
 Help for debugging – Tensorboard: https://www.tensorflow.org/tensorboard

A GENTLE FIRST EXAMPLE
 Lets look at the notebook: demo/03_Gentle_DNN.ipynb.
 This Notebook contains all the basic functionality from a theoretical point of
view.
 2 simple examples, one regression, and one classification.

ACTION REQUIRED
Look at the test functions below*. Pick three of those test functions (from Genz 1987).

Approximate a 2-dimensional function stated below with Neural Nets based 10, 50, 100, 500 points randomly sampled
from [0 , 1]2 . Compute the average and maximum error.

The errors should be computed by generating 1,000 uniformly distributed random test points from
within the computational domain.

Plot the maximum and average error as a function of the number of sample points.

Repeat the same for 5-dimensional and 10-dimensional functions. Is there anything particular
you observe?
*Choose the parameters w and c in meaningful ways.

ACTION REQUIRED (II)
 Play with the architecture.

Number of hidden layers.

activation functions.

choice of the stochastic gradient descent algorithm.

Monitor the performance with respect to the architecture.
A SEMI-COMPREHENSIVE TF TOUR
 demo/04_TF_tour.ipynb
 5 examples (incl. Kaggle data set from Lending Club)
 Tensorboard

On Nuvolos: In-cell it won't work
in JupyterLab.

Once you've run all the cells, go to the launcher, click TensorBoard, and
you should be good to go.

Right after this, a new tensorboard tab should show up that contains the
expected.
ACTION REQUIRED
 Focus on the example with the Kaggle data set.
 Play with the architecture.


Number of hidden layers.

activation functions.

choice of the stochastic gradient descent algorithm.

Monitor the performance with respect to the architecture.
 Try to use Tensorboard.

SOME PERSONAL TAKE-AWAY
 Swish activation is the “best” if you need smooth and deep.
 Multiple of 2 for network (training speed).
 Smaller learning rate with deeper networks.
 Batch normalization for speed.
 Glorot initialization.
 Custom layers for custom models

(https://www.tensorflow.org/tutorials/customization/custom_layers).
A BREAK MOUNTAINS
BEYOND VANILLA DNN
 Not all applications are plain vanilla deep neural nets.
 There exist situations where more intricate architectures are needed.
 Examples:

Time-series comparisons, such as estimating how closely related two documents or two stock
tickers are.

Sequence-to-sequence learning, such as decoding an English sentence into French.

Sentiment analysis, such as classifying the sentiment of tweets or movie reviews as positive or
negative.

Time-series forecasting, such as predicting the future weather at a certain location, given recent
weather data.
EXAMPLE
Given a picture of a ball, can we predict where it will go?

EXAMPLE
Given a picture of a ball, can we predict where it will go?

A SEQUENCE MODELING PROBLEM:
PREDICT THE NEXT WORD
“Today, we hare having a class on deep ”

PREDICT THE NEXT WORD
“Today, we hare having a class on deep learning”
given these words predict the next word

AUDIO
THE PERCEPTRON REVISITED
Activation Functions
e.g. sigmoid function
Inputs Weights Sum Non-Linearity Output
→ Bias term allows you to shift your activation function to the left or the right
FEED-FORWARD NETS REVISITED
X X X X
RECURRENT NEURAL NETS
 To model sequences, we need to

Handle variable-length sequences.

Track long-term dependencies.

Maintain information about the order.

Share parameters across the sequence.
 Recurrent Neural Networks (RNN) are an approach to sequence modeling

problems (Rumelhart et al. (1986)).
 More specifically, given an observation sequence x = {x1, x2, ..., xT} and its
corresponding label y = {y1, y2,..., yT}, we want to learn a map f : x → y.
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
RNN
 RNNs, are a family of neural networks for processing sequential data.
 A RNN is a neural network that is specialized for
processing a sequence of values x(1), . . . , x(τ).
 Unfold the computational graph of a dynamical system:
Fig. from Goodfellow et al. (2016)

PREVIEW ON RNN
 Recurrent layers use their own output as input.

In Figure: A is recurrent cell
 Introduce history or time dependency in NNs.
 The only way to efficiently train them is to unroll them.
Cell Function old Input vector

state (parameterized) state at time t
FEED-FORWARD NETS REVISITED
HANDLING INDIVIDUAL TIME STEPS
output
vector
input
vector
NEURONS WITH RECURRENCE
output
vector
input
vector
output input past memory

NEURONS WITH RECURRENCE
output
vector
input
vector
output input past memory

RECURRENT NEURAL NETWORKS
Apply a recurrence relation at every time step to
output process a sequence:
vector
cell state function input old state

with
weights W
input Note: the same function and set of parameters are
vector used at every time step.
RNNs have a cell state that is updated at each time step as a sequence is proceeded.
RNN INTUITION
output
vector
input
vector
RNN INTUITION
output
vector
input
vector
RNN INTUITION
output
vector
input
vector
RNN STATE UPDATE AND OUTPUT
Output vector
output
vector
Update hidden state
RNN
input
vector Input vector
RNN – IN ONE SLIDE
 RNN models a dynamic system, where the hidden (cell) state ht is not only dependent on the current
observation xt, but also relies on the previous hidden state ht-1.
 More specifically, we can represent ht as ht = f (ht-1, xt ) (Eq. 1)

where f is a nonlinear (time-invariant) mapping.
 Thus, ht contains information about the

whole sequence, which can be inferred from
the recursive definition in Eq. 1.
 In other words, RNN can use

the hidden variables as a memory
to capture long term information from
a sequence.
 Prediction at the time step t: zt
Fig. from G. Chen (2016)
RNN: COMPUTATION GRAPH ACROSS TIME
output
vector
RNN → represent as computational graph unrolled across time.
input
vector
BACK-PROPAGATION THROUGH TIME
Re-use same weight matrices at every time step!
RNN FROM SCRATCH & TENSORFLOW
output
vector
RNN
recurrent cell
input
vector
RNN INTUITION
Many-to-one One-to-many many-to-many
e.g., sentiment classification e.g., text generation e.g., translation & forecasting
one-to-one
ordinary DNN, e.g.,
classification
SEQUENCE MODELING – DESIGN CRITERIA
output Recall: to model sequences, we need to:

vector 
Handle variable-length sequences.

Track long-term dependencies.
RNN

Maintain information about order.
input 
Share parameters across the sequence.
vector
→ Recurrent Neural Networks (RNNs) meet these
sequence modeling design criteria.
HANDLE VARIABLE SEQUENCE LENGTHS
output The food was great.

vector
vs.
We visited a Pizzeria for lunch.

RNN
vs.
input We were hungry because we went for sport before eating.

vector
BACK-PROPAGATION THROUGH TIME
output
vector
RNN
input
vector Computing the gradient wrt. h0 involves many factors
of Whh + repeated gradient computation!
BACKPROPAGATION THROUGH TIME
output
vector
RNN
input
vector Many values > 1: Many values < 1:
Exploding gradients Vanishing gradients
RECALL: RNN HARD TO TRAIN
 Recurrent blocks suffer from two problems:

Long-term dependencies do not work well.

Difficult to connect two distant parts of the input.

Magnitude of the signal can get amplified at each recurrent
connection.

At every time iteration, the gradient can either vanish or
explode.

Very hard to train them.
“I grew up in England… and I speak fluent ___ “
LONG SHORT-TERM MEMORY (LSTM)
http://www.bioinf.jku.at/publications/older/2604.pdf
 Hochreiter & Schmidhuber (1997)

 LSTM layers are improved versions of the recurrent layers.

They rely on a gated cell to track information throughout many time steps.

They can learn long-term dependencies.

They can forget.
 They have an internal state and a

structure which is composed of
four actual layers.

Layers labeled with σ are
gates which can block or let
information flow.
LONG SHORT-TERM MEMORY (LSTM)
 The core of LSTM is a memory unit (or cell) ct
which encodes the information of the inputs
that have been observed up to that step.
 The memory cell ct has the same inputs
(ht−1 and xt) and outputs ht as a normal
recurrent network, but has more gating units
which control the information flow.
 The input gate and output gate respectively
control the information input to the memory
unit and the information output from the unit.
 More specifically, the output ht of the LSTM cell can be shut off via the output gate.
Fig. from G. Chen (2016)

LSTM FORGET GATE
 LSTMs follow two paths


They update their internal state.

They give an output based on
the internal state and the input.
 A gate layer σ decides if we should

forget an old part of the internal
state.

Something which has to be
replaced by new information.
1. Forget
2. Store
3. Update
4. Output
LSTM NEW STATE
1. Forget
2. Store
3. Update
4. Output
 Once that the layer decided what to
forget, it computes
 What has to replace it, it , based
on the input and the old state.

What has to be used to replace it,
the candidate value Ct.
 The new state Ct can be computed
based on the new information.
1. Forget
2. Store
3. Update
4. Output
LSTM OUTPUT
1. Forget
2. Store
3. Update
4. Output
 Based on the new state and the input,
the layer can produce a result.

this is the output.

the same value is also passed to the next iteration.
 Why is this so important?


Many translation algorithms and
voice interpreters are based on
small variations of this layer.
 Action required: demo/05_RNN_intro.ipynb

(see also https://www.tensorflow.org/guide/keras/rnn)
ACTION REQUIRED
 There is a weather data set from the Max Planck Institute of Biochemistry
https://www.bgc-jena.mpg.de/wetter/.
 Open the notebook demo/05b_Weather_data.ipynb.
 Given this time series (Temperature as a function of time), try to make

predictions of various time intervals into the future.
CONVOLUTIONAL NEURAL NETS
 Possibly the most successful types of networks.
 Uses sequences of convolutional layers.
 Can be interleaved with pooling operations or
fully-connected layers.
 Train faster than MLPs.
 Can be used for 2D, 3D or higher dimensions
(though 2D are the most common).
 Used for image recognition, object detection,
sound analysis, etc.
 There exist more intricate architectures.
 Yann LeCun (1998)
http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf
https://cs231n.github.io/convolutional-networks/
CONVOLUTIONAL NEURAL NETS
GENERATIVE ADVERSARIAL NETS
 GANs were introduced by Goodfellow et al. (2014)
 https://arxiv.org/abs/1406.2661.
 The idea is to train a network to generate samples which are
indistinguishable from real ones (from the training set).

The input is a random noise sample (latent space).
 Another network is trained at the same time to distinguish
between real and fake samples.
See https://arxiv.org/abs/1701.00160 for a turorial. Right fig from Goodfellow (2018).

QUESTIONS?

Advanced Data Analytics: Simon Scheidegger - University of Lausanne, Department of Economics

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Advanced Data Analytics: Simon Scheidegger - University of Lausanne, Department of Economics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Data Analytics: Simon Scheidegger - University of Lausanne, Department of Economics

Uploaded by

Copyright:

Available Formats

ADVANCED DATA ANALYTICS

Simon Scheidegger – simon.scheidegger@unil.ch

University of Lausanne, Department of Economics

 Fun data sets to play with: https://www.kaggle.com/datasets

 Some “clean” data to play with: https://archive.ics.uci.edu/ml/index.php

 Help for debugging – Tensorboard: https://www.tensorflow.org/tensorboard

 2 simple examples, one regression, and one classification.

*Choose the parameters w and c in meaningful ways.

 5 examples (incl. Kaggle data set from Lending Club)

 Play with the architecture.

 Try to use Tensorboard.

 Multiple of 2 for network (training speed).

 Smaller learning rate with deeper networks.

 Batch normalization for speed.

 Custom layers for custom models

 There exist situations where more intricate architectures are needed.

Given a picture of a ball, can we predict where it will go?

Given a picture of a ball, can we predict where it will go?

“Today, we hare having a class on deep ”

“Today, we hare having a class on deep learning”

given these words predict the next word

Inputs Weights Sum Non-Linearity Output

 Recurrent Neural Networks (RNN) are an approach to sequence modeling

Fig. from Goodfellow et al. (2016)

 Introduce history or time dependency in NNs.

 The only way to efficiently train them is to unroll them.

Cell Function old Input vector

output input past memory

output input past memory

cell state function input old state

 More specifically, we can represent ht as ht = f (ht-1, xt ) (Eq. 1)

 Thus, ht contains information about the

 In other words, RNN can use

RNN → represent as computational graph unrolled across time.

Re-use same weight matrices at every time step!

output Recall: to model sequences, we need to:

output The food was great.

We visited a Pizzeria for lunch.

input We were hungry because we went for sport before eating.

“I grew up in England… and I speak fluent ___ “

 Hochreiter & Schmidhuber (1997)

 They have an internal state and a

Fig. from G. Chen (2016)

 LSTMs follow two paths

 A gate layer σ decides if we should

 Why is this so important?

 Action required: demo/05_RNN_intro.ipynb

 Open the notebook demo/05b_Weather_data.ipynb.

 Given this time series (Temperature as a function of time), try to make

See https://arxiv.org/abs/1701.00160 for a turorial. Right fig from Goodfellow (2018).

You might also like