DL 2021 Tensorflow and Deep Learning
DL 2021 Tensorflow and Deep Learning
without a PhD
deep deep
Science ! Code ...
#Tensorflow @martin_gorner
Hello World: handwritten digits classification - MNIST
?
MNIST = Mixed National Institute of Standards and Technology - Download the dataset at http://yann.lecun.com/exdb/mnist/
Very simple model: softmax classification
784 pixels
28x28
pixels
... weighted sum of all
pixels + bias
softmax ...
0 1 2 9
neuron outputs
@martin_gorner
In matrix notation, 100 images at a time
10 columns
w0,0 w0,1w0,2 w0,3 … w0,9
w1,0 w1,1w1,2 w1,3 … w1,9 broadcast
w2,0 w2,1w2,2 w2,3 … w2,9
w3,0 w3,1w3,2 w3,3 … w3,9
784 lines
w4,0 w4,1w4,2 w4,3 … w4,9
x w5,0 w5,1w5,2 w5,3 … w5,9
x w6,0 w6,1w6,2 w6,3 … w6,9
x
X : 100 images, x w7,0 w7,1w7,2 w7,3 … w7,9
x w8,0 w8,1w8,2 w8,3 … w8,9
one per line, x
…
x
flattened x w783,0 w783,1 w783,2 … w783,9
L
L0,0
0,0 L0,1 L0,2 L0,3 ……L0,9 + b0 b1 b2 b3 … b9
L1,0 L1,1 L1,2 L1,3 … L1,9
L2,0 L2,1 L2,2 L2,3 … L2,9
L3,0 L3,1 L3,2 L3,3 … L3,9 + Same 10 biases
L4,0 L4,1 L4,2 L4,3 … L4,9
… on all lines
L99,0 L99,1 L99,2 … L99,9
784 pixels
Softmax, on a batch of images
broadcast
applied line matrix multiply
on all lines
by line
tensor shapes in [ ]
@martin_gorner
Now in TensorFlow (Python)
Y = tf.nn.softmax(tf.matmul(X, W) +
b)
broadcast
matrix multiply
on all lines
@martin_gorner
Success ?
0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 1 0 0 0
Cross entropy:
this is a “6”
computed probabilities
.01 .03 .00 .04 .03 .05 0.8 .02 .01 .01
0 1 2 3 4 5 6 7 8 9
@martin_gorner
Demo
92%
TensorFlow - initialisation
init = tf.initialize_all_variables()
@martin_gorner
TensorFlow - success metrics
flattening images
# model
Y = tf.nn.softmax(tf.matmul(tf.reshape(X, [-1, 784]), W) + b)
# placeholder for correct answers
Y_ = tf.placeholder(tf.float32, [None, 10])
“one-hot” encoded
# loss function
cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))
“one-hot” decoding
# % of correct answers found in batch
is_correct = tf.equal(tf.argmax(Y,1), tf.argmax(Y_,1))
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))
@martin_gorner
TensorFlow - training
learning rate
optimizer = tf.train.GradientDescentOptimizer(0.003)
train_step = optimizer.minimize(cross_entropy)
loss function
@martin_gorner
TensorFlow - run !
sess = tf.Session() running a Tensorflow
sess.run(init) computation, feeding
placeholders
for i in range(1000):
# load batch of images and correct answers
batch_X, batch_Y = mnist.train.next_batch(100)
train_data={X: batch_X, Y_: batch_Y}
# train
sess.run(train_step, feed_dict=train_data)
# success ?
Tip: a,c = sess.run([accuracy, cross_entropy], feed_dict=train_data)
do this
every 100 # success on test data ?
iterations test_data={X: mnist.test.images, Y_: mnist.test.labels}
a,c = sess.run([accuracy, cross_entropy], feed=test_data)
TensorFlow - full python code
training step
initialisation
import tensorflow as tf optimizer = tf.train.GradientDescentOptimizer(0.003)
train_step = optimizer.minimize(cross_entropy)
@martin_gorner
Cookbook
Softmax
Cross-entropy
Mini-batch
Let’s try 5 fully-connected layers !
overkill
784
sigmoid
200
100
sigmoid function
60
30
10 softmax
0 1 2 ... 9
@martin_gorner
TensorFlow - initialisation
K = 200
L = 100 weights initialised
M = 60 with random values
N = 30
W1 = tf.Variable(tf.truncated_normal([28*28, K] ,stddev=0.1))
B1 = tf.Variable(tf.zeros([K]))
@martin_gorner
TensorFlow - the model
weights and biases
@martin_gorner
Demo - slow start ?
@martin_gorner
Relu !
RELU
RELU = Rectified Linear Unit
Y = tf.nn.relu(tf.matmul(X, W) + b)
@martin_gorner
RELU
@martin_gorner
Demo - noisy accuracy curve ?
yuck!
Slow down . . .
Learning
rate decay
Demo
98%
Learning rate decay
@martin_gorner
Overfitting
Cross-entropy loss
Overfitting ?
@martin_gorner
Dropout
pkeep =
tf.placeholder(tf.float32) TRAINING EVALUATION
rate=0.5 rate=0
Yf = tf.nn.relu(tf.matmul(X, W) + B)
Y = tf.nn.dropout(Yf, pkeep)
@martin_gorner
All the party tricks
97.9%
98.2%
peak
sustained
Sigmoid,decaying
RELU, learning
learningrate
learning
rate= =0.003
0.003
rate 0.003 -> 0.0001 and dropout 0.75
@martin_gorner
Overfitting
Cross-entropy loss
Overfitting
@martin_gorner
Overfitting Too many
neurons
?!?
BAD
Network
Not
enough
DATA
Convolutional layer
convolutional
subsampling
+padding convolutional
subsampling
convolutional
subsampling
stride
W1[4, 4, 3]
W2[4, 4, 3] W[4, 4, 3, 2]
filter input output
size channels channels
@martin_gorner
Hacker’s tip
ALL
Convolu-
tional
Convolutional neural network
+ biases on
all layers
28x28x1
convolutional layer, 4 channels
W1[5, 5, 1, 4] stride 1
28x28x4
convolutional layer, 8 channels
14x14x8 W2[4, 4, 4, 8] stride 2
W1 = tf.Variable(tf.truncated_normal([5, 5, 1, K] ,stddev=0.1))
B1 = tf.Variable(tf.ones([K])/10)
W2 = tf.Variable(tf.truncated_normal([5, 5, K, L] ,stddev=0.1))
B2 = tf.Variable(tf.ones([L])/10)
W3 = tf.Variable(tf.truncated_normal([4, 4, L, M] ,stddev=0.1))
B3 = tf.Variable(tf.ones([M])/10)
weights initialised
N=200 with random values
W4 = tf.Variable(tf.truncated_normal([7*7*M, N] ,stddev=0.1))
B4 = tf.Variable(tf.ones([N])/10)
W5 = tf.Variable(tf.truncated_normal([N, 10] ,stddev=0.1))
B5 = tf.Variable(tf.zeros([10])/10)
Tensorflow - the model
@martin_gorner
Demo
98.9%
@martin_gorner
WTFH ???
???
@martin_gorner
Bigger convolutional network + dropout
+ biases on
all layers
28x28x1
convolutional layer, 6 channels
28x28x6 W1[6, 6, 1, 6] stride 1
@martin_gorner
YEAH !
with dropout
@martin_gorner
Learning
Relu !
rate decay
Dropout
Softmax
Cross-entropy ALL
Mini-batch Convolu- Overfitting Too many
tional ?!? neurons
BAD
Network
Not
enough
DATA
Cartoon images copyright: alexpokusay / 123RF stock photos
Have fun !
Martin Görner Cloud ML Engine
Google Developer relations your TensorFlow models
@martin_gorner trained in Google’s cloud.
Cloud Auto ML Vision ALPHA
Just bring your data
Cloud TPU
ML supercomputing
Pre-trained models:
Cloud Vision API
Cloud Speech API
Videos, slides, code:
Natural Language API
github.com/ Google Translate API
GoogleCloudPlatform/ Video Intelligence API That’s all
tensorflow-without-a-phd folks...
Cloud Jobs API BETA
@martin_gorner
Tensorflow and
deep learning
without a PhD
1
neurons
Workshop
Keyboard shortcuts for the
visualisation GUI:
@martin_gorner
Workshop
Self-paced code lab (summary below ↓): goo.gl/mVZloU
Code: github.com/martin-gorner/tensorflow-mnist-tutorial
1-5. Theory (install then sit back and listen or read) 11. Theory (sit back and listen or read)
7. Practice (full instructions for this step) 13. Challenge (full instructions for this step)
Start from the file mnist_1.0_softmax.py and add one Try a bigger neural network (good hyperparameters on
or two hidden layers. slide 43) and add dropout on the last layer to get >99%
Solution in: mnist_2.0_five_layers_sigmoid.py Solution in: mnist_3.0_convolutional_bigger_dropout.py
deep deep
Science ! Code ...
#Tensorflow @martin_gorner
The superpower: batch normalisation
Data “whitening”
Subtract average
Modified data: centered around zero, rescaled... Divide by std dev
(A+B)/2
A-B
Modified data: … and decorrelated (that was almost a Principal Component Analysis)
0.05 0.12
x + -1.45 0.12
0.61 -1.23
new new
A B
= A B W ? B ?
A network layer
can do this !
OK
200
OK ?
100
OK ???
60
30 OK ???
softmax OK ???
10
0 1 2 ... 9
My distribution
of inputs
sigmoid
boo-hoo
one of each
per neuron
x=
weighted
sum + bias
Batch-norm α, β
activation
fn
=> BN is differentiable relatively to weights, biases, α and β
It can be used as a layer in the network, gradient calculations will still work
distribution of sigmoid
neuron output
Batch norm
My distribution
of inputs
RELU
biases :
no longer useful Per
relu sigmoid
neuron:
without
bias bias
x=
BN
when activation fn is RELU
weighted
sum + b With α is not useful
β α, β It does not modify output distrib.
Batch-norm α, β BN
activation
fn
Stats on what ?
● Last batch: no
● all images: yes (but not practical)
● => Exponential moving average during training
# this
Y = layers.relu(X, 200)
# instead of this
W = tf.Variable(tf.zeros([784, 200]))
b = tf.Variable(tf.zeros([200]))
Y = tf.nn.relu(tf.matmul(X,W) + b)
Sample: goo.gl/y1SSFy
O’REILLY TensorfFlow World @martin_gorner
Model function
from tensorflow.contrib import learn, layers, metrics
“features” and “targets“
def model_fn(X, Y_, mode):
Yn = … # model layers TRAIN, EVAL
prob = tf.nn.softmax(Yn) or INFER
digi = tf.argmax(prob, 1)
estimator = learn.Estimator(model_fn=model_fn)
estimator.fit(input_fn=… , steps=10000)
estimator.evaluate(input_fn=…, steps=1)
# => {'accuracy': … }
estimator.predict(input_fn=…)
# => {"probabilities":…, "digits":…}
Sample: goo.gl/y1SSFy
O’REILLY TensorfFlow World @martin_gorner
Convolutional network
def conv_model(X, Y_, mode):
XX = tf.reshape(X, [-1, 28, 28, 1])
Y1 = layers.conv2d(XX, num_outputs=6, kernel_size=[6, 6])
Y2 = layers.conv2d(Y1, num_outputs=12, kernel_size=[5, 5], stride=2)
Y3 = layers.conv2d(Y2, num_outputs=24, kernel_size=[4, 4], stride=2)
Y4 = layers.flatten(Y3)
Y5 = layers.relu(Y4, 200)
Ylogits = layers.linear(Y5, 10)
prob = tf.nn.softmax(Ylogits)
estimator = learn.Estimator(model_fn=conv_model)
Sample: goo.gl/y1SSFy
O’REILLY TensorfFlow World @martin_gorner
Recurrent Neural Networks
>TensorFlow, Keras and \
recurrent neural networks
without a PhD_
deep deep
Science ! Code ...
bit.ly/keras-rnn-codelab
#Tensorflow @martin_gorner
Neural network 101 (reminder)
20x20x3
1200
200
20
sigmoid
inputs tanh
activation bias
weights
On last layer:
1
softmax
(classification)
relu weighted
nothing
-1 1 (regression)
sum+b
norm
O’REILLY TensorfFlow World @martin_gorner
RNN
N: internal size
Xt X: inputs
H
RNN cell
Ht = tanh(X.WH + bH)
Yt Yt = softmax(Ht.W + b)
X0 X1 X2 X3 X4 X5
H-1 H0 H1 H2 H3 H4 H5
cell cell cell cell cell cell
Y0 Y1 Y2 Y3 Y4 Y5
X0 X1 X2 X3 X4 X5
0 H0 H1 H2 H3 H4 H5
cell cell cell cell cell cell
Y0 Y1 Y2 Y3 Y4 Y5
Michel C. was born in Paris, France. He is married and has three children. He received a M.S.
in neurosciences from the University Pierre & Marie Curie and the Ecole Normale Supérieure in 1987,
and and then spent most of his career in Switzerland, at the Ecole Polytechnique de Lausanne. He
specialized in child and adolescent psychiatry and his first field of research was severe mood disorders
in adolescent, topic of his PhD in neurosciences (2002). His mother tongue is ? ? ? ? ?
Short context
Miche
l
C. was born in … English,
German,
Hn-1 … Hn Russian,
French …
Frenc
h Long context Problems…
Xt Xt Xt
Ht-1 Ht Ht-1 σ σ 1-
× × Ht Ht-1 σ σ tanh σ Ht
tanh
× tanh × ×
+
Ct-1 × +
tanh
Ct
Ht Ht Ht
Yt Yt Yt
Yt output : Yt = softmax(Ht.W + b) m
X’ = Xt | r * Ht-1 p+n
Xt
X” = tanh(X’.Wc + bc) n
Ht-1 GRU Ht
Ht
Ht = (1-z) * Ht-1 + z * X” n
Yt Yt = softmax(Ht.W + b) m
S t _ J o h
0 H5
character-
based
t _ J o h n
0
GRU H0
GRU H1
GRU H2
GRU H3
GRU H5
GRU H6
GRU H7
GRU H8
0 H0 H H H H H H H
GRU H’00
GRU H’10
GRU H’20
GRU H’30
GRU H’50
GRU H’60
GRU H’70
GRU H’8
0
GRUH’0 GRUH’0
H” GRUH’0
H” GRUH’0
H” H’0
H”
GRU H’0
H”
GRU H’0
H”
GRU H’0
H”
GRU H”8
0 1 2 3 5 6 7
H ALPHASIZE = 98
CELLSIZE = 512
H”0 H”1 H”2 H”3 H”5 H”6 H”7 H”8 NLAYERS = 3
SEQLEN = 30
S t _ A n d r e [ BATCHSIZE, SEQLEN ]
X0 X1 X2 X3 X4 X6 X7 X8
[ BATCHSIZE, SEQLEN, ALPHASIZE ]
0
H0 H1 H2 H3 H5 H6 H7 H8
0 H0 H’H00 H’H10 H’H20 H’H30 H’H50 H’H60 H’H70 H’8 H: [ BATCHSIZE,
0 H’0 H’0 H’0 H’0 H’0 H’0 H’0 H’0 CELLSIZE x NLAYERS ]
H” 0 H” 1 H” 2 H” 3 H” 5 H” 6 H” 7 H”8
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7
t _ A n d r e w ALPHASIZE
CELLSIZE
= 98
= 512
NLAYERS = 3
SEQLEN = 30
r e tiathhnaeeano
O’REILLY TensorfFlow World trrr hhohooon rrt eernre e rnoh
@martin_gorner
Shakespeare
0.1 II WERENI
epochs Are I I wos the wheer boaer.
Tin thim mh cals sate bauut site tar oue tinl an
bsisonetoal yer an fimireeren.
LUCETTA
O’REILLY TensorfFlow World I am a sign of me, and sorrow sounds
@martin_gorner
Shakespeare
30 And sorrow far into the stars of men,
epochs Without a second tears to seek the best and
bed,
With a strange service, and the foul prince of
Rome
detrtstinsenoaolsesnesoairt(
arssserleeeerltrdlesssoeeslslrlslie(e
drnnaleeretteaelreesioe niennoarens
dssnstssaorns sreeoeslrteasntotnnai(ar
dsopelntederlalesdanserl
lts(sitae(e) A1
expeddions = np.randim(natched_collection,
ranger, mang_ops, samplering)
def assestErrorume_gens(assignex) as
and(sampled_veases):
eved. A2
# Check that we have both scalar tensor for being invalid to a vector of 1 indicating
# the total loss of the same shape as the shape of the tensor.
sharded_weights = [[0.0, 1.0]]
# Create the string op to apply gradient terms that also batch.
B10
# The original any operation as a code when we should alw infer to the session case.
The Unreasonable
Effectiveness of
Recurrent Neural
Networks
# initial values X
x = np.array([[0]]) # [BATCHSIZE, SEQLEN] with BATCHSIZE=1 and SEQLEN=1
h = np.zeros([1, INTERNALSIZE * NLAYERS], dtype=np.float32)
Ht-1 Ht
for i in range(100000):
H’0
dic = {'X:0': x, 'Hin:0': h, 'batchsize:0':1}
y,h = sess.run(['Y:0', 'H:0'], feed_dict=dic)
Y
c = my_txtutils.sample_from_probabilities(y, topn=5)
x = np.array([[c]]) # shape [BATCHSIZE, SEQLEN] with BATCHSIZE=1 and SEQLEN=1
print(chr(my_txtutils.convert_to_ascii(c)), end="")
S t _ J o h
0 H5
character-
based
t _ J o h n
geopolitics
0 Hn
geopolitics
Hr, H =
tf.nn.dynamic_rnn(mcell, X, initial_state=Hin, sequence_lenght=slen)
slow
ch roug mang
Le a la souris ∅
at e é
fast tf.nn.sampled_softmax_loss(…)
Tensorflow sample: goo.gl/KyKLDv
ma
A on abeachflying a kite ∅
n
For ex. output
of convolutional network or auto-encoder
Google’s neural net for image captioning: goo.gl/VgZUQZ
O’REILLY TensorfFlow World @martin_gorner
Image captioning
I ♡ noise
model
replicas
data
def experiment_fn(job_dir):
return learn.Experiment(
estimator=learn.Estimator(model_fn, model_dir=job_dir,
config=learn.RunConfig(save_checkpoints_secs=None,
save_checkpoints_steps=1000)),
train_input_fn=…, # data feed
trainingInput: eval_input_fn=…, # data feed
scaleTier: STANDARD_1
train_steps=10000,
eval_steps=1,
Free stuff !!! export_strategies=make_export_strategy(export_input_fn=
Tensorboard graphs serving_input_fn))
Pre-trained models:
Cloud Vision API
Cloud Speech API
Videos, slides, code:
Natural Language API
1
neurons
O’REILLY TensorfFlow World @martin_gorner
>TensorFlow, deep learning and \
modern convolutional neural nets
without a PhD_
O’Reilly AI @martin_gorner
Fully-connected layers
20x20x3 batch size
1200
X = tf.reshape(images, [-1, 20*20*3])
Y1 = tf.layers.dense(X, 200,
200 activation=tf.nn.relu)
Y2 = tf.layers.dense(Y1, 20,
20 activation=tf.nn.relu)
Ylogits = tf.layers.dense(Y2, 2)
2
plane: [1,0] correct
not plane: [0,1] answer
loss = tf.losses.softmax_cross_entropy(tf.one_hot(is_plane,2), Ylogits)
train_op = tf.train.AdamOptimizer(0.001).minimize(loss)
learning rate
O’Reilly AI @martin_gorner
Activation functions
softmax
inputs Classification
head
activation bias
weights
relu 1
-1 1 norm
O’Reilly AI @martin_gorner
Cookbook
Relu, softmax
Cross-entropy
Convolutional layer
+padding weights
4
4
3
W[4, 4, 3, 4]
filter input nb of
size channels filters
=
output
channels
O’Reilly AI @martin_gorner
Convolutional networks
10
one x one
convolution ? # can also use pooling to reduce x,y size
tf.layers.max_pooling2d(pool_size=2, strides=2)
O’Reilly AI @martin_gorner
Tensorflow - the model
input image batch Filter stride
X[100, 20, 20, 3] size
20x20 x 3
O’Reilly AI @martin_gorner
Dropout, batch norm, learning rate decay
# layers…
tf.train. tf.layers.dropout
exponential_decay # layers…
# layers…
tf.layers.
batch_normalization
# acivation
# layers…
O’Reilly AI @martin_gorner
Cloud Machine Learning Engine
TensorBoard
AI
Platform
O’Reilly AI @martin_gorner
Hyperparameter tuning
useless parameter
export_latest =
tf.estimator.LatestExporter(serving_input_receiver_fn=serving_input_fn)
eval_spec =
tf.estimator.EvalSpec(input_fn=eval_input_fn, exporters=export_latest)
O’Reilly AI @martin_gorner
Dataset API
def train_input_fn(dataset):
filenames = gcsfile.get_matching_files(directory + "/*")
dataset = return tf.contrib.data.Dataset.from_tensor_slices((filenames,))
def load(filename):
bytes = tf.read_file(filename)
# ... decode images and corresponding labels from file ...
return tf.contrib.data.Dataset.from_tensor_slices((images, labels))
dataset = dataset.flat_map(load)
dataset = dataset.shuffle(2000)
dataset = dataset.batch(100)
dataset = dataset.repeat() # indefinitely
return dataset
O’Reilly AI @martin_gorner
Estimator serving input function
def serving_input_fn(): # expected input is: list of rgb 20x20 images
input = {'images': tf.placeholder(tf.uint8, shape=[None, 20,20,3] )}
feature_dic = {'images': input['images']}
return tf.estimator.export.ServingInputReceiver(feature_dic, input)
pass-through
def jpeg_to_bytes(jpeg):
pixels = tf.image.decode_jpeg(jpeg, channels=3)
pixels = tf.image.crop_and_resize(tf.expand_dims(pixels,0), boxes256x256, box_indices, [20, 20])
return tf.cast(pixels, dtype=tf.uint8)
O’Reilly AI @martin_gorner
ConvNet architectures and detection papers
Aerial imagery: U.S. Geological Survey
Inception
3x3
conv
O’Reilly AI @martin_gorner
1x1 convolution ?
10
W [1, 1, 10, 5]
5 cheap
O’Reilly AI @martin_gorner
Last layer: dense layer vs. global avg. pool
7
7 global average pooling
flatten
245
…
W [245, 5]
softmax
softmax
1225
cheaper 0 Yay
weights weights cheapskate!
O’Reilly AI @martin_gorner
Squeezenet
“fire”
squeeze 1x1
conv
“fire”
“fire”
maxpool
“fire”
expand 1x1
conv
3x3
conv
“fire”
maxpool
“fire” like
“fire”
“fire module”
arXiv: 1602.07360, Forrest Iandola & al 2016
O’Reilly AI @martin_gorner
Darknet vs. squeezenet
256x256 x 3
Darknet-like 256x256 x 64 256x256 x 3 Squeezenet-like
256x256 x 50 256x256 x 32
3x3 x 64 3x3 x 32
1x1 x 50 3x3 x 16 1x1 x 16
128x128 x 30
maxpool 128x128 x 52 128x128 x 80 maxpool
3x3 x 52 128x128 x 54 1x1 x 30
1x1 x 54 3x3 x 40 1x1 x 40
maxpool maxpool
64x64 x 56 64x64 x 50
3x3 x 56 1x1 x 50
64x64 x 58 64x64 x 128
1x1 x 58 3x3 x 64 1x1 x 64
maxpool maxpool
3x3 x 60 32x32 x 60
32x32 x 50 1x1 x 50
1x1 x 62 32x32 x 62
32x32 x 80 3x3 x 40 1x1 x 40
maxpool maxpool
16x16 x 62
3x3 x 64 1x1 x 30
16x16 x 64 16x16 x 30
1x1 x 65 16x16 x 35 3x3 x 17 1x1 x 18
YOLO head 16x16 x 10
YOLO head
16x16 x 10
12 layers 136K weights 12 layers, 60K weights
O’Reilly AI @martin_gorner
YOLO
4x4 grid, 2 boxes per cell
loss =
256
x
position [-1,1]
y
w size [0,1]
Aerial imagery: U.S. Geological Survey
C confidence [0,1]
256
arXiv: 1602.07360, Redmon & al 2015
O’Reilly AI @martin_gorner
YOLO last layer
1
tanh
-2 -1 1 2
NxN grid
-1 avg tanh x
avg tanh y
avg sigmoid w
Split
in 4 avg sigmoid C
(or 8, 12, …)
sigmoid 1
-4 -2 0 2 4
O’Reilly AI @martin_gorner
Making progress
Intersect. Over Union, eval dataset (higher=better) 17 layers best
+loss weights useful in composite loss
+random hue+rot. data augmentation !
16x16x2 with swarm-optimized box assign.
8x8x1 more boxes
+shuffle shuffle your data!
12 layers 4x4x1 not enough boxes
YOLO grid 4x4x1 YOLO grid 8x8x1 YOLO grid 16x16x1 YOLO grid 16x16x2
Yay
Aerial imagery: U.S. Geological Survey cherrypicker !
O’Reilly AI @martin_gorner
Now with Cloud TPUs
(Tensor Processing Units)
O’Reilly AI @martin_gorner
Training hardware options on ML Engine
Training time Cost
# config.yaml
trainingInput:
scaleTier: CUSTOM
masterType: standard_p100
# Tensorflow code
tf.Estimator
O’Reilly AI @martin_gorner
Training hardware options on ML Engine
Training time Cost
# config.yaml
trainingInput:
scaleTier: CUSTOM
masterType: standard_v100
# Tensorflow code
tf.Estimator
O’Reilly AI @martin_gorner
Training hardware options on ML Engine
Training time Cost
Cluster
1h35 $16
4 GPUs - P100
# config.yaml
trainingInput:
scaleTier: CUSTOM
masterType: standard_p100
parameterServerType: standard
workerType: standard_p100
# Tensorflow code
paramServerCount: 1
tf.Estimator
workerCount: 4
O’Reilly AI @martin_gorner
Training hardware options on ML Engine
Training time Cost
Cluster
1h35 $16
4 GPUs - P100
Cluster
1h15 $18
4 GPUs - v100
# config.yaml
trainingInput:
scaleTier: CUSTOM
masterType: standard_v100
parameterServerType: standard
workerType: standard_v100
# Tensorflow code
paramServerCount: 1
tf.Estimator
workerCount: 4
O’Reilly AI @martin_gorner
Training hardware options on ML Engine
Training time Cost
Cluster
1h35 $16
4 GPUs - P100
Cluster
1h15 $18
4 GPUs - v100
VM GPUx4 P100
MirroredStrategyALPH 2h15 $21
A
# config.yaml
trainingInput:
scaleTier: CUSTOM
masterType: complex_model_m_p100 # Tensorflow code
tf.Estimator +
MirroredStrategy
O’Reilly AI @martin_gorner
Training hardware options on ML Engine
Training time Cost
Cloud TPUs v2
GPU - P100 5h50 $15 (Tensor Processing Units)
now available
GPU - V100 4h30 $18
Cluster
1h35 $16
4 GPUs - P100
Cluster
1h15 $18
4 GPUs - v100
VM GPUx4 P100
MirroredStrategyALPH 2h15 $21
A
TPUv2 1h00 $7
4 chips - 8 cores
# config.yaml
# Tensorflow code
trainingInput:
tf.TPUEstimator
scaleTier: BASIC_TPU
O’Reilly AI @martin_gorner
Training hardware options on ML Engine
Training time Cost
Cloud TPUs V3ALPHA
GPU - P100 5h50 $15
Cluster
1h35 $16
4 GPUs - P100
Cluster
1h15 $18
4 GPUs - v100
VM GPUx4 P100
MirroredStrategyALPH 2h15 $21
A
TPUv2 1h00 $7
O’Reilly AI @martin_gorner
Training hardware options on ML Engine
Training time Cost
Cluster
1h35 $16
4 GPUs - P100
Cluster
1h15 $18
4 GPUs - v100
VM GPUx4 P100
MirroredStrategyALPH 2h15 $21
A
TPUv2 1h00 $7
TPU v3 ALPHA available soon TPUs pods: try them now (alpha)
up to 512 cores
TPU podALPHA available soon
O’Reilly AI @martin_gorner
Have fun !
Martin Görner Cloud ML Engine
Google Developer relations your TensorFlow models
@martin_gorner trained in Google’s cloud.
Cloud Auto ML Vision
Just bring your data
Cloud TPU
ML supercomputing
Pre-trained models:
Cloud Vision API
Cloud Speech API
Videos, slides, code:
Natural Language API
github.com/ Google Translate API
GoogleCloudPlatform/ Video Intelligence API That’s all
tensorflow-without-a-phd folks...
Cloud Jobs API BETA
O’Reilly AI @martin_gorner
Tensorflow and
deep learning
without a PhD
1
neurons
O’Reilly AI @martin_gorner
end
O’Reilly AI @martin_gorner
+padding
depth-
weights
4
separable
4
W1[4, 4, 3]
3
filter convolutions
Phase 1: size nb of
convolutions filters
=
weights
5 input
3 channels
3
Phase 2:
dot products W2[3, 5]
output
channels
Postcard from...
Generative Adversarial Network (GAN)
noise
fake
/ real
O’Reilly AI @martin_gorner
GANs
O’Reilly AI @martin_gorner
GANs
O’Reilly AI @martin_gorner
Nvidia
Research,
Karras &
al. 2017
150,000h
learning Tensorflow
“Tensorflow and
deep learning”
900K 1
views neurons “Without a PhD”
on YouTube
O’Reilly AI @martin_gorner
Learn to slap layers’n’shit together.
[...] that’s what everyone is
already doing in deep learning.
O’Reilly AI @martin_gorner
>TensorFlow, deep learning and \
modern RNN architectures
without a PhD_
Translate
@martin_gorner
RNN
N: internal size
Xt X: inputs
H
@martin_gorner
RNN training
X0 X1 X2 X3 X4 X5
H-1 H0 H1 H2 H3 H4 H5
cell cell cell cell cell cell
Y0 Y1 Y2 Y3 Y4 Y5
X0 X1 X2 X3 X4 X5
0 H5
cell cell cell cell cell cell
0 H’5
cell cell cell cell cell cell
Y0 Y1 Y2 Y3 Y3 Y5
@martin_gorner
RNN cell types
Xt Xt Xt
Ht-1 Ht Ht-1 σ σ 1-
× × Ht Ht-1 σ σ tanh σ Ht
tanh
× tanh × ×
+
Ct-1 × +
tanh
Ct
Ht Ht Ht
Yt Yt Yt
A l p h a b
0 H5
character-
based
l p h a b e
@martin_gorner
Toxic comment detection
Fuck off, you idiot. Thanks for your help editing this. You’re such
an asshole. But thanks anyway. I'm going to shoot you! Oh
shoot. Well alright. God damn it! First of all who the fuck died
and made you the god. Gosh darn it! Get the hell out of here
you jerk. You're not that smart are you? Fuck off, you idiot.
Thanks for your help editing this. You’re such an asshole. But
thanks anyway. I'm going to shoot you! Oh shoot. Well alright.
God damn it! First of all who the fuck died and made you the
god. Gosh darn it! Get the hell out of here you jerk. You're not
that smart are you? Fuck off, you idiot. Thanks for your help
the god.this.
editing Gosh darn such
You’re it! Getanthe hell outBut
asshole. of thanks
here you jerk. I'm
anyway.
You're
going tonot that you!
shoot smartOhare you?Well
shoot. Fuckalright.
off, you idiot.
God damnThanks
it! First of
for your help editing this.
all who the fuck died and made you You’re such an asshole. But
thanks Well
shoot. anyway. I'm God
alright. goingdamnto shoot you!ofOh
it! First all who the fuck died
and made you the god. Gosh darn it! Get the hell out of here
you you?
are jerk. You're
Fuck off,notyou
thatidiot.
smart
Thanks for your help editing this.
You’re such
an asshole. But thanks anyway. I'm going to shoot you! Oh shoot.
Well
alright. God damn it! First of all who the fuck died and made you
@martin_gorner
Modern RNN architectures
“Absorb” 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 … 0
=>
0.3 0.6 0.1 1.9 0.3 0.6 0.1 1.9 0.3 0.6 0.1 0.1 0.3 0.6 0.1 1.9 0.3 1.3 EMBED
1.2 2.1 1.9 1.5 1.2 2.1 1.9 1.5 1.2 2.1 1.9 1.3 1.2 2.1 1.9 1.5 1.2 0.5
1.6 2.2 2.1 0.3 1.6 2.2 2.1 0.3 1.6 2.2 2.1 2.1 1.6 2.2 2.1 0.3 1.6 0.6
embedding …
0.9 1.3 1.8 2.2 0.9 1.3 1.8 2.2 0.9 1.3 1.8 0.5 0.9 1.3 1.8 2.2 0.9 1.2
0.3 1.5 0.8 1.8 0.3 1.5 0.8 1.8 0.3 1.5 0.8 0.7 0.3 1.5 0.8 1.8 0.3 1.9
0.5 1.0 0.9 1.1 0.5 1.0 0.9 1.1 0.5 1.0 0.9 1.9 0.5 1.0 0.9 1.1 0.5 2.2
1.6 1.4 3.2 0.4 1.6 1.4 3.2 0.4 1.6 1.4 3.2 1.4 1.6 1.4 3.2 0.4 1.6 0.1
2.3 0.6 1.1 1.6 2.3 0.6 1.1 1.6 2.3 0.6 1.1 0.6 2.3 0.6 1.1 1.6 2.3 0.7
@martin_gorner
Word based language model EMBED
0 H5
● Trained embeddings
violet ● Pre-trained embeddings (Word2Vec, GloVe, ...)
are red are blue ● Trained embeddings from pre-trained initial values
s
@martin_gorner
Classification with an RNN ENCODE
muc
I like you very but
h
0
Toxic / non
toxic
@martin_gorner
Bidirectional RNN ENCODE
muc
I like you very but
h
0
Y1 concatenate
0
Y2 Y1|Y2
softmax
Toxic / non
toxic
@martin_gorner
Attention ATTEND
muc
I like you very but
h
Toxic / non-toxic
α = softmax(α)
@martin_gorner
Toxicity detector
EMBED
muc
I like you very but
h ENCODE
0
ATTEND
bidirectional 0
PREDICT
concatenate
H1 H2 H3 H4 H5 H6
attention
α1 α2 α3 α4 α5 α6
Toxic / Not
@martin_gorner
Bitchin’ batchin’
seq
len
China and the USA have agreed to a new round of talks 12
The quick brown fox jumps over the lazy dog . ∅ ∅ 10
Boys will be boys . ∅ ∅ ∅ ∅ ∅ ∅ ∅ 5
Tom , get your coat . We are going out . ∅ 11
Math rules the world . Men rule math . ∅ ∅ ∅ 9
0 Hn
geopolitics
Hout, H =
tf.nn.dynamic_rnn(cell, X, initial_state=Hin, sequence_length=slen)
@martin_gorner
Toxicity detector
word_vectors EMBED
= tf.nn.embedding_lookup(embeddings, features[‘words’])
www.kaggle.com/c/jigsaw-toxic-
comment-classification-challenge/
github.com/conversationai/conver
sationai-models/blob/nthain-
initial/attention-codelab
@martin_gorner
Sequence 2 sequence
Training time
Text
encoder decoder
translation
Th mous
cat ate the ∅ ◯
GO Le chat a mangé la souris
e e
0
slow
ch mang
Le a la souris ∅
at é
fast tf.nn.sampled_softmax_loss(…)
@martin_gorner
Sequence 2 sequence PREDICT
Prediction time
mang
◯
GO Le chat a la souris
é X = tf.nn.embedding_lookup(embeddings, inWord)
◯
GO
Le la a le la les
The cat H, Hout =
ate the X Hout tf.nn.dynamic_rnn(cell, X, initial_state=Hin)
mouse Hin H
Y = tf.layers.dense(H, vocab_size)
∅ Y
P
P = tf.nn.softmax(Y)
Le la a man
le la souri
les ∅ outWord = tf.nn.argmax(P) Bad idea™
Le chat a la ∅
gé s => Use tf.contrib.seq2seq.BeamSearchDecoder
@martin_gorner
seq2seq + attention
x = tf.nn.embedding_lookup(embeddings, sentences)
ENCODE
decoder_cell = tf.rnn.GRUCell(encoding_dimension)
decoder = tf.seq2seq.BeamSearchDecoder(decoder_cell, embeddings,
sos_tokens, eos_token, encoder_state, beam_width)
outputs, final_state, _ = tf.seq2seq.dynamic_decode(decoder, maximum_iterations=max_length)
@martin_gorner
Translation with attention
ATTEND
inattentive_decoder_cell = tf.rnn.GRUCell(encoding_dimension)
decoder_cell = tf.seq2seq.AttentionWrapper(inattentive_decoder_cell,
attention_mechanism)
@martin_gorner
Postcard from...
Demo
Q: U.S. Navy identifies deceased sailor as XXXX, who leaves behind a wife
A: Jason Kortz (correct!)
arXiv:1506.03340v3, Hermann & al. 2015
@martin_gorner
Have fun !
Martin Görner Cloud ML Engine
Google Developer relations your TensorFlow models
@martin_gorner trained in Google’s cloud.
Cloud Auto ML VisionALPHA
Nithum Thain Just bring your data
Jigsaw Research manager
@nithum Cloud TPU BETA
ML supercomputing
Neeraj Kashyap
Google Developer relations Pre-trained models:
nkash@google.com Cloud Vision API
Cloud Speech API
Videos, slides, code:
Natural Language API
1
neurons
>TensorFlow and \
deep reinforcement learning
without a PhD_
deep deep
Deep Reinforcement
Science !
Code ...
deep
Code...
#Tensorflow @martin_gorner
Neural network 101
20x20x3
1200
200
20
@martin_gorner
Activation functions
Classification
head
inputs
softmax
activation bias
weights
@martin_gorner
Success ?
0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 1 0 0 0
Cross entropy:
this is a “6”
computed probabilities
.01 .03 .00 .04 .03 .05 0.8 .02 .01 .01
0 1 2 3 4 5 6 7 8 9
@martin_gorner
Cookbook
Relu, softmax
Cross-entropy
Tensorflow 101
20x20x3 pixels
1200
Y1 = tf.layers.dense(X, 200, activation=tf.nn.relu)
2
plane: [1,0] correct
not plane: [0,1] answer
loss = tf.losses.softmax_cross_entropy(tf.one_hot(is_plane,2), Ylogits)
train_op = tf.train.AdamOptimizer(0.001).minimize(loss)
learning rate
@martin_gorner
Pong ?
Δ frames cross-entropy
W1 probabilities
W2
“correct move”
softmax
Ex: 1 0 0
“policy network”
Adapted from A. Karpathy’s “Pong from Pixels” post
@martin_gorner
Pong ?
sample from
probabilities
UP
STILL keep playing
DOWN
“policy network”
@martin_gorner
Policy gradients
WIN!
+1
LOSE
-1
@martin_gorner
Policy gradient refinements
WIN!
+1
LOSE
whichever was chosen
-1
discounted
① rewards
R: reward
move #i
@martin_gorner
Training data
move [UP, STILL, Policy network Discounted
# DOWN] probabilities rewards
# model
Y = tf.layers.dense(observations, 200, activation=tf.nn.relu)
Ylogits = tf.layers.dense(Y, 3)
#loss
cross_entropies = tf.losses.softmax_cross_entropy(one_hot_labels=
tf.one_hot(actions,3), logits=Ylogits)
# training operation
optimizer = tf.train.RMSPropOptimizer(learning_rate=0.001, decay=0.99)
train_op = optimizer.minimize(loss)
Playing a game
with tf.Session() as sess:
... # reset everything
while not done: # play a game in 21 points
current_pix = read_pixels(game_state) # get pixels
observation = current_pix - previous_pix
previous_pix = current_pix
# decide what move to play: UP, STILL, DOWN (through NN model)
action = sess.run(sample_op, feed_dict={observations: [observation]})
# play it (through opneAI gym pong simulator)
game_state, reward, done, info = pong_sim.step(action)
# collect results
observations.append(observation);
actions.append(action);
rewards.append(reward)
Training loop
with tf.Session() as sess:
while len(observations) < BATCH_SIZE:
... # play game in 21 points, many moves, collect ...
... # observations, actions, rewards (from previous slide) ...
@martin_gorner
Postcard from...
Neural architecture search
RNN sample
policy
Layer conv 4x4x16 relu
gradient Layer conv 2x2x32 relu
Layer conv 2x2x64 relu
Layer maxpool 2x2
Layer conv 1x1x16 relu
Layer dense 400 relu
R: reward
train
accuracy R
AI Platform TensorBoard
Auto ML
Just bring your data
Cloud TPU
ML supercomputing
@martin_gorner
Have fun !
Martin Görner AI Platform
Google Developer relations your TensorFlow models
@martin_gorner trained in Google’s cloud.
Auto ML
Yu-Han Liu Just bring your data
Google Developer relations
yuhanliu@google.com Cloud TPU
ML supercomputing
Neeraj Kashyap
Google Developer relations Pre-trained models:
nkash@google.com Cloud Vision API
Cloud Speech API
Videos, slides, code:
Natural Language API
1
neurons