Dealing with
sparsity
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
Rob O'Callaghan
Director of Data
Sparse matrices
BUILDING RECOMMENDATION ENGINES IN PYTHON
Sparse matrices
BUILDING RECOMMENDATION ENGINES IN PYTHON
Sparse matrices
BUILDING RECOMMENDATION ENGINES IN PYTHON
Measuring sparsity
print(book_rating_df)
title The Great Gatsby The Catcher in the Rye Fifty Shades of Grey
User
User_233 3.0 NaN NaN
User_651 NaN 5.0 4.0
User_965 4.0 3.0 NaN
... ... ... ...
BUILDING RECOMMENDATION ENGINES IN PYTHON
Measuring sparsity
number_of_empty = book_ratings_df.isnull().values.sum()
total_number = user_ratings_df.size
sparsity = number_of_empty/total_number
print(sparsity)
0.0114
BUILDING RECOMMENDATION ENGINES IN PYTHON
Why sparsity matters
BUILDING RECOMMENDATION ENGINES IN PYTHON
Why sparsity matters
BUILDING RECOMMENDATION ENGINES IN PYTHON
Why sparsity matters
BUILDING RECOMMENDATION ENGINES IN PYTHON
Why sparsity matters
BUILDING RECOMMENDATION ENGINES IN PYTHON
Measuring sparsity per column
user_ratings_df.notnull().sum()
The Pelican Brief 1
Snow Crash 1
The Great Gatsby 12
Fifty Shades of Grey 9
Leviathan 1
..
BUILDING RECOMMENDATION ENGINES IN PYTHON
Matrix factorization
BUILDING RECOMMENDATION ENGINES IN PYTHON
Matrix factorization
BUILDING RECOMMENDATION ENGINES IN PYTHON
Matrix factorization
BUILDING RECOMMENDATION ENGINES IN PYTHON
Matrix multiplication
BUILDING RECOMMENDATION ENGINES IN PYTHON
Matrix multiplication
BUILDING RECOMMENDATION ENGINES IN PYTHON
Matrix multiplication
BUILDING RECOMMENDATION ENGINES IN PYTHON
Matrix multiplication
BUILDING RECOMMENDATION ENGINES IN PYTHON
Matrix multiplication
BUILDING RECOMMENDATION ENGINES IN PYTHON
Matrix multiplication
print(matrix_x)
[[4, 1],
[2, 2],
[3, 3]]
print(matrix_b)
[[1, 0, 4],
[0, 1, 6]]
BUILDING RECOMMENDATION ENGINES IN PYTHON
Matrix multiplication
import numpy as np
dot_product = np.dot(matrix_x, matrix_b)
print(dot_product)
[[ 4 1 22]
[ 2 2 20]
[ 3 3 30]]
BUILDING RECOMMENDATION ENGINES IN PYTHON
Let's practice!
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
Matrix factorization
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
Rob O'Callaghan
Director of Data
Why this helps with sparse matrices
BUILDING RECOMMENDATION ENGINES IN PYTHON
Why this helps with sparse matrices
BUILDING RECOMMENDATION ENGINES IN PYTHON
Why this helps with sparse matrices
BUILDING RECOMMENDATION ENGINES IN PYTHON
What matrix factorization looks like
BUILDING RECOMMENDATION ENGINES IN PYTHON
What matrix factorization looks like
BUILDING RECOMMENDATION ENGINES IN PYTHON
What matrix factorization looks like
BUILDING RECOMMENDATION ENGINES IN PYTHON
What matrix factorization looks like
BUILDING RECOMMENDATION ENGINES IN PYTHON
Latent features
BUILDING RECOMMENDATION ENGINES IN PYTHON
Latent features
BUILDING RECOMMENDATION ENGINES IN PYTHON
Latent features
BUILDING RECOMMENDATION ENGINES IN PYTHON
Information loss
BUILDING RECOMMENDATION ENGINES IN PYTHON
Information loss
BUILDING RECOMMENDATION ENGINES IN PYTHON
Information loss
BUILDING RECOMMENDATION ENGINES IN PYTHON
Information loss
BUILDING RECOMMENDATION ENGINES IN PYTHON
Let's practice!
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
Singular value
decomposition
(SVD)
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
Rob O'Callaghan
Director of Data
What SVD does
BUILDING RECOMMENDATION ENGINES IN PYTHON
What SVD does
BUILDING RECOMMENDATION ENGINES IN PYTHON
What SVD does
BUILDING RECOMMENDATION ENGINES IN PYTHON
What SVD does
BUILDING RECOMMENDATION ENGINES IN PYTHON
Prepping our data
print(book_ratings_df.shape)
(220, 500)
avg_ratings = book_ratings_df.mean(axis=1)
print(avg_ratings)
array([[4.5 ],
[3.5],
[2.5],
[3.5],
...
[2.2]])
BUILDING RECOMMENDATION ENGINES IN PYTHON
Prepping our data
user_ratings_pivot_centered = user_ratings_df.sub(avg_ratings, axis=0)
user_ratings_df.fillna(0, inplace=True)
print(user_ratings_df)
The Great Gatsby The Catcher in the Rye Fifty Shades of Grey
User_233 0.0 0.0 0.0
User_651 0.0 0.5 -0.5
User_965 0.5 -0.5 0.0
... ... ... ...
BUILDING RECOMMENDATION ENGINES IN PYTHON
Applying SVD
from scipy.sparse.linalg import svds
U, sigma, Vt = svds(user_ratings_pivot_centered)
print(U.shape)
(610, 6)
print(Vt.shape)
(6, 1000)
BUILDING RECOMMENDATION ENGINES IN PYTHON
Applying SVD
print(sigma)
[3.0, 4.8, -12.6, -3.8, 8.2, 7.3]
sigma = np.diag(sigma)
print(sigma)
array([ 3.0 , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 4.8 , 0. , 0. , 0. , 0. ],
[ 0. , 0. , -12.6 , 0. , 0. , 0. ],
[ 0. , 0. , 0. , -3.8 , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 8.2 , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 7.3 ]),
BUILDING RECOMMENDATION ENGINES IN PYTHON
Getting the final matrix
BUILDING RECOMMENDATION ENGINES IN PYTHON
Getting the final matrix
BUILDING RECOMMENDATION ENGINES IN PYTHON
Getting the final matrix
BUILDING RECOMMENDATION ENGINES IN PYTHON
Getting the final matrix
BUILDING RECOMMENDATION ENGINES IN PYTHON
Calculating the product in Python
recalculated_ratings = np.dot(U, sigma)
BUILDING RECOMMENDATION ENGINES IN PYTHON
Calculating the product in Python
recalculated_ratings = np.dot(np.dot(U, sigma), Vt)
print(recalculated_ratings)
[[ 0.1 -0.9 -3.6. ... ]
[ -2.3 0.5 -0.5 ... ]
[ 0.5 -0.5 2.0 ... ]
[ ... ... ... ... ]]
BUILDING RECOMMENDATION ENGINES IN PYTHON
Add averages back
recalculated_ratings = recalculated_ratings + avg_ratings.values.reshape(-1, 1)
print(recalculated_ratings)
[[ 4.6 3.6 0.9 ... ]
[ 1.8 4.0 3.0 ... ]
[ 3.0 2.0 4.5 ... ]
[ ... ... ... ... ]]
print(book_ratings_df)
[[ 5.0 4.0 NA ... ]
[ NA 4.0 3.0 ... ]
[ 3.0 2.0 NA ... ]
[ ... ... ... ... ]]
BUILDING RECOMMENDATION ENGINES IN PYTHON
Let's practice!
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
Validating your
predictions
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
Rob O'Callaghan
Director of Data
Hold-out sets
BUILDING RECOMMENDATION ENGINES IN PYTHON
Hold-out sets
BUILDING RECOMMENDATION ENGINES IN PYTHON
Hold-out sets
BUILDING RECOMMENDATION ENGINES IN PYTHON
Hold-out sets
BUILDING RECOMMENDATION ENGINES IN PYTHON
Hold-out sets
BUILDING RECOMMENDATION ENGINES IN PYTHON
Hold-out sets
BUILDING RECOMMENDATION ENGINES IN PYTHON
Separating the hold-out set
actual_values = act_ratings_df.iloc[:20, :100].values
act_ratings_df.iloc[:20, :100] = np.nan
Generate predictions as before.
predicted_values = calc_pred_ratings_df.iloc[:20, :100].values
BUILDING RECOMMENDATION ENGINES IN PYTHON
Masking the hold-out set
mask = ~np.isnan(actual_values)
print(actual_values[mask])
[4. 4. 5. 3. 3. ...]
print(predicted_values[mask])
[3.76, 4.35, 4.95, 3.5869079 3.686337 ...]
BUILDING RECOMMENDATION ENGINES IN PYTHON
Introducing RMSE (root mean squared error)
BUILDING RECOMMENDATION ENGINES IN PYTHON
Introducing RMSE (root mean squared error)
BUILDING RECOMMENDATION ENGINES IN PYTHON
Introducing RMSE (root mean squared error)
BUILDING RECOMMENDATION ENGINES IN PYTHON
Introducing RMSE (root mean squared error)
BUILDING RECOMMENDATION ENGINES IN PYTHON
Introducing RMSE (root mean squared error)
BUILDING RECOMMENDATION ENGINES IN PYTHON
Introducing RMSE (root mean squared error)
BUILDING RECOMMENDATION ENGINES IN PYTHON
RMSE in Python
from sklearn.metrics import mean_squared_error
print(mean_squared_error(actual_values[mask],
predicted_values[mask],
squared=False))
3.6223997
BUILDING RECOMMENDATION ENGINES IN PYTHON
Let's practice!
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
Wrap up
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
Rob O'Callaghan
Director of Data
Non-personalized models
BUILDING RECOMMENDATION ENGINES IN PYTHON
Content-based models
BUILDING RECOMMENDATION ENGINES IN PYTHON
Collaborative filtering
BUILDING RECOMMENDATION ENGINES IN PYTHON
Matrix factorization
BUILDING RECOMMENDATION ENGINES IN PYTHON
Congratulations!
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N