Time series analysis in neuroscience
Lecture 10. Clustering and Classification
Alexander Zhigalov / Dept. of CS, University of Helsinki and Dept. of NBE, Aalto University
Time series analysis in neuroscience 2
Outline / overview
Section 1. Machine learning
Section 2. Clustering
Section 3. Classification
Section 4. Regression
Time series analysis in neuroscience 3
Section 1. Machine learning
Time series analysis in neuroscience 4
Machine learning
Introducing Machine Learning - MathWorks
Section 1 Machine learning
Time series analysis in neuroscience 5
Machine learning
Unsupervised learning is useful when you want to explore
your data but do not yet have a specific goal or are not sure
what information the data contains.
A supervised learning algorithm takes a known set of input
data (the training set) and known responses to the data
(output), and trains a model to generate reasonable
predictions for the response to new input data.
Section 1 Machine learning
Time series analysis in neuroscience 6
Machine learning
Introducing Machine Learning - MathWorks
Section 1 Machine learning
Time series analysis in neuroscience 7
Section 2. Clustering
Time series analysis in neuroscience 8
Clustering
In cluster analysis, data is partitioned into groups based on
some measure of similarity or shared characteristic.
Clusters are formed so that objects in the same cluster are
very similar and objects in different clusters are very
distinct.
Section 2 Clustering
Time series analysis in neuroscience 9
K-means clustering
How it works
Partitions data into k number of mutually exclusive clusters.
How well a point fits into a cluster is determined by the
distance from that point to the cluster’s center.
Best used ...
• When the number of clusters is known
• For fast clustering of large data sets
Result
Cluster centers
Section 2 Clustering
Time series analysis in neuroscience 10
K-means clustering (1/2)
The number of clusters (K) is equal to the
number of sources.
# create copies
X[i] = np.tile(S[i, :], (R, 1)) +
np.random.randn(R, N) * SNR
# measurements
Y = X[np.random.permutation(M*R), :]
# clustering using sklearn
model = cluster.KMeans(n_clusters=K)
model.fit(Y)
# clustering outcome
labels = model.labels_
Z = model.cluster_centers_
inertia = model.inertia_
print(inertia) # within-cluster sum-of-squares
inertia = 0.0
See, “L10_clustering_kmeans.py”
Section 2 Clustering
Time series analysis in neuroscience 11
K-means clustering (1/2)
The number of clusters (K) is greater or less
than the number of sources.
# create copies
X[i] = np.tile(S[i, :], (R, 1)) +
np.random.randn(R, N) * SNR
# measurements
Y = X[np.random.permutation(M*R), :]
# clustering using sklearn
model = cluster.KMeans(n_clusters=K)
model.fit(Y)
# clustering outcome
labels = model.labels_
Z = model.cluster_centers_
inertia = model.inertia_
print(inertia)
inertia = 603.5 inertia = 0.0
See, “L10_clustering_kmeans.py”
Section 2 Clustering
Time series analysis in neuroscience 12
Noisy measurements (1/2)
How does it work in presence of noise?
# create copies
X[i] = np.tile(S[i, :], (R, 1)) +
np.random.randn(R, N) * 0.5
# measurements
Y = X[np.random.permutation(M*R), :]
inertia = 2550.0
See, “L10_clustering_kmeans.py”
Section 2 Clustering
Time series analysis in neuroscience 13
Noisy measurements (2/2)
Can the algorithm put noise to a single
cluster?
# create copies
X[i] = np.tile(S[i, :], (R, 1)) +
np.random.randn(R, N) * 0.5
# measurements
Y = X[np.random.permutation(M*R), :]
inertia = 3381.0 inertia = 2279.0
See, “L10_clustering_kmeans.py”
Section 2 Clustering
Time series analysis in neuroscience 14
Hierarchical clustering
How it works
Produces nested sets of clusters by analyzing similarities
between pairs of points and grouping objects into a binary,
hierarchical tree.
Best used ...
• When you do not know in advance how many clusters
are in your data
• You want visualization to guide your selection
Result
Dendrogram showing the hierarchical relationship
between clusters
Section 2 Clustering
Time series analysis in neuroscience 15
Hierarchical clustering (1/2)
What are the distance measures between
signals?
# pair-wise distance between signals
PX = np.zeros((MR, MR))
PX[np.triu_indices(MR, 1)] = pdist(X,
'euclidean')
# distance after permutation
PY = np.zeros((MR, MR))
PY[np.triu_indices(MR, 1)] = pdist(Y,
'euclidean')
See, “L10_clustering_hierarchical.py”
Section 2 Clustering
Time series analysis in neuroscience 16
Hierarchical clustering (2/2)
How does it work?
# clustering
model =
cluster.AgglomerativeClustering()
model.fit(Y)
labels = model.labels_
children = model.children_
See, “L10_clustering_hierarchical.py”
Section 2 Clustering
Time series analysis in neuroscience 17
Noisy measurements
How does noise affect the clustering
results?
# create copies
X[i] = np.tile(S[i, :], (R, 1)) +
np.random.randn(R, N) * 0.5
# measurements
Y = X[np.random.permutation(M*R), :]
See, “L10_clustering_hierarchical.py”
Section 2 Clustering
Time series analysis in neuroscience 18
Hierarchical clustering 2D (1/2)
Could we cluster the covariance matrix
instead of sources?
# covariance
CX = np.cov(X)
CY = np.cov(Y)
# clustering
model =
cluster.AgglomerativeClustering(n_clusters=K)
model.fit(Y)
labels = model.labels_
# sort matrix
indices = np.squeeze(np.argsort(labels))
CZ = CY
CZ = CZ[indices, :]
CZ = CZ[:, indices]
See, “L10_clustering_hierarchical_2D.py”
Section 2 Clustering
Time series analysis in neuroscience 19
Hierarchical clustering 2D (2/2)
Sub-optimal number of clusters …
# covariance
CX = np.cov(X)
CY = np.cov(Y)
# clustering
model =
cluster.AgglomerativeClustering(n_clusters=K)
model.fit(Y)
labels = model.labels_
# sort matrix
indices = np.squeeze(np.argsort(labels))
CZ = CY
CZ = CZ[indices, :]
CZ = CZ[:, indices]
See, “L10_clustering_hierarchical_2D.py”
Section 2 Clustering
Time series analysis in neuroscience 20
Gaussian Mixture Model
How it works
Partition-based clustering where data points come from
different multivariate normal distributions with certain
probabilities.
Best used ...
• When a data point might belong to more than one
cluster
• When clusters have different sizes and correlation
structures within them
Result
A model of Gaussian distributions that give probabilities of
a point being in a cluster
Section 2 Clustering
Time series analysis in neuroscience 21
Gaussian Mixture Model (1/2)
Why does it work?
# generate data
Z = [np.random.randn(1, N) * 0.5 + 0.0,
np.random.randn(1, N) * 1.25 + 4.0]
# scatter plot (joint distribution)
plt.scatter(Z[0], Z[1])
# gaussian PDF
b = np.linspace(-3, 10, 1000)
p0 = norm.pdf(b, np.mean(Z[0]), np.std(Z[0]))
p1 = norm.pdf(b, np.mean(Z[1]), np.std(Z[1]))
# multivariate gaussian PDF
x, y = np.meshgrid(np.arange(-10.0, 10.0,
delta), np.arange(-10.0, 10.0, delta))
z = mlab.bivariate_normal(x, y, np.std(Z[0]),
np.std(Z[1]), np.mean(Z[0]), np.mean(Z[1]))
plt.contour(x, y, z)
See, “L10_clustering_gmm.py”
Section 2 Clustering
Time series analysis in neuroscience 22
Gaussian Mixture Model (2/2)
How does it work?
# fit model
model = mixture.GaussianMixture(n_components=K)
model.fit(X)
# model properties
Y = model.predict(X)
model_mu = model.means_
model_cov = model.covariances_
See, “L10_clustering_gmm.py”
Section 2 Clustering
Time series analysis in neuroscience 23
Section 3. Classification
Time series analysis in neuroscience 24
Classification
Classification techniques predict discrete responses, for
example, whether an email is genuine or spam.
Classification models are trained to classify data
into categories.
Section 3 Classification
Time series analysis in neuroscience 25
Support Vector Machine
How it works
Classifies data by finding the linear decision boundary
(hyperplane) that separates all data points of one class from
those of the other class. The best hyperplane for an SVM is
the one with the largest margin between the two classes.
Best used ...
• For data that has exactly two classes
• For high-dimensional, nonlinearly separable data
• When you need a classifier that is simple, easy to
interpret and accurate
Result
Training/fitting transforms data and labels to coefficients,
while testing/prediction transforms data and coefficients to
labels.
Section 3 Classification
Time series analysis in neuroscience 26
Support Vector Machine
When it works
SVM works if data has exactly two classes.
Section 3 Classification
Time series analysis in neuroscience 27
Support Vector Machine (1/2)
SVM as any other classification approach
training testing
consists of two stages: training and testing.
# data
X = np.random.randn(M, N)
# binary labels
y = get_sequence(5, 0.8, N)
# induce some correlation between X and y
X = X + 2.0 * np.tile(y, (M, 1))
# training and testing datasets
L = N // 2
Y = y[:L] # training labels
U = y[L:] # testing labels
XY = X[:, :L] # training data
XU = X[:, L:] # testing data
# train classifier
model = SVC(kernel='linear')
model.fit(XY.T, Y)
See, “L10_classification_svm_2_signals.py”
Section 3 Classification
Time series analysis in neuroscience 28
Support Vector Machine (2/2)
The classifier gives the coefficients that can
training testing
be converted to the decision function.
# classifier outcome
coef = model._get_coef()
intercept = model.intercept_
# decision function
Z = np.zeros(N)
for i in range(0, N):
Z[i] = np.sum(X[:, i] * coef) + intercept
# testing
v = U
u = model.predict(XU.T)
u = u > 0.5
# accuracy
a = np.mean(v == u)
print('accuracy: %1.2f' % (a))
accuracy = 99%
See, “L10_classification_svm_2_signals.py”
Section 3 Classification
Time series analysis in neuroscience 29
Classification accuracy (1/2)
What if two datasets cannot be clearly
training testing
separated?
# data
X = np.random.randn(M, N)
# binary labels
y = get_sequence(5, 0.8, N)
# induce some correlation between X and y
X = X + 2.0 * np.tile(y, (M, 1))
# training and testing datasets
L = N // 2
Y = y[:L] # training labels
U = y[L:] # testing labels
XY = X[:, :L] # training data
XU = X[:, L:] # testing data
# train classifier
model = SVC(kernel='linear')
model.fit(XY.T, Y)
See, “L10_classification_svm_2_signals.py”
Section 3 Classification
Time series analysis in neuroscience 30
Classification accuracy (2/2)
The decision function looks like a random
training testing
noise.
# classifier outcome
coef = model._get_coef()
intercept = model.intercept_
# decision function
Z = np.zeros(N)
for i in range(0, N):
Z[i] = np.sum(X[:, i] * coef) + intercept
# testing
v = U
u = model.predict(XU.T)
u = u > 0.5
# accuracy
a = np.mean(v == u)
print('accuracy: %1.2f' % (a))
accuracy = 76%
See, “L10_classification_svm_2_signals.py”
Section 3 Classification
Time series analysis in neuroscience 31
Multichannel recordings (1/2)
More channels could provide a better fit of
training testing
the model but also overfitting may occur.
# data
X = np.random.randn(M, N)
# binary labels
y = get_sequence(5, 0.8, N)
# induce some correlation between X and y
X = X + 2.0 * np.tile(y, (M, 1))
accuracy = 100%
See, “L10_classification_svm.py”
Section 3 Classification
Time series analysis in neuroscience 32
Classification accuracy (2/3)
In case of weak correlation between data
and labels …
# data
X = np.random.randn(M, N)
# binary labels
y = get_sequence(5, 0.8, N)
# induce some correlation between X and y
X = X + 0.5 * np.tile(y, (M, 1))
accuracy = 67%
See, “L10_classification_svm.py”
Section 3 Classification
Time series analysis in neuroscience 33
Discriminant Analysis
How it works
Discriminant analysis classifies data by finding linear
combinations of features. Discriminant analysis assumes
that different classes generate data based on Gaussian
distributions.
Best used ...
• When you need a simple model that is easy to interpret
• When you need a model that is fast to predict
Result
Coefficients
Section 3 Classification
Time series analysis in neuroscience 42
Logistic regression
How it works
Fits a model that can predict the probability of a binary
response belonging to one class or the other. Because of its
simplicity, logistic regression is commonly used as a starting
point for binary classification problems.
Best used ...
• When data can be clearly separated by a single, linear
boundary
• As a baseline for evaluating more complex classification
methods
Result
Coefficients
Section X Classification
Time series analysis in neuroscience 34
Section 4. Regression
Time series analysis in neuroscience 35
Regression
Regression techniques predict continuous responses, for
example, changes in temperature or fluctuations in
electricity demand.
Section 4 Regression
Time series analysis in neuroscience 36
Linear regression
How it works
Linear regression is a statistical modeling technique used to
describe a continuous response variable as a linear function
of one or more predictor variables.
Best used ...
• When you need an algorithm that is easy to interpret
and fast to fit
• As a baseline for evaluating other, more complex,
regression models
Result
Coefficients
Section 4 Regression
Time series analysis in neuroscience 37
Nonlinear regression
How it works
Nonlinear regression is a statistical modeling technique that
helps describe nonlinear relationships in experimental data.
Best used ...
• When data has strong nonlinear trends and cannot be
easily transformed into a linear space
• For fitting custom models to data
Result
Coefficients
Section 4 Regression
Time series analysis in neuroscience 38
SVM regression
How it works
SVM regression algorithms work like SVM classification
algorithms, but are modified to be able to predict a
continuous response.
Best used ...
• For high-dimensional data with a large number of
predictor variables
Result
Coefficients
Section 4 Regression
Time series analysis in neuroscience 39
Literature
Time series analysis in neuroscience 40
• Python programming language
- http://www.scipy-lectures.org/, see “materials/L02_ScipyLectures.pdf”
• Data analysis
- Andreas Müller and Sarah Guido “Introduction to Machine Learning with Python: A Guide for Data Scientists”
Literature