[go: up one dir, main page]

0% found this document useful (0 votes)
74 views43 pages

Lec 06 Feature Selection and Extraction

This document summarizes a lecture on feature selection and extraction in machine learning. It discusses dimensionality reduction techniques to simplify datasets while retaining important information. It also covers different types of feature selection methods including filter, wrapper and embedded approaches to evaluate and select important features. Finally, it introduces feature extraction as a way to transform raw data into a new feature space that better represents the data.

Uploaded by

Augusto Monge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views43 pages

Lec 06 Feature Selection and Extraction

This document summarizes a lecture on feature selection and extraction in machine learning. It discusses dimensionality reduction techniques to simplify datasets while retaining important information. It also covers different types of feature selection methods including filter, wrapper and embedded approaches to evaluate and select important features. Finally, it introduces feature extraction as a way to transform raw data into a new feature space that better represents the data.

Uploaded by

Augusto Monge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Lecture 6

Feature Selection
and Extraction

Machine Learning
Ivan Smetannikov

15.06.2016
Lecture plan
• Dimensionality Reduction
• Feature Selection
• Feature Extraction

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 2


Lecture plan
• Dimensionality Reduction
• Feature Selection
• Feature Extraction

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 3


Dimensionality Reduction

Why should we look at dimensionality


reduction?

• Speeds up algorithms
• Reduces space used by data for them

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 4


Dimensionality Reduction

What is dimensionality reduction?

• You’ve collected many features – maybe


more than you need. Can you ”simply”
your data set in a rational and useful way?

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 5


Dimensionality Reduction

Example:
• Redundant data set –
different units for
same attribute
• Reduce data to 1D
(2D -> 1D)

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 6


Dimensionality Reduction

Another Example
• Helicopter flying - do a survey of pilots
(x1 = skill, x2 = pilot enjoyment) These
features may be highly correlated
• This correlation can be combined into a
single attribute called aptitude (for
example)

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 7


Dimensionality Reduction

So what does
dimensionality
reduction mean?
• Let plot a line
• Take exact example
and record
position on that
line
• So we can present
x1 as 1D number
Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 8
Dimensionality Reduction

Another example 3D -> 2D

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 9


Dimensionality Reduction

Motivation:
Collect a large data set (50 dimensions)

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 10


Dimensionality Reduction

Using dimensionality reduction come up with a


different feature representation

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 11


Lecture plan
• Dimensionality Reduction
• Feature Selection
• Feature Extraction

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 12


Feature Selection

Goals of feature selection:


• Avoiding retraining and improving the
quality of classification
• Best understanding of models
• Boosting of classifying models

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 13


Feature Selection

Type of elected attributes:


• Redundant attributes - do not carry any
additional information
• Irrelevant attributes - are not generally
informative

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 14


Feature Selection

Evaluation methods of feature selection:


• At various datasets
• With different classifiers (if possible)
• By adding to datasets noise and target
vectors

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 15


Feature Selection

Feature selection types:


• Filter methods
a. Univariate
b. Multivariate
• Wrapper methods
a. Deterministic
b. Randomized
• Embedded methods
Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 16
Feature Selection

Filter methods:
Evaluate the quality of certain attributes and
remove the worst of them.
+ Simple to compute, easy to scale
- Ignore the relationships between attributes
or features used by classifier

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 17


Feature Selection

Examples of filter methods:


• Univariate:
o Euclidian distance
o Information gain
• Spearman corellation coefficient
o Multivariate:
o CFS
o MBF

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 18


Feature Selection

Spearman corellation coefficient

Python SciPy: x : (N,) array_like


scipy.stats.pearsonr(x, y) Input
Parameters:
y : (N,) array_like
Input
(Pearson’s correlation coefficient,
Returns:
2-tailed p-value)
Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 19
Feature Selection

Weka:
ASEvaluation evaluator = new CorrelationAttributeEval();
Ranker ranker = new Ranker();

// ranker.setThreshold(0.05); or ranker.setNumToSelect(10);

AttributeSelection selection = new AttributeSelection();

selection.setInputFormat(heavyInstances);

selection.setEvaluator(evaluator);
selection.setSearch(ranker);
Instances lightInstances = Filter.useFilter(heavyInstances, selection);

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 20


Feature Selection

Wrapper methods:
Get a subset of attributes of the source
+ Higher accuracy than Filtering
+ Consider the relationships between
attributes
+ Direct interaction with the classifier
- Long computing time
- The probability of re-education
Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 21
Feature Selection

Examples of Wrapper methods:


• Deterministic:
o SFS (sequential forward selection?)
o SBE (sequential backward elimination?)
o SVM-RFE
• Randomized:
o Randomized Hill Climbing
o Genetic Algorithms

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 22


Feature Selection

SVM-RFE
• Train SVM on training subset
• Rank features by received weights
• Throw out last features
• Repeat until the necessary amount of
features will left

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 23


Feature Selection

SVM-RFE (Python example)


x = [1, 5, 1.5, 8, 1, 9]
y = [2, 8, 1.8, 8, 0.6, 11]

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 24


Feature Selection

SVM-RFE (Python example)

X = np.array([[1,2], [5,8], [1.5,1.8], [8,8],


[1,0.6], [9,11]])
y = [0,1,0,1,0,1]

Let use SVM:


clf = svm.SVC(kernel='linear', C = 1.0)
Let fit our model:
clf.fit(X,y)
Let predict predict something:
print(clf.predict([0.58,0.76]))

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 25


Feature Selection

Embedded

• Take into account the particular classifier


• Use individual method for each classifier

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 26


Feature Selection

Random Forest:

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 27


Feature Selection

Random Forest:
• Select a subsample of size N for each tree
with replacement
• Build decision trees. To select next feature
to split the considered
• Choose the best for a given criteria

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 28


Feature Selection

Random Forest (Python example):


# Import the random forest package
from sklearn.ensemble import RandomForestClassifier
# Create the random forest object which will include all the
parameters for the fit
forest = RandomForestClassifier(n_estimators = 100)
# Fit the training data to the Survived labels and create the
decision trees
forest = forest.fir(train_data[0::, 1::],
train_data[0::, 0])
# Take the same decision trees and run it on the test data
output = forest.predict(test_data)

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 29


Feature Selection

Random Forest (Weka):


int numFolds = 10;
br = new BufferedReader(new FileReader(“data.arff"));

Instances trainData = new Instances(br);


trainData.setClassIndex(trainData.numAttributes() - 1);

RandomForest rf = new RandomForest();


rf.setNumTrees(100);

rf.buildClassifier(trainData);
Evaluation evaluation = new Evaluation(trainData);
evaluation.crossValidateModel(rf, trainData, numFolds, new Random(1));

System.out.println("F-measure= " + evaluation.fMeasure(0));

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 30


Feature Selection

IG and IG

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 31


Feature Selection

Redundancy

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 32


Feature Selection

Regularization

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 33


Lecture plan
• Dimensionality Reduction
• Feature Selection
• Feature Extraction

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 34


Feature Extraction

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 35


Feature Extraction

Feature Extraction
• Reducing the amount of resources
required to describe a large set of data
• New features
• Linear and nonlinear

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 36


Feature Extraction

Feature Extraction
• Reducing the amount of resources
required to describe a large set of data
• New features
• Linear and nonlinear

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 37


Feature Extraction

Linear and nonlinear

Maniford Sculpting PCA

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 38


Feature Extraction

PCA
We have 2D dataset which we wish to
reduce to 1D

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 39


Feature Extraction

PCA tries to find the surface (a straight line


in this case) which has
the minimum projection error

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 40


Feature Extraction

PCA (Python example)


Let use Iris-data and import PCA

from sklearn.decomposition import PCA as sklearnPCA


sklearn_pca = sklearnPCA(n_components=2)
Y_sklearn = sklearn_pca.fit_transform(X_std)

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 41


Feature Extraction

PCA (Python example)


Let plot PCA-results

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 42


Feature Extraction

PCA (Weka)

PrincipalComponents pca = new PrincipalComponents();

pca.setInputFormat(trainingData);
pca.setMaximumAttributes(100);
newData = Filter.useFilter(newData, pca);

Machine learning. Lecture 6. Feature Selection and Extraction. 15.06.2016. 43

You might also like