[go: up one dir, main page]

0% found this document useful (0 votes)
34 views2 pages

Codes For Practice

The document outlines methods for feature selection in classification tasks using different data types. It includes code snippets for selecting features with numerical inputs and categorical outputs using ANOVA, as well as loading and preparing datasets with categorical inputs and outputs. Additionally, it demonstrates how to split a dataset into training and testing sets for further analysis.

Uploaded by

mohan venkey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views2 pages

Codes For Practice

The document outlines methods for feature selection in classification tasks using different data types. It includes code snippets for selecting features with numerical inputs and categorical outputs using ANOVA, as well as loading and preparing datasets with categorical inputs and outputs. Additionally, it demonstrates how to split a dataset into training and testing sets for further analysis.

Uploaded by

mohan venkey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

1.

Feature Selection for Classification: Numerical Input, Categorical Output


# ANOVA feature selection for numeric input and categorical output
from sklearn.datasets import make_classification
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif
# generate dataset
X, y = make_classification(n_samples=100, n_features=20, n_informative=2)
# define feature selection
fs = SelectKBest(score_func=f_classif, k=2)
# apply feature selection
X_selected = fs.fit_transform(X, y)
print(X_selected.shape)

2. Feature Selection for Classification: Categorical Input, Categorical Output


# load the dataset
def load_dataset(filename):
# load the dataset as a pandas DataFrame
data = read_csv(filename, header=None)
# retrieve numpy array
dataset = data.values
# split into input (X) and output (y) variables
X = dataset[:, :-1]
y = dataset[:,-1]
# format all fields as string
X = X.astype(str)
return X, y
3. Feature Selection for Classification: Categorical Input, Categorical Output
# load and summarize the dataset
from pandas import read_csv
from sklearn.model_selection import train_test_split
# load the dataset
def load_dataset(filename):
# load the dataset as a pandas DataFrame
data = read_csv(filename, header=None)
# retrieve numpy array
dataset = data.values
# split into input (X) and output (y) variables
X = dataset[:, :-1]
y = dataset[:,-1]

# format all fields as string


X = X.astype(str)
return X, y
# load the dataset
X, y = load_dataset('breast-cancer.csv')
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)
# summarize
print('Train', X_train.shape, y_train.shape)
print('Test', X_test.shape, y_test.shape)

You might also like