DATA MINING TASK-3
3a. TO DEMONSTRATE DATA PREPROCESSING ON PREDEFINED WEKA DATASET DIABETES.ARFF
AIM: TO PERFORM OPERATIONS ON DIABETES.ARFF WITH HELP OF DATA MINING TOOL WEKA.
PROCEDURE: Click on the open file -> Go to C drive- > Go to Program files-> Go to the Weka
3.8.6. -> Click on data folder.
Choose the required dataset,i.e is DIiabetes.arff.
Click on edit to see
OPERATIONS PERFORMED :
1)ADD: An instance filter that adds a new attribute to dataset.
To apply add operation,we have to click on choose and click on filters,unsupervised then add option.
Now right click on operation to click on to show properties ,then add data on attributeName.
AFTER
Click on edit to see
2.REMOVE: A filter that removes a range of attributes from the dataset
To apply add operation,we have to click on choose and click on filters,unsupervised then remove option.
Now right click on remove to click on show properties,then give value to the attributeIndices.
AFTER
Click on edit to see
3.COPY: An instance filter that copies a range of more capabilities attributes in a dataset
To apply add operation,we have to click on choose and click on filters,unsupervised then click on copy option.
Now right click on remove to click on show properties,then give range values to the attributeIndices.
Then click on edit to see
4.REPLACE MISSING VALUES: Replace all missing values for nominal and numeric attributes in a
dataset with the modes and means from the training data.
To apply add operation,we have to click on choose and click on filters,unsupervised then click on
Remove missing values option.
Now right click on remove to click on show properties,then click on ok as it updates default values.
Then clicl
5.REPLACE MISSING WITH USER CONSTANT: Replace all missing values for nominal and numeric attributes in
a dataset with user-supplied constant values.
Then right click on Remove missing values with user constant to show properties,then ,Feed data to
nominalStringReplacementValue and numericReplacementValue as follows.
Then click on edit to see
6.STANDARDIZE.: All numeric attributes in the given dataset to have zero mean and unit variance .
After click on ok to see the changes.
7.NORMALIZATION: All numeric values in the given dataset (apart from the class attribute,if set)
AFTER
Then click on edit to see
STATISTICS AND ITS VALUES:
OBSERVATION
Relation:diabetes
No.of.attributes=9
No.of.missing values=0
List the attribute names:
1.preg 2.plas 3.pres 4.skin 5.insu 6.mass 7.pedi 8.age 9.class
Yes it is balanced dataset.
3b.Create a student.arff dataset & demonstrate data preprocessing on it.
AIM: TO CREATE AN STUDENT TABLE WITH HELP OF DATA MINING TOOL WEKA.
DESCRIPTION: We need to create an student table with training dataset which includes the attributes like
name, age,id,branch,gender.
PROCEDURE:
Open notepad and type the following code
OPERATIONS PERFORMED :
1)ADD: An instance filter that adds a new attribute to dataset.
To apply add operation,we have to click on choose and click on filters,unsupervised then add option.
AFTER
2.REMOVE: A filter that removes a range of attributes from the dataset
To apply add operation,we have to click on choose and click on filters,unsupervised then remove option.
AFTER
3.COPY: An instance filter that copies a range of more capabilities attributes in a dataset
To apply add operation,we have to click on choose and click on filters,unsupervised then click on copy option.
AFTER:
4.REPLACEE MISSING VALUES: Replace all missing values for nominal and numeric attributes in a
dataset with the modes and means from the training data.
To apply add operation,we have to click on choose and click on filters,unsupervised then click on
Remove missing values option.
A
AFTER
5.REPLACE MISSING WITH USER CONSTANT: Replace all missing values for nominal and numeric attributes in
a dataset with user-supplied constant values.
Then right click on Remove missing values with user constant to show properties,then ,Feed data to
nominalStringReplacementValue and numericReplacementValue as follows.
AFTER
6.NORMALIZATION: All numeric values in the given dataset (apart from the class attribute,if set)
AFTER
7.STANDARDIZE.: All numeric attributes in the given dataset to have zero mean and unit variance .
AFTER
OBSERVATION
elation:student
No.of.attributes=5
No.of.missing values=0
List the attribute names:
1.name, 2.age,3.id,4.branch,5.gender
It is a balanced dataset
3c.Create a weather.arff dataset & demonstrate data preprocessing on it.
AIM: TO CREATE AN WEATHER TABLE WITH HELP OF DATA MINING TOOL WEKA.
DESCRIPTION: We need to create an weather table with training dataset which includes the attributes like
Outlook,temperature,humidity,windy,play.
PROCEDURE:Open notepad and type the following code
OPERATIONS PERFORMED :
1)ADD: An instance filter that adds a new attribute to dataset.
To apply add operation,we have to click on choose and click on filters,unsupervised then add option.
AFTER
2.REMOVE: A filter that removes a range of attributes from the dataset
To apply add operation,we have to click on choose and click on filters,unsupervised then remove option.
AFTER
3.COPY: An instance filter that copies a range of more capabilities attributes in a dataset
To apply add operation,we have to click on choose and click on filters,unsupervised then click on copy option.
AFTER
4.REPLACEE MISSING VALUES: Replace all missing values for nominal and numeric attributes in a
dataset with the modes and means from the training data.
To apply add operation,we have to click on choose and click on filters,unsupervised then click on
Remove missing values option.
AFTER
5.REPLACE MISSING WITH USER CONSTANT: Replace all missing values for nominal and numeric attributes in
a dataset with user-supplied constant values.
AFTER
6.STANDARDIZE.: All numeric attributes in the given dataset to have zero mean and unit variance .
AFTER
7.NORMALIZATION: All numeric values in the given dataset (apart from the class attribute,if set)
AFTER
Click on edit to see
Statistics and its values:
OBSERVATION
elation:weather
No.of.attributes=5
No.of.missing values=0
List the attribute names:
1.outlook 2.temperature 3.humidity 4.windy 5.play.
It is a balanced dataset
SUBMITTED BY:22A81A0654