0% found this document useful (0 votes)

89 views47 pages

DWM Lab Manual 2025-26 Updated

The document is a laboratory manual for the Data Warehouse & Mining course at Nagpur Institute of Technology for the 2024-2025 session. It outlines various practical experiments students will perform, including data preprocessing, normalization, discretization, and classification using Weka software. Each practical includes an aim, theory, and step-by-step procedures for executing the tasks.

Uploaded by

ganeshawghade700

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views47 pages

DWM Lab Manual 2025-26 Updated

Uploaded by

ganeshawghade700

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 47

SHRI SAI SHIKSHAN SANSTHA’S

NAGPUR INSTITUTE OF TECHNOLOGY, NAGPUR

DEPARTMENT OF INFORMATION TECHNOLOGY

Session 2024-2025
Laboratory Manual

Seventh Semester
Subject Name: Data Warehouse &Mining
Practical Code: BEIT701P
COURSE OUTCOME

On successful completion of the course, students will be able to:

Sr. Practical
Code
Title Of Experiment
No.
To create .csv files from excel and read it as arff file in Weka.
1 PRO1

Perform preprocessing on bank dataset by copying attributes in the

2 PR02 dataset.

3
PR03
Study about discretization on iris data set.
Study about normalization on iris data set.
4 PR04

5 Perform Nominal to Binary conversion on weather dataset.

PR05

6 Perform remove filter on weather dataset.

PR06

Generate decision tree using J48 algorithm.

7 PR07

8 PR08 Perform association on contact lenses dataset using Apriory algorithm.

Perform classification on labor dataset using decision tree.

9 PR09
Perform Classification of Supermarket dataset by using Naive Bayesian
10 PR10 Classifier.
Demonstrate standardization on weather data set.
11. PR11

Perform classification on weather data set using ZeroR rule.

12. PR12

Perform classification on weather data set using OneR rule.

13. PR13

Perform k means clustering on iris data set.

14. PR14

Use multiple ROC curve for model evaluation.

15. PR15

1. Perform Pre-processing On dataset.

2. Perform Classification on Datasets.

3. Perform normalization & discretization on dataset.

List of Practical
Practical No. 01
AIM: To create .csv files from excel and read it as arff file in Weka.
THEORY:

Attribute Relation File Format (ARFF) is an ASCII text file that describes a list of instances
sharing a set of attriobutes. ARFF files were developed by the Machine Learning Project at the department
of computer science of the University of Waikato for use with the Weka machine learning software. ARFF
file can be created from excel files by saving the excel files in comma separated values (CSV) format.

CSV is a simple file format used to store tabular data, such as a spreadsheet or database. Files in
the CSV format can be imported to and exported from programs that store data in tables, such as Microsoft
Excel or Open Office Calc.
CSV stands for "comma-separated values". Its data fields are most often separated, or delimited,
by a comma. For example, let's say you had a spreadsheet containing the following data.

Name Class Dorm Room GPA

Sally Whittaker 2018 McCarren House 312 3.75

Belinda Jameson 2017 Cushing House 148 3.52
Jeff Smith 2018 Prescott House 17-D 3.20
Sandy Allen 2019 Oliver House 108 3.48
The above data could be represented in a CSV-formatted file as follows:

Sally Whittaker,2018,McCarren House,312,3.75

Belinda Jameson,2017,Cushing House,148,3.52

Jeff Smith,2018,Prescott House,17-D,3.20

Sandy Allen,2019,Oliver House,108,3.48

Procedure:
Step 1: Create dataset in excel file. Name the file as student_dataset

Step 2: Save the dataset in CSV format

Step 3: In the weka explorer, click on open file button and select the .csv file created in the previous step.

Output:

Practical No. 02
AIM: Perform preprocessing on bank dataset by copying attributes in the dataset.

THEORY:

In data preprocessing, we can copy any attribute or no. of attributes if required. Some applications
require copying attributes in the data set. This exercise illustrates some of the basic data preprocessing
operations that can be performed using WEKA. The sample data set used for this example is the "bank data"
available in comma‐separated format (bank‐data.csv).

The data contains the following fields

1.Id- a unique identification number
2.Age- age of customer in years (numeric
3.sex -MALE / FEMALE region inner_city/rural/suburban/town
4.income- income of customer (numeric) married is the customer married (YES/NO)
5.children- number of children (numeric) car does the customer own a car (YES/NO)
6.save-_acct does the customer have a saving account (YES/NO)
7.current-_acct does the customer have a current account (YES/NO)
8.mortgage -does the customer have a mortgage (YES/NO)
9.pep did the customer buy a PEP (Personal Equity Plan) after the last mailing (YES/NO)

Loading the Data In addition to the native ARFF data file format, WEKA has the capability to read
in ".csv" format files. This is fortunate since many databases or spreadsheet applications can save or export
data into flat files in this format. As can be seen in the sample data file, the first row contains the attribute
names (separated by commas) followed by each data row with attribute values listed in the same order (also
separated by commas). In fact, once loaded into WEKA, the data set can be saved into ARFF format. In
this example, we load the data set into WEKA, perform a series of operations using WEKA's preprocessing
filters. While all of these operations can be performed from the command line, we use the GUI interface for
WEKA Explorer. Initially (in the Preprocess tab) click "open" and navigate to the directory containing the
data file (.csv or .arff).
Procedure:

Step 1: Load bank dataset in weka explorer.

Step 2: Choose the copy filter in the filters panel.

Step 3: Choose the index of the attribute you want to copy. Apply the filter

Output:

Practical No. 03
AIM: Study about discretization on iris data set.
THEORY:

Data discretization techniques can be used to reduce the number of values for a given continuous
attribute by dividing the range of the attribute into intervals. Interval labels can then be used to replace
actual data values [5]. This leads to a concise, easy-to-use, knowledge-level representation of mining results.
Data discretization can perform before or while doing data mining. Most of the real data set usually contains
continuous attributes. Some machine learning algorithms that can handle both continuous and discrete
attributes perform better with discrete-valued attributes. Discretization involves: Divide the ranges of
continuous attribute into

 intervals Some classification algorithms only accept

 categorical attributes Reduce data size by discretization
 Prepare for further analysis
 Discretization techniques are often used by the classification algorithms. Unsupervised discretization
algorithms that do not use class information that divides continuous ranges into sub-ranges [8].
Discretization involves several advantages. Some of them are given below: Discretization will reduce the
number of
 continuous features values, which brings smaller demands on system’s storage. Discretization makes
learning more accurate.
 faster. In addition to many advantages of having discrete
 data over continuous one, a suite of classification learning algorithms can only deal with discrete data.
Data can also be reduced and simplified through
 discretization. For both users and experts, discrete features are easier to understand, use, and explain.

Procedure:

Step 1: Load iris dataset in the weka explorer.

Step 2:Select Discretize from filters panel

Step 3: Select the attribute indicves on which discretization is to be performed. Fill the no. of bins
required in the bins field. Apply the filter.

Step 4: Output
Practical No. 04
AIM: Study about normalization on iris data set.

THEORY:
In creating a database, normalization is the process of organizing it into tables in such a way that the
results of using the database are always unambiguous and as intended. Normalization may have the effect of
duplicating data within the database and often results in the creation of additional tables. (While
normalization tends to increase the duplication of data, it does not introduce redundancy, which is
unnecessary duplication.) Normalization is typically a refinement process after the initial exercise of
identifying the data objects that should be in the database, identifying their relationships, and defining the
tables required and the columns within each table.

Procedure:

Step 1: Load iris dataset in the weka explorer.

Step 2: Select Normalize from filters panel

Step 3: Select the scale for normalization, along with other fields, and apply the filter.
Step 4: Otuput
Practical No. 05
AIM: Perform Nominal to Binary conversion on weather dataset.
THEORY:

 Normalization is scaling technique or a mapping technique or a pre processing stage [1]. Where, we
can find new range from an existing one range. It can be helpful for the prediction or forecasting
purpose a lot [2].
 As we know there are so many ways to predict or forecast but all can vary with each other a lot. So
to maintain the large variation of prediction and forecasting the Normalization technique is required
to make them closer.
 there is some existing normalization techniques as mentioned in my abstract section namely Min-
Max, Zscore & Decimal scaling excluding these technique we are presenting new one technique
called Integer Scaling technique. This technique comes from the AMZD (Advanced on Min-Max Z-
score Decimal scaling)
 Data normalization is a process in which data attributes within a data model are organized to increase
the cohesion of entity types. In other words, the goal of data normalization is to reduce and even
eliminate data redundancy, an important consideration for application developers because it is
incredibly difficult to stores objects in a relational database that maintains the same information in
several places.

1. First Normal Form (1NF)

Let’s consider an example. An entity type is in first normal form (1NF) when it contains no repeating
groups of data. For example, in Figure 1 you see that there are several repeating attributes in the
data Order0NF table – the ordered item information repeats nine times and the contact information is
repeated twice, once for shipping information and once for billing information. Although this initial version
of orders could work, what happens when an order has more than nine order items? Do you create
additional order records for them? What about the vast majority of orders that only have one or two items?
Do we really want to waste all that storage space in the database for the empty fields? Likely not.
Furthermore, do you want to write the code required to process the nine copies of item information, even if
it is only to marshal it back and forth between the appropriate number of objects. Once again, likely not.

2. Second Normal Form (2NF)

Although the solution presented in Figure 2 is improved over that of Figure 1, it can be normalized
further. Figure 3 presents the data schema of Figure 2in second normal form (2NF). an entity type is in
second normal form (2NF) when it is in 1NF and when every non-key attribute, any attribute that is not part
of the primary key, is fully dependent on the primary key. This was definitely not the case with
the OrderItem1NF table, therefore we need to introduce the new table Item2NF. The problem
with OrderItem1NF is that item information, such as the name and price of an item, do not depend upon an
order for that item. For example, if Hal Jordan orders three widgets and Oliver Queen orders five widgets,
the facts that the item is called a “widget" and that the unit price is $19.95 is constant. This information
depends on the concept of an item, not the concept of an order for an item, and therefore should not be
stored in the order items table – therefore the Item2NF table was introduced. OrderItem2NF retained the a
calculated value that is the number of items ordered multiplied by the price of the item. The value of
the SubtotalBeforeTax column within the Order2NF table is the total of the values of the total price
extended for each of its order items.
Procedure:

Step 1: Load weather dataset in the weka explorer.

Step 2: Save the file as .csv extension.

Step 3: Explore the file on WEKA explorer.

Step 4: Choose the nominal to binary filter

Step 5: Indicate attribute from index which you want to change to binary.

Step 5 OUTPUT: After selection click apply button.

Practical No. 06
AIM: Perform remove filter on weather dataset.
THEORY: A filter that removes a range of attributes from the dataset. Will re-order the remaining
attributes if invert matching sense is turned on and the attribute column indices are not specified in
ascending order.

Procedure:

Step 1: Load weather dataset in the weka explorer.

Step 2: Save the file as .csv extension.

Step 3: Explore the file on WEKA explorer

Step 4:Choose the remove filter.

Step 5:Indicate attribute from index which you want remove.

Step 6: After selection click apply button and particular attribute will be remove.
Practical No. 07
AIM: Generate decision tree using J48 algorithm.
THEORY:

 A decision tree is a predictive machine-learning model that decides the target value (dependent
variable) of a new sample based on various attribute values of the available data.
 The internal nodes of a decision tree denote the different attributes, the branches between the nodes
tell us the possible values that these attributes can have in the observed samples, while the terminal
nodes tell us the final value (classification) of the dependent variable.
 The attribute that is to be predicted is known as the dependent variable, since its value depends
upon, or is decided by, the values of all the other attributes. The other attributes, which help in
predicting the value of the dependent variable, are known as the independent variables in the dataset.

Modified J48 Decision Tree Algorithm

 The 16 bit representation of the device MAC address is presented in theCurrent Active Directory
List. The modified J48 decision tree algorithm examines thenormalized information gain that results
from choosing an attribute for splitting thedata.
 To make the decision, the attribute with the highest normalized information gainis used. Then the
algorithm recurs on the smaller subsets. The splitting procedurestops if all instances in a subset
belong to the same class.
 Then a leaf node is created in the decision tree telling to choose that class. Inhis case, the modified
J48 decision tree algorithm creates a decision node higher upin the tree using the expected value of
the class.
 If the generated LSB value in CADLand incoming protocol device MAC address are same then the
device is authenticatedotherwise the device recommended for the intruder.

Disadvantages of J48 algorithm:

run-time complexity of the algorithm matches to the tree depth, which cannot be greater than the number of
attributes. Tree depth is linked to tree size, and thereby to the number of examples. So, the size of C4.5 trees
increaseslinearly with the number of examples. C4.5 rules slow for large and noisy datasets Space
complexity is very large as wehave to store the values repeatedly in arrays.
Fig: Create weather dataset and save with extension .csv.

Fig: Explore the excel file in WEKA explorer.

Fig: Use J48 tree classifier.

Fig: Classifier Output.

Fig: Select visualize tree option.

Fig: Classifier decision tree view.

Result: Hence,we have generated decision tree using J48 algorithm.

Practical No. 08
AIM: Perform association on contact lenses dataset using Apriory algorithm.

THEORY:

The Apriori algorithm was proposed by Agrawal and Srikant in 1994. Apriori is designed to operate
on databases containing transactions (for example, collections of items bought by customers, or details of a
website frequentation). Other algorithms are designed for finding association rules in data having no
transactions, or having no timestamps (DNA sequencing). Each transaction is seen as a set of items
(an itemset). Given a threshold , the Apriori algorithm identifies the item sets which are subsets of at
least transactions in the database. Apriori uses a "bottom up" approach, where frequent subsets are extended
one item at a time (a step known as candidate generation), and groups of candidates are tested against the
data. The algorithm terminates when no further successful extensions are found.
Apriori uses breadth first search and a Hash tree structure to count candidate item sets efficiently. It
generates candidate item sets of length from item sets of length . Then it prunes the candidates which have
an infrequent sub patternThe pseudo code for the algorithm is given below for a transaction database , and a
support threshold of . Usual set theoretic notation is employed, though note that is a multi-set. Is the
candidate set for level. At each step, the algorithm is assumed to generate the candidate sets from the large
item sets of the preceding level, heeding the downward closure lemma. Accesses a field of the data structure
that represents candidate set, which is initially assumed to be zero. Many details are omitted below, usually
the most important part of the implementation is the data structure used for storing the candidate sets, and
counting their frequencies.
Procedure:

Step 1: Explore contact lenses dataset in WEKA.

Step 2: Apply Associate on contact dataset.

Step 3: Apply Apriori algorithm on dataset.

Step 4: Change the number of rules.

Step 5: Rule numbers change on Apriori dataset.

Result: Hence, we have performed associate on contact lenses dataset using Apriori algorithm.
Practical No. 09
AIM: Perform classification on labor dataset using decision tree.

THEORY:
Decision trees are a classic supervised learning algorithms, easy to understand and easy to use.
In this article we will describe the basic mechanism behind decision trees and we will see the
algorithm into action by using Weka (Waikato Environment for Knowledge Analysis).

The main concept behind decision tree learning is the following: starting from the training data,
we will build a predictive model which is mapped to a tree structure. The goal is to achieve
perfect classification with minimal number of decision, although not always possible due to noise
or inconsistencies in data.

Step 1: Explore labor dataset in WEKA.

Step 2: Apply decision classifier on dataset.

Step 3:Right click on tree to visualize classifier error.

AIM: Perform Classification of Supermarket dataset by using Naive Bayesian Classifier.

Practical No. 10
AIM: Perform classification on supermarket dataset using naïve bayes algorithm.

THEORY:
It is a classification technique based on Bayes Theorem with an assumption of
independence among predictors. In simple terms, a Naive Bayes classifier assumes that the
presence of a particular feature in a class is unrelated to the presence of any other feature. For
example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in
diameter. Even if these features depend on each other or upon the existence of the other
features, all of these properties independently contribute to the probability that this fruit is an
apple and that is why it is known as ‘Naive’.

Naive Bayes model is easy to build and particularly useful for very large data sets. Along with
simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.

Step 1: Explore Supermarket dataset in WEKA.

Step 2: Apply Naive bayes classifier on dataset.

Step 3:Right click on tree to visualize classifier error.

Practical No. 11
AIM: Demonstrate standardization on weather data set.
THEORY:
In standardization , it standarizes all numeric attribute in the given dataset to have zero mean and unit variance
(apart from class attribute if set).

Procedure:

Step 1: Load waether dataset in the weka explorer.

Step 2:Select standardize from filters panel

Step 3: Select the scale for standardization and apply the filter
Step 4: Output
Practical No. 12
AIM: Perform classification on weather data set using ZeroR rule.

THEORY:
ZeroR rule:

Class for building and using a 0-R classifier. Predicts the mean (for a numeric class) or the mode (for
a nominal class).

ZeroR is the simplest method which relies on the frequency of target.

Procedure :

Step 1: Load the weather data set in weka explorer.

Step 2: Classify the weather data set by using ZeroR rule.

Output :
Practical No. 13
AIM: Perform classification on weather data set using OneR rule

THEORY:
OneR: learns a one-level decision tree, i.e. generates a set of rules that test one particular attribute. Basic
version (assuming nominal attributes):
• One branch for each of the attribute’s values
• Each branch assigns most frequent class
• Error rate: proportion of instances that don't belong to the majority class of their corresponding
branch
• Choose attribute with lowest error rate
Procedure :

Step 1: Load the weather data set in weka explorer.

Step 2: Classify the weather data set by using OneoR rule.

Output : Visualize threshold curve

Practical No. 14
AIM: Perform k means clustering on iris data set.

THEORY:
Procedure :

Step 1: Load the iris data set in weka explorer.

Step 2: Apply k means Clustering on iris dataset .

Output : Visualize Clustering assignment.
Practical No. 15
AIM: Use multiple ROC curve for model evaluation

THEORY:

Procedure :

Step 1: click on Knowledge Flow.

Step 2: select Arff loader

Step 3: select Class assigner .

Step 4: Apply naïve bayes.

Step 5: Model Performance chart

Step 6:

Step 7 output:

Ccs341-Data-Warehousing-Lab-Manual2021 240410 1745 250417 141609
No ratings yet
Ccs341-Data-Warehousing-Lab-Manual2021 240410 1745 250417 141609
46 pages
DM Tools Sample-1
No ratings yet
DM Tools Sample-1
72 pages
Weka LAB-ALL
No ratings yet
Weka LAB-ALL
19 pages
Experiment No: 01 Data Exploration & Data Preprocessing
No ratings yet
Experiment No: 01 Data Exploration & Data Preprocessing
54 pages
DM Lab 1
No ratings yet
DM Lab 1
6 pages
Lab Record
No ratings yet
Lab Record
26 pages
ccs341 Data Warehousing Lab Manual2021
No ratings yet
ccs341 Data Warehousing Lab Manual2021
41 pages
Experiment 1: Installation of WEKA Tool Aim
No ratings yet
Experiment 1: Installation of WEKA Tool Aim
19 pages
DWM Lab Record
No ratings yet
DWM Lab Record
52 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
47 pages
BI - Experiment - No - 1
No ratings yet
BI - Experiment - No - 1
7 pages
Data Mining & Predictive Modeling Lab
No ratings yet
Data Mining & Predictive Modeling Lab
23 pages
DWDM Lab Manual 7th Sem
No ratings yet
DWDM Lab Manual 7th Sem
45 pages
L6 Data Preprocessing
No ratings yet
L6 Data Preprocessing
9 pages
DMLab
No ratings yet
DMLab
27 pages
Data Mining Lab Guide
33% (3)
Data Mining Lab Guide
44 pages
Data Warehousing Lab Manual 2021
No ratings yet
Data Warehousing Lab Manual 2021
48 pages
CCS341-Data Warehousing Lab Manual (2021)
No ratings yet
CCS341-Data Warehousing Lab Manual (2021)
88 pages
Data Mining in Bioinformatics
No ratings yet
Data Mining in Bioinformatics
21 pages
Weka Data Mining Lab Guide
No ratings yet
Weka Data Mining Lab Guide
20 pages
DMDV 210
No ratings yet
DMDV 210
61 pages
DMW LabFile 0901CS243D11 Swastik
No ratings yet
DMW LabFile 0901CS243D11 Swastik
25 pages
Data-Mining-Lab-Manual Cs 703b
No ratings yet
Data-Mining-Lab-Manual Cs 703b
41 pages
Perform Data Pre-Processing On Sample Data Set (Student - Arff)
No ratings yet
Perform Data Pre-Processing On Sample Data Set (Student - Arff)
4 pages
Data Mining Lab Manual: Aurora's PG College Moosarambagh Mca Department
No ratings yet
Data Mining Lab Manual: Aurora's PG College Moosarambagh Mca Department
42 pages
DM Lab Manualiii I 1 Mrits
No ratings yet
DM Lab Manualiii I 1 Mrits
39 pages
Step1. Open The Data/bank Data - CSV Dataset
No ratings yet
Step1. Open The Data/bank Data - CSV Dataset
3 pages
DMDV 210
No ratings yet
DMDV 210
63 pages
Assignment 1-Preprocessing Handon
No ratings yet
Assignment 1-Preprocessing Handon
6 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
40 pages
DAV Lab1 For Students
No ratings yet
DAV Lab1 For Students
2 pages
DWDM Lab Manual Using Weka-For MIC
No ratings yet
DWDM Lab Manual Using Weka-For MIC
42 pages
Itdw
No ratings yet
Itdw
44 pages
DMLab
No ratings yet
DMLab
14 pages
CH1-data Preprocessing
No ratings yet
CH1-data Preprocessing
49 pages
Data Analytics Using WEKA
No ratings yet
Data Analytics Using WEKA
65 pages
WEKA Data Preprocessing Guide
No ratings yet
WEKA Data Preprocessing Guide
15 pages
Perform Data Preprocessing Tasks Using Labor Data Set in WEKA
No ratings yet
Perform Data Preprocessing Tasks Using Labor Data Set in WEKA
6 pages
Demonstration of Preprocessing On Dataset Student - Arff Aim: This Experiment Illustrates Some of The Basic Data Preprocessing Operations That Can Be
100% (1)
Demonstration of Preprocessing On Dataset Student - Arff Aim: This Experiment Illustrates Some of The Basic Data Preprocessing Operations That Can Be
4 pages
Big Data & Weka Tool Guide
No ratings yet
Big Data & Weka Tool Guide
32 pages
DWDM Record With Alignment
No ratings yet
DWDM Record With Alignment
69 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
11 pages
Lab Manual
No ratings yet
Lab Manual
16 pages
Data Mining File
No ratings yet
Data Mining File
87 pages
WEKA Manual
No ratings yet
WEKA Manual
25 pages
Weka Lab
No ratings yet
Weka Lab
11 pages
CCS341-Data Warehousing Lab Manual (2021)
100% (1)
CCS341-Data Warehousing Lab Manual (2021)
50 pages
Data Mining Lab Manual COMPLETE GMR
No ratings yet
Data Mining Lab Manual COMPLETE GMR
66 pages
5 Data Preprocessing III Editted Notes
No ratings yet
5 Data Preprocessing III Editted Notes
17 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
Data Preprocessing & Analysis with WEKA
No ratings yet
Data Preprocessing & Analysis with WEKA
31 pages
WEKA Data Mining Practical Guide
No ratings yet
WEKA Data Mining Practical Guide
18 pages
WEKA Data Mining Lab Manual
100% (1)
WEKA Data Mining Lab Manual
8 pages
CP1407 Assignment Final
No ratings yet
CP1407 Assignment Final
13 pages
Physics Coursework Writing Tips
100% (2)
Physics Coursework Writing Tips
6 pages
Library Copy Hultman Thesis
No ratings yet
Library Copy Hultman Thesis
332 pages
Chapter 21 - Exploration and Discovery
No ratings yet
Chapter 21 - Exploration and Discovery
14 pages
Tools and Techniques in Evaluation 2
No ratings yet
Tools and Techniques in Evaluation 2
30 pages
IBM FlashCopy Backup Solution For SAP HANA TDI 1.6
No ratings yet
IBM FlashCopy Backup Solution For SAP HANA TDI 1.6
23 pages
Master Thesis Information Science
No ratings yet
Master Thesis Information Science
86 pages
An Analysis of Customer Orientation of ZAHEEN Knit Wears LTD
No ratings yet
An Analysis of Customer Orientation of ZAHEEN Knit Wears LTD
40 pages
PL/SQL Technical Assessment
No ratings yet
PL/SQL Technical Assessment
11 pages
(FREE PDF Sample) Relational Database Design Clearly Explained 2nd Edition Edition Jan Lharington Ebooks
100% (9)
(FREE PDF Sample) Relational Database Design Clearly Explained 2nd Edition Edition Jan Lharington Ebooks
41 pages
MS-Access 2007 Post-Test
No ratings yet
MS-Access 2007 Post-Test
2 pages
Fil Psych Intro-Pakapa Kapa
No ratings yet
Fil Psych Intro-Pakapa Kapa
15 pages
EDGE Server Schemas
No ratings yet
EDGE Server Schemas
54 pages
Research Guideline
100% (2)
Research Guideline
34 pages
Determine Database Functionality Exam
No ratings yet
Determine Database Functionality Exam
3 pages
Eapp Reviewer Q2
No ratings yet
Eapp Reviewer Q2
6 pages
ITS62904 Assignment APRIL2023
No ratings yet
ITS62904 Assignment APRIL2023
11 pages
Visual Aids Boost Autism Learning
100% (1)
Visual Aids Boost Autism Learning
22 pages
Databricks Class 1 PPT
No ratings yet
Databricks Class 1 PPT
8 pages
DBMS Lab Exercises
No ratings yet
DBMS Lab Exercises
30 pages
Big - Data - Urban - Planning Dr. Harish
0% (1)
Big - Data - Urban - Planning Dr. Harish
50 pages
SAP S/4HANA Migration Guide
No ratings yet
SAP S/4HANA Migration Guide
59 pages
Synopsis of T24 Java Documentations PDF
No ratings yet
Synopsis of T24 Java Documentations PDF
1 page
Data Collection Techniques
No ratings yet
Data Collection Techniques
15 pages
12th Computer CHP 2
No ratings yet
12th Computer CHP 2
4 pages
Datasheet Seclore Data Centric Security Platform
No ratings yet
Datasheet Seclore Data Centric Security Platform
2 pages
AI Policy Education ChatGPT Policy in HE
No ratings yet
AI Policy Education ChatGPT Policy in HE
25 pages
Oracle Training
No ratings yet
Oracle Training
16 pages
Basic Statistics For Social Science Syllabus
No ratings yet
Basic Statistics For Social Science Syllabus
3 pages
Top 15 YouTube Channels for Power BI
No ratings yet
Top 15 YouTube Channels for Power BI
17 pages
Communication Management
No ratings yet
Communication Management
20 pages

DWM Lab Manual 2025-26 Updated

Uploaded by

DWM Lab Manual 2025-26 Updated

Uploaded by

SHRI SAI SHIKSHAN SANSTHA’S

NAGPUR INSTITUTE OF TECHNOLOGY, NAGPUR

DEPARTMENT OF INFORMATION TECHNOLOGY

On successful completion of the course, students will be able to:

Perform preprocessing on bank dataset by copying attributes in the

5 Perform Nominal to Binary conversion on weather dataset.

6 Perform remove filter on weather dataset.

Generate decision tree using J48 algorithm.

8 PR08 Perform association on contact lenses dataset using Apriory algorithm.

Perform classification on labor dataset using decision tree.

Perform classification on weather data set using ZeroR rule.

Perform classification on weather data set using OneR rule.

Perform k means clustering on iris data set.

Use multiple ROC curve for model evaluation.

1. Perform Pre-processing On dataset.

2. Perform Classification on Datasets.

3. Perform normalization & discretization on dataset.

Name Class Dorm Room GPA

Sally Whittaker 2018 McCarren House 312 3.75

Sally Whittaker,2018,McCarren House,312,3.75

Belinda Jameson,2017,Cushing House,148,3.52

Jeff Smith,2018,Prescott House,17-D,3.20

Sandy Allen,2019,Oliver House,108,3.48

Step 2: Save the dataset in CSV format

The data contains the following fields

Step 1: Load bank dataset in weka explorer.

Step 2: Choose the copy filter in the filters panel.

 intervals Some classification algorithms only accept

Step 1: Load iris dataset in the weka explorer.

Step 2:Select Discretize from filters panel

Step 1: Load iris dataset in the weka explorer.

1. First Normal Form (1NF)

2. Second Normal Form (2NF)

Step 1: Load weather dataset in the weka explorer.

Step 2: Save the file as .csv extension.

Step 4: Choose the nominal to binary filter

Step 5 OUTPUT: After selection click apply button.

Step 1: Load weather dataset in the weka explorer.

Step 2: Save the file as .csv extension.

Step 4:Choose the remove filter.

Modified J48 Decision Tree Algorithm

Disadvantages of J48 algorithm:

Fig: Explore the excel file in WEKA explorer.

Fig: Classifier Output.

Fig: Classifier decision tree view.

Result: Hence,we have generated decision tree using J48 algorithm.

Step 1: Explore contact lenses dataset in WEKA.

Step 2: Apply Associate on contact dataset.

Step 4: Change the number of rules.

Step 1: Explore labor dataset in WEKA.

Step 3:Right click on tree to visualize classifier error.

Step 1: Explore Supermarket dataset in WEKA.

Step 2: Apply Naive bayes classifier on dataset.

Step 1: Load waether dataset in the weka explorer.

Step 2:Select standardize from filters panel

ZeroR is the simplest method which relies on the frequency of target.

Step 1: Load the weather data set in weka explorer.

Step 2: Classify the weather data set by using ZeroR rule.

Step 1: Load the weather data set in weka explorer.

Output : Visualize threshold curve

Step 1: Load the iris data set in weka explorer.

Step 2: Apply k means Clustering on iris dataset .

Step 1: click on Knowledge Flow.

Step 2: select Arff loader

Step 4: Apply naïve bayes.

You might also like