0% found this document useful (0 votes)

37 views11 pages

DM Module 1

Ugtuctct bvc

Uploaded by

abhiramsurya48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views11 pages

DM Module 1

Ugtuctct bvc

Uploaded by

abhiramsurya48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

MODULE 1

DATA MINING

Introduction to Data Mining:

Data mining is the process of extracting useful information from large sets of data. It involves using various
techniques from statistics, machine learning, and database systems to identify patterns, relationships, and trends in
the data. This information can then be used to make data-driven decisions, solve business problems, and uncover
hidden insights. Applications of data mining include customer profiling and segmentation, market basket analysis,
anomaly detection, and predictive modeling. Data mining tools and technologies are widely used in various
industries, including finance, healthcare, retail, and telecommunications.
In general terms, “Mining” is the process of extraction of some valuable material from the earth e.g. coal mining,
diamond mining, etc.
It is basically the process carried out for the extraction of useful information from a bulk of data or data
warehouses. One can see that the term itself is a little confusing. In the case of coal or diamond mining, the result of
the extraction process is coal or diamond. But in the case of Data Mining, the result of the extraction process is not
data!! Instead, data mining results are the patterns and knowledge that we gain at the end of the extraction process.
In that sense, we can think of Data Mining as a step in the process of Knowledge Discovery or Knowledge
Extraction.

Data Mining Definitions:

The actual data mining task is the semi-automatic or automatic analysis of large quantities of data to extract previously
unknown, interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection),
and dependencies (association rule mining, sequential pattern mining). This usually involves using database techniques
such as spatial indices. These patterns can then be seen as a kind of summary of the input data, and may be used in
further analysis or, for example, in machine learning and predictive analytics. For example, the data mining step might
identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a decision
support system. Neither the data collection, data preparation, nor result interpretation and reporting is part of the data
mining step, although they do belong to the overall KDD process as additional steps.

The difference between data analysis and data mining is that data analysis is used to test models and hypotheses on the
dataset, e.g., analyzing the effectiveness of a marketing campaign, regardless of the amount of data. In contrast, data
mining uses machine learning and statistical models to uncover clandestine or hidden patterns in a large volume of
data.[8]

The related terms data dredging, data fishing, and data snooping refer to the use of data mining methods to sample
parts of a larger population data set that are (or may be) too small for reliable statistical inferences to be made about the
validity of any patterns discovered. These methods can, however, be used in creating new hypotheses to test against the
larger data populations.

KDD: Knowledge discovery database

KDD (Knowledge Discovery in Databases) is a process that involves the extraction of useful, previously unknown,
and potentially valuable information from large datasets. The KDD process is an iterative process and it requires
multiple iterations of the above steps to extract accurate knowledge from the data. The following steps are included
in KDD process:

Data Cleaning:
Data cleaning is defined as removal of noisy and irrelevant data from collection.

Cleaning in case of Missing values.

Cleaning noisy data, where noise is a random or variance error.

Cleaning with Data discrepancy detection and Data transformation tools.

Data Integration:
Data integration is defined as heterogeneous data from multiple sources combined in a common source(Data
Warehouse). Data integration using Data Migration tools, Data Synchronization tools and ETL(Extract-Load-
Transformation) process.

Data Selection:
Data selection is defined as the process where data relevant to the analysis is decided and retrieved from the d ata
collection. For this we can use Neural network, Decision Trees, Naive bayes, Clustering,
and Regression methods.
Data Transformation:
Data Transformation is defined as the process of transforming data into appropriate form required by mining
procedure. Data Transformation is a two step process:

1. Data Mapping: Assigning elements from source base to destination to capture transformations.
2. Code generation: Creation of the actual transformation program.

Data Mining:
Data mining is defined as techniques that are applied to extract patterns potentially useful. It transforms task relevant
data into patterns, and decides purpose of model using classification or characterization.

Pattern Evaluation:
Pattern Evaluation is defined as identifying strictly increasing patterns representing knowledge based on given
measures. It find interestingness score of each pattern, and uses summarization and Visualization to make data
understandable by user.

Knowledge Representation:
This involves presenting the results in a way that is meaningful and can be used to make decisions.

Differences between KDD and Data Mining:

Parameter KDD Data Mining

KDD refers to a process of identifying valid, Data Mining refers to a process of extracting

Definition novel, potentially useful, and ultimately useful and valuable information or patterns
understandable patterns and relationships in data. from large data sets.

Objective To find useful knowledge from data. To extract useful information from data.

Data cleaning, data integration, data selection, data Association rules, classification, clustering,
Techniques transformation, data mining, pattern evaluation, regression, decision trees, neural networks,
Used
and knowledge representation and visualization. and dimensionality reduction.

Patterns, associations, or insights that can be

Structured information, such as rules and models,
Output used to improve decision-making or
that can be used to make decisions or predictions.
understanding.
Parameter KDD Data Mining

Focus is on the discovery of useful knowledge, Data mining focus is on the discovery of
Focus
rather than simply finding patterns in data. patterns or relationships in data.

Domain expertise is less critical in data

Domain expertise is important in KDD, as it helps
Role of mining, as the algorithms are designed to
domain in defining the goals of the process, choosing
identify patterns without relying on prior
expertise appropriate data, and interpreting the results.
knowledge.

Differences between DBMS and Data Mining:

DBMS (DATABASE MANAGEMENT
FEATURE SYSTEM) DATA MINING
Analyzing data to extract patterns and
Focus Storing, organizing, and managing data relationships

Identifies patterns and interesting relationships

Technique Creates, modifies, and queries databases in data

Relational databases, transactional databases, Business decision making, data analysis,

Application data warehousing pattern recognition

Data mining algorithms, machine learning

Tools My SQL, Oracle, SQL Server, SQL techniques

Database design, data entry, data retrieval, data Data cleaning, data preprocessing, data
Process manipulation analysis, data visualization

Data Mining Techniques:

1. Association:

Association analysis is the finding of association rules showing attribute-value conditions that occur frequently
together in a given set of data. Association analysis is widely used for a market basket or transaction data analysis.
Association rule mining is a significant and exceptionally dynamic area of data mining research. One method of
association-based classification, called associative classification, consists of two steps. In the main step, association
instructions are generated using a modified version of the standard association rule mining algorithm known as
Apriori. The second step constructs a classifier based on the association rules discovered.

2. Classification

Classification is the processing of finding a set of models (or functions) that describe and distinguish data classes or
concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown.
The determined model depends on the investigation of a set of training data information (i.e. data objects whose
class label is known). The derived model may be represented in various forms, such as classification (if – then) rules,
decision trees, and neural networks. Data Mining has a different type of classifier:
 Decision Tree
 SVM(Support Vector Machine)
 Generalized Linear Models
 Bayesian classification:
 Classification by Back propagation
 K-NN Classifier
 Rule-Based Classification
 Frequent-Pattern Based Classification
 Rough set theory
 Fuzzy Logic

3. Prediction
Data Prediction is a two-step process, similar to that of data classification. Although, for prediction, we do not utilize
the phrasing of “Class label attribute” because the attribute for which values are being predicted is consistently
valued(ordered) instead of categorical (discrete-esteemed and unordered). The attribute can be referred to simply as
the predicted attribute. Prediction can be viewed as the construction and use of a model to assess the class of an
unlabeled object, or to assess the value or value ranges of an attribute that a given object is likely to have.
4. Clustering:
Unlike classification and prediction, which analyze class-labeled data objects or attributes, clustering analyzes data
objects without consulting an identified class label. In general, the class labels do not exist in the training data
simply because they are not known to begin with. Clustering can be used to generate these labels. The objects are
clustered based on the principle of maximizing the intra-class similarity and minimizing the interclass similarity.
That is, clusters of objects are created so that objects inside a cluster have high similarity in contrast with each
other, but are different objects in other clusters. Each Cluster that is generated can be seen as a class of objects, from
which rules can be inferred. Clustering can also facilitate classification formatio n, that is, the organization of
observations into a hierarchy of classes that group similar events together.

5. Regression:

Regression can be defined as a statistical modeling method in which previously obtained data is used to predicting a
continuous quantity for new observations. This classifier is also known as the Continuous Value Classifier. There are
two types of regression models: Linear regression and multiple linear regression models.

6. Artificial Neural network (ANN) Classifier Method:

An artificial neural network (ANN) also referred to as simply a “Neural Network” (NN), could be a process model
supported by biological neural networks. It consists of an interconnected collection of artificial neurons. A neural
network is a set of connected input/output units where each connection has a weight associated with it. During the
knowledge phase, the network acquires by adjusting the weights to be able to predict t he correct class label of the
input samples.
The advantages of neural networks, however, contain their high tolerance to noisy data as well as their ability to
classify patterns on which they have not been trained. In addition, several algorithms have new ly been developed for
the extraction of rules from trained neural networks. These issues contribute to the usefulness of neural networks for
classification in data mining.

7. Outlier Detection:

A database may contain data objects that do not comply with the general behavior or model of the data. These data
objects are Outliers. The investigation of OUTLIER data is known as OUTLIER MINING. An outlier may be
detected using statistical tests which assume a distribution or probability model for the data, or using distance
measures where objects having a small fraction of “close” neighbors in space are considered outliers. Rather than
utilizing factual or distance measures, deviation-based techniques distinguish exceptions/outlier by inspecting
differences in the principle attributes of items in a group.
8. Genetic Algorithm:
Genetic algorithms are adaptive heuristic search algorithms that belong to the larger part of evolutionary algorithms.
Genetic algorithms are based on the ideas of natural selection and genetics. These are intelligent exploitation of
random search provided with historical data to direct the search into the region of better performance in solution
space. They are commonly used to generate high-quality solutions for optimization problems and search problems.
Genetic algorithms simulate the process of natural selection which means those species who can adapt to changes in
their environment are able to survive and reproduce and go to the next generation. In simple words, they simulate
“survival of the fittest” among individuals of consecutive generations for solving a problem. Each generation consist
of a population of individuals and each individual represents a point in search space and possible solution. Each
individual is represented as a string of character/integer/float/bits. This string is analogous to the Chromosome.

Problems and Challenges in Data Mining:

1] Data Quality:
The quality of data used in data mining is one of the most significant challenges. The accuracy, completeness, and
consistency of the data affect the accuracy of the results obtained. The data may contain errors, omissions,
duplications, or inconsistencies, which may lead to inaccurate results. Moreover, the data may be incomplete, meaning
that some attributes or values are missing, making it challenging to obtain a complete understanding of the data.
Data quality issues can arise due to a variety of reasons, including data entry errors, data storage issues, data integration
problems, and data transmission errors. To address these challenges, data mining practitioners must apply data cleaning
and data preprocessing techniques to improve the quality of the data. Data cleaning involves detecting and correcting
errors, while data preprocessing involves transforming the data to make it suitable for data mining.

2] Data Complexity:
Data complexity refers to the vast amounts of data generated by various sources, such as sensors, social media, and the
internet of things (IOT). The complexity of the data may make it challenging to process, analyze, and understand. In
addition, the data may be in different formats, making it challenging to integrate into a single dataset.
To address this challenge, data mining practitioners use advanced techniques such as clustering, classification, and
association rule mining. These techniques help to identify patterns and relationships in the data, which can then be used
to gain insights and make predictions.

3] Data Privacy and Security:

Data privacy and security is another significant challenge in data mining. As more data is collected, stored, and
analyzed, the risk of data breaches and cyber-attacks increases. The data may contain personal, sensitive, or
confidential information that must be protected. Moreover, data privacy regulations such as GDPR, CCPA, and HIPAA
impose strict rules on how data can be collected, used, and shared.
To address this challenge, data mining practitioners must apply data encryption techniques to protect the privacy and
security of the data. Data encryption involves removing personally identifiable information (PII) from the data, while
data encryption involves using algorithms to encode the data to make it unreadable to unauthorized users.

4] Scalability:
Data mining algorithms must be scalable to handle large datasets efficiently. As the size of the dataset increases, the
time and computational resources required to perform data mining operations also increase. Moreover, the
algorithms must be able to handle streaming data, which is generated continuously and must be processed in real-
time.
To address this challenge, data mining practitioners use distributed computing frameworks such as Hadoop and
Spark. These frameworks distribute the data and processing across multiple nodes, making it possible to process
large datasets quickly and efficiently.

5] Interpretability:
Data mining algorithms can produce complex models that are difficult to interpret. This is because the algorithms
use a combination of statistical and mathematical techniques to identify patterns and relationships in the data.
Moreover, the models may not be intuitive, making it challenging to understand how the model arrived at a
particular conclusion.
To address this challenge, data mining practitioners use visualization techniques to represent the data and the models
visually. Visualization makes it easier to understand the patterns and relationships in the data and to identify the
most important variables.

6] Ethics:
Data mining raises ethical concerns related to the collection, use, and dissemination of data. The data may be used to
discriminate against certain groups, violate privacy rights, or perpetuate existing biases. Moreover, data mining
algorithms may not be transparent, making it challenging to detect biases or discrimination.

Data Mining Applications

There are many measurable benefits that have been achieved in different application areas from data mining. So,
let’s discuss different applications of Data Mining:
Scientific Analysis: Scientific simulations are generating bulks of data every day. This includes data collected from
nuclear laboratories, data about human psychology, etc. Data mining techniques are capable of the analysis of these
data. Now we can capture and store more new data faster than we can analyze the old data a lready accumulated.
Example of scientific analysis:
 Sequence analysis in bioinformatics
 Classification of astronomical objects
 Medical decision support.

Intrusion Detection: A network intrusion refers to any unauthorized activity on a digital network. Network
intrusions often involve stealing valuable network resources. Data mining technique plays a vital role in searching
intrusion detection, network attacks, and anomalies. These techniques help in selecting and refining useful and
relevant information from large data sets. Data mining technique helps in classify relevant data for Intrusion
Detection System. Intrusion Detection system generates alarms for the network traffic about the foreign invasions in
the system. For example:
 Detect security violations
 Misuse Detection
 Anomaly Detection

Business Transactions: Every business industry is memorized for perpetuity. Such transactions are usually time-
related and can be inter-business deals or intra-business operations. The effective and in-time use of the data in a
reasonable time frame for competitive decision-making is definitely the most important problem to solve for
businesses that struggle to survive in a highly competitive world. Data mining helps to analyze these business
transactions and identify marketing approaches and decision-making.
 Direct mail targeting
 Stock trading
 Customer segmentation
 Churn prediction (Churn prediction is one of the most popular Big Data use cases in business)

Market Basket Analysis: Market Basket Analysis is a technique that gives the careful study of purchases done by a
customer in a supermarket. This concept identifies the pattern of frequent purchase items by customers. This analysis
can help to promote deals, offers, sale by the companies and data mining techniques helps to achieve this analysis
task. Example:
 Data mining concepts are in use for Sales and marketing to provide better customer service, to improve cross-
selling opportunities, to increase direct mail response rates.
 Customer Retention in the form of pattern identification and prediction of likely defections is possible by Data
mining.
 Risk Assessment and Fraud area also use the data-mining concept for identifying inappropriate or unusual
behavior etc.

Education: For analyzing the education sector, data mining uses Educational Data Mining (EDM) method. This
method generates patterns that can be used both by learners and educators. By using data mining EDM we can
perform some educational task:
 Predicting students admission in higher education
 Predicting students profiling
 Predicting student performance
 Teachers teaching performance
 Curriculum development
 Predicting student placement opportunities

Research: A data mining technique can perform predictions, classification, clustering, associations, and grouping of
data with perfection in the research area. Rules generated by data mining are unique to find results. In most of the
technical research in data mining, we create a training model and testing model. The training/testing model is a
strategy to measure the precision of the proposed model. It is called Train/Test because we split the data set into two
sets: a training data set and a testing data set. A training data set used to design the training model whereas testing
data set is used in the testing model. Example:
 Classification of uncertain data.
 Information-based clustering.
 Decision support system
 Web Mining
 Domain-driven data mining
 IOT (Internet of Things)and Cyber security
 Smart farming IOT(Internet of Things)

Healthcare and Insurance: A Pharmaceutical sector can examine its new deals force activity and their outcomes to
improve the focusing of high-value physicians and figure out which promoting activities will have the best effect in
the following upcoming months, Whereas the Insurance sector, data mining can help to predict which customers will
buy new policies, identify behavior patterns of risky customers and identify fraudulent behavior of customers.
 Claims analysis i.e which medical procedures are claimed together.
 Identify successful medical therapies for different illnesses.
 Characterizes patient behavior to predict office visits.

Transportation: A diversified transportation company with a large direct sales force can apply data mining to
identify the best prospects for its services. A large consumer merchandise organization can apply information mining
to improve its business cycle to retailers.
 Determine the distribution schedules among outlets.
 Analyze loading patterns.

Financial/Banking Sector: A credit card company can leverage its vast warehouse of customer transaction data to
identify customers most likely to be interested in a new credit product.
 Credit card fraud detection.
 Identify ‘Loyal’ customers.
 Extraction of information related to customers.
 Determine credit card spending by customer groups.

DWM 4
No ratings yet
DWM 4
23 pages
Data Mining Lecture One - Docx1
No ratings yet
Data Mining Lecture One - Docx1
12 pages
Data Mining
No ratings yet
Data Mining
25 pages
Unit I DM
No ratings yet
Unit I DM
27 pages
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
4 pages
1.1 Data and Information Mining
No ratings yet
1.1 Data and Information Mining
24 pages
Unit 1 Datamining For Business Intelligence
No ratings yet
Unit 1 Datamining For Business Intelligence
101 pages
Subject Data Warehouse
No ratings yet
Subject Data Warehouse
42 pages
Data Mining Essentials for Students
No ratings yet
Data Mining Essentials for Students
15 pages
Data Mining New
No ratings yet
Data Mining New
21 pages
DMWH M1
No ratings yet
DMWH M1
25 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Data Mining & KDD Overview
No ratings yet
Data Mining & KDD Overview
22 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
U1 - Data Warehouse Intro
No ratings yet
U1 - Data Warehouse Intro
13 pages
DM Module1
No ratings yet
DM Module1
15 pages
KDD Process in Data Mining
No ratings yet
KDD Process in Data Mining
11 pages
Data Mining Research
No ratings yet
Data Mining Research
4 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
25 pages
Data Mining 4545
No ratings yet
Data Mining 4545
20 pages
DMDW Unit1
No ratings yet
DMDW Unit1
31 pages
Data Mining: Patterns & Techniques
No ratings yet
Data Mining: Patterns & Techniques
1 page
Chapater 1 Data Mining 2025
No ratings yet
Chapater 1 Data Mining 2025
7 pages
Data Mining for Business Insights
100% (3)
Data Mining for Business Insights
11 pages
Mining
No ratings yet
Mining
7 pages
DWDM 1
No ratings yet
DWDM 1
17 pages
Data Mining
No ratings yet
Data Mining
11 pages
Unit 1: Scs5623 - Data Mining and Warehousing
No ratings yet
Unit 1: Scs5623 - Data Mining and Warehousing
13 pages
Unit 3 DWM Notes
No ratings yet
Unit 3 DWM Notes
17 pages
: - -: What The Data Mining?: عوضوملا
No ratings yet
: - -: What The Data Mining?: عوضوملا
6 pages
B SC (IT) VI-DSE3-M5
No ratings yet
B SC (IT) VI-DSE3-M5
13 pages
Unit II Data Mining
No ratings yet
Unit II Data Mining
8 pages
Unit 1 Datamining
No ratings yet
Unit 1 Datamining
16 pages
Unit 1
No ratings yet
Unit 1
43 pages
Unit 1 DM
No ratings yet
Unit 1 DM
16 pages
Data Mining Survey Overview
No ratings yet
Data Mining Survey Overview
8 pages
Data Mining-1
No ratings yet
Data Mining-1
7 pages
Data Mining Module 2
No ratings yet
Data Mining Module 2
23 pages
Data Mining - First Page PDF
No ratings yet
Data Mining - First Page PDF
20 pages
Knowledge Discovery in Databases (KDD) : An Overview
No ratings yet
Knowledge Discovery in Databases (KDD) : An Overview
4 pages
Data Analysis & Mining for Business Decisions
No ratings yet
Data Analysis & Mining for Business Decisions
100 pages
Yihao Final Paper CCSC For Submission
No ratings yet
Yihao Final Paper CCSC For Submission
6 pages
Module1 DataMining Ktustudents - in
No ratings yet
Module1 DataMining Ktustudents - in
24 pages
New Note
No ratings yet
New Note
23 pages
Fund Data Science
No ratings yet
Fund Data Science
91 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
8 pages
Data Mining: Applications and Techniques
No ratings yet
Data Mining: Applications and Techniques
60 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Data Mining Questions 1st Unit
No ratings yet
Data Mining Questions 1st Unit
6 pages
Data Mining
No ratings yet
Data Mining
19 pages
Data Mining and Its Techniques: A Review Paper: Maria Shoukat (MS Student)
No ratings yet
Data Mining and Its Techniques: A Review Paper: Maria Shoukat (MS Student)
7 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
22 pages
Data Mining
No ratings yet
Data Mining
14 pages
Data Mining Processes
No ratings yet
Data Mining Processes
14 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
KDD vs Data Mining Explained
No ratings yet
KDD vs Data Mining Explained
2 pages
Chap1 Introduction
No ratings yet
Chap1 Introduction
58 pages
PHP VI TH 3rd
100% (1)
PHP VI TH 3rd
15 pages
UNIT-2 AI Notes
No ratings yet
UNIT-2 AI Notes
7 pages
Final-BCA V and VI Sem Syllabus
No ratings yet
Final-BCA V and VI Sem Syllabus
25 pages
DM Module 4
No ratings yet
DM Module 4
12 pages
Unit-3 AIA (BCA3) Not Complete
No ratings yet
Unit-3 AIA (BCA3) Not Complete
15 pages
Impact of Homogeneous Classrooms on Stats
No ratings yet
Impact of Homogeneous Classrooms on Stats
18 pages
A New Deep Neural Network For Forecasting Deep Dendritic Artificial Neural Network
No ratings yet
A New Deep Neural Network For Forecasting Deep Dendritic Artificial Neural Network
25 pages
Unit 3 - Binary
No ratings yet
Unit 3 - Binary
10 pages
Digital Signature & Electronic Governance
No ratings yet
Digital Signature & Electronic Governance
15 pages
A Summary of PI and PID Controller Tuning Rules For Processes With Time Delay. Part 2: PID Controller Tuning Rules
No ratings yet
A Summary of PI and PID Controller Tuning Rules For Processes With Time Delay. Part 2: PID Controller Tuning Rules
7 pages
Polynomial Operations Concept Map
100% (1)
Polynomial Operations Concept Map
1 page
Dimension Reduction Methods
No ratings yet
Dimension Reduction Methods
32 pages
07 Task Performance Flores
No ratings yet
07 Task Performance Flores
2 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Simultaneous Ed With Quadratic and Linear
No ratings yet
Simultaneous Ed With Quadratic and Linear
3 pages
حسین
No ratings yet
حسین
3 pages
Introduction To Control System Course Outline
No ratings yet
Introduction To Control System Course Outline
5 pages
Pvp19 Dbms Unit-4 Material
No ratings yet
Pvp19 Dbms Unit-4 Material
41 pages
Ajay Kumar Garg Engineering College, Ghaziabad Department of Mechanical Engineering Sessional Test-2
No ratings yet
Ajay Kumar Garg Engineering College, Ghaziabad Department of Mechanical Engineering Sessional Test-2
7 pages
Complexity
No ratings yet
Complexity
39 pages
Artificial Bee Colony Optimization For Multi-Area Economic Dispatch
No ratings yet
Artificial Bee Colony Optimization For Multi-Area Economic Dispatch
18 pages
Advanced Optimization and Operations Res
No ratings yet
Advanced Optimization and Operations Res
15 pages
Machine Learning Probability Homework
No ratings yet
Machine Learning Probability Homework
3 pages
3 Uninformed Search
No ratings yet
3 Uninformed Search
77 pages
Control Systems: GATE Objective & Numerical Type Solutions
No ratings yet
Control Systems: GATE Objective & Numerical Type Solutions
9 pages
Controller Design (Based On Transient Response Criteria: To Determine Controller Settings For P, PI or PID Controllers
No ratings yet
Controller Design (Based On Transient Response Criteria: To Determine Controller Settings For P, PI or PID Controllers
66 pages
Planning
100% (1)
Planning
26 pages
Arrays & Functions Explained
No ratings yet
Arrays & Functions Explained
6 pages
Load Flow Analysis Methods
No ratings yet
Load Flow Analysis Methods
42 pages
Ma724 - 38
No ratings yet
Ma724 - 38
7 pages
Phys 121
100% (1)
Phys 121
3 pages
Codechef Syllabus
No ratings yet
Codechef Syllabus
4 pages
Combinatorics and Probability Problems
No ratings yet
Combinatorics and Probability Problems
41 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Artificial Intelligence For Structural Integrity Analysis
No ratings yet
Artificial Intelligence For Structural Integrity Analysis
19 pages

DM Module 1

Uploaded by

DM Module 1

Uploaded by

MODULE 1

Introduction to Data Mining:

Data Mining Definitions:

KDD: Knowledge discovery database

Cleaning in case of Missing values.

Cleaning noisy data, where noise is a random or variance error.

Cleaning with Data discrepancy detection and Data transformation tools.

Differences between KDD and Data Mining:

Patterns, associations, or insights that can be

Domain expertise is less critical in data

Differences between DBMS and Data Mining:

Identifies patterns and interesting relationships

Relational databases, transactional databases, Business decision making, data analysis,

Data mining algorithms, machine learning

Data Mining Techniques:

6. Artificial Neural network (ANN) Classifier Method:

Problems and Challenges in Data Mining:

3] Data Privacy and Security:

Data Mining Applications

You might also like