[go: up one dir, main page]

0% found this document useful (0 votes)
136 views6 pages

Data Mining - Docx Ghhdocx

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 6

INTRODUCTION

Basically, data mining is about processing data and identifying patterns and trends in that
information so that you can choose. Data mining principles have been around for many years,
but, with the initiation of big data, it is even more established.
Big data caused an eruption in the use of more extensive data mining approaches moderately
because the size of the information is much big and because the information tends to be more
diverse and widespread in its very environment and content. With large data sets, it is no
longer enough to get comparatively easy and clear-cut statistics out of the system. With 30 or
40 million records of thorough customer information, knowing that two million of them live
in one spot is not enough. You want to know whether those two million are an exacting age
group and their average wages so that you can aim your customer needs better.
These business-driven requirements changed simple data recovery and figures into more
composite data mining. The business problem drives an examination of the data that helps to
build a mould to describe the information that eventually leads to the creation of the resulting
report.

The data analysis is a procedure that often follows strict rules that can be used repeatedly
and recognize the diverse data that can be retrieved. It is also very important to be able to,
map, relate, cluster and associate it with different data to get a particular outcome.

Data mining is not only restricted to the software or hardware that is in use.
Data mining can also be performed on simple software. The benefits of
complex data mining and algorithms are being appreciated a lot.

It is recent that the very large data sets and the cluster and large-scale data
processing are able to allow data mining to collate and report on groups and
correlations of data that are more complicated. Now an entirely new range of tools
and systems available, including combined data storage and processing systems.
A flow chart that could properly explain the process of data mining is displayed
Key Techniques and Examples

There are a no of different techniques that can be used in data mining to


describe the different types of mining and operations used to recover data.

Following are the different techniques and examples that explain the building
of data mining:

ASSOCIATION

Association is a well know, understood and probably the most widely


used data mining technique. A relation between different items or
different types of data is observed and identified to build a particular
pattern. For example, it can be observed in a sports market, a person
buying bat may also apparently land up buying a ball, so if this data is
studied then both bat and a ball may be associated together in order for
future demand.

This technique of association can be used with the help of different tools.
For example, InfoSphere Warehouse is tool that can be used for
association

.
Following is an example from the sample database:

Classification
A classification may be made to build an idea of the different type of items,
data by putting a number of constraints to eventually make a class under
which different data can be organised efficiently. For example, cars are
classified into different types like suv , sedans etc. Now a car can be slotted
into one of these classified categories by comparing the constraints .
Clustering

By studying different constraints and class , data can be grouped together in order to
identify or examine the grouped data. Commonly, clustering is studying two or more
constrained data in order to observe the correlation between them. Clustering is very
important to identify plenty of data in order to examine the similarities between them.

The graph displayed shows a good example. In this example, size of sale is
compared with the age of the customer.
From the data shown in the graph, it can be noticed that the number of points
clustering together has the highest probability of certain age group purchasing
a certain amount of products.

Decision trees
Related to most of the other techniques (primarily classification and prediction), the
decision tree can be used either as a part of the selection criteria, or to support the
use and selection of specific data within the overall structure. Within the decision
tree, you start with a simple question that has two (or sometimes more) answers.
Each answer leads to a further question to help classify or identify the data so that it
can be categorized, or so that a prediction can be made based on each answer.

You might also like