Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
Data Mining
Data mining refers to the mining or discovery of new information in terms of patterns or rules from
vast amounts of data. It is also defined as the process of finding interesting structure in data. Data
mining employs one or more computer learning techniques such as machine learning, statistics,
neural networks, and genetic algorithms to automatically analyze and extract knowledge from data.
To be practically useful, data mining must be carried out efficiently on large files and databases
The process of Discovering meaningful patterns & trends often previously unknown, by
shifting large amount of data, using pattern recognition, statistical and Mathematical techniques is
called data mining. It is also defined as a group of techniques that find relationship that have not
previously been discovered.
Data mining is a logical process that is used to search through large amount of
data in order to find useful data. The goal of this technique is to find patterns that were
previously unknown. Once these patterns are found they can further be used to make
certain decisions for development of their business.
Three steps involved are:
1. Exploration
2. Pattern identification
3. Deployment
Exploration: In the first step of data exploration data is cleaned and transformed into
another form, and important variables and then nature of the data based on the problem
are determined.
Pattern identification: Once data is explored, refined and defined for the specific
variables the second step is to form pattern identification. Identify and choose the
patterns which make the best prediction.
Deployment: Patterns are deployed for desired outcome.
Retail Industry
Data Mining has its great application in Retail Industry because it collects large amount of data
from on sales, customer purchasing history, goods transportation, consumption and services. It is
natural that the quantity of data collected will continue to expand rapidly because of the increasing
ease, availability and popularity of the web.
Data mining in retail industry helps in identifying customer buying patterns and trends that lead to
improved quality of customer service and good customer retention and satisfaction. Here is the list
of examples of data mining in the retail industry −
Design and Construction of data warehouses based on the benefits of data mining.
Multidimensional analysis of sales, customers, products, time and region.
Analysis of effectiveness of sales campaigns.
Customer Retention.
Product recommendation and cross-referencing of items.
Telecommunication Industry
Today the telecommunication industry is one of the most emerging industries providing various
services such as fax, pager, cellular phone, internet messenger, images, e-mail, web data
transmission, etc. Due to the development of new computer and communication technologies, the
telecommunication industry is rapidly expanding. This is the reason why data mining is become
very important to help and understand the business.
Data mining in telecommunication industry helps in identifying the telecommunication patterns,
catch fraudulent activities, make better use of resource, and improve quality of service. Here is the
list of examples for which data mining improves telecommunication services −
Multidimensional Analysis of Telecommunication data.
Fraudulent pattern analysis.
Identification of unusual patterns.
Multidimensional association and sequential patterns analysis.
Mobile Telecommunication services.
Use of visualization tools in telecommunication data analysis.
Intrusion Detection
Intrusion refers to any kind of action that threatens integrity, confidentiality, or the availability of
network resources. In this world of connectivity, security has become the major issue. With
increased usage of internet and availability of the tools and tricks for intruding and attacking
network prompted intrusion detection to become a critical component of network administration.
Here is the list of areas in which data mining technology may be applied for intrusion detection −
Development of data mining algorithm for intrusion detection.
Association and correlation analysis, aggregation to help select and build discriminating
attributes.
Analysis of Stream data.
Distributed data mining.
Visualization and query tools.
Web mining
The discovery and analysis of useful patterns and information from the World Wide Web
or simply web is called web mining. Web mining is the application of data mining
technique to find interesting and potentially useful knowledge from web data. So web
mining is the application of data mining technique to extract knowledge from web data,
including web documents, hyperlinks between documents, usage logs of web sites etc.
Businesses might turn to Web mining to help them understand customer behavior,
evaluate the effectiveness of a particular Web site, or quantify the success of a
marketing campaign. For instance, marketers use Google Trends and Google Insights
for Search services, which track the popularity of various words and phrases used in
Google search queries, to learn what people are interested in and what they are
interested in buying.
Data Warehouse
A data warehouse is a repository of multiple heterogeneous data sources organized
under a unified schema at a single site to facilitate management decision making. A data
warehouse is a subject-oriented, integrated, time-variant and nonvolatile collection of
data in support of management’s decision-making process.
a. Subject-Oriented: A data warehouse can be used to analyze a particular subject
area. For example, "sales" can be a particular subject.
b. Integrated: A data warehouse integrates data from multiple data sources. For
example, source A and source B may have different ways of identifying a product,
but in a data warehouse, there will be only a single way of identifying a product.
c. Time-Variant: Historical data is kept in a data warehouse. For example, one can
retrieve data from 3 months, 6 months, 12 months, or even older data from a data
warehouse. This contrasts with a transactions system, where often only the most
recent data is kept. For example, a transaction system may hold the most recent
address of a customer, where a data warehouse can hold all addresses associated
with a customer.
d. Non-volatile: Once data is in the data warehouse, it will not change. So, historical
data in a data warehouse should never be altered.
A data warehouse is a repository of current and historical data of an organization that
are organized to facilitate reporting and analysis. The data originate in many core
operational transaction systems, such as systems for sales, customer accounts, and
manufacturing, and may include data from Web site transactions. The data warehouse
consolidates and standardizes information from different operational databases so that
the information can be used across the enterprise for management analysis and decision
making. Figure below illustrates how a data warehouse works. The data warehouse
makes the data available for anyone to access as needed, but it cannot be altered. A data
warehouse system also provides a range of ad hoc and standardized query tools,
analytical tools, and graphical reporting facilities. Many firms use intranet portals to
make the data warehouse information widely available throughout the firm.
Operational
data
Customer OLAP
data Analysis
Meta
ETL data
Extraction,
Manufactur
Transformation
Sum- Reporting
ing data
mary
Loading data
Data warehouse
Historical
data Raw
data Data
mining
External
data
Flat
files
Features of Data warehouse
It is separate from operational database.
Integrates data from heterogeneous systems
Store huge amount of data, more historical data than current data
Does not require data to be highly accurate
Queries are generally complex
Provides an integrated and total view of the enterprise
Makes the enterprise’s current and historical information easily available for
decision making
Makes decision support transaction possible without hindering operational
systems
Renders the organization’s information consistent
Presents a flexible and interactive source of strategic information
Meta Data
Meta data is the data about data or documentation about the data that is needed by the
users. Another description of metadata is that it is structured data which describes the
characteristics of a resource. Several examples of metadata are:
1. The table of contents and the index in a book may be considered metadata for the
book.
2. A library catalogue may be considered metadata. The catalogue metadata consists
of a number of predefined elements representing specific attributes of a resource,
and each element can have one or more values.
3. Another example of metadata is data about the tables and figures in a document.
A table has a name and there are column names of the table that may be considered
metadata. The figures also have titles or names.
Data Marts
Data mart is a database that contains a subset of data present in a data warehouse.
Data marts are created to structure the data in a data warehouse according to
issues such as hardware platforms and access control strategies. We can divide a data
warehouse into data marts after the data warehouse has been created. The
implementation cycle of the data mart is likely to be measured in weeks rather than
months or years.
Companies often build enterprise-wide data warehouses, where a central data
warehouse serves the entire organization, or they create smaller, decentralized
warehouses called data marts. A data mart is a subset of a data warehouse in which a
summarized or highly focused portion of the organization’s data is placed in a separate
database for a specific population of users. For example, a company might develop
marketing and sales data marts to deal with customer information. A data mart typically
focuses on a single subject area or line of business, so it usually can be constructed more
rapidly and at lower cost than an enterprise-wide data warehouse.
Data sources
Data marts
Data warehouse
Reasons for creating a data mart
Creates collective view by a group of users
Easy access to frequently needed data
Ease of creation
Improves end-user response time
Lower cost than implementing a full data warehouse
Potential users are more clearly defined than in a full data warehouse
Contains only business essential data and is less cluttered
CR 50
Location
NR
20
WR
25
ER 30 15
7
Computer TV Mobile Camera Laptop 1 Time
Product
Based on Star Schema, Snowflake, Schema and Based on Entity Relationship Model.
Fact Constellation Schema.