What is Classification in Data Mining?
Classification in data mining is a common technique that separates data points
into different classes. It allows you to organize data sets of all sorts, including
complex and large datasets as well as small and simple ones.
It primarily involves using algorithms that you can easily modify to improve the
data quality. This is a big reason why supervised learning is particularly common
with classification in techniques in data mining. The primary goal of
classification is to connect a variable of interest with the required variables. The
variable of interest should be of qualitative type.
Types of Classification Techniques in Data Mining
Before we discuss the various classification algorithms in data mining, let's first
look at the type of classification techniques available.
Primarily, we can divide the classification algorithms into two categories:
1. Generative
2. Discriminative
Here's a brief explanation of these two categories:
Generative
A generative classification algorithm models the distribution of individual
classes. It tries to learn the model which creates the data through estimation of
distributions and assumptions of the model. You can use generative algorithms to
predict unseen data. A prominent generative algorithm is the Naive Bayes
Classifier.
Discriminative
It's a rudimentary classification algorithm that determines a class for a row of
data. It models by using the observed data and depends on the data quality
instead of its distributions.
Logistic regression is an excellent type of discriminative classifiers.
Explain various types of Web mining in detail. [10 m]
Web Mining is the process of Data Mining techniques to automatically discover
and extract information from Web documents and services. The main purpose of
web mining is discovering useful information from the World-Wide Web and its
usage patterns.
Web mining is the application of data mining techniques to discover patterns
from the World Wide Web. It uses automated methods to extract both structured
and unstructured data from web pages, server logs and link structures.
There are three main sub-categories of web mining.
Web content mining extracts information from within a page. Web structure
mining discovers the structure of the hyperlinks between documents,
categorizing sets of web pages and measuring the similarity and relationship
between different sites. Web usage mining finds patterns of usage of web pages.
Applications of Mining:
1. Web mining helps to improve the power of web search engine by classifying
the web documents and identifying the web pages.
2. It is used for Web Searching eg. Google, Yahoo etc. and vertical searching eg.
FatLens Become etc.
3. Web mining is used to predict user behaviour.
4. Web mining is very useful of a particular Website and e service eg, landing
page optimization Web mining can be broadly divided into three different types
of techniques of mining:
1. Web Content Mining: Web content mining is the application of
extracting useful information from the content of the web documents. Web
content consist of several types of data-text, image, audio, video etc.
Content data is the group of facts that a web page is designed. It can
provide effective and interesting patterns about user needs. Text documents
are related to text mining, machine learning and natural language
processing. This mining is also known as text mining. This type of mining
performs scanning and mining of the text, images and groups of web pages
according to the content of the input.
2. Web Structure Mining: Web structure mining is the application of
discovering structure information from the web. The structure of the web
graph consists of web pages as nodes, and hyperlinks as edges connecting
related pages. Structure mining basically shows the structured summary of
a particular website. It identifies relationship between web pages linked by
information or direct link connection. To determine the connection between
two commercial websites. Web structure mining can be very useful.
3. Web Usage Mining: Web usage mining is the application of identifying
or discovering interesting usage patterns from large data sets. And these
patterns enable you to understand the user behaviors or something like that.
In web usage mining, user access data on the web and collect data in form
of logs. So, Web usage mining is also called log mining.
Advantages and Disadvantages of Data Mining
Dining in the process of analysing enormous amounts of information and
datasets, extracting for mining" serial intelligence to help organizations solve
problems, predict trends, mitigate risks, and find new opportunities. Data mining
is like actual mining because, in both the milers are sifting through mountains of
material to find valuable resources and elements.
Data mining also includes establishing relationships and finding patterns,
anomalies, and correlations to tackle issues, creating actionable information in
the process.
Advantages of Data mining
It helps gather reliable information-Data mining allows companies, organisations, and
governments to gather reliable information.
Helps businesses make operational adjustments-Data mining helps businesses make
profitable production and operational adjustments. Data mining can be used to find
correlations between products, consumers, suppliers and other aspects of the business.
Helps to make informed decisions-It is often used for business purposes to improve
decision making. As more data is collected, the accuracy of data mining becomes greater.
It helps detect risks and fraud-Data mining can help identify risks and fraud that may not
be detectable through traditional means of data analysis.
Helps to analyse very large quantities of data quickly-Data mining can be used to analyse
data that was previously too difficult to understand due to the sheer volume or type of
information.
Helps to understand behaviours, trends and discover hidden patterns - Data mining can
be used to find patterns and trends in user behaviour. It does this by looking for anything that
is repeated in the data, such as instances of buying specific items. It helps companies gather
reliable information.
It's an efficient, cost-effective solution compared to other data applications.
It helps businesses make profitable production and operational adjustments.
Data mining uses both new and legacy systems.
It helps businesses make informed decisions.
It helps detect credit risks and fraud.
It helps data scientists easily analyse enormous amounts of data quickly.
Data scientists can use the information to detect fraud, build risk models, and improve
product safety.
Disadvantages of Data Mining
Many data analytics tools are complex and challenging to use. Data scientists
need the right training to use the tools effectively.
Speaking of the tools, different ones work with varying types of data mining,
depending on the algorithms they employ. Thus, data analysts must be sure to
choose the correct tools.
Data mining techniques are not infallible, so there's always the risk that the
information isn't entirely accurate. This obstacle is especially relevant if there's a
lack of diversity in the dataset.
Companies can potentially sell the customer data they have gleaned to other
businesses and organizations, raising privacy concerns.
Data mining requires large databases, making the process hard to manage.
Data Mining tools are complex and require training to use - Data analytics is
a complicated process and often requires people with training to use the tools
Data mining techniques are not infallible - Data mining doesn't always provide
accurate information. Rising privacy concerns - One of the major disadvantages
of data mining are data and privacy concerns.
Data mining requires large databases- Data mining is one of the most
powerful tools in a marketer's toolbox, but it does have its drawbacks. One such
drawback is that data mining requires large databases to be effective.
Expensive - Data mining be a very expensive process. For example, companies
have to hire additional employees and technology specialists to ensure that the
data mining is done correctly.
Issues and Challenges of Data Mining
1. Security and Social Challenges
Dynamic techniques are done through data assortment sharing, so it requires
impressive security, Private information about people and touchy information is
gathered for the client's profiles, client standard of conduct understanding.
2. Noisy and Incomplete Data
Data Mining is the way toward obtaining information from huge volumes of data.
This present reality information is noisy, incomplete, and heterogeneous.
3. Distributed Data
True data is normally put away on various stages in distributed processing
conditions. It very well may be on the internet, individual systems, or even on the
databases.
4. Complex Data
True data is truly heterogeneous, and it very well may be media data, including
natural language text, time series, spatial data, temporal data, complex data,
audio or video, images, etc.
5. Performance
The presentation of the data mining framework basically relies upon the
productivity of techniques and algorithms utilized. On the off chance that the
techniques and algorithms planned are not sufficient; at that point, it will
influence the presentation of the data mining measure unfavourably.
6. Scalability and Efficiency of the Algorithms
The Data Mining algorithm should be scalable and efficient to extricate
information from tremendous measures of data in the data set.
7. Improvement of Mining Algorithms
Factors, for example, the difficulty of data mining approaches, the enormous size
of the database, and the entire data flow inspire the distribution and creation of
parallel data mining algorithms.
8. Incorporation of Background Knowledge
In the event that background knowledge can be consolidated, more accurate and
reliable data mining arrangements can be found accurate predictions.
9. Data Visualization
Data visualization is a vital cycle in data mining since it is the foremost
interaction that shows the output in a respectable way to the client. The
information extricated ought to pass on the specific significance of what it really
plans to pass on.
10. Data Privacy and Security
Data mining typically prompts significant issues regarding governance, privacy,
and data security.
11. User Interface
The knowledge is determined utilizing data mining devices is valuable just in the
event that it is fascinating or more all reasonable by the client.
12. Mining dependent on Level of Abstraction
Data Mining measure should be community-oriented in light of the fact that it
permits clients to focus on example optimizing, presenting, and pattern finding
for data mining dependent on brought results back.
13. Integration of Background Knowledge
Previous information might be utilized to communicate examples to express
discovered patterns and to direct the exploration processes.
14. Mining Methodology Challenges
These difficulties are identified with data mining methods and their limits.
Mining methods that cause the issue are the control and handling of noise in data,
the dimensionality of the domain, diversity of data available, versatility of the
mining method, and so on.