Data mining models are essentially the output of applying data mining algorithms to data.
They
are sets of data, statistics, and patterns that can be used to make predictions, infer
relationships, and gain insights from new data. These models help to uncover hidden patterns
and valuable information within large datasets.
There are generally two main categories of data mining models:
1. Predictive Models: These models are designed to forecast future outcomes or trends based
on historical data. They aim to answer "what will happen?"
● Classification: This is one of the most common predictive techniques. It categorizes data
into predefined classes or categories.
○ Examples: Predicting whether a customer will churn (yes/no), classifying emails as
spam or not, identifying fraudulent transactions.
○ Algorithms: Decision Trees, Naive Bayes, Support Vector Machines (SVM), Neural
Networks.
● Regression: This technique aims to predict a continuous numerical value. It identifies
and studies the relationships between variables.
○ Examples: Predicting house prices based on features like size and location,
forecasting sales figures, predicting temperature.
○ Algorithms: Linear Regression, Polynomial Regression.
● Time Series Analysis: This model focuses on analyzing data points collected over a
period of time to identify patterns and predict future values in a sequence.
○ Examples: Stock price forecasting, weather prediction, analyzing seasonal sales
trends.
● Anomaly Detection (Outlier Detection): This identifies rare or unusual data instances
that deviate significantly from the expected patterns. It's often used for fraud detection or
identifying system malfunctions.
○ Examples: Detecting unusual credit card transactions, identifying abnormal
network behavior.
2. Descriptive Models: These models aim to characterize the general properties of the data
present in the database. They help in understanding "what has happened?" or "what is
happening?"
● Clustering: This technique groups data points based on their similarities, without
predefined categories. It helps in segmenting data into meaningful clusters.
○ Examples: Customer segmentation for targeted marketing, grouping similar
documents, identifying different types of diseases based on patient symptoms.
○ Algorithms: K-Means, Hierarchical Clustering, DBSCAN.
● Association Rule Mining: This discovers interesting relationships or patterns among a
set of items, often found in transactional data. It helps identify items that are frequently
bought together.
○ Examples: Market basket analysis (e.g., "customers who buy bread also buy milk"),
product recommendations.
● Summarization: This process aims to provide concise and aggregate information from
the data, often through visualizations or statistical summaries.
● Sequential Pattern Mining: Similar to association rule mining, but it focuses on patterns
that occur in a specific order over time.
○ Examples: Analyzing website clickstreams to understand user navigation paths,
identifying common sequences of medical treatments.
In essence, data mining models are the knowledge extracted from raw data, allowing for
informed decision-making and a deeper understanding of underlying patterns and relationships.