[go: up one dir, main page]

0% found this document useful (0 votes)
4 views6 pages

Lecture 2 Data Mining

Here's a 100-word document on data mining: *Data Mining:* Data mining is the process of discovering patterns, relationships, and insights from large datasets. It involves using statistical and mathematical techniques to analyze and extract valuable information from data. Data mining helps organizations make informed decisions, predict future trends, and improve business outcomes. *Key Steps:* 1. *Data Collection:* Gathering data from various sources. 2. *Data Preprocessing:* Cleaning and pre

Uploaded by

MUHAMMAD SHEHZAD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views6 pages

Lecture 2 Data Mining

Here's a 100-word document on data mining: *Data Mining:* Data mining is the process of discovering patterns, relationships, and insights from large datasets. It involves using statistical and mathematical techniques to analyze and extract valuable information from data. Data mining helps organizations make informed decisions, predict future trends, and improve business outcomes. *Key Steps:* 1. *Data Collection:* Gathering data from various sources. 2. *Data Preprocessing:* Cleaning and pre

Uploaded by

MUHAMMAD SHEHZAD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Lecture 2: Data

Preprocessing
PRESENTED BY : HALIMA TAHIR
Data Preprocessing

 Before using data for analysis or building models, we need to prepare it


properly. Raw data is often messy, incomplete, or scattered. Data
preprocessing is like cleaning and organizing your room before starting
work — it makes data ready for use.
It mainly includes the following steps:
1. Data Representation
2. Data Summarization
3. Data Cleaning
4. Data Integration and Transformation
Data Representation

 This means how data is stored and shown.


 Example: Numbers, text, images, tables, graphs, etc.
 If the data is not in a useful form, we convert it into a standard format so computers
can understand.
 👉 Think of it as writing notes neatly in one notebook instead of random papers.
Data Summarization

 Data can be huge, so we make summaries to understand it better.


 Example: Instead of keeping marks of 1,000 students, we calculate
average marks, highest marks, and lowest marks.
 Helps to quickly see patterns without going through all data.
 👉 Like making short notes from a big chapter.
Data Cleaning

 Real-world data usually has mistakes or missing values.


 Example:
 Some entries are empty (missing age).
 Some are wrong (age written as 500).
 Some are duplicates (same person added twice).
 In cleaning, we fix errors, fill missing values, and remove duplicates.
 👉 It’s like washing vegetables before cooking.
Data Integration and Transformation

 Data often comes from many different sources (databases, Excel files,
websites).
 We combine (integrate) them into one dataset.
 Transformation means changing the data into a common format.
 Example: Changing all dates to the same style (DD/MM/YYYY).
 Scaling numbers (marks out of 100 converted to percentage).
 👉 It’s like collecting ingredients from different shops and then
cutting/adjusting them before cooking.

You might also like