Part 2: Understanding Data Collection, Cleaning, and Organization
Who Is This For?
Beginners or non-tech professionals entering data analytics
Those who want to understand how to work with data before analyzing it
People who have no background in data preparation
What is Data Collection?
Why Collect Data?
Before you can analyze or visualize anything, you need data — and the right data.
�Types of Data Sources:
Source Type Examples
Primary Data Surveys, interviews, observations
Secondary Data Excel files, CRM systems, government reports
Internal Data Sales records, website analytics, HR databases
External Data Market research reports, social media, APIs
��Real-World Example:
An e-commerce company collects:
Customer purchase history (internal)
Google Trends for product keywords (external)
Feedback surveys (primary)
�Page 2: Data Cleaning – Why It Matters
�What is Data Cleaning?
Data cleaning is the process of fixing or removing incorrect, corrupted, or incomplete data.
Dirty data = wrong conclusions.
�Common Data Problems:
Issue Example
Missing values Empty age or salary field
Duplicates Same customer appears twice
Wrong data types 'Age' stored as text instead of number
Typos or inconsistencies "India" vs "india" vs "IN"
Outliers A ₹10,000 tip recorded instead of ₹100
✔�Tools to Clean Data:
Excel (filters, formulas)
Python (pandas library)
SQL (WHERE, IS NULL)
BI tools (Power BI, Tableau)
��Page 3: Organizing Data – The Foundation of Analytics
�How Should Data Be Structured?
Data should be organized in a tabular format (rows and columns) so that tools and programs can understand
and analyze it.
�Good Data Table Format:
Customer_ID Name Gender Age Purchase_Amount
101 Priya F 28 1500
102 Rahul M 35 2200
� Bad Example:
| Rahul, 35, Male, Bought Rs 2200 on 1st July |
Too unstructured for any analysis.
�Page 4: Data Preparation Workflow – Step-by-Step
1. Data Collection
→ Gather from surveys, files, databases, or APIs
2. Data Cleaning
→ Fix missing values, typos, errors, duplicates
3. Data Formatting
→ Convert columns to correct data types (e.g., date, number)
4. Data Integration
→ Combine data from multiple sources (e.g., merge Excel + CRM)
5. Data Storage
→ Save clean data in a spreadsheet, database, or cloud system
�Page 5: Best Practices in Handling Business Data
� Do:
Always backup original data
Use consistent date and currency formats
Document every step (e.g., what you cleaned or filtered)
Use meaningful column names (not A, B, C)
� Avoid:
Manual data changes without tracking
Ignoring missing or inconsistent values
Working without understanding the data context
�Tip:
Good data = Good analysis. Garbage in = Garbage out.
�What You Learned in Part 2:
Where data comes from and how to collect it
How to clean and organize data before analysis
The complete flow of preparing data for business analytics