0% found this document useful (0 votes)

9 views24 pages

Module 2 Data Collection and Preparation

Module 2 focuses on the critical aspects of data collection and preparation, emphasizing the importance of accurate data gathering, ethical considerations, and quality assessment. It covers practical skills such as importing data into Excel, cleaning and preprocessing datasets, and handling missing data and outliers. The module aims to equip learners with the necessary tools and techniques to ensure data integrity and reliability for analysis.

Uploaded by

Laiza Salas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views24 pages

Module 2 Data Collection and Preparation

Uploaded by

Laiza Salas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Module 2 – Data Collection and

O Preparation
b 1. Explain the importance of accurate data collection and
preparation.
j
2. Assess data quality using key indicators.
e 3. Identify and address ethical issues in data collection.
c 4. Import data into Excel from various file types and
t external sources.
i 5. Clean and preprocess raw datasets using Excel tools.
6. Handle missing data and detect or treat outliers in a
v dataset.
e
s
Module 2 – Data Collection and
Preparation
1. Importance of Data Collection and Preparation
T 2. Basic Data Quality Assessment
O 3. Ethical Considerations in Data Collection
4. Importing Data from Various Sources
P 5. Data Cleaning and Preprocessing in Excel
I 6. Handling Missing Data and Outliers
C 6.1 Types of Missing Data
6.2 Techniques for Handling Missing Data
S 6.3 Identifying Outliers
Data collection refers to the process of gathering and
accumulating information or data from various sources. This
data can come from a wide range of places, including surveys,
sensors, databases, websites, social media, and more.

https://www.globalpatron.com/blog/data-collection-methods/
Data collection

The main objectives of data collection are:

➢Acquiring Relevant Information: Collecting data that is

pertinent to the problem or question at hand.
➢Ensuring Data Accuracy: Ensuring that the data is accurate,
reliable, and free from errors or bias.
➢Maintaining Data Consistency: Keeping the data consistent in
terms of format, units, and structure.
➢Preserving Data Integrity: Preventing unauthorized access, loss,
or corruption of data.
Data preparation also known as data preprocessing, is the
process of cleaning, transforming, and structuring raw data into a
format that is suitable for analysis. This step is essential because real-
world data is often messy, inconsistent, and may contain missing
values.

https://www.linkedin.com/pulse/data-collection-preprocessing-dr-john-martin-5kj3f
Data preparation

The key tasks in data preparation include:

Data Cleaning: Identifying and correcting errors, inconsistencies, and

outliers in the data.
Data Transformation: Converting data into a more suitable format or
scale. This may involve normalization, standardization, or encoding
categorical variables.
Handling Missing Data: Dealing with missing values by imputing
them or removing incomplete records.
Feature Engineering: Creating new features or variables that might
be more informative for analysis.
Importance of Data Collection
 The foundation of any analysis depends on the quality
of data collected.
 Inaccurate or poorly prepared data can lead to biased,
misleading results.
 Good data preparation enhances model performance
and analytical accuracy.

Examples:

 A survey with missing responses vs. a well-completed

dataset.
 Raw sales data with inconsistent formats (e.g., date formats,
product names).
Basic Data Quality Assessment

• Common quality dimensions: Accuracy, completeness,

consistency, timeliness, conformity, validity, integrity,
relevance, documentation, data profiling, data visualization,
data sampling, data quality metrics, data validation rules, user
feedback, uniqueness.
• Indicators of poor data quality (e.g., duplicates, missing fields,
Examples:
inconsistent units).
Multiple records for the same customer with slight name
variations.
Sales records with mismatched units (e.g., PHP vs. USD).
ETHICAL CONSIDERATIONS IN DATA COLLECTION

Ethical considerations in data collection are essential

to ensure that data is collected, used, and managed in a
responsible and socially acceptable manner. Ethical data
collection practices help protect individuals' privacy,
prevent discrimination, and maintain trust in data-driven
processes.
ETHICAL CONSIDERATIONS IN DATA COLLECTION

Here are some key ethical considerations in data collection:

Informed Consent: Obtain informed consent from individuals before

collecting their data. Clearly explain the purpose of data collection, how the
data will be used, and any potential risks involved. Individuals should have the Data Security: Implement robust data security measures to safeguard collected
option to opt in or opt out. data from unauthorized access, breaches, or theft. Encryption, access controls,
and regular security audits are important components of data security.

Privacy and Anonymity: Protect individuals' privacy by anonymizing or de-

identifying data whenever possible. Remove or encrypt personally identifiable
information (PII) to prevent the identification of individuals from the data. Bias and Fairness: Be vigilant about bias in data collection. Biased data can
lead to biased results and discriminatory outcomes. Ensure that data collection
methods and sources are free from biases that could disproportionately affect
certain groups.

Data Minimization: Collect only the data that is necessary for the intended
purpose. Avoid collecting excessive or irrelevant information that could
infringe on individuals' privacy.
Data Ownership and Control: Clearly define data ownership and control.
Individuals should have the right to access their data, correct inaccuracies,
Transparency: Be transparent about data collection practices and data handling and request the deletion of their data when applicable.
procedures. Clearly communicate how data will be stored, processed, and
shared.
ETHICAL CONSIDERATIONS IN DATA COLLECTION

Here are some key ethical considerations in data collection:

Sensitive Data: Handle sensitive data (e.g., health records, financial information) Data Retention: Establish clear policies for data retention and deletion. Do not retain
with extra care. Follow industry-specific regulations and best practices for data longer than necessary for the intended purpose.
collecting and storing sensitive data.

Data Use: Use collected data only for the purposes explicitly stated during data
collection. Avoid using data for purposes that individuals did not consent to or Ethical Review: In some cases, particularly in research involving human subjects,
could not reasonably anticipate. ethical review boards may be required to assess the ethics of data collection methods.

Third-Party Data Sources: If using data from third-party sources, ensure that the data Bias Mitigation: When working with machine learning and AI algorithms, be aware of
was collected ethically and in compliance with relevant laws and regulations. Verify the potential for algorithmic bias. Regularly audit and test algorithms for fairness
that the third party has obtained informed consent and adhered to privacy standards. and bias, and take steps to mitigate any identified biases.

Compliance with Regulations: Ensure that data collection practices comply with
Children's Data: Special considerations apply when collecting data from children. relevant local, national, and international laws and regulations, such as the General
Comply with laws like the Children's Online Privacy Protection Act (COPPA) and Data Protection Regulation (GDPR) in Europe or the Health Insurance Portability
obtain parental consent when necessary. and Accountability Act (HIPAA) in the United States, Data Privacy Act of 2012
(Republic Act 10173) of the Philippines
IMPORTING DATA FROM VARIOUS

Importing data from various SOURCES

sources is a common task in data
analysis, data science, and database management. The process
involves retrieving data from different origins, such as databases,
files, APIs, and web sources, and making it available for analysis or
storage. File types: .xlsx, .csv, .txt, .xml, .json.
 External data sources: databases, websites, APIs.
 Excel features: Get & Transform Data, From
Text/CSV, From Web.

Examples:
o Importing .csv file with sales data.
o Connecting Excel to a web-based data source.
Get Data
IMPORTING DATA FROM VARIOUS

Here are steps and considerations

SOURCES for importing data from various
sources:
1. Identify Data Sources: Determine the sources from which you need to collect data. These sources could include databases (SQL,
NoSQL), spreadsheets, CSV files, JSON files, web APIs, websites, sensor data, and more.

2. Access Permissions: Ensure that you have the necessary permissions and access rights to retrieve data from the selected sources. In
some cases, you may need credentials or API keys.

3. Data Retrieval Methods: Different sources require different methods for data retrieval:
➢ Databases: Use SQL queries for relational databases (e.g., MySQL, PostgreSQL), or appropriate drivers and libraries for NoSQL
databases (e.g., MongoDB).
➢ Files: Use file I/O operations to read data from formats like CSV, Excel, JSON, XML, etc.
➢ APIs: Interact with APIs using HTTP requests (GET, POST, etc.) and libraries like requests in Python.
➢ Web Scraping: Extract data from websites using web scraping libraries such as BeautifulSoup or Scrapy (respecting website terms
of service and robots.txt).
➢ Sensor Data: Use hardware interfaces or communication protocols to collect data from sensors or IoT devices.
IMPORTING DATA FROM VARIOUS

4. Data Extraction: Retrieve the data from the source using appropriate
methods and techniques. This may involve executing SQL queries,
SOURCES
8. Automate the Process: Consider automating the data import process,
especially if you need to retrieve data regularly or in real-time. Automation can
reading files line by line, or making API requests. save time and reduce the risk of human errors.

5. Data Transformation: Once you have the raw data, perform

necessary transformations to clean and structure it. This may include data
cleaning, data type conversion, and handling missing values.
9. Error Handling: Implement error-handling mechanisms to gracefully deal
with issues that may arise during data import, such as connection failures or
6. Data Integration: If you're collecting data from multiple sources, unexpected data formats.
integrate it into a single dataset or database if needed. This may involve
merging tables, joining datasets, or creating relationships between data
entities. 10. Logging and Monitoring: Set up logging and monitoring to keep
track of the data import process. This helps in identifying and resolving
issues quickly.
7. Data Validation: Verify the quality and integrity of the imported
data. Check for errors, inconsistencies, or missing data. Implement
validation checks to ensure the data adheres to predefined rules and 11. Data Security: Ensure that sensitive data is handled securely during the
standards. import process. Use encryption and secure channels when transferring data.
IMPORTING DATA FROM VARIOUS

SOURCES
12. Documentation: Document the data import process thoroughly, including source details, data extraction methods, transformation steps, and any
scripts or code used. This documentation is valuable for future reference and troubleshooting.

13. Compliance and Ethics: Ensure that data import practices adhere to legal and ethical standards, especially when dealing with personally identifiable
information (PII) and sensitive data.

14. Performance Optimization: Optimize the data import process for efficiency, especially when dealing with large datasets. Consider batch processing,
parallel processing, and indexing for databases.

15. Testing and Validation: Test the entire data import process end-to-end to ensure that it functions as expected and delivers accurate results.

16. Backup and Recovery: Implement data backup and recovery procedures to safeguard against data loss or corruption during the import process.
DATA CLEANING AND PROCESSING IN
EXCEL
Steps: Remove duplicates, correct inconsistent values, standardize
formats, use filters.
Tools: Text to Columns, Flash Fill, Find & Replace, Conditional
Formatting.

Examples:
Merging first and last names using Flash Fill.
Standardizing dates from MM/DD/YYYY to YYYY-MM-DD.

Merge Text
DATA CLEANING AND PROCESSING IN
EXCEL

Data cleaning and processing in Excel involve preparing

your data for analysis by identifying and addressing issues such
as missing values, inconsistencies, and formatting problems.
Excel provides a range of tools and functions to help with these
tasks.
DATA CLEANING AND PROCESSING IN
EXCEL
Steps: Remove duplicates, correct inconsistent values, standardize
formats, use filters.
Tools: Text to Columns, Flash Fill, Find & Replace, Conditional
Formatting.

Examples:
Merging first and last names using Flash Fill.
Standardizing dates from MM/DD/YYYY to YYYY-MM-DD.
DATA CLEANING AND PROCESSING IN
EXCEL
Here's a step-by-step guide on how to clean and process data in
Excel:
1. Open Your Data in Excel - "File" > "Open"

2. Understand Your Data - its structure, columns, and potential issues

3. Handling Missing Data - Fill Missing Values, Use functions like IF, ISBLANK, and VLOOKUP to fill missing values

4. Dealing with Duplicates - Remove duplicates by going to "Data" > "Remove Duplicates" and selecting the columns to check for duplicates.

5. Correcting Data Inconsistencies - Use Excel's "Find and Replace" feature to correct inconsistent data

6. Data Formatting - Use the "Format Cells" option (right-click > Format Cells) to format date and time

7. Data Transformation - Use the "Text to Columns" feature (Data > Text to Columns) to split data in a single column into multiple columns based on delimiters (e.g., commas, spaces).
Combine data from multiple columns into one using the CONCATENATE or & operator.

8. Calculations and Derived Columns - Create Derived Columns, Add new columns for calculations or derived information using Excel formulas

9. Filtering and Sorting - Use the "Filter" option (Data > Filter) to filter data based on specific criteria. "Sort A to Z" or "Sort Z to A."

10. Removing Irrelevant Data – remove rows or column

11. Data Validation - Use Excel's data validation feature to enforce rules and constraints on data entry

12. Data Visualization - Use Excel's charting tools

13. Save Your Cleaned Data - Save your cleaned and processed data as a new Excel file to preserve the original dataset
HANDLING MISSING DATA AND OUTLIERS

Handling missing data and outliers is a critical step in data analysis to ensure the
accuracy and reliability of your results. Missing data are values that are not
recorded or are incomplete, while outliers are data points that significantly
deviate from the rest of the data.
HANDLING MISSING DATA AND OUTLIERS

Handling Missing Data:

1. Identify Missing Data: Begin by identifying missing values in your dataset. 4. Create an Indicator Variable: Sometimes, it's valuable to create an indicator
These can appear as blank cells, "N/A," "NaN," or other placeholders. variable that flags missing values. This way, you retain information about which
data points were missing.
2. Understand the Missingness Pattern: Determine if the missing data are
5. Data Removal: In some cases, if the missing data are substantial and cannot be
missing completely at random (MCAR), missing at random (MAR), or missing
imputed accurately, you may consider removing rows or columns with missing
not at random (MNAR). This helps in choosing appropriate handling methods.
values. However, this should be done with caution, as it can lead to loss of
information.
3. Imputation: involves replacing missing values with estimated or predicted values. Common
imputation methods include:
 Mean/Median Imputation: Replace missing values with the mean (for continuous
data) or median (for ordinal data) of the available values in the column.
 Mode Imputation: Replace missing values with the mode (most frequent value) for
categorical data.
 Regression Imputation: Predict missing values using regression models based on
other variables.
 K-Nearest Neighbors (KNN) Imputation: Replace missing values with values
from the Knearest neighbors in the dataset.
 Time Series Imputation: For time series data, consider using interpolation or
forward/backward filling based on the time order of the data.
HANDLING MISSING DATA AND OUTLIERS

Handling Outliers:
1. Identify Outliers: Visualize your data using box plots, histograms, scatter plots, or other
graphical techniques to identify potential outliers. Calculate summary statistics (e.g., mean,
standard deviation) and use them to identify data points that fall significantly outside the
expected range.

2. Understand the Nature of Outliers: Determine if outliers are genuine data points
(representing true extreme values) or errors/noise. Consult domain experts if necessary.
HANDLING MISSING DATA AND OUTLIERS

Handling Outliers:
2. Transformation: Consider transforming the data to make it more robust against 6. Binning or Categorization: Convert continuous data into categorical data by binning
outliers. Common transformations include logarithmic, square root, or winsorization or categorizing it to minimize the impact of outliers.
(capping or flooring extreme values).

3. Data Removal: In some cases, you may decide to remove outliers if they are identified as
errors or if they significantly distort your analysis. Be cautious when doing this and document the
reasons for removal.
7. Winsorization: Replace extreme values with values within a specified range (e.g., replace
values above the 95th percentile with the 95th percentile value).
4. Robust Statistical Methods: Use robust statistical techniques that are less sensitive to
outliers. For example, use the median instead of the mean for central tendency
measurements.
8. Visualization and Reporting: When reporting your results, consider showing both the
analysis with and without outliers to provide a balanced perspective.
5. Model-Based Approaches: Some machine learning algorithms are less affected by
outliers. Consider using algorithms like Random Forests or Support Vector Machines
that can handle noisy data.
Perform Laboratory activity no. 2
Prepare for quiz no. 2

Data Processing: Editing and Coding
100% (2)
Data Processing: Editing and Coding
16 pages
Data Collection for Business Insights
No ratings yet
Data Collection for Business Insights
8 pages
Data Collection Methods
No ratings yet
Data Collection Methods
20 pages
Data Collection Methods
No ratings yet
Data Collection Methods
13 pages
Primary vs. Secondary Data Guide
No ratings yet
Primary vs. Secondary Data Guide
3 pages
Choosing or Designing Effective Survey Instruments: Tips To Apply
No ratings yet
Choosing or Designing Effective Survey Instruments: Tips To Apply
30 pages
Data Collection Methods Guide
No ratings yet
Data Collection Methods Guide
88 pages
BUSRES - Data Collection
No ratings yet
BUSRES - Data Collection
27 pages
Data Collection Methods Lecture
No ratings yet
Data Collection Methods Lecture
8 pages
Instrumentation and Data Collection
No ratings yet
Instrumentation and Data Collection
60 pages
Data Collection Techniques
No ratings yet
Data Collection Techniques
35 pages
Data Collection: Primary vs Secondary
No ratings yet
Data Collection: Primary vs Secondary
24 pages
Laboratory Exercise Data Collection: Objective
No ratings yet
Laboratory Exercise Data Collection: Objective
2 pages
Writing A Method Section-Procedure - tcm18-117659
No ratings yet
Writing A Method Section-Procedure - tcm18-117659
26 pages
Research Ethics in Political Inquiry
No ratings yet
Research Ethics in Political Inquiry
8 pages
RenckJalongo-Saracho2016 Chapter FromAResearchProjectToAJournal
No ratings yet
RenckJalongo-Saracho2016 Chapter FromAResearchProjectToAJournal
23 pages
Signed Learning Material No. 4A Data Management
No ratings yet
Signed Learning Material No. 4A Data Management
15 pages
Q2 Surveys Experiments or Observations
No ratings yet
Q2 Surveys Experiments or Observations
27 pages
Data Collection
No ratings yet
Data Collection
65 pages
Fuzzy Wire
No ratings yet
Fuzzy Wire
3 pages
Data Collection and Presentation
100% (2)
Data Collection and Presentation
15 pages
Coding, Editing
No ratings yet
Coding, Editing
30 pages
Steps of Research Process
No ratings yet
Steps of Research Process
6 pages
Data Collection Process Guide
No ratings yet
Data Collection Process Guide
3 pages
Chapter 1-Introduction To SAD
No ratings yet
Chapter 1-Introduction To SAD
21 pages
Classroom Assessment: What Teachers Need To Know 10th Edition W. James Popham Download
100% (1)
Classroom Assessment: What Teachers Need To Know 10th Edition W. James Popham Download
90 pages
Social Research Methods Class IV: Dr. Shubhasree Bhadra University of Calcutta May 6, 2021
100% (2)
Social Research Methods Class IV: Dr. Shubhasree Bhadra University of Calcutta May 6, 2021
22 pages
Qualitative Data Analysis
No ratings yet
Qualitative Data Analysis
27 pages
Research Design
No ratings yet
Research Design
96 pages
241 Survey Research
No ratings yet
241 Survey Research
28 pages
Types of Statistical Analysis
No ratings yet
Types of Statistical Analysis
2 pages
Research Basics for Students
No ratings yet
Research Basics for Students
158 pages
Experimental Methods
No ratings yet
Experimental Methods
27 pages
Lecture On Data Collection Method
No ratings yet
Lecture On Data Collection Method
31 pages
Information Systems in Perspective
No ratings yet
Information Systems in Perspective
18 pages
ENGACPROF 5. Research Skills
100% (1)
ENGACPROF 5. Research Skills
29 pages
Assignment 1 - Quantitative Research Methods
No ratings yet
Assignment 1 - Quantitative Research Methods
15 pages
Difference Between Correlation and Regression in Statistics
No ratings yet
Difference Between Correlation and Regression in Statistics
1 page
Data Analysis Procedure
0% (1)
Data Analysis Procedure
27 pages
Research Methodology Essentials
No ratings yet
Research Methodology Essentials
2 pages
How To Develop A Research Topic
100% (1)
How To Develop A Research Topic
14 pages
13.10.14 Instrumentation and Data Collection
No ratings yet
13.10.14 Instrumentation and Data Collection
35 pages
PRactical-Research-2 LM1 Nuñeza
100% (1)
PRactical-Research-2 LM1 Nuñeza
18 pages
Al-Saadi - Demystifying Ontology and Epistemology in Research
No ratings yet
Al-Saadi - Demystifying Ontology and Epistemology in Research
11 pages
Steps in Research Process:: Reporting The Findings
100% (1)
Steps in Research Process:: Reporting The Findings
17 pages
Unit 2 Developing A Research Problem
No ratings yet
Unit 2 Developing A Research Problem
80 pages
MODULE 5. Research Methodology
100% (1)
MODULE 5. Research Methodology
11 pages
RM For Computer Science
No ratings yet
RM For Computer Science
2 pages
APA 7th Edition Style Tutorial
No ratings yet
APA 7th Edition Style Tutorial
36 pages
Q1 WK 2 L1 2 Intro To Research Quanti Research Importance of Quanti Research Across Fields
100% (1)
Q1 WK 2 L1 2 Intro To Research Quanti Research Importance of Quanti Research Across Fields
54 pages
Text Analytics: Visualizing and Analyzing Open-Ended Text Data
No ratings yet
Text Analytics: Visualizing and Analyzing Open-Ended Text Data
6 pages
IMRaD Journal Article Template
No ratings yet
IMRaD Journal Article Template
4 pages
AP Research Student Reflection
No ratings yet
AP Research Student Reflection
2 pages
Measure Heart Rate During Exercise
No ratings yet
Measure Heart Rate During Exercise
15 pages
Research Proposal Enhancement Guide
100% (5)
Research Proposal Enhancement Guide
26 pages
Mgt 101: Intro to Management Guide
No ratings yet
Mgt 101: Intro to Management Guide
7 pages
Degrees of Freedom
100% (1)
Degrees of Freedom
11 pages
Qualitative Data Analysis Guide
No ratings yet
Qualitative Data Analysis Guide
35 pages
Data Analysis Question and Answers
No ratings yet
Data Analysis Question and Answers
15 pages
Module 2
No ratings yet
Module 2
70 pages
Hierarchical Data Model
No ratings yet
Hierarchical Data Model
21 pages
Class 12 CS Practical Guide
No ratings yet
Class 12 CS Practical Guide
2 pages
Agriculture Management System-1
100% (2)
Agriculture Management System-1
37 pages
Lecture Notes 1
No ratings yet
Lecture Notes 1
5 pages
Predictive - Analytics 2
No ratings yet
Predictive - Analytics 2
18 pages
SQL For Data Analytics
No ratings yet
SQL For Data Analytics
92 pages
100-DBMS Multiple Choice Questions
67% (3)
100-DBMS Multiple Choice Questions
17 pages
CXCXX C C
No ratings yet
CXCXX C C
14 pages
Snowflake Template Light-2019
No ratings yet
Snowflake Template Light-2019
57 pages
Build Data Warehouse With SQLite
No ratings yet
Build Data Warehouse With SQLite
5 pages
Binar Academy - Id.en
No ratings yet
Binar Academy - Id.en
4 pages
SQL Tasks for Database Learners
No ratings yet
SQL Tasks for Database Learners
5 pages
Serializable Schedules in Database
No ratings yet
Serializable Schedules in Database
2 pages
Data Structure and Algorithm Question Bank
No ratings yet
Data Structure and Algorithm Question Bank
2 pages
Trisha (Searchmyexpert)
No ratings yet
Trisha (Searchmyexpert)
2 pages
Application of Data Warehouse and Data Mining in Construction Management
No ratings yet
Application of Data Warehouse and Data Mining in Construction Management
12 pages
HR ABAP Logical Database Guide
No ratings yet
HR ABAP Logical Database Guide
10 pages
Hana Backup and Recovery
No ratings yet
Hana Backup and Recovery
9 pages
Fundamentals of Statistics For Data Science
0% (1)
Fundamentals of Statistics For Data Science
23 pages
Distributed Database Management System
No ratings yet
Distributed Database Management System
36 pages
Computer Science Presentation
No ratings yet
Computer Science Presentation
11 pages
Informatica MDM Questions
100% (1)
Informatica MDM Questions
3 pages
Data Warehousing for Decision Makers
No ratings yet
Data Warehousing for Decision Makers
31 pages
Resume For DBA
No ratings yet
Resume For DBA
3 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
27 pages
Data Mining CSE4052 Lecture-13
No ratings yet
Data Mining CSE4052 Lecture-13
14 pages
Translet
No ratings yet
Translet
29 pages
Oracle Database Practice Guide
No ratings yet
Oracle Database Practice Guide
3 pages
SQL
100% (3)
SQL
111 pages
SQLMap Web App Penetration Guide
No ratings yet
SQLMap Web App Penetration Guide
10 pages

Module 2 Data Collection and Preparation

Uploaded by

Module 2 Data Collection and Preparation

Uploaded by

Module 2 – Data Collection and

The main objectives of data collection are:

➢Acquiring Relevant Information: Collecting data that is

The key tasks in data preparation include:

Data Cleaning: Identifying and correcting errors, inconsistencies, and

 A survey with missing responses vs. a well-completed

• Common quality dimensions: Accuracy, completeness,

Ethical considerations in data collection are essential

Here are some key ethical considerations in data collection:

Informed Consent: Obtain informed consent from individuals before

Privacy and Anonymity: Protect individuals' privacy by anonymizing or de-

Here are some key ethical considerations in data collection:

Importing data from various SOURCES

Here are steps and considerations

5. Data Transformation: Once you have the raw data, perform

Data cleaning and processing in Excel involve preparing

2. Understand Your Data - its structure, columns, and potential issues

10. Removing Irrelevant Data – remove rows or column

12. Data Visualization - Use Excel's charting tools

Handling Missing Data:

You might also like