Module 2: Data Collection and Presentation (3 Hours)
This module focuses on the practical aspects of obtaining data and then organizing and
displaying it in a clear and meaningful way. It covers the different sources from which data can
be acquired, the principles of designing effective surveys, and various methods for summarizing
and visualizing data, including the practical application of Excel for creating charts.
2.1 Primary vs. Secondary Data
Understanding the source of your data is critical for assessing its reliability, relevance, and the
effort required for collection.
2.1.1 Primary Data:
Definition: Data that is collected for the first time by the researcher or organization for a
specific purpose. It is original, raw, and directly relevant to the current research question.
Characteristics:
o Originality: Collected firsthand.
o Specificity: Tailored to the exact needs of the research.
o Control: The researcher has full control over the data collection process,
including methodology, sample, and quality.
o Timeliness: Can be collected when needed, ensuring it's up-to-date.
o High Cost/Time: Generally more expensive and time-consuming to collect than
secondary data.
Methods of Collection:
o Surveys/Questionnaires: Gathering information directly from individuals
through structured questions (online, mail, phone, in-person).
o Experiments: Conducting controlled studies to observe cause-and-effect
relationships (e.g., A/B testing for marketing campaigns).
o Observations: Directly observing behaviors or phenomena (e.g., traffic patterns,
customer interactions in a store).
o Focus Groups: Facilitated discussions with a small group of people to gather in-
depth qualitative insights.
o Interviews: One-on-one conversations to gather detailed information.
Examples in Business:
o Conducting a survey to understand customer preferences for a new product.
o Running an A/B test on a website to see which layout converts more visitors.
o Observing foot traffic patterns in a retail store to optimize store layout.
o Interviewing employees to understand morale issues.
2.1.2 Secondary Data:
Definition: Data that has already been collected and published by someone else for a
purpose other than the current research question. It's readily available information.
Characteristics:
o Availability: Easily accessible and often free or low-cost.
o Less Time-Consuming: Can be obtained quickly.
o Broader Scope: Can provide a wider perspective or historical context.
o Less Control: The researcher has no control over the original data collection
methodology, quality, or bias.
o Relevance: May not perfectly align with the current research needs.
o Outdated: May not be current.
Sources of Collection:
o Internal Sources: Company sales records, financial statements, customer
databases, employee records, operational reports.
o External Sources:
Government Publications: Census data, economic surveys (e.g., from
NSSO, RBI in India).
Industry Associations: Reports, statistics, and trends specific to an
industry.
Academic Research/Journals: Published studies and research papers.
Commercial Data Providers: Market research firms (e.g., Nielsen,
Gartner) that sell specialized data.
Online Databases: Publicly available datasets (e.g., World Bank, IMF).
News Articles and Periodicals: Information from reputable media
outlets.
Examples in Business:
o Using government census data to identify potential market size for a new service.
o Analyzing competitors' annual reports for financial benchmarking.
o Referring to industry reports to understand market trends and competitive
landscape.
o Using internal sales data to analyze past performance.
Choosing Between Primary and Secondary Data: Often, a combination of both primary and
secondary data is used. Secondary data can help frame the research question and provide context,
while primary data can fill specific information gaps and address the unique aspects of the
current problem.
2.2 Survey Design Principles
Surveys are a common method for collecting primary data, especially when gathering opinions,
attitudes, or preferences. A well-designed survey is crucial for obtaining accurate and unbiased
data.
2.2.1 Defining Objectives:
o Clear Purpose: Before writing any questions, clearly define what information
you need to collect and why. What decisions will be made based on this data?
o Target Audience: Who are you surveying? Their characteristics will influence
question wording and survey distribution.
2.2.2 Questionnaire Design:
o Question Wording:
Clarity and Simplicity: Use simple, unambiguous language. Avoid
jargon.
Neutrality: Avoid leading questions (that suggest a preferred answer) or
loaded questions (that contain emotionally charged words).
Bad: "Don't you agree that our product is superior?"
Good: "How would you rate our product's quality?"
Specificity: Be precise. Instead of "How often do you eat out?", ask "How
many times per week do you eat at a restaurant?"
Avoid Double-Barreled Questions: Questions that ask two things at
once.
Bad: "Are you satisfied with our product's quality and price?"
Good (split): "Are you satisfied with our product's quality?" AND
"Are you satisfied with our product's price?"
o Types of Questions:
Closed-Ended Questions: Provide a set of predefined answer choices.
Easier to quantify and analyze.
Dichotomous: Yes/No, True/False.
Multiple Choice: Select one or more options from a list.
Rating Scales (Likert Scale): Agree/Disagree, Satisfaction levels
(e.g., 1-5 scale).
Ranking Questions: Rank items in order of preference.
Open-Ended Questions: Allow respondents to provide free-form answers
in their own words. Provide rich qualitative data but are harder to analyze
statistically.
o Question Order:
Start with easy, non-sensitive questions to build rapport.
Group similar questions together.
Place sensitive or demographic questions towards the end.
Logical flow from general to specific.
o Layout and Formatting:
Clear, uncluttered layout.
Logical progression.
Clear instructions.
Appropriate use of white space.
o Length: Keep surveys as concise as possible to avoid respondent fatigue and
abandonment.
2.2.3 Sampling Methods (Brief mention here, detailed in Module 8):
o Decide how you will select your respondents from the target population to ensure
representativeness. Common methods include random sampling, stratified
sampling, etc.
2.2.4 Pre-testing (Pilot Testing):
o Always test the survey with a small group similar to your target audience before
full deployment. This helps identify confusing questions, technical glitches, or
problems with flow.
2.2.5 Administration Method:
o Online Surveys: (e.g., Google Forms, SurveyMonkey) Cost-effective, wide
reach, automated data collection.
o Mail Surveys: Low response rates, slow.
o Telephone Surveys: Can be intrusive, limited in length.
o In-Person Interviews/Surveys: High response rates, can clarify questions, but
expensive and time-consuming.
2.3 Frequency Tables
Once data is collected, the first step in presentation is often to organize it into frequency tables.
These tables summarize the occurrences of different values or categories in a dataset.
2.3.1 For Qualitative (Categorical) Data:
Frequency Distribution: A table showing the number of times each category or value
appears in the dataset.
o Example: Customer Feedback for a product | Rating (Category) | Frequency
(Count) | | :---------------- | :---------------- | | Very Satisfied | 60 | | Satisfied | 110 | |
Neutral | 30 | | Dissatisfied | 15 | | Very Dissatisfied | 5 | | Total | 220 |
Relative Frequency Distribution: Shows the proportion of times each category appears.
Calculated as (Frequency / Total Observations).
o Example (cont.): | Rating | Frequency | Relative Frequency | | :---------------- | :----
---- | :----------------- | | Very Satisfied | 60 | 60/220 = 0.273 | | Satisfied | 110 |
110/220 = 0.500 | | Neutral | 30 | 30/220 = 0.136 | | Dissatisfied | 15 | 15/220 =
0.068 | | Very Dissatisfied | 5 | 5/220 = 0.023 | | Total | 220 | 1.000 |
Percentage Frequency Distribution: Relative frequency multiplied by 100.
o Example (cont.): | Rating | Frequency | Relative Frequency | Percentage
Frequency | | :---------------- | :-------- | :----------------- | :------------------- | | Very
Satisfied | 60 | 0.273 | 27.3% | | Satisfied | 110 | 0.500 | 50.0% | | Neutral | 30 |
0.136 | 13.6% | | Dissatisfied | 15 | 0.068 | 6.8% | | Very Dissatisfied | 5 | 0.023 |
2.3% | | Total | 220 | 1.000 | 100.0% |
2.3.2 For Quantitative Data (Grouped Frequency Distribution):
When dealing with a large range of quantitative values, data is grouped into classes or
bins.
Steps:
1. Determine the number of classes: Generally 5 to 20 classes. Square root of N
(number of observations) is a common heuristic.
2. Determine the class width: (Largest value - Smallest value) / Number of classes.
Round up to a convenient number.
3. Establish class limits: Define the lower and upper bounds for each class,
ensuring they are non-overlapping and cover all data points.
4. Tally frequencies: Count how many data points fall into each class.
Example: Monthly Sales (in ₹ Lakhs) for 100 Stores | Monthly Sales (₹ Lakhs) |
Frequency | Relative Frequency | Cumulative Frequency | | :---------------------- | :-------- |
:----------------- | :------------------- | | 50 - < 70 | 15 | 0.15 | 15 | | 70 - < 90 | 35 | 0.35 | 50 | |
90 - < 110 | 40 | 0.40 | 90 | | 110 - < 130 | 10 | 0.10 | 100 | | Total | 100 | 1.00 | |
Cumulative Frequency: The running total of frequencies. Useful for finding the number
of observations below a certain value.
2.4 Visual Tools for Data Presentation
Visual tools make data more accessible, engaging, and easier to interpret, revealing patterns,
trends, and outliers that might be hidden in tables.
2.4.1 Bar Charts (or Column Charts):
o Purpose: To compare quantities of different categories or to show changes over
time (discrete intervals). Best for displaying qualitative or discrete quantitative
data.
o Features: Rectangular bars of equal width, where the length (or height) of each
bar is proportional to the value it represents. Bars can be vertical (column chart)
or horizontal (bar chart).
o Example Use: Comparing sales across different product lines, showing customer
counts by region.
2.4.2 Pie Charts:
o Purpose: To show the proportion of each category relative to the whole. Best for
displaying parts of a whole for qualitative data.
o Features: A circle divided into slices, where each slice represents a category's
proportion. The sum of all slices must equal 100%.
o Limitations: Not ideal for many categories (too many small slices), or for
comparing precise values between categories. Hard to compare multiple pie
charts.
o Example Use: Market share percentages of different companies, breakdown of
budget allocation.
2.4.3 Line Charts (or Line Graphs):
o Purpose: To display trends or changes in data over a continuous period of time.
Best for time series data (continuous quantitative data plotted against time).
o Features: Data points are connected by lines. The horizontal axis (X-axis)
typically represents time, and the vertical axis (Y-axis) represents the measured
value.
o Example Use: Showing monthly sales trends, stock price fluctuations over a year,
website traffic over time.
2.4.4 Histograms:
o Purpose: To display the distribution of continuous quantitative data. It shows
the shape, spread, and center of the data.
o Features: Similar to a bar chart, but the bars represent continuous data grouped
into "bins" or "classes." The bars touch each other, indicating continuity. The x-
axis represents the data range (classes), and the y-axis represents the frequency or
relative frequency.
o Example Use: Distribution of customer ages, frequency of different delivery
times, spread of employee salaries.
2.4.5 Other Visual Tools (brief mention for context):
o Frequency Polygons: Similar to histograms but use points connected by lines to
represent class frequencies.
o Ogive (Cumulative Frequency Curve): Plots cumulative frequencies, useful for
finding percentiles.
o Scatter Plots: (Detailed in Module 10) Shows relationship between two
quantitative variables.
o Box Plots: (Briefly mentioned in Module 4) Shows distribution, median,
quartiles, and outliers.
2.5 Using Excel for Charts
Excel is a powerful and widely used tool for creating professional and informative charts from
your data.
2.5.1 Data Preparation:
o Ensure your data is organized in clear columns and rows, with appropriate
headers.
o Clean data: Remove duplicates, correct errors, handle missing values.
2.5.2 Steps to Create a Basic Chart in Excel:
1. Select Data: Highlight the range of cells containing the data you want to chart,
including headers for labels.
2. Go to 'Insert' Tab: In the Excel ribbon, click on the "Insert" tab.
3. Choose Chart Type: In the 'Charts' group, select the desired chart type (e.g.,
Column, Bar, Pie, Line, Histogram). Excel also offers "Recommended Charts"
which can suggest suitable charts for your data.
4. Chart Elements: Once the chart is inserted, you can customize it using:
Chart Title: Give a clear, descriptive title.
Axis Titles: Label X and Y axes appropriately (e.g., "Month," "Sales
Revenue (₹)").
Data Labels: Show the actual values on or next to the bars/slices/points.
Legend: If you have multiple data series.
Gridlines: For easier reading of values.
Data Table: Display the underlying data within the chart.
5. Formatting Options:
Chart Design Tab: Change chart styles, colors, layouts quickly.
Format Pane (Right-Click): Right-click on any chart element (axis,
series, plot area) and select "Format..." to open a detailed formatting pane
for precise control over colors, fonts, borders, fills, etc.
2.5.3 Creating a Histogram in Excel:
o Method 1 (Data Analysis ToolPak):
1. Go to 'File' > 'Options' > 'Add-ins' > 'Excel Add-ins' > 'Go...'.
2. Check 'Analysis ToolPak' and click 'OK'.
3. Go to 'Data' tab > 'Data Analysis' (in the 'Analysis' group).
4. Select 'Histogram' from the list and click 'OK'.
5. Input Range: Select your data.
6. Bin Range: (Optional but recommended) Define your class intervals. If
left blank, Excel creates them automatically.
7. Output Range: Choose where to put the frequency table and chart.
8. Check 'Chart Output'.
o Method 2 (Excel 2016 and later): Select your data > 'Insert' > 'Insert Statistic
Chart' > 'Histogram'. This automatically creates bins, which you can then
customize by right-clicking the horizontal axis and selecting 'Format Axis'.
2.5.4 Tips for Effective Chart Design:
o Clarity: Charts should be easy to understand at a glance.
o Accuracy: Represent data faithfully, avoid distorting scales.
o Simplicity: Avoid unnecessary clutter (3D effects, excessive colors).
o Appropriate Chart Type: Choose the chart that best tells the story of your data.
o Highlight Key Insights: Use colors or annotations to draw attention to important
points.