[go: up one dir, main page]

0% found this document useful (0 votes)
32 views52 pages

Unit 4 - Data Management

Unit 4 focuses on data management, covering data gathering, organization, presentation, and interpretation. It outlines the importance of confidentiality in data collection, the roles of the Philippine Statistics Authority, and the classification of data into qualitative and quantitative types. Additionally, it discusses methods of data presentation and measures of central tendency and dispersion, providing examples and interpretations for better understanding.

Uploaded by

Sheena Seguritan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views52 pages

Unit 4 - Data Management

Unit 4 focuses on data management, covering data gathering, organization, presentation, and interpretation. It outlines the importance of confidentiality in data collection, the roles of the Philippine Statistics Authority, and the classification of data into qualitative and quantitative types. Additionally, it discusses methods of data presentation and measures of central tendency and dispersion, providing examples and interpretations for better understanding.

Uploaded by

Sheena Seguritan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Unit 4:

MMW

DATA
MANAGEMENT

GIRLIE DAMASCO - BALISI


Topic 1: Data Gathering, Organization,
Presentation and Interpretation
Learning Objectives:

Upon the completion of this topic, you are expected to:


a. summarize and present data using the different methods of data
presentation;
b. construct graphs and tables to present given data; and
c. interpret the data presented.
In statistical activities, facts are collected from respondents for
purposes of getting aggregate information, but confidentiality
should be protected. Agencies mandated to collect data is
bound by law to protect the confidentiality of information
provided by respondents. Even market research organizations
in the private sector and individual researchers also guard
confidentiality as they merely want to obtain aggregate data.
This way, respondents can be truthful in giving information,
and the researcher can give a commitment to respondents
that the data they provide will never be released to anyone in
a form that will identify them without their consent.
SAMPLE DATA COLLECTION
DATA COLLECTION (continuation…)
PSA is the government agency mandated to conduct censuses
and surveys. Through Republic Act 10625 (also referred to as
The Philippine Statistical Act of 2013), PSA was created from
four former government statistical agencies, namely: National
Statistics Office (NSO), National Statistical Coordination Board
(NSCB), Bureau of Labor and Employment of Statistics (BLES)
and Bureau of Agricultural Statistics (BAS). The other agency
created through RA 10625 is the Philippine Statistical Research
and Training Institute (PSRTI) which is mandated as the
research and training arm of the Philippine Statistical System.
PSRTI was created from its forerunner the former Statistical
Research and Training Center (SRTC).
CONCEPTUALIZATION OF DATA
DATA

Although the collection is composed of numbers and symbols that


could be classified as numeric or non-numeric, the collection has no
meaning or it is not contextualized, hence it cannot be referred to as
data.
Data are facts and figures that are presented, collected
and analyzed. Data are either numeric or non-numeric
and must be contextualized.
CONCEPTUALIZATION OF DATA (continuation…)

To contextualize data, we must identify its six W’s or to put


meaning on the data, we must know the following W’s of the
data:
1. Who? Who provided the data?

2. What? What are the information from the respondents

and What is the unit of measurement used for each of


the information (if there are any)?
3. When? When was the data collected?

4. Where? Where was the data collected?

5. Why? Why was the data collected?

6. HoW? HoW was the data collected


CONCEPTUALIZATION OF DATA (continuation…)
Once the data are contextualized, there is now meaning to
the collection of number and symbols which may now look
like the following which is just a small part of the data
collected in the earlier activity.
BROAD CLASSIFICATION OF VARIABLES/DATA
QUALITATIVE
Qualitative variables express a categorical attribute,
such as sex (male or female), religion, marital status,
region of residence, highest educational attainment.
Qualitative variables do not strictly take on numeric
values (although we can have numeric codes for
them, e.g., for sex variable, 1 and 2 may refer to
male, and female, respectively). Qualitative data
answer questions “what kind.” Sometimes, there is a
sense of ordering in qualitative data, e.g., income
data grouped into high, middle and low-income
status. Data on sex or religion do not have the sense
of ordering, as there is no such thing as a weaker or
stronger sex, and a better or worse religion.
Qualitative variables are sometimes referred to as
categorical variables.
QUANTITATIVE
Quantitative (otherwise called numerical) data, whose sizes are
meaningful, answer questions such as “how much” or “how many”.
Quantitative variables have actual units of measure. Examples of
quantitative variables include the height, weight, number of registered
cars, household size, and total household expenditures/income of
survey respondents. Quantitative data may be further classified into:
a. Discrete data are those data that can be counted, e.g., the
number of days for cellphones to fail, the ages of survey
respondents measured to the
nearest year, and the number of patients in a hospital. These
data assume only (a finite or infinitely) countable number of
values.
b. Continuous data are those that can be measured, e.g. the
exact height of a survey respondent and the exact volume of
some liquid substance. The possible values are uncountably
infinite.
EXAMPLES:
EXAMPLES
A survey of students in a certain school is conducted. The survey questionnaire
details the information on the following variables. For each of these variables,
identify whether the variable is qualitative or quantitative, and if the latter, state
whether it is discrete or continuous.

❑ number of family members who are working


❑ ownership of a cell phone among family members
❑ length (in minutes) of longest call made on each cell phone owned per month
❑ ownership/rental of dwelling
❑ amount spent in pesos on food in one week
❑ occupation of household head
❑ total family income
❑ number of years of schooling of each family member
❑ access of family members to social media
❑ amount of time last week spent by each family member using the internet
EXAMPLES
Levels of
Measurement
NOMINAL LEVEL of measurement arises when we have variables that are
categorical and non-numeric or where the numbers have no sense of ordering.
As an example, consider the numbers on the uniforms of basketball players. Is the
player wearing a number 7 a worse player than the player wearing number 10?
Maybe, or maybe not, but the number on the uniform does not have anything to do
with their performance. The numbers on the uniform merely help identify the
basketball player. Other examples of the variables measured at the nominal level
include sex, marital status, religious affiliation.
ORDINAL LEVEL also deals with categorical variables like the nominal level,
but in this level ordering is important, that is the values of the variable could
be ranked.
Examples of the ordinal scale include socio economic status (A to E, where A is
wealthy, E is poor), difficulty of questions in an exam (easy, medium difficult), rank
in a contest (first place, second place, etc.), and perceptions in Likert scales.
INTERVAL LEVEL tells us that one unit differs by a certain amount of degree
from another unit. Knowing how much one unit differs from another is an
additional property of the interval level on top of having the properties posses
by the ordinal level.
When measuring temperature in Celsius, a 10 degree difference has the same
meaning anywhere along the scale – the difference between 10 and 20 degree
Celsius is the same as between 80 and 90 centigrade. But, we cannot say that 80
degrees Celsius is twice as hot as 40 degrees Celsius since there is no true zero,
but only an arbitrary zero point.
INTERVAL LEVEL (continuation)
A measurement of 0 degrees Celsius does not reflect a true "lack of temperature."
Thus, Celsius scale is in interval level. Other example of a variable measure at the
interval is the Intelligence Quotient (IQ) of a person. We can tell not only which
person ranks higher in IQ but also how much higher he or she ranks with another,
but zero IQ does not mean no intelligence. The students could also be classified
or categorized according to their IQ level.
RATIO LEVEL also tells us that one unit has so many times as much of the
property as does another unit. The ratio level possesses a meaningful (unique
and non-arbitrary) absolute, fixed zero point and allows all arithmetic
operations. The existence of the zero point is the only difference between
ratio and interval level of measurement.
Examples of the ratio scale include mass, heights, weights, energy and electric
charge. The level at any given point is constant, and a measurement of 0 reflects
a complete lack of mass. Amount of money is also at the ratio level.
In summary:
Examples:
METHODS OF COLLECTING DATA
Variables were observed or measured using any of the three methods of data
collection, namely: objective, subjective and use of existing records. The
objective and subjective methods obtained the data directly from the source.
The former uses any or combination of the five senses (sense of sight, touch,
hearing, taste and smell) to measure the variable while the latter obtains data
by getting responses through a questionnaire. The resulting data from these
two methods of data collection is referred to as PRIMARY DATA.
METHODS OF COLLECTING DATA
On the other hand, SECONDARY DATA are obtained through the use of
existing records or data collected by other entities for certain purposes. For
example, when we use data gathered by the Philippine Statistics Authority,
we are using secondary data and the method we employ to get the data is the
use of existing records. Other data sources include administrative records,
news articles, internet, and the like.
DATA
PRESENTATION
METHODS OF DATA PRESENTATION

1. textual or narrative;
2.tabular; and
3.graphical method of
presentation
EXAMPLE:

TEXTUAL OR PARAGRAPH The country’s poverty incidence among


OR NARRATIVE FORM families as reported by the Philippine
Describes the data by enumerating some Statistics Authority (PSA), the agency
of the highlights of the data set like mandated to release official poverty
giving the highest, lowest or the average statistics, decreases from 21% in 2006
values. In case there are only few down to 19.7% in 2012. For 2012, the
observations, say less than ten regional estimates released by PSA
observations, the values could be indicate that the Autonomous Region of
enumerated if there is a need to do so. Muslim Mindanao (ARMM) is the
poorest region with poverty incidence
among families estimated at 48.7%. The
region with the smallest estimated
poverty incidence among families at
2.6% is the National Capital Region
(NCR).
The following are the common parts of a
TABULAR METHOD statistical table:
Data could also be summarized or ❑ Table title includes the number and a
presented using tables. The tabular short description of what is found inside
method of presentation is applicable for the table.
large data sets. Trends could easily be ❑ Column header provides the label of
seen in this kind of presentation. However, what is being presented in a column.
there is a loss of information when using ❑ Row header provides the label of what is
such kind of presentation. The frequency being presented in a row.
distribution table is the usual tabular ❑ Body are the information in the cell
form of presenting the distribution of the intersecting the row and the column.
data.
In general, a table should have at least three
rows and/or three columns. However, too
many information to convey in a table is also
not advisable. Tables are usually used in
written technical reports and in oral
presentation.
EXAMPLE OF
TABULAR
METHOD
GRAPHICAL BAR GRAPH
PRESENTATION
Graphical presentation is a visual
presentation of the data. Graphs are
commonly used in oral presentation.
There are several forms of graphs to use
like the pie chart, pictograph, bar graph,
line graph, histogram and box-plot. Which
form to use depends on what information
is to be relayed.
PIE CHART
LINE GRAPH
SCATTER GRAPH
THE FREQUENCY
DISTRIBUTION TABLE
AND HISTOGRAM
Frequency Distribution Table (FDT) and Histogram
Used to depict the distribution of the data. Most of the time, these are used in
technical reports. An FDT is a presentation containing non-overlapping
categories or classes of a variable and the frequencies or counts of the
observations falling into the categories or classes. There are two types of FDT
according to the type of data being organized: a qualitative FDT or a
quantitative FDT. For a qualitative FDT, the non-overlapping categories of the
variable are identified, and frequencies, as well as the percentages of
observations falling into the categories, are computed. On the other hand, for a
quantitative FDT, there are also of two types: ungrouped and grouped.
Ungrouped FDT is constructed when there are only a few observations or if the
data set contains only few possible values. On the other hand, grouped FDT is
constructed when there is a large number of observations and when the data set
involves many possible values. The distinct values are grouped into class intervals.
The creation of columns for a grouped FDT follows a set of guidelines.
Example:
FDT and its corresponding histogram of the 2012 estimated poverty incidences of
144 municipalities and cities of Region VIII are shown below:
Topic 2:
MEASURE OF
CENTRAL
LOCATION
Unit Learning Outcomes:
By the end of the unit, the students must be able to:
a. Compute for the mean, median and mode of a set of data.
b. Give sound interpretation from the results of each measure.
Activating Prior Learning
Given the data below, answer the following questions:
1. What is the highest family income? How about the lowest?
2. What is the average monthly family income?
3. If you arrange the income of each family from lowest to highest, what
family income is in the center of the distribution?
4. What family income is the most common among the entire
household?
Central tendency can be measured in three ways: Mean or Arithmetic
mean, Median and Mode.

What is Mean?
Mean is simply the average score of a distribution. Your answer in Question 2 is called
the mean of the distribution (data) given.

What is Median?
Median is the center or the middle score within a distribution. Your answer in Question
3 is called the median of the distribution (data) given.

What is Mode?
Mode is the most frequent score within a distribution. Your answer in Question 4 is
called the mean of the distribution (data) given.
A distribution with a single mode is called unimodal. The distribution with two modes
is called bimodal and a distribution with more than two modal is called multimodal.

In a normal distribution the mean, median and mode are the same.
Example Dataset:
Dataset = [10,15,20,25,30,35,40,45,50]

1. Mean:
The mean is the average of the data points. In this dataset:
Mean ≈ 30

Interpretation: The average value of the dataset is approximately 30. This suggests
that, on average, the values in the dataset are centered around 30.

2. Median:
The median is the middle value when the data is arranged in order. In this dataset:
Median = 30

Interpretation: The middle value of this dataset is 30. Half of the data points are
below 30, and half are above 30.
Example Dataset:
Dataset = [10,15,20,25,30,35,40,45,50]

3. Mode:
The mode is the value that appears most frequently. In this dataset:
Mode=No mode (or all values have equal frequency)

Interpretation: There is no value that appears more frequently than others. Each
value in the dataset appears once, so there is no mode.
Summary of Interpretation:

Mean (Average): The average value is approximately 30, suggesting that,


on average, the data is centered around 30.

Median (Middle Value): The middle value is 30, indicating that half of the
data points are below 30 and half are above 30.

Mode (Most Frequent Value): In this case, there is no mode as each value
appears once. If there were repeating values, the mode would
represent the most frequent value(s) in the dataset.
When is it best to use mean? How about median? Mode?
The diagram below will guide you in determining which measure of
central location will be most appropriate for the data set:
Topic 3:
MEASURES OF
DISPERSION
Unit Learning Outcomes:
By the end of the unit, the students must be able to:
a. Calculate measure of dispersion.
b. Know the strengths and limitations of these measures.
c. Give sound interpretation of these measures.
Measure of variation is a statistical measurement which intends to provide
information on how the set of scores are being distributed. The measure of
variation can be divided into two groups. The first group measures the
variation in terms of distance from the lowest/smallest score to the
highest/largest score. These measures include the range, the interquartile
range and the semi-interquartile range (SIQR). The second group measures
the variation in term of each score’s deviation from the mean. These
measures include the variance and the standard deviation.
RANGE
The range is the simplest measure of distribution. It is the distance between the
lowest/smallest data point and the highest/largest data point.

To find the range, we must arrange first the data from lowest to highest, then subtract the
smallest value from the largest value in the data set.

Illustration 1:
Find the range of the following set of data:
76, 81, 86, 78, 79, 73, 83, 86, 84

First, let us arrange the data from lowest to highest, that is:
73, 76, 78, 79, 81, 83, 84, 86, 86
Then, subtract the smallest value from the largest value:
= 86 – 73

The range is therefore, 13.


VARIANCE

Variance indicates the relationship between the mean of a distribution and the data points.
It is the measure of dispersion that accounts for the deviation of each observation from the
mean. The variance is computed by getting the difference between every data point and
the mean, squaring them, summing them up and taking the average of these numbers.
STANDARD DEVIATION

Standard deviation measures how far the data values are from their mean, whether a
particular data is close to or far from their mean. It provides numerical measure on the
variation in a data set.

Standard deviation is the square root of variance.

When the result is a low standard deviation, your data points tend to be close to the mean.
On the other hand, if the result is high, the data points are spread out over a wide range.
In choosing the measure of variability to be used, take
note of the following:

▪ Range – extremely sensitive to outliers


▪ Standard Deviation – very sensitive to outliers
-----End-----

You might also like