[go: up one dir, main page]

0% found this document useful (0 votes)
44 views25 pages

Presentation 11 Data-Processing Sum24

The document outlines the process of data processing, starting from raw data collection to its presentation. It details steps such as data scrutiny, arrangement, coding, classification, and various methods of data presentation including textual, tabular, and diagrammatic forms. Additionally, it emphasizes the importance of choosing the appropriate presentation method based on the data type and audience.

Uploaded by

Raghib Ashab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views25 pages

Presentation 11 Data-Processing Sum24

The document outlines the process of data processing, starting from raw data collection to its presentation. It details steps such as data scrutiny, arrangement, coding, classification, and various methods of data presentation including textual, tabular, and diagrammatic forms. Additionally, it emphasizes the importance of choosing the appropriate presentation method based on the data type and audience.

Uploaded by

Raghib Ashab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Processing of Data

Source: https://images.app.goo.gl/AmxcbiEL4qXU6ZkAA
Dr. JAHIDUL HASSAN
Professor, Department of Horticulture,
BSMRAU
Processing of Information/ Data
The information/data collected/collated either from primary or secondary sources
at the initial stage is known as raw data. Raw data is nothing but the observation
recorded from individual units. Raw data, particularly the primary data, can hardly
speak anything unless and otherwise arranged in order or processed. Data are
required to be processed and analyzed as per the requirement of a research
problem outlined. Working with data starts with the scrutiny of data; sometimes it is
also known as editing of data. There are several steps to follow before a set of
data is put under analysis befitting with the objectives of a particular research
program.
Though the order of the steps is not unique and may change according to the
need and objective of a study, the following steps are generally followed:
(1) scrutiny/editing of data,
(2) arrangement of data,
(3) coding of data,
(4) classification of data, and
(5) presentation of data.

The first three steps, that is, scrutiny, arrangement, and coding of data may
interchange the order depending upon the situation. If the number of observations
is few, one can go for scrutiny at the first stage; otherwise, it is better to arrange
the data in ascending or descending order.
Scrutiny and Arrangement of Data
Raw data set is put under careful examination to find out the existence of
any abnormal/doubtful observation, to detect errors and omissions, if any,
and to rectify these. Editing/scrutiny of data ensures the accuracy, uniformity,
and consistency of data. If the observations are few in number, during
scrutiny, one can have an overall idea about the information collected or
collated. If the number of observations is large, then one may go for
arrangement of observations in order, that is, either ascending or descending
order and then go for scrutiny. Scrutiny and arrangement of data help to
have preliminary knowledge about the nature of the data set which may not
be possible (particularly when a number of observations are large)
otherwise.
Advantages:
Either scrutiny followed by arrangement (for small number of observations) or
arrangement followed by scrutiny would help.
1. To know the maximum and minimum values of the observations.
2. Whether the values of the observations are consistent with the area of interest
under consideration or not. That means whether there is any possibility of the
data set containing outlier or not.
3. Whether the data set could be used for further analysis towards fulfilling the
objective of the study or not.
Arrangements of data can be made using SORT command in MS Excel or similar
command in other similar software.
Coding of Data
Sometimes the information collected may be qualitative in nature like
male/female, black/yellow/white/green, determinate/indeterminate, and
educated/illiterate. Coding refers to the process of assigning numerals or
other symbols to the responses so that these could be categorized.

Coding should be made in such a way that these are non-overlapping and all
the observations are categorized in one of the categories framed for the
purpose. That means the coding should be made in such a way that
categories are exclusive and exhaustive in nature. Generally, the numerical
information does not require coding. Coding helps researchers in
understanding the data in a more meaningful way.
Classification/Grouping
While dealing with a huge number of observations, it is sometimes very difficult to
have a concise idea about the information collected. So the first idea comes to mind,
that is, to have a logical classification (formation of groups) in accordance with some
common characteristic(s)/classification or grouping, may be one of the solutions.
The first question in classification comes to mind is, how many classes one should
make? There is no hard-and-first rule as to fix the number of classes. However, a
general guideline as given below is followed while making the classes:
(a) Classes should be well defined and exhaustive.
(b) Classes should not be overlapping.
(c) Classes should be of equal width as far as possible.
(d) The number of classes should not be too few or too many.
(e) Classes should be devoid of an open-ended limit.
(f) Classes should be framed in such a way that each and every class should have
some observation.
While deciding the number of classes or groups, the general idea is to have minimum variations
among the observations of a particular class/group and maximum variation among the
groups/classes.
Following the above guidelines and the formulae given below, classes may be formed.
1. Yule formula: K = 2.5 × N1/4
2. Sturge formula: K = 1 + 3.322 log10N,
where N is the number of observations and K is the number of classes.

Generally, the range of date can be extended in both sides (i.e., at the lower end as well as at the
upper end) so as to (1) make the no. of classes as a whole number and (2) avoid the class width
as a fraction. For example, in the paddy yield, the maximum value is 87.5 and the minimum is
12.8. So to have 8 classes, the class width becomes 9.33 by using the following formula:
(87.5-12.8)/8=9.33 [Calculate the range of the entire data set by subtracting the lowest point from
the highest, Divide it by the number of classes. Round this number up (usually, to the nearest
whole number].
A fraction not advisable for the convenience of further mathematical calculation. To avoid this, one
can increase the data range in both sides to 10 and 90, respectively, for the lower and upper
sides, thus making the (90 - 10)/8 = 10 class width a whole number. It should emphatically be
noted that making the class width whole number is not compulsory; one can very well use a
fractional class width also.
Method of Classification
When both the upper limit and lower limit of a particular class are included in the
class, it is known as inclusive method of classification. While in other methods one
of these limits is not included in the respective class, it is known as exclusive
method of classification.
May it be discrete or continuous, different statistical measures in subsequent
analysis of the data may result in a fractional form. As such, generally discrete
classes are made continuous by subtracting “d/2” from the lower class limit and
adding “d/2” to the upper class limit, where “d” is the difference between the upper
limit of a class and the lower limit of the following class.
Thus, the constructed class limits are known as lower class boundary and upper
class boundary, respectively, in case of continuous distribution. The class width or
the class interval is equal to the difference between the upper class boundary and
the lower class boundary of the class interval.
Mid Value:
Mid value of a class is the average of the lower limit/boundary and the upper limit/
boundary. Mid values are generally taken as representatives of different classes.
Frequency Density:
As we know it, “density” is mass per unit volume, that is, d ¼ m/v gm/cc where m in
gram is the mass for a “v” (cc) of a matter. Similarly, frequency density is the
frequency of a particular class per unit of class width. For instance, the frequency
density of the class 3.5–6.5 is 10/3 = 3.333. Similarly, for class 6.5–9.5 is 22/3 =
7.33. Frequency density indicates the concentration of observations in different
classes of a frequency distribution table per unit of class width.
Relative Frequency:
Relative frequency is defined as the proportion of observation in a particular class to
a total number of observations. Sometimes, the relative frequency is expressed in
percentage also.
Cumulative Frequency:
Cumulative frequency of a class is defined as the number of observations up to a
particular class (less than type) or above a particular class (greater than type).
Cumulative frequency gives an instant idea about the distribution of frequencies
among the classes and the cutoff points.
Presentation of Information
Edited/scrutinized data can either be used for the application of statistical
methodologies and/or presented in a suitable form to present and concise the
information from the recorded data.

Different forms of presentation of data are:


(1) Textual form, (2) Tabular form, (3) Diagrammatic form

1. Textual Form:
In a textual form of data presentation, information is presented in a form of a
paragraph. In many of the research papers or articles, while discussing the
findings of the research outcome, this method is adopted for explanation.
2. Tabular Form:
It is the most widely used form of data presentation. A large number of data can be
presented in a very efficient manner in a table. At the same time, it can bring out
some of the essential features of the data.
A table consists of the following parts: (1) title, (2) stub, (3) caption, (4) body, and
(5) footnote.

Title: The title of a table gives a brief description of the content or the subject
matter presented in a table. Generally, the title is written in short and concise form
such that it becomes easily visible and eye-catching at a glance and through light
to the content of the table.
Stub: A table is divided into a number of rows and columns. Stub is used to describe
the contents of the rows of a table. Different classes represent the rows of the table,
and the heading “classes” at the top left corner of the table is the stub. With the help
of this stub, one can extract the features of the rows in a table.
Caption: Caption describes the content of each and every column. Thus, “mid
value,” “frequency,” etc., are the captions for the different columns in Tables. With the
help of the “mid value” or “frequency,” one can understand how the mid values or the
frequencies are changing over different classes (stub).
Body: Relevant information is given in the body of a table.
Footnote: Footnotes are not compulsory but may be used to indicate the source of
information or a special notation (if) used in the table. Though a tabular form is more
appealing than a textual form of presentation, it is only applicable to literate and
educated persons.
3. Diagrammatic Form: Keeping in mind the variety of users, this form of
representation is more convincing and appealing than the other forms of data
presentation. This form of presentation is easily understood by any person, layman
as well as an educated person.

Different diagrammatic forms of presentation are (a) line diagram, (b) bar diagram,
(c) histogram, (d) frequency polygon, (e) cumulative frequency curve or Ogive, (f) pie
charts, (g) pictorial diagrams, (h) maps, etc.; within each type, there may be variant
types.
(a) A frequency line for discrete as well as for continuous distributions can be
represented graphically by drawing ordinates equal to the frequency on a
convenient scale at different values of the variable, X.
(b) Bar diagram: Instead of drawing a line joining the class frequencies, one
represents the frequencies in the form of bars. In bar diagrams, equal bases on a
horizontal (or vertical) line are selected, and rectangles are constructed with length
proportional to the given frequencies on a suitably chosen scale. The bars should be
drawn at equal distances from one another.
A more complicated form of bar diagrams is the clustered column/bar, stacked
column and bar, and 100% stacked column/bar diagram. In clustered bar diagrams,
values of the same item for different categories are compared. While in a stacked
column, the proportions of the values across the categories are shown.
A. Bar Diagram

A. Stacked Bar Diagram A. Cluster Bar Diagram


(c) Histogram: Histogram is almost similar to that of a bar diagram for discrete
data; the only thing is that the reflection of nonexistence of any gap between two
consecutive classes is also reflected by leaving no gap between two consecutive
bars. Continuous grouped data are usually represented graphically by a
histogram. The rectangles are drawn with bases corresponding to the true class
intervals and with heights proportional to the frequencies. With all the class
intervals equal, the areas of a rectangle also represent the corresponding
frequencies. If the class intervals are not all equal, then the heights are to be
suitably adjusted to make the area proportional to the frequencies.

Yield Frequency Histogram of 130 Varieties of Rice


(d) Frequency Polygon: If the midpoints of the top of the bars in histogram are
joined by straight lines, then a frequency polygon is obtained. To complete the
polygon, it is customary to join the extreme points at each end of the frequency
polygon to the midpoints of the next higher and lower class intervals on a horizontal
line (class axis here). For the purpose, generally two hypothetical classes, one
before the lowest actual class and another at the last of the highest actual class,
are added with zero observation in both cases so that the frequency line diagram
completes a bounded area with horizontal X-axis.

Yield Frequency Polygon and Histogram of 130 Varieties of Rice


(e) Pie Chart: The basic idea behind the formation of a pie diagram is to take the
whole frequencies in 100% and present it in a circle with 360 o angle at the center.
In the frequency distribution table, ordinary frequency or relative frequency can
effectively be used in the form of a pie diagram.
The advantage of pie diagram is that different characteristics measured in different
units and or under different situations can be compared with the help of this
diagram. Moreover, along with other diagrammatic data presentation, this method is
also appealing to both illiterate and educated persons.

Pie Diagram of Yield Frequency of 130 Varieties of Rice


(f) Cumulative Frequency Curve (Ogive): Partitioning the whole data set can very well
be made with the help of a cumulative frequency graph, also known as OGIVE. It is of
two different types, that is, “less than type” and “more than equal to type.” For “less
than type,” one plots the points with the upper boundaries of the classes as abscissa
and the corresponding cumulative frequency as ordinates. The points are joined by a
freehand smooth curve. For “more than equal to type” one plots the points with the
lower boundaries of the classes as abscissas and the corresponding cumulative
frequencies as ordinates. Then, the points are joined by a freehand smooth curve.

Different Forms of Cumulative Frequency Curve (Ogive)


(g) Pictorial Diagram: To make the information lively and easy to understand by any
user, sometimes information is presented in pictorial forms. Instead of a bar diagram
or line diagram or pie chart, one can use pictures in the diagrams.

Pictorial Presentation of a Frequency Distribution of the no. of Insect per Hill


(h) Maps: Statistical maps are generally used to represent the distribution of particular
parameters like a forest area in a country, crop-producing zone, different mines located
at different places in a country, rainfall pattern, population density, etc. The essence of
a pictorial diagram or map lies in their acceptability to a wide range of users including
the illiterate people. This type of data representation is easily conceived by any person,
but utmost care should be taken to make the statistical map true to the sense and
scale, etc.
Pictorial Presentation of Major Mango Production Areas in
Bangladesh (Source: https://images.app.goo.gl/XidgsgvTSq6e1WqT7)
It should clearly be noted that all types of presentation are not suitable for all types
of data, at all situations, and to all users. The appropriate type of presentation is to
be decided on the basis of the type of information, the objective of the presentation,
and the person concerned for whom the presentation is basically meant.
THANK YOU ALL

Source: https://images.app.goo.gl/q2m8HcWDWkKfy7mWA

You might also like