[go: up one dir, main page]

0% found this document useful (0 votes)
5 views5 pages

Data Visualization Cleaning and Errors

The document discusses data cleaning, highlighting common errors such as outliers, missing data, and erroneous data, along with methods to handle them. It also covers data visualization techniques, including area graphs, bar charts, histograms, line graphs, scatterplots, flow charts, and pie charts, emphasizing their importance in understanding and communicating data insights. These techniques aid in defining strategies for model selection and presenting data trends effectively.

Uploaded by

joshimanoj8829
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
5 views5 pages

Data Visualization Cleaning and Errors

The document discusses data cleaning, highlighting common errors such as outliers, missing data, and erroneous data, along with methods to handle them. It also covers data visualization techniques, including area graphs, bar charts, histograms, line graphs, scatterplots, flow charts, and pie charts, emphasizing their importance in understanding and communicating data insights. These techniques aid in defining strategies for model selection and presenting data trends effectively.

Uploaded by

joshimanoj8829
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 5
1) Data Cleaning : Data cleaning helps in getting rid of commonly found errors and mistakes in a data set. These are the 3 commonly found errors in data. 1) Outliers: Data points existing out of the range. 2) Missing data: Data points missing at certain places. 3) Erroneous data: Incorrect data points. hey GS ky. Outliers =¢ Bream tao [Date Bir Ered oe % on An outlier is a data point in a dataset A that is distant from all other & We observations. 6 fs 7 é ° 2 30 An outlier is something that behaves differently from the combination/ collection of the data. ~_ ”- Mig ae Ey dbase ie tee 8 te? : S 0 SE taey we te ising ves 8 he 3 hy ONL data set v We can handle them in two ways: 1. By eliminating the rows of missing values, (Generally, not recommended as it might reduce the data set to some cextent leading to less data to be trained) 2By Using an imputer to find the best possible substitute to replace missing values. 23. Erroneous Data: [Erroneous data is test data that falls outside of what is. lacceptable and should be rejected by the system. Student Name| RIVAGEORGE scsi | JOSHUA SAM rl [APARNA BINU A IDHARDHVR x INITHILAM Er ATHULYAMS a pee (ERT SHNAATH 7 J 1) Data Visualization | pod | oe wubb eno iirem ont totic er a Sweat 1) Uae j chads 7 [WW Peereaing Maks i Mod CD Depiadey ctaokgg 7 SA te Wlenhly Hebd tae you howe Dainty ogg Sidy Wed Cahaahe H mohay) pattems contained within the data 2) Ithelps us define strategy for which model to use at a later stage. Visual representation is easier to understand and communicate to others. Example ‘pe Table & Yearly Employee Wage Cost sssooa 287 $3885, sam § aio) S000 sen $7536 mosis 3500) 22088 $3062) grap mses $45,128) Wen § 3076) seam uegn2_§_7s8 2aaaon_$sooe7| see ties $s) cman 3: Data Visualization Techniques 1. Area Graphs ‘Area Graphs are Line Graphs but with the area below the line filled in with a'certain colour or texture. Like Line Graphs, Area Graphs are used to display the development of quantitative TLEDPELE PEPGEETl : iene memos values over an interval or time period. They ‘are most commonly used to show trends, rather than convey specific values. 2. BarChans The classic Bar Chart uses either horizontal or vertical bars (column chart) to show discrete, numerical comparison across categories. Bars Charts are distinguished from Histograms, as they do not display continuous developments over an interval. Bar Chart's discrete data is categorical data and therefore answers the question of "how many?" in each category, 3. Histogram A Histogram visualizes the distribution of data over a continuous interval or certain time period. Each bar in a histogram represents the tabulated frequency at each interval/bin. Histograms help give an estimate as to where values are concentrated, what the extremes are and whether there are any gaps or unusual values. 4. Line Graphs Line Graphs are used to display quantitative values over a continuous interval or time period. A Line Graph is most frequently used to show trends and analyze how the data has changed over time. Line Graphs are drawn by first plotting data points on a Cartesian coordinate grid, then connecting a line between all of these points. ‘Typically, the y-axis has a quantitative value, while the x-axis is a timescale or a sequence of intervals. Negative values can be displayed below the x-axis, 5. Scaterpots A scatterplot is a type of data display that shows the relationship between two numerical variables. Each member of the dataset gets plotted as & point whose (x, y) coordinates relates 10 its values for the Variables, a 6. Flow Chars This type of diagram is used to show the sequentiabien> _seves memes — steps of a process. Flow Co Charts map out a_ process using a series of connected symbols, which makes process easy to understand and aids in its communication to other people. Flow Charts are useful for explaining how a complex andlor abstract procedure, system, concept or algorithm, work. Drawing a Flow Chart can also help in planning an developing an existing one relationship or correlation between the two variables exists Co A SF re ai 1. Pie Charts Die Charts help show proportions and percentages between categories, by dividing a circle into proportional segments. Each ac length represents @ proportion of each category, while the full circle represents the total sum of all the data, equal to 100%. Pie Charts are ideal for giving the reader a quick idea

You might also like