[go: up one dir, main page]

0% found this document useful (0 votes)
0 views3 pages

Data and Excel Analytics

The document outlines three types of data: structured (organized in tables), semi-structured (tagged data), and unstructured (messy data). It details Excel tools for data modification, including sorting, filtering, VLOOKUP, and pivot tables, as well as methods for creating distributions like histograms and probability distributions. Additionally, it discusses measures of central tendency such as mean, median, mode, and percentiles to describe data characteristics.

Uploaded by

sri1239099
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views3 pages

Data and Excel Analytics

The document outlines three types of data: structured (organized in tables), semi-structured (tagged data), and unstructured (messy data). It details Excel tools for data modification, including sorting, filtering, VLOOKUP, and pivot tables, as well as methods for creating distributions like histograms and probability distributions. Additionally, it discusses measures of central tendency such as mean, median, mode, and percentiles to describe data characteristics.

Uploaded by

sri1239099
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Data and Excel for Analytics

Types of Data

1. Structured Data

o Well-organized, stored in tables with rows and columns.

o Easy to store, search, and analyze.

o Example: Sales records, employee databases, bank transactions.

o In Excel: Appears as spreadsheets with headers like Name, Age, Salary.

2. Semi-structured Data

o Does not fit neatly into tables, but still has some organizational tags or
markers.

o Examples: JSON files, XML data from web APIs.

o In Excel: Can be imported and converted into tables using Power Query.

3. Unstructured Data

o No fixed format; messy and harder to analyze.

o Examples: Videos, audio files, social media comments, images, emails.

o In Excel: Needs preprocessing or text mining tools before analysis.

In short:

• Structured = Tables

• Semi-structured = Tagged data

• Unstructured = Freeform/messy data

Modifying Data in Excel

Excel provides many built-in tools for cleaning and preparing data for analytics:

• Sorting & Filtering:

o Sorting arranges data in ascending/descending order (e.g., sort sales by


amount).

o Filtering shows only relevant rows (e.g., filter customers from “Chennai”).
• VLOOKUP / XLOOKUP:

o Searches for a value in one column and returns related info from another
column.

o Example: Find employee salary by looking up their ID.

• Pivot Tables:

o Summarize large datasets quickly.

o Example: Show total sales by region or average marks by class.

• Data Cleaning Tools:

o Remove duplicates.

o Replace or fill missing values.

o Use “Text to Columns” to split combined data (like “FirstName LastName”).

These steps ensure data is clean, consistent, and ready for analysis.

Creating Distributions from Data

• Histograms (Frequency Distribution):

o Show how often values fall into ranges (bins).

o Example: Number of students scoring 0–10, 11–20, … in an exam.

o In Excel: Insert → Charts → Histogram.

• Probability Distributions:

o Represent the likelihood of outcomes.

o Example: Demand for a product may follow a Normal distribution.

o In Excel: Use NORM.DIST(), BINOM.DIST() functions for probability


calculations.

Histograms = visualize past data; Probability distributions = forecast future outcomes.

Measures of Location (Central Tendency)

1. Mean (Average):
o Formula: =AVERAGE(range)

o Example: If sales = {10, 20, 30}, mean = (10+20+30)/3 = 20.

o Use: General trend, but sensitive to outliers.

2. Median (Middle Value):

o Formula: =MEDIAN(range)

o Example: For {10, 20, 500}, median = 20 (better than mean when outliers
exist).

3. Mode (Most Frequent Value):

o Formula: =MODE.SNGL(range)

o Example: For {2, 2, 3, 4}, mode = 2.

o Use: Identifies most common category/value.

4. Percentiles/Quartiles:

o Formula: =PERCENTILE.EXC(range, k) or =QUARTILE.EXC(range, q)

o Example: 90th percentile = score below which 90% of students fall.

o Quartiles divide data into 4 equal parts (Q1, Q2, Q3).

Together, these measures help describe the center and spread of data.

Summary

• Data Types: Structured (tables), Semi-structured (XML/JSON), Unstructured


(media/text).

• Excel Modifications: Sorting, Filtering, VLOOKUP, Pivot Tables, Data Cleaning.

• Distributions: Histograms (past frequency), Probability Distributions (forecasting).

• Measures of Location: Mean, Median, Mode, Percentiles/Quartiles.

You might also like