DATA ANALYTICS
Avinash seekoli
WHAT IS DATA
Data refers to raw facts, figures, or details that can be processed or analyzed to
provide meaningful information. Data can come in various forms, such as
numbers, text, images, audio, or video, and it serves as the foundation for
generating insights and making decisions
EXAMPLES OF DATA
1. Numerical Data:
○ Temperature readings: 25°C, 30°C, 28°C
○ Sales figures: $500, $1000, $1500
2. Textual Data:
○ Customer reviews: "The product is excellent," "Delivery
was slow"
○ Names: "John", "Jane", "Ali"
3. Categorical Data:
○ Gender: Male, Female, Non-binary
○ Colors: Red, Blue, Green
Time Series Data:
● Stock prices over time: Day 1: $100, Day 2: $102, Day 3: $101
● Monthly rainfall: Jan: 50mm, Feb: 70mm
Image Data:
● A photograph of a cat
● MRI scan images
Audio Data:
● A recording of someone speaking
● Music files (e.g., MP3
TYPES OF DATA
Data categories can be measured using different scales depending on the
nature of the data. Here are the primary categories and how they are
measured
Nominal Data :
Nominal data is a basic data type that categorizes data by labeling or
naming values such as Gender, hair color, or types of animal. It does not
have any hierarchy
Ordinal data:Ordinal data involves classifying data based on rank, such
as social status in categories like ‘wealthy’, ‘middle income’, or ‘poor’.
However, there are no set intervals between these categories.
CNTD
Interval data:
Interval data has meaningful intervals between values, but there is no true zero point.
The difference between 20°C and 30°C is the same as between 30°C and 40°C (a 10-degree
difference), making intervals meaningful.
However, 0°C does not mean the absence of temperature—it’s just a point on the scale.
Ratio data
Ratio data has both meaningful intervals between values and a true zero point.
A height of 0 cm means no height at all, making zero meaningful. It makes sense to say that a
person who is 180 cm tall is twice as tall as a person who is 90 cm, as the ratio is meaningful
WHAT IS DATA ANALYTICS
Data Analytics is used to get conclusions by processing the raw
data.
It is helpful in various businesses as it helps the company to
make decisions based on the conclusions from the data.
Basically, data analytics helps to convert a Large number of
figures in the form of data into Plain English i.e., conclusions which
DATA SCIENCE
Data Science is a field that deals with extracting meaningful information and insights by
applying various algorithms preprocessing and scientific methods on structured and
unstructured data.
This field is related to Artificial Intelligence
Data Science is used in almost every industry today that can predict customer behavior and
trends and identify new opportunities.
Businesses can use it to make informed decisions about product development and marketing.
It is used as a tool to detect fraud and optimize processes.
Governments also use Data Science to improve efficiency in the delivery of public services.
TYPES OF DATA ANALYTICS
Descriptive Analytics tells you what happened in the past.
Diagnostic Analytics helps you understand why something happened in the past.
Predictive Analytics predicts what is most likely to happen in the future.
Prescriptive Analytics recommends actions you can take to affect those
outcomes.
Descriptive analysis. This step, also known as data mining, is the most common
method of data analysis where large sets of data are captured and analyzed for any
patterns that can help scientists gain deeper insight into business processes. This kind
of analysis lets specialists find answers to key statistical questions. They might
want to know how much revenue the business is generating, how many customers
visit the business on average, and how much profit the business is taking away.
Diagnostic or inferential analysis. As the name suggests, diagnostic inferential
analysis determines the root cause of current problems. It involves using data to find
out exactly how and why a business process failed.
Predictive analysis. By utilizing previous data, specialists can use this
process to estimate what will likely happen in the future. These predictions
are made on the basis of historical data and past consumer trends.
Prescriptive analysis. This helps specialists gain a statistical perspective on
an important business decision. Is it the right time to launch a new product?
Prescriptive analysis will answer that question. Can we afford to scale up
right now? This type of analysis will help you find out.
BIG DATA
● Big Data is the field of collecting the large data sets from various sources like social media,
GPS, sensors etc and analyzing them systematically and extract useful patterns using some
tools and techniques by enterprises.
● Most of the data is generated from social media sites like Facebook, Instagram, Twitter, etc,
and the other sources can be e-business, e-commerce transactions, hospital, school, bank data,
etc.
● This data is impossible to manage by traditional data storing techniques. So Big-Data came
into existence for handling the data which is big and impure.
● Before analyzing and determining the data, the data architecture must be designed by the
architect.
DATA ARCHITECTURE DESIGN AND DATA MANAGEMENT
Data architecture design is set of standards which are composed of certain policies, rules,
models and standards which manages, what type of data is collected, from where it is
collected, the arrangement of collected data, storing that data, utilizing and securing
the data into the systems and data warehouses for further analysis.
Data is one of the essential pillars of enterprise architecture through which it succeeds in
the execution of business strategy.
ARCHITECT
Data integration is the process of combining data from
different sources and providing users with a unified view.
This involves consolidating, transforming, and cleaning
data to make it consistent, reliable, and usable for
analysis, reporting, or business processes
A data lake is a centralized repository that allows you to
store all your structured, semi-structured, and
unstructured data at any scale. Unlike a traditional data
warehouse, which stores processed and refined data, a
data lake stores raw data in its native format until it’s
needed for analytics or processing.
A data mart is a smaller, specialized subset of a
data warehouse designed for a specific department
or business function, such as sales or finance. It
allows users to quickly access relevant data without
needing to go through the entire data warehouse.
Metadata is the "data about data," offering key
information to help describe, manage, and organize
data across systems. It plays a crucial role in
enabling data discovery, management, and
security across various applications and
industries.
Metadata is information that describes other
data, providing context and details about its
characteristics. In real-time, it helps you
understand key aspects like:
• What the data is (e.g., file name, type, title).
• Who created or owns it (e.g., author, owner).
• When it was created or modified (e.g.,
timestamps).
• How it’s formatted or structured (e.g., file size,
format).
• Permissions (e.g., access rights, usage
restrictions).
• Master Data: Used to identify consistent
information (e.g., customer profiles in CRM).
• Reference Data: Provides fixed classifications
(e.g., currency code validation in transactions).
Both help ensure data consistency, quality, and
accuracy across systems in real-time operations.
Data quality refers to the accuracy, completeness, reliability, and relevance of data. High-
quality data is essential for making sound decisions, ensuring that the data is fit for its
intended purpose.
Data governance is the framework of rules, policies, and procedures that ensure data is
managed properly, securely, and used consistently across an organization. It includes defining
who has authority over data and how it should be handled.
Real-time example: A bank implements data governance to ensure customer information is
accurate and secure. When a customer updates their contact details, the bank’s system verifies
and standardizes the data before it is distributed to various departments, ensuring consistency
and compliance with privacy regulations in real time
Data privacy refers to the protection and proper
handling of personal or sensitive information,
ensuring it is collected, stored, and shared in ways
that safeguard individuals' rights and prevent
unauthorized access or misuse.
DATA QUALITY
Data quality refers to how reliable, accurate, and usable your data is for analysis.
Poor data quality can lead to incorrect conclusions, so it’s essential to address
common
issues like noise , outliers , missing values , and duplicate data . Here’s a
breakdown in simple terms:
1.Noise
Definition : Noisy data is a meaningless data that can’t be interpreted by
machines. It can be generated due to faulty data collection, data entry errors
etc.
- Example : If a sensor reading gets random spikes due to interference, that’s
noise.
Impact : It makes your data less accurate and harder to interpret.
Solution : Use filtering techniques to remove or reduce noise.
2. Outliers
- Definition : Data points that are very different from the rest of the data.
- Example : In a group of people’s ages (20, 21, 22, 85), the age 85 might be
an outlier.
3. Missing Values
- Definition : When some data points are absent or not recorded.
- Example : In a survey, some respondents might skip answering certain
questions.
- Impact : Missing values can affect the accuracy of the results or analysis.
- Solution : You can either ignore missing data, fill in missing values (e.g., with
averages), or use algorithms that can handle missing data.
4. Duplicate Data
- Definition : When the same data is recorded more than once.
- Example : A customer may accidentally be registered twice in a database with
slight variations in their name.
- Impact : It can inflate results and cause incorrect conclusions.
- Solution : Detect and remove duplicates to keep data accurate.
Addressing these issues ensures that your data is clean, consistent, and ready for
reliable analysis.