[go: up one dir, main page]

0% found this document useful (0 votes)
8 views8 pages

Unit 5 Introduction To Big Data

Uploaded by

renuchoubey2680
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views8 pages

Unit 5 Introduction To Big Data

Uploaded by

renuchoubey2680
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

G.D.

GOENKA PUBLIC SCHOOL


Shivpuri Link Road, Gwalior
CLASS – XII Artificial Intelligence

UNIT-5 INTRODUCTION TO BIG DATA AND ANALYTICS


Data

 Definition: Data is the foundational element of an AI system. It consists of raw facts, figures, and statistics
collected from various sources.
 Purpose: Data serves as the training material for AI models, helping them learn patterns and make
predictions or classifications.

Features of Data

Features are measurable characteristics or attributes of data that contribute to the AI model. They determine what
the model will use to make decisions. Data features refer to the type of data you want to collect. Data features
would be salary amount, increment percentage, increment period, bonus, etc.

Key characteristics of data include:

 Accuracy: How correct and precise the data is.


 Completeness: Whether all necessary data points are present.
 Consistency: The uniformity of data across sources and time.
 Relevance: How suitable the data is for achieving the desired output.
 Volume: The amount of data available, influencing the model's reliability.
 Timeliness: Data should be recent and up-to-date to remain valuable.
 Accessibility: How easy it is to access and retrieve data.

Big Data

 Big Data refers to large and complex datasets that cannot be effectively managed, processed, or analyzed
using traditional data processing methods.
Type of Big Data
1. Structured Data
2. Unstructured Data
3. Semi-structured Data

Structured data
 Structured data is generally tabular data that is represented by columns and rows in a database.
 Databases that hold tables in this form are called relational databases.
 The mathematical term “relation” specifies a formed set of data held as a table.
 In structured data, all rows in a table have the same set of columns.
 The data that can be stored in relational databases or spread sheets (like Excel) is the best example of
structured data.

Unstructured data
 Unstructured data is information that either is not organized in a pre-defined manner or does not have a
pre-defined data model.
 Unstructured information is a set of text-heavy but may contain data such as numbers, dates, and facts as
well.
 Videos, audios, and binary data files might not have a specific structure. They’re assigned
to unstructured data.

Semi structured data

 Semi-structured data falls somewhere between structured and unstructured data.


 It is simpler to manage than unstructured data even though it is not as structured as structured data.
 Semi-structured data uses metadata to identify certain characteristics and organize data into fields,
allowing for some level of organization and analysis.
 An example of semi structured data is a social media video with hash tags used for categorization, blending
structured elements like hash tags with unstructured content like the video itself.

Benefits of Big Data Processing

Processing Big Data effectively offers several benefits, such as:


1. Enhanced Decision Making:
o Big Data analytics allows businesses to make data-driven decisions, improving strategies and
outcomes with real-time insights.
2. Improved Operational Efficiency:
o Organizations can identify inefficiencies in their processes and optimize operations. Predictive
analytics can help forecast demand and streamline production.
3. Personalized Customer Experiences:
o Big Data enables companies to understand customer preferences and behavior, leading to
personalized marketing, targeted ads, and tailored products.
4. Fraud Detection and Risk Management:
o Financial institutions and businesses can detect fraudulent activities in real-time and mitigate risks
using Big Data analytics.
5. Innovation and New Product Development:
o Insights derived from Big Data allow companies to innovate and create new products or services
based on customer needs and market trends.
6. Predictive Analytics and Forecasting:
o Big Data supports predictive analytics models, which can anticipate future trends and help
businesses prepare accordingly.

Disadvantages of Big Data

1. Privacy and Security Concerns – Risk of data breaches, misuse of personal information, and
unauthorized access.
2. Data Quality Issues – Data may be inaccurate, incomplete, or unstructured, leading to errors in
analysis.
3. Technical Complexity – Requires advanced tools and skilled professionals, which many organizations
lack.
4. Regulatory Compliance – Laws like GDPR and India’s DPDP Act, 2023, make handling personal data
more difficult.
5. High Cost – Setting up infrastructure, storage, and hiring experts is expensive, especially for small
businesses.

Uses of Big Data


Big Data finds application across numerous sectors due to its ability to provide valuable insights and facilitate
informed decision-making:

1. Healthcare:
o Example: Predicting disease outbreaks, analyzing patient records for personalized treatment,
optimizing hospital management, and improving clinical trials.
2. Finance and Banking:
o Example: Fraud detection, risk assessment, algorithmic trading, personalized banking solutions,
and customer credit scoring.
3. Retail and E-commerce:
o Example: Analyzing customer purchase patterns, optimizing inventory, personalized marketing
campaigns, and dynamic pricing strategies.
4. Manufacturing and Supply Chain:
o Example: Predictive maintenance of equipment, optimizing supply chain logistics, quality control,
and demand forecasting.
5. Telecommunications:
o Example: Network optimization, customer churn analysis, personalized offers, and real-time
monitoring of network traffic.
6. Social Media and Marketing:
o Example: Sentiment analysis, understanding customer preferences, ad targeting, and measuring
the effectiveness of marketing campaigns.
7. Smart Cities and IoT:
o Example: Traffic monitoring, energy management, waste management, and safety initiatives
through real-time data from connected devices.
8. Education:
o Example: Analyzing student performance, identifying learning patterns, and developing
personalized learning programs.
9. Government and Public Services:
o Example: Predicting and managing natural disasters, monitoring infrastructure health, and
improving public safety and law enforcement.

Characteristics of Big Data


 The “characteristics of Big Data” refer to the defining attributes that distinguish large and complex
datasets from traditional data sources.
 These characteristics are commonly described using the "3Vs" framework: Volume, Velocity, and
Variety.
 The 6Vs framework provides a holistic view of Big Data, emphasizing not only its volume, velocity, and
variety but also its veracity, variability, and value.
 Understanding and addressing these six dimensions are essential for effectively managing, analyzing, and
deriving value from Big Data in various domains .
1. Velocity
o Refers to the speed at which data is generated, delivered, and analyzed.
o Example: Google generates 40,000+ search queries per second.
2. Volume
o Refers to the huge amount of data generated daily.
o Example: Around 328.77 million terabytes of data are created every day.
o Big data is measured in gigabytes, terabytes, petabytes, or even exabytes.
3. Variety
o Refers to the different types of data:
 Structured (tables, numbers)
 Unstructured (images, videos, audio, social media posts)
 Semi-structured (XML, JSON)
o Challenge: Unstructured data is harder to store in traditional databases but provides valuable
insights.
4. Veracity
o Refers to the trustworthiness, accuracy, and quality of data.
o Not all data is useful → it must be cleaned before analysis.
o Ensures reliability of data for decision-making.
5. Value
o The ultimate goal of Big Data is to extract value from it.
o Without value, other V’s don’t matter.
o Example: Businesses use big data to gain insights into customer behavior and improve decision-
making.
6. Variability
o Refers to the inconsistency or unpredictability of data.
o Data streams may change frequently, making it harder to analyze.
o The challenge is to extract meaningful patterns even under fluctuating conditions.

 Velocity = Speed
 Volume = Amount
 Variety = Types
 Veracity = Trust
 Value = Usefulness
 Variability = Consistency under change
Big Data Analytics
Data Analytics

 Definition: The process of analyzing datasets to uncover insights, trends, and patterns.
 Scope: Works on small to medium datasets.
 Tools:
o Statistical analysis software
o Data visualization tools
o RDBMS (Relational Database Management Systems)

2. Big Data Analytics

 Definition: Involves methods, tools, and practices for analyzing and managing large, complex datasets
(Big Data).
 Tasks Covered:
o Data collection
o Data organization
o Data storage
o Data analysis
 Objective:
o Identify patterns and solve problems
o Improve decision-making
o Enhance business performance
o Provide forecasts and insights
 Importance in Business:
o Helps evaluate and improve business processes
o Supports better customer understanding
o Enables informed decision-making

4. Global Trends Leading to Big Data Analytics

1. Moore’s Law –The growth of computing power: Growth in computing power enables handling of
massive datasets.
2. Mobile Computing – Data at our fingertips: Smartphones provide real-time data collection &
connectivity.
3. Social Networking – A source of Massive Data: Platforms like Facebook & Twitter generate huge
volumes of user data.
4. Cloud Computing –Storing and Processing Data Remotely: Provides on-demand storage & processing
power without heavy infrastructure costs.
4. Types of Big Data Analytics

1. Descriptive Analytics
o Summarizes historical data to identify patterns & trends.
o Example: Monthly sales reports.
2. Diagnostic Analytics
o Explains why something happened by analyzing past data.
o Example: Analyzing reasons for a drop in sales.
3. Predictive Analytics
o Uses historical data to forecast future outcomes.
o Example: Predicting next quarter’s sales.
4. Prescriptive Analytics
o Suggests best possible actions based on insights.
o Example: Recommending discounts to boost sales.

Working on Big Data Analytics

Big Data Analytics involves collecting, processing, cleaning,

and analyzing large datasets to improve decision-making.

Steps:

1. Gather Data – Collect structured and unstructured data from sources like cloud storage, mobile apps,
and IoT sensors.
2. Process Data – Organize and handle data for analysis:
o Batch Processing → analyzes large blocks of data over time.
o Stream Processing → analyzes small batches quickly for real-time decisions.
3. Clean Data – Remove errors, duplicates, and irrelevant entries; format properly for accurate results.
4. Analyze Data – Use advanced analytics to convert big data into meaningful insights.

Mining Data Streams

A data stream is a continuous, real-time flow of data generated by sources such as sensors, satellite images,
social media, financial transactions, or website traffic.

Mining data streams means extracting patterns, insights, and trends from this continuous flow of data. Unlike
traditional data mining, it does not store the entire dataset; instead, it processes data as it arrives.

🔹 Example:
Websites continuously receive user activity data. If there is a sudden surge in searches for “election results”, it
may indicate ongoing elections or high public interest in political outcomes.
Future of Big Data Analytics

The future of Big Data will be shaped by several major technological trends:

1. Real-Time Analytics
o Data will be processed instantly.
o Helps in immediate decision-making, e.g., monitoring customer behavior, fraud detection, or
tracking supply chain.
2. Advanced Predictive Models
o AI and Machine Learning will make predictive analytics more accurate.
o Organizations will be able to forecast trends, customer needs, and risks with greater precision.
3. Quantum Computing
o Will provide massive processing power beyond classical computers.
o Can solve highly complex problems in seconds that would take years otherwise, revolutionizing
Big Data analysis.

You might also like