[go: up one dir, main page]

0% found this document useful (0 votes)
162 views20 pages

03 Facets of Data in Data Science

The document outlines key facets of data in data science, including types of data (structured, semi-structured, unstructured), sources of data (primary, secondary, real-time), and characteristics of data (volume, velocity, variety, veracity, value). It details the data science process, which encompasses setting research goals, retrieving and preparing data, exploring data, modeling, and presenting results. Additionally, it emphasizes the importance of data visualization and management techniques to effectively communicate findings and support decision-making.

Uploaded by

theophilusindia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
162 views20 pages

03 Facets of Data in Data Science

The document outlines key facets of data in data science, including types of data (structured, semi-structured, unstructured), sources of data (primary, secondary, real-time), and characteristics of data (volume, velocity, variety, veracity, value). It details the data science process, which encompasses setting research goals, retrieving and preparing data, exploring data, modeling, and presenting results. Additionally, it emphasizes the importance of data visualization and management techniques to effectively communicate findings and support decision-making.

Uploaded by

theophilusindia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Facets of Data

in Data Science

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
Types of Data

 • Structured: Organized in rows and columns


(e.g., databases).
 • Semi-structured: Partially organized (e.g., XML,
JSON).
 • Unstructured: Lacks a defined format (e.g., text,
images, videos).

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
Sources of Data

 • Primary: Surveys, experiments, direct


observations.
 • Secondary: Databases, government portals,
academic journals.
 • Real-time: IoT devices, GPS, streaming
platforms.

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
Characteristics of Data

 • Volume: Large quantities of data generated


every second.
 • Velocity: Speed at which data is created and
processed.
 • Variety: Different formats like text, images,
videos.
 • Veracity: Accuracy and trustworthiness.
 • Value: Extracting meaningful insights.

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
Data Representation

 • Tables: Organized in rows and columns.


 • Graphs: Pie charts, bar graphs, etc.
 • Images and Videos: Used in multimedia
analysis.
 • Text: Found in logs, emails, and documents.

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
Key Facets of Data Science

 The key facets of data science:

* Identifying data structures


* Cleaning, filtering, reorganizing, augmenting, and
aggregating data
* Visualizing data
* Data analysis, statistics, and modeling
* Machine learning
* Assembling data processing pipelines to connect
Department of AI & DS, VSB College of Engineering Technical Campus, Coim
batore

these steps
Data Cleaning and
Preprocessing
 • Handle missing values: Imputation or removal.
 • Remove duplicates and outliers.
 • Normalize data for consistent scales.
 • Encode categorical variables.

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
Data Storage and
Management
 • Relational Databases: SQL for structured data.
 • NoSQL Databases: Handle unstructured or semi-
structured data.
 • Cloud Storage: AWS, Azure, Google Cloud.
 • Data Lakes: Store raw data for later processing.

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
Data Analysis

 • Descriptive Analysis: Summarizes data trends.


 • Predictive Analysis: Forecasts future outcomes.
 • Prescriptive Analysis: Offers actionable
recommendations.
 • EDA: Techniques to visualize data trends and
outliers.

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
Data Visualization

 • Purpose: Communicate findings effectively.


 • Tools: Matplotlib, Seaborn, Tableau, Power BI.
 • Best Practices: Use appropriate charts, avoid
clutter, focus on key insights.

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
Facet by Row and Facet by
Column
 • Facet by Row: Data subsets in rows (e.g., sales
by region).
 • Facet by Column: Data subsets in columns (e.g.,
temperature by city).
 • Combined Facets: Grids with rows and columns
for multi-dimensional data.

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
Positioning in Data
Visualization
 • Importance: Ensures accurate interpretation of
data.
 • Key Aspects: Axis alignment, labels, proximity,
whitespace management.
 • Use Cases: Overlaying trends, aligning bar plots,
grid-based visuals.

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
Data Science
Process

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
Data Science
Process

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
1. Setting a Research Goal

• Ensure all stakeholders understand the what, how, and


why of the project.
• Create a project charter that outlines:
 - Objectives
 - Scope
 - Key deliverables

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
2. Retrieving Data

 • Identify suitable data sources.


 • Gain access to data from data owners.
 • Result: Raw data, which may require polishing
and transformation.

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
3. Data Preparation

 • Transform raw data into a usable format:


 - Detect and correct errors.
 - Combine data from multiple sources.
 - Apply necessary transformations.
 • Prepares data for visualization and modeling.

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
4. Data Exploration

 • Gain a deep understanding of the data:


 - Identify patterns, correlations, and deviations.
 - Use visual and descriptive techniques.
 • Insights from this step guide the modeling
process.

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
5. Data Modeling

 • Build models to gain insights or make


predictions:
 - Use the insights from previous steps to guide
model selection.
 - Combine simple models for better performance
(if applicable).
 • Aim to meet the objectives stated in the project
charter.

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore
6. Presenting Results and
Automation
 • Present findings to stakeholders:
 - Demonstrate how results can change business
processes.
 • Automate processes for repetitive tasks:
 - Saves time and ensures consistency.
 • Influence decision-making at strategic and
tactical levels.

Department of AI & DS, VSB College of Engineering Technical Campus, Coim


batore

You might also like