0% found this document useful (0 votes)

14 views5 pages

Data Science in Society Cat

Exploratory Data Analysis (EDA) is an open-ended approach used to discover patterns and insights in data without a predetermined hypothesis, while Confirmatory Data Analysis (CDA) is hypothesis-driven, focusing on validating specific theories through statistical tests. Key Python packages for data science include Pandas for data manipulation, NumPy for numerical computing, Matplotlib for data visualization, and TensorFlow for machine learning. Data science can significantly improve health services in Kenya by analyzing health data for predictive insights, optimizing resources, and enhancing patient care.

Uploaded by

genesiskalya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views5 pages

Data Science in Society Cat

Uploaded by

genesiskalya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Differentiate between Exploratory Data Analysis (EDA) and Confirmatory Data Analysis

(CDA).

Exploratory Data Analysis (EDA) is the initial step of analyzing data to uncover patterns,
anomalies, or insights, without a predetermined hypothesis. It helps to summarize the
main characteristics of data, often using visual methods, and guides further data analysis.
EDA is open-ended, enabling analysts to make observations about data distribution,
relationships, and outliers.

Confirmatory Data Analysis (CDA), on the other hand, is hypothesis-driven and used to
confirm or refute specific hypotheses using statistical tests and models. CDA is focused on
assessing if observed data patterns align with pre-defined theories or assumptions and
involves statistical testing to validate findings.

b) Explain the following Python packages used in data science in society.

i. Pandas: A data manipulation and analysis library that provides data structures like
DataFrames and Series for handling and analyzing structured data efficiently.

ii. NumPy: A library for numerical computing that supports multi-dimensional arrays and
mathematical functions to operate on them, widely used for scientific computing and data
analysis.

iii. Matplotlib: A plotting library that allows for the creation of static, interactive, and
animated visualizations in Python. It’s useful for data visualization to interpret results in
data science.

iv. TensorFlow: An open-source machine learning framework developed by Google,

primarily used for deep learning tasks. It helps build, train, and deploy machine learning
models, especially neural networks.
c) Assuming height and weight are already defined as lists in Python, write a code that
imports numpy as np and stores both the height and weight of your classmates as numpy
arrays.

Import numpy as np

# Assuming height and weight are predefined lists

Height_array = np.array(height)

Weight_array = np.array(weight)

d) Using Pandas:

i. Create a simple Pandas DataFrame and print its values.

Import pandas as pd

# Creating a simple DataFrame

Data = {‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’], ‘Age’: [24, 27, 22]}

Df = pd.DataFrame(data)

Print(df)

ii. Create your own DataFrame from a dictionary of arrays/lists.

# Creating DataFrame from dictionary of arrays/lists

Data_dict = {‘Student’: [‘John’, ‘Mary’, ‘Emma’], ‘Score’: [88, 92, 95]}

Df_from_dict = pd.DataFrame(data_dict)

Print(df_from_dict)

iii. Perform appending, slicing, addition, and deletion of rows with a Pandas DataFrame.

# Adding a new row

Df.loc[3] = [‘David’, 25]

# Appending a new DataFrame

New_data = pd.DataFrame({‘Name’: [‘Eve’], ‘Age’: [30]})

Df = df.append(new_data, ignore_index=True)

# Slicing rows

Print(df.iloc[1:3])

# Deleting a row

Df = df.drop(3)

e) Using Pandas:

i. Create a DataFrame with a list of dictionaries, row indices, and column indices.

Data = [{‘Name’: ‘Alice’, ‘Age’: 24}, {‘Name’: ‘Bob’, ‘Age’: 27}]

Df = pd.DataFrame(data, index=[‘row1’, ‘row2’], columns=[‘Name’, ‘Age’])

Print(df)

ii. Use index label to delete or drop rows from a Pandas DataFrame.

# Dropping a row by index label

Df = df.drop(‘row1’)

Print(df)

f) Nearly 80% of data analysis is spent on cleaning and preparing data.

Explain.

Data cleaning and preparation take a significant portion of time in data analysis because
raw data is often incomplete, inconsistent, and noisy. This process involves handling
missing values, correcting inconsistencies, transforming data types, and ensuring that the
data is ready for analysis. Proper data preparation is crucial for accurate and meaningful
analysis, as well-cleaned data prevents biases and errors in subsequent analysis and
model training.

g) Distinguish between data visualization and data formatting in big data analytics.
Data visualization is the graphical representation of data to help understand patterns,
trends, and insights. It involves creating charts, graphs, and dashboards for clearer data
interpretation.

Data formatting, however, refers to structuring or transforming data into a specific format
or layout to make it suitable for analysis. This may involve data conversion, reorganization,
or cleanup to ensure compatibility with analytical tools or visualization frameworks.

h) Discuss how data science can be used to improve health services in Kenya.

Data science can enhance healthcare in Kenya by analyzing health data to predict
outbreaks, optimize resources, and improve patient care. Predictive analytics can help in
identifying disease patterns, which aids in early intervention. Data science also enables
efficient hospital resource allocation, monitoring public health trends, and improving
patient diagnostics, leading to a better, data-informed healthcare system.

i) Discuss the data analytics project phases.

Problem Definition: Clearly outline the question or problem to address and define project
objectives.

Data Collection: Gather relevant data from multiple sources, such as databases, APIs, or
public datasets.

Data Cleaning and Preparation: Clean and preprocess data, handling missing values,
outliers, and inconsistencies.

Data Exploration and Analysis: Perform EDA to understand data characteristics and
generate insights.

Modeling: Develop and train models on the data to find patterns, predictions, or solutions
to the problem.

Evaluation and Deployment: Assess model accuracy, refine as needed, and deploy the
solution into a real-world environment for end-users or stakeholders.

DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
Unit 1
No ratings yet
Unit 1
44 pages
File 2
No ratings yet
File 2
43 pages
Foundation of Data Science Previous Year Question Paper
No ratings yet
Foundation of Data Science Previous Year Question Paper
40 pages
Part 1 Lectures
No ratings yet
Part 1 Lectures
100 pages
Endsem Imp Bi Unit 4
No ratings yet
Endsem Imp Bi Unit 4
36 pages
FINAL LECTURE 3,4.pptx - AutoRecovered (Autosaved)
No ratings yet
FINAL LECTURE 3,4.pptx - AutoRecovered (Autosaved)
80 pages
FINAL LECTURE 3,4.pptx - AutoRecovered
No ratings yet
FINAL LECTURE 3,4.pptx - AutoRecovered
73 pages
Java 4 U
No ratings yet
Java 4 U
283 pages
Research Report
No ratings yet
Research Report
12 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
Data - Science - Manaul (Te)
No ratings yet
Data - Science - Manaul (Te)
78 pages
Data Communication &fieldbus Systems
No ratings yet
Data Communication &fieldbus Systems
493 pages
2025 S1 IE1044 Assignment
No ratings yet
2025 S1 IE1044 Assignment
6 pages
Data Science Practicals
No ratings yet
Data Science Practicals
40 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
CSC User Guide
No ratings yet
CSC User Guide
112 pages
Lecture 2 File Types Suitable For Storing Big Data
No ratings yet
Lecture 2 File Types Suitable For Storing Big Data
12 pages
OneScreen Comparison Sheet T7 Vs BenQ
No ratings yet
OneScreen Comparison Sheet T7 Vs BenQ
8 pages
Swag
No ratings yet
Swag
33 pages
Introduction To Automata and Languages: 19CSE214 - Theory of Computation
No ratings yet
Introduction To Automata and Languages: 19CSE214 - Theory of Computation
422 pages
Machine
No ratings yet
Machine
10 pages
Hgs Phase II
No ratings yet
Hgs Phase II
27 pages
Index: SR. NO. Practical Name Date of Perform NO. Sign
No ratings yet
Index: SR. NO. Practical Name Date of Perform NO. Sign
28 pages
Data Science Workshop - Day 1
No ratings yet
Data Science Workshop - Day 1
80 pages
Training Report On Data Analysis With Python
No ratings yet
Training Report On Data Analysis With Python
12 pages
Unit I and Unit II Dev
No ratings yet
Unit I and Unit II Dev
36 pages
Telegram Bot: Diploma in Computer Technology
No ratings yet
Telegram Bot: Diploma in Computer Technology
13 pages
Hamza Phase 4
No ratings yet
Hamza Phase 4
7 pages
3736 Lecture 1introduction Unit11
No ratings yet
3736 Lecture 1introduction Unit11
46 pages
6068 CS Cambria FTC
No ratings yet
6068 CS Cambria FTC
4 pages
Problem Set 4
No ratings yet
Problem Set 4
2 pages
Media in Java
No ratings yet
Media in Java
37 pages
Datascience
No ratings yet
Datascience
26 pages
Python
No ratings yet
Python
6 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
DSP Unit - Ii
No ratings yet
DSP Unit - Ii
14 pages
Key Performance Indicators Add-On For IBM Maximo ONTRACKS
No ratings yet
Key Performance Indicators Add-On For IBM Maximo ONTRACKS
20 pages
Butwal Multiple Campus Tribhuvan University Institute of Science and Technology
No ratings yet
Butwal Multiple Campus Tribhuvan University Institute of Science and Technology
40 pages
Ip Practical File
No ratings yet
Ip Practical File
20 pages
Lesson 10 - Screen Design Principles
No ratings yet
Lesson 10 - Screen Design Principles
4 pages
Mini Project Report On
No ratings yet
Mini Project Report On
17 pages
REVIEWER
No ratings yet
REVIEWER
9 pages
Prome Installation
No ratings yet
Prome Installation
4 pages
OJT-Field Report - Research Project Format 2025
No ratings yet
OJT-Field Report - Research Project Format 2025
9 pages
Computational Thinking Theory Answers
No ratings yet
Computational Thinking Theory Answers
2 pages
Discuss 1
No ratings yet
Discuss 1
5 pages
C01 LumenSoft Candela RMS Installation Guide
No ratings yet
C01 LumenSoft Candela RMS Installation Guide
16 pages
SET 1 Part A Marks, (
No ratings yet
SET 1 Part A Marks, (
10 pages
Overview of Data Cleaning
No ratings yet
Overview of Data Cleaning
17 pages
Dav End Sem
No ratings yet
Dav End Sem
2 pages
FDS PYQ Solution
No ratings yet
FDS PYQ Solution
8 pages
Bana Reviewer
No ratings yet
Bana Reviewer
4 pages
Freedom Won ETower PBMS Tools April 2024
No ratings yet
Freedom Won ETower PBMS Tools April 2024
9 pages
Data Exploration and Analysis With Python
No ratings yet
Data Exploration and Analysis With Python
9 pages
Civica GIS
No ratings yet
Civica GIS
5 pages
Cognizant Data Analyst Interview Questions 1745235888
No ratings yet
Cognizant Data Analyst Interview Questions 1745235888
18 pages
Softwareasaservice ERP Versus Onpremise ERP Through The Lens of Total Cost of Ownership 125936 PDF
No ratings yet
Softwareasaservice ERP Versus Onpremise ERP Through The Lens of Total Cost of Ownership 125936 PDF
12 pages
Data Science
No ratings yet
Data Science
10 pages
OCS353 - Review Questions
No ratings yet
OCS353 - Review Questions
3 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
G9 P1 Online Reservation and Billing System For Villa Sotoya Resort
100% (1)
G9 P1 Online Reservation and Billing System For Villa Sotoya Resort
246 pages
Fds Csheet and Read The Rule
No ratings yet
Fds Csheet and Read The Rule
4 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
Comparing SDO DAO
No ratings yet
Comparing SDO DAO
2 pages
Foundations of Data Science
No ratings yet
Foundations of Data Science
3 pages
Data Science I: Charles C.N. Wang
No ratings yet
Data Science I: Charles C.N. Wang
68 pages
ACR1255U-J1 Secure Bluetooth NFC Reader: Technical Specifications V1.08
No ratings yet
ACR1255U-J1 Secure Bluetooth NFC Reader: Technical Specifications V1.08
7 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
25 pages
Lecture 3 Data Engineering Concepts, Processes, and Tools
No ratings yet
Lecture 3 Data Engineering Concepts, Processes, and Tools
2 pages
Resume No Phone
No ratings yet
Resume No Phone
1 page
XII - IP - Practical - List 2023-24
No ratings yet
XII - IP - Practical - List 2023-24
4 pages
Home Assignment Dataliteracy
No ratings yet
Home Assignment Dataliteracy
4 pages
XII IP Practical List 2023-24
No ratings yet
XII IP Practical List 2023-24
4 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Data Science Workflow
No ratings yet
Data Science Workflow
7 pages
Nd002 Syllabus 2018 June v9
No ratings yet
Nd002 Syllabus 2018 June v9
5 pages
20ad41e2 - Data Science
No ratings yet
20ad41e2 - Data Science
2 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Chapter 3 E-Commerce Architecture
No ratings yet
Chapter 3 E-Commerce Architecture
28 pages
DLP Tve-Tle Css q1 Act 12
No ratings yet
DLP Tve-Tle Css q1 Act 12
2 pages
Udacity Enterprise Syllabus Data Analyst nd002
No ratings yet
Udacity Enterprise Syllabus Data Analyst nd002
16 pages
The Casa Cookbook
No ratings yet
The Casa Cookbook
20 pages
DS-DS Lab-1
No ratings yet
DS-DS Lab-1
4 pages
User Manual: HGM6200K Series Automatic Control Module
100% (1)
User Manual: HGM6200K Series Automatic Control Module
31 pages
Datascience
No ratings yet
Datascience
8 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
HP Integrity Superdome 2
No ratings yet
HP Integrity Superdome 2
29 pages
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
From Everand
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
Steve Brown
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

Data Science in Society Cat

Uploaded by

Data Science in Society Cat

Uploaded by

Differentiate between Exploratory Data Analysis (EDA) and Confirmatory Data Analysis

b) Explain the following Python packages used in data science in society.

iv. TensorFlow: An open-source machine learning framework developed by Google,

# Assuming height and weight are predefined lists

i. Create a simple Pandas DataFrame and print its values.

# Creating a simple DataFrame

Data = {‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’], ‘Age’: [24, 27, 22]}

ii. Create your own DataFrame from a dictionary of arrays/lists.

# Creating DataFrame from dictionary of arrays/lists

Data_dict = {‘Student’: [‘John’, ‘Mary’, ‘Emma’], ‘Score’: [88, 92, 95]}

# Adding a new row

Df.loc[3] = [‘David’, 25]

# Appending a new DataFrame

New_data = pd.DataFrame({‘Name’: [‘Eve’], ‘Age’: [30]})

Data = [{‘Name’: ‘Alice’, ‘Age’: 24}, {‘Name’: ‘Bob’, ‘Age’: 27}]

Df = pd.DataFrame(data, index=[‘row1’, ‘row2’], columns=[‘Name’, ‘Age’])

# Dropping a row by index label

f) Nearly 80% of data analysis is spent on cleaning and preparing data.

i) Discuss the data analytics project phases.

You might also like