0% found this document useful (0 votes)

1K views10 pages

Data Science Notes

Uploaded by

shailreena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views10 pages

Data Science Notes

Uploaded by

shailreena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

DATA SCIENCE

INTRODUCTION

Data Science is a multi-disciplinary field used for extracting insights from massive amounts of
data using mathematical & scientific models and algorithms. It is one of the domains of Artificial
Intelligence. The analysis of the data helps in making the machine intelligent. A meaningful and
informative dataset for decision making is generated by applying extensive data processing on
the unstructured data.

Data Science can be understood easily through the following diagram:-

For example – Data Science is used by various companies like Netflix, Google and Amazon for
developing robust recommendation systems for their visitors. Similarly, to predict stock prices,
different financial companies are using multiple predictive analytics and forecasting methods.

Why Data Science?

Significant advantages of using Data ScienceTechnology:

• The conversion of data to information with the help of the right tools, technologies, and
algorithms. The processed information is suitable for business decision-making.
• Data Science can help to detect fraud using advanced machine learning algorithms.
• It helps to prevent any significant monetary losses.
• Allows in building intelligence ability in machines.
• It enables us to make better and faster decisions.
• Helps in suitable product recommendations to the customer, which ultimately enhances
business.

APPLICATIONS OF DATA SCIENCE

All the industries worldwide are relying on data and information. Now, Data science is fuel for
industries.

Ø Image recognition and speech recognition:

Data science is extensively used in the field of Image and speech recognition. Facebook
gives suggestions to tag friends when an image is uploaded on Facebook. Image
Recognition Algorithms are used to provide automatic recommendations to tag friends.
Also, devices like Alexa, Siri etc., uses speech recognition algorithm to respond as per
voice control.

Ø Internet search:
Various Search engines of the Internet like Google, Yahoo, Bing, Ask, etc., use data
science technology to enhance the search experience for the users. The search result is
listed within a fraction of seconds.

Ø E-Commerce
To provide an enhanced personalized experience and accurate recommendations to the
users, e-commerce companies, such as Amazon, Flipkart, Netflix, Google Play, etc., are
using data science technology. For example, when searching for a product on any e-
commerce site, auto suggestions for similar products pop up due to data science
technology.
Benefits of data science in the field of E-commerce:-
• For identifying a potential customer base.
• Predictive analytics is used for making a forecast about the products and services.
• Companies are optimizing their pricing structures for their consumers.

Ø Banking
One of the prime applications of data science is Banking Industry. With the help of data
science, it has become easier for banks to manage their resources efficiently.
Data science has benefited the Banking industry in the following areas:-
§ Fraud detection
§ Management of customer data
§ Customer segmentation
§ Risk detection

Ø Healthcare
Data Science has a crucial role in the Healthcare Industry. Doctors can detect cancer and
tumours at an early stage using Image Recognition software.
Genetic Industries use Data Science for analyzing and classifying patterns of genomic
sequences. Various virtual assistants are also helping patients to resolve their physical
and mental ailments. In the field of disease research, the techniques of data science
provide greater insight into genetic issues.

Ø Travel, Transportation & Logistics

The transportation industry safeguards the efficient and secure transportation of people
and goods from one place to another. The data science platform helps make travel,
transportation and logistics safer and more reliable.
Benefits of Data Science in the transportation and logistics industry:-
• Road, Railway and Air Traffic Management
• Ship Monitoring and Route Optimization
• Analyze traffic to optimize network flow and improve travel experiences
• Better predict delays to improve scheduling of support resources
1.3 REVISITING AI PROJECT CYCLE

The field of computer science dealing with building a smarter machine capable of performing
tasks and decision-making the same as a human is known as Artificial Intelligence.

AI PROJECT CYCLE

The Scenario

Transportation is considered an indispensable part of human life and backbone of any Country’s
economy. It plays a crucial role in enhancing up the lifestyles of ordinary men by providing
facilities and approachability as required to them. The rural regions are constantly struggling with
services and facilities aspects due to their remote and dispersed locations. Effective and efficient
transportation can mitigate regional problems of rural areas by providing access to employment,
health, education and services.

The common man's problem in villages for commuting for work, study, etc., is not getting suitable
and convenient public transportation like bus, auto, or taxies. The proposed study objective is to
provide accessibility and proper transportation services to these rural regions.

Problem Scoping

a) Who - It involves all those affected directly and indirectly by the problems and
are Stake Holders.

b) What – Under this block, pieces of evidence are gathered to prove that problem
exists. It helps us in understanding and recognizing the nature of the problem.

c) Where – It involves the situation and location of the problem.

d) Why – Under this block, the decision about whether the given problem is worth
solving is made.

In our Scenario, the Problem Template:-

Stake Holders Students, Employees, Business Owners etc., Who ?

commuting for Education and Employment
purpose etc.
Problem Non-availability of suitable and convenient What ?
public transportation (Government/ Private) like
bus, auto or taxi.
Location During Morning hours while going to school Where ?
/college and offices. Also, afternoon while
returning from college/school and offices.
Emergency transportation for healthcare
services.
An Ideal Based on the accurate data supplied to the Why ?
Solution service providers and AI system for
transportation, a proper timetable and services
can be devised for public in rural areas.

Data Acquisition

After the goal of our project is finalized, the next step is looking at various data features which
affect the problem. As data is the fuel for AI-based projects, the goal is to collect the kind of
data. In the above project, the factors affecting are:-
a. Number of students and people using the transportation system
b. Time of usage of the transportation system
c. Cost incurred during transportation
d. Days with heavy traffic load
e. Days with very less traffic
On studying the pattern of the data for a week, the database has to be created. The various
techniques for collecting the data will be face-to-face interviews, telephonic interviews, pilot
surveys, and household surveys.
To collect the travel data, the GPS and GIS technologies are used. GPS data help us provide real-
time spatial information, and it shows the travel behavior, including distance, travel speed, trip
time.The data collected can be used for the development of transportation policy.

Data Exploration

The next step after the creation of the database is to analyze the data and perform the interpretation.
The required information extracted from the curated dataset, is cleaned so that there exist no errors
or missing elements in it.

Modelling

The model selection is the next step after the dataset. The clustering algorithm is chosen for the
transportation system, and it is an unsupervised machine learning technique. Clustering has proven
its efficiency in developing intelligent transportation systems. Clustering is applied in various
categories in transportation planning as the trip generation, traffic zone division, and trip
distribution.

Evaluation

The accuracy of the algorithm is checked whether it is working correctly or not, after training the
model on dataset.

1. The trained algorithm feeds data regarding the number of trips, number of persons
travelling and travelling time.
2. It is then fed data regards the capacity of the public transportation utilized.
3. The algorithm then works upon the entries according to the training it got at the
modelling stage.
4. The Model predicts the number of trips to be run for practical usage of public
transportation.
5. The prediction is compared to the testing dataset value.
6. The model is tested for ten testing datasets.
7. Prediction values of the testing dataset is compared to the actual values.
8. If the prediction value is the same or almost similar to the actual values, the model is
accurate. Otherwise, either the model selection is changed, or the model is trained on
more data for better accuracy. Once the model is able to achieve optimum efficiency, it is
ready to be deployed for real-time usage.
DATA COLLECTION

Our society is highly dependent on data. Accurate data collection is necessary to make precise
business decisions, safeguard quality assurance, and keep research reliabile. The systematic
approach to gather obseravtions and create databsase for data analysis is Data Collection. The
process of data collection is non technical process and do not require experts.

SOURCES OF DATA

Data collection is the fundamental basis of data analysis in Data Science. Raw facts and figures
are known as data. The sources of data are mainly of two types:

i. Primary data source: The data collected from its origin, i.e. from the reports and
records published within the organization, is the primary data source. Primary data is an
original and hence more reliable source of data.
ii. Secondary data source: The data collected from an outside agency or the organization is
the secondary data source. Secondary data is not original, and it is analyzed and has
undergone some statistical operations.

DATA COLLECTION TECHNIQUES

Offline Techniques Online Techniques

• Interviews • Sales Reports
• Questionnaires • Business Journals
• In-person Surveys • Government Records (e.g., census, tax
records, Social Security info)

While accessing data from any of the data sources, following points should be kept in mind:

1. Data which is available for public usage only should be taken up.
2. Personal datasets should only be used with the consent of the owner.
3. One should never breach someone’s privacy to collect data.
4. Data should only be taken form reliable sources as the data collected from random
sources can be wrong or unusable.
5. Reliable sources of data ensure the authenticity of data which helps in proper training of
the AI model.

TYPES OF DATA

There are two general types of data:-

• QUANTITATIVE DATA
Quantitative data is measurable information, and it is a numeric value.
Examples include:
a) Class attendance.
b) Marks obtained by a child in a subject
c) Price of any item like Book/Pen/Pencil etc.
d) Distance of the School from the railway station
• QUALITATIVE DATA
Qualitative data is information about qualities, which can’t be measured.
Examples include:
a) Sharing the experience of the first day at the School.
b) Recommendation of a book / Book review

FILE / DATA FORMATS IN DATA SCIENCE

Some of the popular forms of file formats are:-
a) CSV: CSV is a widely accepted data format. CSV stands for comma-separated values.
The data in CSV files are stored in text files separated by a delimiter, and the default
delimiter is a comma.
b) TSV: Tab Separated Values, and is quite similar to CSV file. TSV file is a popular
method that supports the exchange of data among databases and spreadsheets.
c) XLSX: In XLS/XLSX format, the data is stored and organized in rows and columns.
Microsoft Excel software is used to create spreadsheet files.
d) SQL: Full form of SQL is Structured Query Language. It is a language used to store,
retrieve and modify data from a database.

ERRORS IN DATA COLLECTION

During the data collection process, there can be some errors in the data as follows:-
a. Erroneous Data: Erroneous data means inaccurate data, which includes missing data, wrong
information etc., and their two ways are:-
• Incorrect values: The values in the dataset are not correct. For example, in the employee
table, in the designation post, the employee's name is mentioned. Since the expected
value was a designation, the analysis of data will be incorrect.
• Invalid or Null values: Some of the values in the datasets get corrupted and hence
become invalid. Some values become NaN in the dataset and are null values. The reason
for NaN can be due to finding out the square root of the number, division by zero etc.
These values are removed from the database.

b. Missing Data: In some datasets, the data is absent or is missing, i.e. some of the observations
in datasets are blank. The absence of data/ missing data cannot be inferred as an error. For example,
In surveys, some respondents are unreachable, and hence dataset for them is not recorded.

c. Outliers: Data that lies outside the range of other values in the set are outliers, i.e. it can be
termed as data that significantly differ from other data. Reasons for outliers are - data is
inappropriately scaled, errors during data entry, etc.
For example, finding out the average height of students of a particular grade, if there is a mistake
and some entries are wrongly entered, it would result in an incorrect average.
DATA ACCESS

Python supports various packages for accessing the tabular data and processing the data. Some of
these packages are:-

NUMPY

Python NumPy is a standard library; it stands for Numerical Python and is an extensive package
that provides the simplest and robust data structure, i.e. n-dimensional array used for data
analysis and scientific computing. An array is a group of homogenous elements stored together
under one name.

Lists Array (ndarray)

A collection of heterogeneous elements is known A group of homogeneous elements is an Array,
as a list, i.e. elements that can be of different data i.e. all elements are of the same data type. For
types. example, an array with float values may be:
for example, [1.25, 5.5, 2.75 , 12.56]
[100, 3.14, ‘hello’, ‘artificial intelligence’]
Elements of a list are not stored contiguously in All the elements of the array are stored in
memory. contiguous memory locations.
Lists take more space in the memory and are less NumPy array uses less space in memory than a
efficient because Python stores the type list because arrays do not require space to store
information of each element of the list. the data type of each element separately.

One of the core data types of Python is List An array is a part of the NumPy library

PANDAS
Python Pandas is Python’s library for analyzing datasets and finally making conclusions based on
statistical theories. Name ‘Pandas’ is derived from “Panel Data System”. As relevant data is
essential in data science, Pandas helps clean datasets to make them suitable.
Pandas support high performance and easy to use data structures like series (1 dimensional) and
data frames (2 dimensional).
Capabilities of Pandas library:
• Handling substantial datasets.
• Supports multiple data file formats for storing data.
• Supports operations on independent groups within the datasets.
• Selection/ filtration of subsets from bulky datasets and even merging numerous datasets.
• Reshaping and pivoting of datasets in many forms.
• Functionality to find and fill missing data.

MATPLOTLIB

Martplotlib is a library for 2-dimensional plotting with Python and is used for creating static,
animated, and interactive 2D- plots or figures in Python. Plots aid in understanding and deriving
decisions based on various trends and patterns in the graphs. Matplotlib supports the following
types of charts:-

Bar Chart Histogram Pie Chart Scatter Chart Area Chart

Features of MatPlotlib:

• Matplotlib is an open-source drawing library that supports various chart types.

• It supports features to make the plots more communicable and descriptive.
• Simple codes are required to generate plots like histograms, bar charts, and other types of
charts.
• They are used in web application servers, shells, and Python scripts.

STATISTICS WITH PYTHON

Python is a prevalent language when it comes to data analysis and statistics. Python has a built-
in module, ‘statistics’, that is used to perform mathematical statistics of numeric data. A few of
the functions of the module are:-

• Mean : Mean is also known as Arithmetic Average. It is the sum of all

observations in the datasets divided by the number of observations.
• Median : Median is also known as the 50th percentile of all datasets, and it
is the middle element of all the observations arranged in ascending order in a dataset.
• Mode : Mode is defined as the most common observation occurring in a
dataset.
• Standard Deviation : Standard Deviation is the square root of the variance, and it is
used to identify the outliers in the data.
• Variance : Average of squares of difference of mean and each observation of
the datasets.

DATA VISUALIZATION

The pictorial or graphical representation of the data using a graph, chart, etc., is knowns as Data
Visualisation. For effectively communicating results to the user, visualization is a great tool.
For example, traffic symbols and the speedometer of a vehicle are a few examples of visualization
that we encounter in our daily lives. Visualization of data is effectively used in fields like science,
health, finance, engineering, etc.
Python supports many visualization libraries like Matplotlib, seaborn, and Folium etc. As we have
already discussed before, with the help of Matplotlib, various plots can be drawn. Pyplot is a
Matplotlib module that provides an interface to draw different types of plots. Let us discuss five
key plots for basic data visualization:
• Scatter Plot
• Line Plot
• Bar Chart
• Histogram Plot
• Box Plot

Session 02
No ratings yet
Session 02
16 pages
Ucd: Physics 9C - Electricity and Magnetism: Tom Weideman
No ratings yet
Ucd: Physics 9C - Electricity and Magnetism: Tom Weideman
173 pages
The Invention of Coinage and The Monetization of Ancient Greece
75% (4)
The Invention of Coinage and The Monetization of Ancient Greece
312 pages
Unit V Easy To Learn
No ratings yet
Unit V Easy To Learn
21 pages
MC4001 - Software Project Management
No ratings yet
MC4001 - Software Project Management
66 pages
SILVERLIGHT
100% (2)
SILVERLIGHT
20 pages
100ManusiaCatalogue_2021
No ratings yet
100ManusiaCatalogue_2021
60 pages
Final Perioperative Guideline for Print
No ratings yet
Final Perioperative Guideline for Print
154 pages
Compact Representation of Frequent Item Set
No ratings yet
Compact Representation of Frequent Item Set
59 pages
Perkin Reaction
No ratings yet
Perkin Reaction
3 pages
Nce Upper Ws
No ratings yet
Nce Upper Ws
24 pages
Unit 1-2-3 - 27-Octorber
No ratings yet
Unit 1-2-3 - 27-Octorber
38 pages
Csc 206 Exam Study Questions
No ratings yet
Csc 206 Exam Study Questions
13 pages
Contoh Lesson Plan B Inggeris Sek Menengah
No ratings yet
Contoh Lesson Plan B Inggeris Sek Menengah
6 pages
IEEE Biography Samples
No ratings yet
IEEE Biography Samples
5 pages
HTML Unit-1 - 1
No ratings yet
HTML Unit-1 - 1
50 pages
JNTUK R20 B.tech CSE 4-1 Cloud Computing Unit 1 Notes
No ratings yet
JNTUK R20 B.tech CSE 4-1 Cloud Computing Unit 1 Notes
18 pages
Bone Broth Power Lose Weight, Improve Your Health, and Reverse Aging (Bone Broth, Bone Broth Diet, Bone Broth Miracle Book 1
100% (2)
Bone Broth Power Lose Weight, Improve Your Health, and Reverse Aging (Bone Broth, Bone Broth Diet, Bone Broth Miracle Book 1
23 pages
Abzalov 2010 Yandigradecontrol
No ratings yet
Abzalov 2010 Yandigradecontrol
11 pages
How Ohio's Unemployment Insurance Benefit Amounts Are Calculated
No ratings yet
How Ohio's Unemployment Insurance Benefit Amounts Are Calculated
3 pages
Analytical Prediction of Springback PDF
No ratings yet
Analytical Prediction of Springback PDF
14 pages
Glossery of Terms Cost Control-2
No ratings yet
Glossery of Terms Cost Control-2
9 pages
Mean Stack Technologies Lab Record
No ratings yet
Mean Stack Technologies Lab Record
49 pages
Introduction To Object Oriented Database: Unit-I
No ratings yet
Introduction To Object Oriented Database: Unit-I
67 pages
ds4015-big-data-analytics-vignesh-k-notes
No ratings yet
ds4015-big-data-analytics-vignesh-k-notes
146 pages
Fire Extinguisher
No ratings yet
Fire Extinguisher
4 pages
Cabs Availability Prediction Using Deep Learning: Project Member
No ratings yet
Cabs Availability Prediction Using Deep Learning: Project Member
58 pages
Reflection On RA 9184
100% (5)
Reflection On RA 9184
4 pages
Decoder and Encoder With Application
No ratings yet
Decoder and Encoder With Application
38 pages
Highway Drawings I PDF
100% (1)
Highway Drawings I PDF
30 pages
Bus Routes – Jamaica Urban Transit Company Limited
No ratings yet
Bus Routes – Jamaica Urban Transit Company Limited
3 pages
21 Series - CSE Curriculum - V SEMESTER
No ratings yet
21 Series - CSE Curriculum - V SEMESTER
25 pages
Data Visualization Complete Notes
No ratings yet
Data Visualization Complete Notes
31 pages
DBMS-unit 5-database security
No ratings yet
DBMS-unit 5-database security
13 pages
8th Sem Project PPT-1
No ratings yet
8th Sem Project PPT-1
26 pages
Network Software
No ratings yet
Network Software
5 pages
Selective Tuning and Indexing
No ratings yet
Selective Tuning and Indexing
3 pages
21cs502 Ai Unit-I Notes Short 42 Pges
No ratings yet
21cs502 Ai Unit-I Notes Short 42 Pges
42 pages
Boe310 Lecture
No ratings yet
Boe310 Lecture
25 pages
4.5 Issues in Code Generation
No ratings yet
4.5 Issues in Code Generation
7 pages
Collate Se Unit 4 Notes
No ratings yet
Collate Se Unit 4 Notes
37 pages
Compiler Design-UNIT-5
No ratings yet
Compiler Design-UNIT-5
34 pages
Data Mining & Business Intelligence (2170715) : Unit-5 Concept Description and Association Rule Mining
No ratings yet
Data Mining & Business Intelligence (2170715) : Unit-5 Concept Description and Association Rule Mining
39 pages
Shivaji University, Kolhapur
No ratings yet
Shivaji University, Kolhapur
12 pages
Unit III - SPM
No ratings yet
Unit III - SPM
13 pages
Bridge Course Computer Science
No ratings yet
Bridge Course Computer Science
2 pages
Project Report "E-Commerce Recommendation"
No ratings yet
Project Report "E-Commerce Recommendation"
20 pages
2.1: Advanced Processor Technology: Qn:Explain Design Space of Processor?
No ratings yet
2.1: Advanced Processor Technology: Qn:Explain Design Space of Processor?
29 pages
Thermodynamics 1: Gage Vac Vac Abs
No ratings yet
Thermodynamics 1: Gage Vac Vac Abs
2 pages
Unit 1 Topic 4 ISO-OSI Model
No ratings yet
Unit 1 Topic 4 ISO-OSI Model
18 pages
Structure_of_a_Neuron_Document
No ratings yet
Structure_of_a_Neuron_Document
2 pages
Course Outline - ELT MPHIL
No ratings yet
Course Outline - ELT MPHIL
5 pages
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
Human Computer Interface - Unit 4
82% (17)
Human Computer Interface - Unit 4
47 pages
CS8591 - CN Unit 1
No ratings yet
CS8591 - CN Unit 1
44 pages
Raytheon Anschutz DISTRIBUTION UNIT COMPACT Type 138-126
100% (6)
Raytheon Anschutz DISTRIBUTION UNIT COMPACT Type 138-126
32 pages
Topic 7 Plastic Design
No ratings yet
Topic 7 Plastic Design
8 pages
Application Layer Protocols in IoT
No ratings yet
Application Layer Protocols in IoT
8 pages
Project Management Professional (PMP) ® Certification Training.
No ratings yet
Project Management Professional (PMP) ® Certification Training.
3 pages
Unit 1 - Data Mining - WWW - Rgpvnotes.in PDF
100% (1)
Unit 1 - Data Mining - WWW - Rgpvnotes.in PDF
13 pages
JNTUK R20 ML UNIT-I Final
No ratings yet
JNTUK R20 ML UNIT-I Final
22 pages
Mobile Communications Chapter 5: Satellite Systems: History Basics Localization Handover Routing Systems
No ratings yet
Mobile Communications Chapter 5: Satellite Systems: History Basics Localization Handover Routing Systems
30 pages
Read The Following Hypothetical Text and Answer The Given Questions
No ratings yet
Read The Following Hypothetical Text and Answer The Given Questions
12 pages
Mc-Unit I
No ratings yet
Mc-Unit I
16 pages
Digital Nurture 2.0 - Deep Skilling Stage - Handbook
No ratings yet
Digital Nurture 2.0 - Deep Skilling Stage - Handbook
11 pages
Third Quarter Test in Grade 9 Science
100% (1)
Third Quarter Test in Grade 9 Science
3 pages
Restful Web Services: A Seminar Report On
No ratings yet
Restful Web Services: A Seminar Report On
11 pages
Parallel Processors and Cluster Systems: Gagan Bansal IME Sahibabad
No ratings yet
Parallel Processors and Cluster Systems: Gagan Bansal IME Sahibabad
15 pages
1.to Perform Menu Driven Code For Searching Techniques
No ratings yet
1.to Perform Menu Driven Code For Searching Techniques
5 pages
East West Institute of Technology: Sadp Notes
No ratings yet
East West Institute of Technology: Sadp Notes
30 pages
OBJECT ORIENTED SYSTEM DESIGN Question Paper 21 22
No ratings yet
OBJECT ORIENTED SYSTEM DESIGN Question Paper 21 22
3 pages
Method Statement: Project No.: Project Title
No ratings yet
Method Statement: Project No.: Project Title
7 pages
CCA3002 - FOG-AND-EDGE-COMPUTING - LT - 1.0 - 34 - Fog and Edge Computing
No ratings yet
CCA3002 - FOG-AND-EDGE-COMPUTING - LT - 1.0 - 34 - Fog and Edge Computing
3 pages
IJPREMS Template January 2023
No ratings yet
IJPREMS Template January 2023
2 pages
Cryptography and Network Security - Atul Kahate - Google Books
No ratings yet
Cryptography and Network Security - Atul Kahate - Google Books
4 pages
Cloud Computing Unit-1 Notes
No ratings yet
Cloud Computing Unit-1 Notes
12 pages
25th August MCA New First Year Syllabus 2020
No ratings yet
25th August MCA New First Year Syllabus 2020
24 pages
CS 606 Skill Dev Lab - 7TO 10 - 1648109707
No ratings yet
CS 606 Skill Dev Lab - 7TO 10 - 1648109707
12 pages
Department of IT It2301-Java Programming Question Bank: Unit I Part A
No ratings yet
Department of IT It2301-Java Programming Question Bank: Unit I Part A
6 pages
A Model For Network Security
No ratings yet
A Model For Network Security
1 page
How Does A Single Bit Error Differs From Burst Error.
No ratings yet
How Does A Single Bit Error Differs From Burst Error.
4 pages
KDD Vs Data Mining
No ratings yet
KDD Vs Data Mining
2 pages
Unit 4 - Software Engineering - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Software Engineering - WWW - Rgpvnotes.in
12 pages
OS 2 Marks
100% (11)
OS 2 Marks
15 pages
Uid-Graphical System Advatages
No ratings yet
Uid-Graphical System Advatages
21 pages
BD Problem Solving - I
No ratings yet
BD Problem Solving - I
2 pages
Component Level Design
No ratings yet
Component Level Design
2 pages
Trackpad Ver. 2.0 Class 8
From Everand
Trackpad Ver. 2.0 Class 8
Nidhi Arora
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Introduction to Linux: Installation and Programming
From Everand
Introduction to Linux: Installation and Programming
N. B. Venkateswarlu
No ratings yet

Data Science Notes

Uploaded by

Data Science Notes

Uploaded by

DATA SCIENCE

Data Science can be understood easily through the following diagram:-

Why Data Science?

Significant advantages of using Data ScienceTechnology:

APPLICATIONS OF DATA SCIENCE

Ø Image recognition and speech recognition:

Ø Travel, Transportation & Logistics

c) Where – It involves the situation and location of the problem.

In our Scenario, the Problem Template:-

Stake Holders Students, Employees, Business Owners etc., Who ?

DATA COLLECTION TECHNIQUES

Offline Techniques Online Techniques

There are two general types of data:-

FILE / DATA FORMATS IN DATA SCIENCE

ERRORS IN DATA COLLECTION

Lists Array (ndarray)

Bar Chart Histogram Pie Chart Scatter Chart Area Chart

• Matplotlib is an open-source drawing library that supports various chart types.

STATISTICS WITH PYTHON

• Mean : Mean is also known as Arithmetic Average. It is the sum of all

You might also like