[go: up one dir, main page]

0% found this document useful (0 votes)
13 views4 pages

Section3 ML

The document outlines the data science process, focusing on the end-to-end machine learning project, including essential Python programming and data analysis libraries. It explains the types of datasets, including categorical and numerical data, and provides examples of various dataset types like image, text, and time series datasets. Additionally, it offers practical resources for data collection and exploratory data analysis (EDA).

Uploaded by

tabedo6789
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views4 pages

Section3 ML

The document outlines the data science process, focusing on the end-to-end machine learning project, including essential Python programming and data analysis libraries. It explains the types of datasets, including categorical and numerical data, and provides examples of various dataset types like image, text, and time series datasets. Additionally, it offers practical resources for data collection and exploratory data analysis (EDA).

Uploaded by

tabedo6789
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Data Science Process

End-to-End Machine Learning Project

Important Basics

Python Programming Basics

•Python Data Analysis Libraries Basics

•NumPy, Pandas, Matplotlib, & Seaborn

Tools

Installing Anaconda and Python (Watch Video)


Data Collection

What is a dataset?

A dataset is a collection of data in which data is arranged in some order.

• A tabular dataset can be understood as a


database table or matrix, where each
column corresponds to a particular
variable
• The most supported file type for a tabular
dataset is "Comma Separated
File," or CSV

An attribute is a property or characteristic of


a data object.

Attribute values are numbers or symbols assigned to an attribute.

object → raw = record = entity = instance

attribute → field = features = characteristic

Types of data in statistics

Categorical Data ( Qualitative ):


Represents categories or groups with qualitative distinctions, such as gender(Male / female)
Yes/No, True/False, Blue/green, etc.

Numerical Data (Quantitative):


Represents measurable quantities expressed in numerical form, such as height or weight ,
house price, temperature, etc.
Types of Categorical Data: Nominal and Ordinal
1. Nominal Data
Categories without any inherent order or ranking .
Examples :
• Gender (Male, Female, Other)
• Eye color (Brown, Blue, Green)
• Marital status (Single, Married, Divorced, Widowed)
• Types of vehicles (Car, Truck, Motorcycle)
• Blood type (A, B, AB, O)

2. Ordinal Data
Categories with a clear order or ranking, where the intervals between categories may not be equal.
Examples :
• Educational level (High School Diploma, bachelor’s degree, master’s degree, PhD)
• Rating scales (1 star, 2 stars, 3 stars, 4 stars, 5 stars)
• Severity of illness (Mild, Moderate, Severe)
• Frequency of Travel (Rarely, Occasionally, Frequently, Regularly)

Types of Numerical Data: Discrete and Continuous


1. Discrete Data
Consists of distinct and separate values that are countable and finite, often representing whole
numbers. These values cannot be broken down into smaller units and typically arise from counting.
Such as Number of students in a class (5, 10, 15, ...)
2. Continuous Data
Consists of measurements that can take on any value within a given range. These values are infinite
and uncountable, often resulting from measurement. Continuous data can be broken down into
smaller and smaller units, and they can take fractional and decimal values. Such as
• [ Temperature (measured in °C, °F,) … 37.5
• Time taken to complete a task (measured in sec, min, or hr) 2,5 h

Types of datasets
Image Datasets:
Image datasets contain an assortment of images and are normally utilized in computer vision tasks
such as image classification, object detection, and image segmentation.
Examples :
o ImageNet - MNIST

Text Datasets:
Text datasets comprise textual information, like articles, books reviews, or posts. These datasets are
utilized in NLP techniques like sentiment analysis, text classification, and machine translation.
Examples :
o IMDb film reviews dataset
Time Series Datasets:
Time series datasets include information focuses gathered after some time. They are generally utilized
in determining abnormality location, and pattern examination.
Examples :
o Climate information

Tabular Datasets:
Tabular datasets are organized information coordinated in tables or calculation sheets.

Practical

• How to get data sets


[ https://www.kaggle.com/datasets
https://archive.ics.uci.edu/ml/index.php ]
• Google Colab configuration.
• Example on Titanic dataset
• EDA

You might also like