0% found this document useful (0 votes)

29 views28 pages

Data Analysis With Python

The document provides an overview of data analysis using Python, covering variable types, statistics, and data visualization techniques. It explains concepts like continuous vs discrete data, standard deviation, skewness, and various data structures such as lists, dictionaries, and sets. Additionally, it includes practical examples of using libraries like NumPy and pandas for data manipulation and visualization.

Uploaded by

marcioveloso785

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views28 pages

Data Analysis With Python

Uploaded by

marcioveloso785

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 28

DATA ANALYSIS WITH PYTHON

\n = means a new line in python

End “” =end of a sentence

PYTHON VARIABLES

VARIABLE TYPE:

 Numbers: Integers (1,9,3389), Floats(3.145)

 String: String “my name is Marcio”
 Boolean: True or False
 List: [1,3,5,7,9]
 Dictionary: {“Blizzard”: “bad weather”}
 Set: A unique list
RUN After (Shift + E)

How to get serious information about the number of the objects in parentheses
Statistics for Data Analysis and Data Science
Continuous vs Discrete

Continuous: A set of data is said to be continuous is the values belonging to the set can take on ANY
value within a finite or infinite interval. EX: The height of a horse (could be of any value within the
range of horse heights). Time to complete a task (which could be measured to fractions of seconds)
or the speed of a car on route 3 ( assuming legal speed limits).

Discrete: A set of data is said to be discrete if the values belonging to the set are distinct and
separate (Unconnected values). EX: The number of people in your class( no fractional parts of a
person). The number of TV sets in a home (no fractional parts of a TV set). The number of questions
on a math test(no incomplete questions).

What is a Distribution

A probability distribution is mathematical function that, stated in simple terms, can be thought of as
providing the probability of occurrence of different possible outcomes in an experiment.

This is more about a probability of something to happen by using a given data.

For discrete information tend to be presented in a bar chart due to the finite number of the
outcomes that can be seen in that type of chart.

The continuous distribution is a continuous line which means that there is unlimited of numbers.

Standard Deviation

According to wiki, Standard Deviation is a measure of the amount of variation of dispersion of a set
of values. A low standard deviation indicates that the vales tend to be close to the mean and the
high standard deviation indicates that the values are spread out over a wider range.
Standard deviation is better because we can add it to our mean. I the example above we add 65.92 +
4.08 which gives us the result 70.

Quick tip to remember the difference between the standard deviation and variance

SKEWNESS

The best way to see the skewness of the data is to see which direction the line is taking, where is
the outliers. On the left image, the image is skewed to the left (Negative) and the right image is
skewed to the right (Positive). Adding numbers at bottom of the graphic helps us have a better
understanding of the skewness.

The mean is the average of the numbers

For the median we divide the number equal and use the number that is left as the median. If the
numbers of numbers are even for example, we just have to divide the two in the middle then we
will get our median.
Another note, the median is not affected by an outlier value, but the mean is affected.

The mode is the most frequent value, in the graphic is the highest point.
Combining the Strings

1 first_name= “Marcio ” (SPACE AFTER THE NAME)

2 last_name= “Santos”
3 full_name=(first_name + last_name)
4 print(full_name)
5 Marcio Santos

Str() Function

1 Myname= “Hello my name is Marcio and my age is “

2 age=29
3 total=(Myname + str(age))
4 print(total)
5 Hello my name is Marcio and my age is 29

Title function

1 Myname= “Marcio ”
2 LastName= “Santos”
3 print(Myname.title() + LastName.title())
4 Marcio Santos

LECTURE: LIST
What can I put in a list?
Store numbers: numbers_list= [1,3.9,101]
Store strings: string_list= [“John”, “Mike”, “Tony”]
Store mixed: mixed_list= [3,”apple”, 2, “orange”, “banana”]
Even …
A list of lists! Wow_list = [[1,3.9,101]. [“John”, “Mike”, “Tony”], [3,”apple”, 2, “orange”, “banana”]]

One example
names = ["Marcio", "Joao", "Tony"]
["Hello "+item for item in names]
Result: ['Hello Marcio', 'Hello Joao', 'Hello Tony']

Even number on the list from 1 to 100

numbers = list(range(1,101))
[item for item in numbers if item%2==0]
Result: [2,
4,
6,
8,
10,
12,
14,
16,
18,
20,
22,
24,
26,
28,
30,
32,
34,
36,
38,
40,
42,
44,
46,
48,
50, …

DICTIONARY IN PYTHON

When using a dictionary the values must be unique

drinks = {'type':'coffee','quantity':'a cup'}

drinks['quantity']
R: ‘a cup’
Other example

drink = {'type':'juice','quantity':'two bottles'}

print("Would you like "+drink['quantity']+" of "+drink['type']+"?")
R: Would you like two bottles of juice?

IT IS NECESSARY TO BE AWARE OF MISSPELLING, PYTHON IS A CASE SENTITIVE LANGUAGE.

SET
Set is essentially particular type of list where only unique items are stored

Indentation
Is jus t a way to separate a group of codes must be align to the left.
When we use if, else, for and “:” that’s when we should use an indentation

LOOPS, IF-ELSE CONDITIONS AND FUCNTIONS

Functions
IMPORT NUMPY

CREATE AN ARRAY

PRINT AN ARRAY

One way of printing

Another way

2 Dimensionals
Third dimensional

Formula for printing large numbers

Basic operations of An Array

INDEXING, SLICING AND INTERATING

GENERATE STATISCS
SERIES CREATION

SERIES OPERATION
INDEXING
Importing data from Excel
In order to import data from Excel is necessary to upload the excel file first to pandas.
Formula example: data_excel=pd.read_excel('Business Analitics.xlsx')
data_excel
Array Transformation
Review DataFrame ix()

11/10/2020

Backup data
12/10/2020
Visualization
%matplolib inline = is the library that pandas uses for plots.

It is necessary to have the files already saved on a computer so that we can import into Jupyter.
13/10/2020

To make the plot much prettier and organised.

Matplotlib.style.use(‘ggpolt’)

To add a histogram
Df.plot.hist()
The problem with this plot is the fact that do not show the bars of that are hidden.

To show them separated we must use this formula

df.male.plot.hist()
To show them together but with transparent effect
df.plot.hist(alpha=0.5) # it has to be values 0<= alpha <=1

df.plot.hist(stacked=True)

df.plot.box()
Scatter plot is used to show the distributions of two variables.
df.plot.scatter(x=’male’,y=’female’)

Another way of doing to change the size of the points on the scatter plot

df[score]=df.male*0.3+df.female*0.7
df.head()

Then
df.plot.scatter(x=’male’,y=’female’,s=df.score)
Drawbacks of the scatter plot

1-diffucult to show to level of concentration

2-hard to read on large samples

Creating heatmaps

df.plot.scatter.hexbin(x=’male’,y=’female’,gridsize=25) # In order to change the size we just need to

change the number on the gridsize

The deeper the colour the more of the are is concentrated.

To create a pie plot

df = pd.DataFrame(np.random.randint(100,size=(10,4)), columns=[‘a’,’b’,’c’,’d’])df.head()
df.ix[0].plot.pie()
df.ix[0].plot.pie(figsize=(5,5)) #to make the circular pie plot more circular/e.g:
df.ix[0].plot.pie(figsize=(5,5))

Area plot
If the condition is changing over time, this the plot to use. It is ease to make, to read and provide
loads of information
df.plot.area()
To pick just one area
df.c.plot.area()

15/10/2020

Regression analysis

df.describe() #the describe function that comes with panda

Statistical MCQ
100% (2)
Statistical MCQ
7 pages
Chapter 3 - Displaying and Summarizing Quantitative Data
No ratings yet
Chapter 3 - Displaying and Summarizing Quantitative Data
77 pages
5 - Data Summaries and Visualization
No ratings yet
5 - Data Summaries and Visualization
97 pages
5 - Data Summaries and Visualization
No ratings yet
5 - Data Summaries and Visualization
87 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
22 pages
6.lab Activity
No ratings yet
6.lab Activity
23 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
IDS2
No ratings yet
IDS2
14 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Machine Learning: Dr. Muhammad Asadullah
No ratings yet
Machine Learning: Dr. Muhammad Asadullah
69 pages
Intro To Statistics For Engineers Using Python
No ratings yet
Intro To Statistics For Engineers Using Python
147 pages
Rahul ML File' (1) 2
No ratings yet
Rahul ML File' (1) 2
30 pages
Data Mining Lab Maual Through Python 031023
No ratings yet
Data Mining Lab Maual Through Python 031023
22 pages
Unit 2 1
No ratings yet
Unit 2 1
54 pages
PRW Questions
No ratings yet
PRW Questions
31 pages
3-Data Description
No ratings yet
3-Data Description
91 pages
The Idiomatic Programmer - Statistics Primer
No ratings yet
The Idiomatic Programmer - Statistics Primer
44 pages
CS1010S Lecture 11 - Visualising Data
No ratings yet
CS1010S Lecture 11 - Visualising Data
68 pages
Lab Plan 5: Statistics and Probability: Describing A Single Set of Data
No ratings yet
Lab Plan 5: Statistics and Probability: Describing A Single Set of Data
19 pages
Viva Dsa
No ratings yet
Viva Dsa
11 pages
Grade 10 AI Practicals DATA SCIENCE-Solution
No ratings yet
Grade 10 AI Practicals DATA SCIENCE-Solution
6 pages
Nummerical Summaries
No ratings yet
Nummerical Summaries
11 pages
Week2 Modified
No ratings yet
Week2 Modified
43 pages
Statistics S1 Theory
No ratings yet
Statistics S1 Theory
8 pages
Unit 3
No ratings yet
Unit 3
45 pages
Staticus: Math 103 Lecture 9 Class Notes
No ratings yet
Staticus: Math 103 Lecture 9 Class Notes
4 pages
MATH 361 (Autosaved)
No ratings yet
MATH 361 (Autosaved)
17 pages
27 Statistics
No ratings yet
27 Statistics
64 pages
Machine Learning
No ratings yet
Machine Learning
80 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
PYDS 3150713 Unit-4
No ratings yet
PYDS 3150713 Unit-4
59 pages
Unit 4 (2) Python
No ratings yet
Unit 4 (2) Python
27 pages
Intro To Statistics (CH1&2)
No ratings yet
Intro To Statistics (CH1&2)
38 pages
Build ETL Using Python
No ratings yet
Build ETL Using Python
7 pages
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
No ratings yet
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
20 pages
3 Data Description
No ratings yet
3 Data Description
87 pages
Chapter1.3 - Data Visualization
No ratings yet
Chapter1.3 - Data Visualization
27 pages
Assignmeant-1 Sharan S
No ratings yet
Assignmeant-1 Sharan S
20 pages
AS Level Mathematics Statistics (New)
No ratings yet
AS Level Mathematics Statistics (New)
49 pages
Unit 3
No ratings yet
Unit 3
20 pages
Week2 Lab
No ratings yet
Week2 Lab
8 pages
Math236 Lecture 2
No ratings yet
Math236 Lecture 2
64 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
Statistics For Css
No ratings yet
Statistics For Css
73 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
Machine Learning: Data Set
100% (1)
Machine Learning: Data Set
52 pages
Lecture 4
No ratings yet
Lecture 4
60 pages
UNIT 3 Data Science LM 2023
No ratings yet
UNIT 3 Data Science LM 2023
20 pages
Python For Exploratory Data Analysis
No ratings yet
Python For Exploratory Data Analysis
12 pages
Unit 5 Descriptive Statistics
No ratings yet
Unit 5 Descriptive Statistics
7 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
Types of Statistics
No ratings yet
Types of Statistics
7 pages
ML Lab Manual
No ratings yet
ML Lab Manual
37 pages
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
No ratings yet
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
13 pages
AL Notes
No ratings yet
AL Notes
61 pages
FIT1043 - Lecture 3 - 2024
No ratings yet
FIT1043 - Lecture 3 - 2024
69 pages
Machine Learning: Where To Start?
No ratings yet
Machine Learning: Where To Start?
71 pages
SCSA1606 - Predictive and Advanced Analytics - Unit II
No ratings yet
SCSA1606 - Predictive and Advanced Analytics - Unit II
50 pages
4 - IB Math Applications & Interpretations SL Notes - Unit 4 Statistics
No ratings yet
4 - IB Math Applications & Interpretations SL Notes - Unit 4 Statistics
17 pages
DAUP Exam Notes - 2in1
No ratings yet
DAUP Exam Notes - 2in1
35 pages
The Practically Cheating Statistics Handbook TI-83 Companion Guide
From Everand
The Practically Cheating Statistics Handbook TI-83 Companion Guide
S. Deviant
3.5/5 (3)
Basic Math Notes
From Everand
Basic Math Notes
Ernest Bywater
5/5 (2)
Stat Methods
No ratings yet
Stat Methods
243 pages
Statics All in One
100% (2)
Statics All in One
120 pages
MTE 3113 - Stat - 2
No ratings yet
MTE 3113 - Stat - 2
51 pages
Scales of Measurement
100% (1)
Scales of Measurement
5 pages
Iim Trichy Wat-Pi Kit 2021
100% (1)
Iim Trichy Wat-Pi Kit 2021
49 pages
Lean Six Sigma - Define and Measure
No ratings yet
Lean Six Sigma - Define and Measure
40 pages
CHAPTER8 QS026 semII 2009 10
No ratings yet
CHAPTER8 QS026 semII 2009 10
13 pages
Biostatistics For Clinical and Public Health Research, 1st Edition Instant Reading Access
100% (9)
Biostatistics For Clinical and Public Health Research, 1st Edition Instant Reading Access
17 pages
Rubber Contraceptives (Male Condoms) : Standard Specification For
No ratings yet
Rubber Contraceptives (Male Condoms) : Standard Specification For
14 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Math Finals Reviewer
No ratings yet
Math Finals Reviewer
5 pages
PHD Thesis (Eaves)
No ratings yet
PHD Thesis (Eaves)
372 pages
1) Population - Sample 2) Simple Series - Classified Series - Grouped Data
No ratings yet
1) Population - Sample 2) Simple Series - Classified Series - Grouped Data
18 pages
SPC Course Material
No ratings yet
SPC Course Material
116 pages
Seminar An.
No ratings yet
Seminar An.
8 pages
Math 4th Periodical g10
No ratings yet
Math 4th Periodical g10
1 page
Matm Midterm Reviewer
No ratings yet
Matm Midterm Reviewer
8 pages
ENS 185 Module 1
No ratings yet
ENS 185 Module 1
64 pages
Mean, Median, Mode, and Range - Purplemath
No ratings yet
Mean, Median, Mode, and Range - Purplemath
3 pages
CH 8
No ratings yet
CH 8
20 pages
Fundamentals of Business Statistics
No ratings yet
Fundamentals of Business Statistics
2 pages
Assessment and Evaluation of Learning.2: A. B. C. D
No ratings yet
Assessment and Evaluation of Learning.2: A. B. C. D
6 pages
STA301-Mid Term Solved Subjective With References
No ratings yet
STA301-Mid Term Solved Subjective With References
23 pages
Appendix 3.4.5 - Project Cost Estimate and Contingency
No ratings yet
Appendix 3.4.5 - Project Cost Estimate and Contingency
6 pages
Freq Procedure: For Creating Frequency Tables & Contingency Tables
No ratings yet
Freq Procedure: For Creating Frequency Tables & Contingency Tables
23 pages
ANSWERS KEY (Removal) IN MIDTERM EXAM IN STATISTICS 2018-2019
No ratings yet
ANSWERS KEY (Removal) IN MIDTERM EXAM IN STATISTICS 2018-2019
2 pages
CE Comprehensive Review Probability & Statistics
No ratings yet
CE Comprehensive Review Probability & Statistics
1 page
Educational Statistics Notes
No ratings yet
Educational Statistics Notes
32 pages

Data Analysis With Python

Uploaded by

Data Analysis With Python

Uploaded by

DATA ANALYSIS WITH PYTHON

\n = means a new line in python

End “” =end of a sentence

 Numbers: Integers (1,9,3389), Floats(3.145)

This is more about a probability of something to happen by using a given data.

The mean is the average of the numbers

1 first_name= “Marcio ” (SPACE AFTER THE NAME)

1 Myname= “Hello my name is Marcio and my age is “

Even number on the list from 1 to 100

When using a dictionary the values must be unique

drinks = {'type':'coffee','quantity':'a cup'}

drink = {'type':'juice','quantity':'two bottles'}

IT IS NECESSARY TO BE AWARE OF MISSPELLING, PYTHON IS A CASE SENTITIVE LANGUAGE.

LOOPS, IF-ELSE CONDITIONS AND FUCNTIONS

One way of printing

Formula for printing large numbers

Basic operations of An Array

To make the plot much prettier and organised.

To show them separated we must use this formula

1-diffucult to show to level of concentration

df.plot.scatter.hexbin(x=’male’,y=’female’,gridsize=25) # In order to change the size we just need to

The deeper the colour the more of the are is concentrated.

To create a pie plot

df.describe() #the describe function that comes with panda

You might also like