[go: up one dir, main page]

0% found this document useful (0 votes)
29 views28 pages

Data Analysis With Python

The document provides an overview of data analysis using Python, covering variable types, statistics, and data visualization techniques. It explains concepts like continuous vs discrete data, standard deviation, skewness, and various data structures such as lists, dictionaries, and sets. Additionally, it includes practical examples of using libraries like NumPy and pandas for data manipulation and visualization.

Uploaded by

marcioveloso785
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views28 pages

Data Analysis With Python

The document provides an overview of data analysis using Python, covering variable types, statistics, and data visualization techniques. It explains concepts like continuous vs discrete data, standard deviation, skewness, and various data structures such as lists, dictionaries, and sets. Additionally, it includes practical examples of using libraries like NumPy and pandas for data manipulation and visualization.

Uploaded by

marcioveloso785
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

DATA ANALYSIS WITH PYTHON

\n = means a new line in python

End “” =end of a sentence

PYTHON VARIABLES

VARIABLE TYPE:

 Numbers: Integers (1,9,3389), Floats(3.145)


 String: String “my name is Marcio”
 Boolean: True or False
 List: [1,3,5,7,9]
 Dictionary: {“Blizzard”: “bad weather”}
 Set: A unique list
RUN After (Shift + E)

How to get serious information about the number of the objects in parentheses
Statistics for Data Analysis and Data Science
Continuous vs Discrete

Continuous: A set of data is said to be continuous is the values belonging to the set can take on ANY
value within a finite or infinite interval. EX: The height of a horse (could be of any value within the
range of horse heights). Time to complete a task (which could be measured to fractions of seconds)
or the speed of a car on route 3 ( assuming legal speed limits).

Discrete: A set of data is said to be discrete if the values belonging to the set are distinct and
separate (Unconnected values). EX: The number of people in your class( no fractional parts of a
person). The number of TV sets in a home (no fractional parts of a TV set). The number of questions
on a math test(no incomplete questions).

What is a Distribution

A probability distribution is mathematical function that, stated in simple terms, can be thought of as
providing the probability of occurrence of different possible outcomes in an experiment.

This is more about a probability of something to happen by using a given data.

For discrete information tend to be presented in a bar chart due to the finite number of the
outcomes that can be seen in that type of chart.

The continuous distribution is a continuous line which means that there is unlimited of numbers.

Standard Deviation

According to wiki, Standard Deviation is a measure of the amount of variation of dispersion of a set
of values. A low standard deviation indicates that the vales tend to be close to the mean and the
high standard deviation indicates that the values are spread out over a wider range.
Standard deviation is better because we can add it to our mean. I the example above we add 65.92 +
4.08 which gives us the result 70.

Quick tip to remember the difference between the standard deviation and variance

SKEWNESS

The best way to see the skewness of the data is to see which direction the line is taking, where is
the outliers. On the left image, the image is skewed to the left (Negative) and the right image is
skewed to the right (Positive). Adding numbers at bottom of the graphic helps us have a better
understanding of the skewness.

The mean is the average of the numbers

For the median we divide the number equal and use the number that is left as the median. If the
numbers of numbers are even for example, we just have to divide the two in the middle then we
will get our median.
Another note, the median is not affected by an outlier value, but the mean is affected.

The mode is the most frequent value, in the graphic is the highest point.
Combining the Strings

1 first_name= “Marcio ” (SPACE AFTER THE NAME)


2 last_name= “Santos”
3 full_name=(first_name + last_name)
4 print(full_name)
5 Marcio Santos

Str() Function

1 Myname= “Hello my name is Marcio and my age is “


2 age=29
3 total=(Myname + str(age))
4 print(total)
5 Hello my name is Marcio and my age is 29

Title function

1 Myname= “Marcio ”
2 LastName= “Santos”
3 print(Myname.title() + LastName.title())
4 Marcio Santos

LECTURE: LIST
What can I put in a list?
Store numbers: numbers_list= [1,3.9,101]
Store strings: string_list= [“John”, “Mike”, “Tony”]
Store mixed: mixed_list= [3,”apple”, 2, “orange”, “banana”]
Even …
A list of lists! Wow_list = [[1,3.9,101]. [“John”, “Mike”, “Tony”], [3,”apple”, 2, “orange”, “banana”]]

One example
names = ["Marcio", "Joao", "Tony"]
["Hello "+item for item in names]
Result: ['Hello Marcio', 'Hello Joao', 'Hello Tony']

Even number on the list from 1 to 100


numbers = list(range(1,101))
[item for item in numbers if item%2==0]
Result: [2,
4,
6,
8,
10,
12,
14,
16,
18,
20,
22,
24,
26,
28,
30,
32,
34,
36,
38,
40,
42,
44,
46,
48,
50, …

DICTIONARY IN PYTHON

When using a dictionary the values must be unique

drinks = {'type':'coffee','quantity':'a cup'}


drinks['quantity']
R: ‘a cup’
Other example

drink = {'type':'juice','quantity':'two bottles'}


print("Would you like "+drink['quantity']+" of "+drink['type']+"?")
R: Would you like two bottles of juice?

IT IS NECESSARY TO BE AWARE OF MISSPELLING, PYTHON IS A CASE SENTITIVE LANGUAGE.

SET
Set is essentially particular type of list where only unique items are stored

Indentation
Is jus t a way to separate a group of codes must be align to the left.
When we use if, else, for and “:” that’s when we should use an indentation

LOOPS, IF-ELSE CONDITIONS AND FUCNTIONS

Functions
IMPORT NUMPY

CREATE AN ARRAY

PRINT AN ARRAY

One way of printing

Another way

2 Dimensionals
Third dimensional

Formula for printing large numbers

Basic operations of An Array


INDEXING, SLICING AND INTERATING

GENERATE STATISCS
SERIES CREATION

SERIES OPERATION
INDEXING
Importing data from Excel
In order to import data from Excel is necessary to upload the excel file first to pandas.
Formula example: data_excel=pd.read_excel('Business Analitics.xlsx')
data_excel
Array Transformation
Review DataFrame ix()

11/10/2020

Backup data
12/10/2020
Visualization
%matplolib inline = is the library that pandas uses for plots.

It is necessary to have the files already saved on a computer so that we can import into Jupyter.
13/10/2020

To make the plot much prettier and organised.


Matplotlib.style.use(‘ggpolt’)

To add a histogram
Df.plot.hist()
The problem with this plot is the fact that do not show the bars of that are hidden.

To show them separated we must use this formula


df.male.plot.hist()
To show them together but with transparent effect
df.plot.hist(alpha=0.5) # it has to be values 0<= alpha <=1

df.plot.hist(stacked=True)

df.plot.box()
Scatter plot is used to show the distributions of two variables.
df.plot.scatter(x=’male’,y=’female’)

Another way of doing to change the size of the points on the scatter plot

df[score]=df.male*0.3+df.female*0.7
df.head()

Then
df.plot.scatter(x=’male’,y=’female’,s=df.score)
Drawbacks of the scatter plot

1-diffucult to show to level of concentration


2-hard to read on large samples

Creating heatmaps

df.plot.scatter.hexbin(x=’male’,y=’female’,gridsize=25) # In order to change the size we just need to


change the number on the gridsize

The deeper the colour the more of the are is concentrated.

To create a pie plot

df = pd.DataFrame(np.random.randint(100,size=(10,4)), columns=[‘a’,’b’,’c’,’d’])df.head()
df.ix[0].plot.pie()
df.ix[0].plot.pie(figsize=(5,5)) #to make the circular pie plot more circular/e.g:
df.ix[0].plot.pie(figsize=(5,5))

Area plot
If the condition is changing over time, this the plot to use. It is ease to make, to read and provide
loads of information
df.plot.area()
To pick just one area
df.c.plot.area()

15/10/2020

Regression analysis

df.describe() #the describe function that comes with panda

You might also like