0 ratings0% found this document useful (0 votes) 10 views14 pagesChapter 0 Introduction
CURSO DE IA PROBABILÍSTICA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Introduction
This chapter introduces python functions, graphics, widgets and pandas.
To run the code, lick on the recket symbol at the top ofthe page and select Live Code. The server wil
then launch and, after some time (be patient), you will see ready. You can then click Run on the calls
below. You should run the cells one ata time, without skipping, as values computed in one cell are
reused in subsequent cals
Let's run the next cell
peine(‘Helte verte’)
Helle verte
Next, change the values of x and y inthe cell below andl click un
print(x sy =", xy)
Inthe cell above, you could try to print x « y instead of x + y. You can use that cell to try your own code.
I you are not familiar with Python, or ifyou want a refresher, you may find Appendix 1 useful
We now load some necesary libraries. (Run the next cell)
fram sPython-core.cisplay import HIM.
port sunpy asm
Inport matplotlib
moore seipy
from scipy.stats Apert norn
from scipy stats Snport boron
from scipy.stats Saport poisson
Inport panes 95 pa
parans (Figure: Figsize':(12,6), # These are plot paraweters
Legend. fontsize’: 23)
from natplotlib anport pypiot as pt
satpiotLio.reParans.update(parans)
Inport radon
‘fron ipywidgets Inport *
printtTe Libraries los
sceess fully")
‘The Libraries loaded successfully
\We start with some Python experiments with random variables.Plotting Distributions of Random Variables
Binomial: B(100,p)
We plot the probability mass function of the Binomial distribution with parameters N’ = 100 and
p= 0.1,0.2,0.5. For n = 0,..., 100, this isthe probability that 100 flips ofa coin that yields heads
with probability p result in w heads
1k = mp-z0rostN)
B= np.zeres(y)
C= mp.2er0stN)
X= nparange(s)
for nin range(®):
An} = blron.pm(o,N-1,8)
B5{n) = biren.paFnsh-2,6)
Cin) = baron pme(nyh-2,e)
plt.plot(a,label= p=t.1") # continuous plot gives a sense of the shope
of the pa
lt. seatter(x,A,5930) # we add the markers on the integers
Plt plot(@,tabel=p=2.2")
put Seatter(X,8,5=30)
Ut plot(€,Tabel='po2.5°)
Put Seatter(x,¢,5=30)
pie tsele(-pon. tof 9(100,5)")
prtsaaser('n')
pit legend()
Plt show)
ims. of 800.9)
008
006
PPF of B(100,0.2):
“The percent point function (ppt isthe inverse CDF. For instance, binom ppl(0.95, 1000.2) is the smallest
value of n such that P(X 0.95, where X = B(100, 0.2).
np. zeras()
np. zeros(s)
for nin range(§)
X{n) = 009 1 8.05%a/(
B{n} = Binon.ppF(x(a],100,6)
plt.seatear(x,8,5950)
prt titleC'p.p.t. of 8(180,0.2)°)
pit adabel('n')
pt show)pps of 1300.02)
Poisson: P(A)
‘The Poisson distribution with parameter 2 i that of the number of phone calls that you receive in one
npvarange(s)
for m in range(¥)
Ato} = polsson.pe(a,a)
In) = poisson. paf(a,0)
ln} = potssen.paf(are)
spleens =
ple plot(A,1obelo"$\Lareda = 18°)
ple. seatcar(iyA, 5-30)
Plt plot(@,labele-$\arede = 46°)
ple Seatcar(iy8,5=30)
Plt plot(,label-s\tanbdo = 108°)
ple seatcer(x,¢,5=30)
Pit-titleC pense. of $°(\2anbda)$")
pit adabel('n')
pit legend)
plt-show)
day, if you receive A phone calls per day, on average.
1 = np-zerostn)
5 = np.zerosty)
C= mpzeros(N)
f
pimf. of A)
os
020
ons
Exponential: Expo(A)
‘The exponential distribution with pamameter is that ofthe time {in days) until the next phone call, if
yyou get A phone cals per day, on average.boas
= np-zeros()
8 = np.zeresty)
= mpzeroe(¥)
D = nprzerostN)
X= (iN}"np.arango(N)
‘for nin range(s:
An) = atrp.exp(- 3°#{n1)
B[n) = berbvexa(- Box{n])
ln) = eerprexpt- extn)
Din) = eerp.expt- sxtn])
splt.aticne =|
pit plot(xyaytasel="s\lanbes = 0.255")
pit plot(x.6,1abel~"s\Lanbca = 6.55")
ple plot(x,c,lavels"s\lanbes = 0.758")
plt-plot(x,0,1abel="s\Lanbc2 = 35°)
ple-title(-erés ts of $84p0(\larbds)$")
prt aaabel(«')
pit tegend()
put sho)
ef. of Exp0(8)
os
02
00
Generating and Plotting Random Variables
np.random.uniform\a,b) retums one value chosen uniformly at random inthe interval [ab
nprrandom.uniform(a, bn) returns n values chosen independently and uniformly 3t random in the
interval [a 8
Let:
yprandom.uniform(®, 1, 100). We then plot these values.
N= 100 # nunber of vatues
X= nporandon.entforn(®, 3, 8) # Hantfona random values An £9, 1)
ple.plot(x); # plot thece values. The ; at the and of the Line
Suppresses unsonted outputs
10
oe
06
os
Since X i a sequence of values, itis more appropriate to use 2 seatter plot of X asa function of the
Index. To do this, we define timeSteps = nparange(100) = [0, 1, ., 99] and we plot X against timeSteps.‘inastens = np. arange(s)
ple seatter (tinesteps, 3;
Strong Law of Large Numbers
‘The values above look lke noise, because they are independent. Is there some statistical regularity of any
kind inthis noise? We compute Yin]. the sample average ofthe frst n values, form = 1, 100 and we
plot ¥. That is, Yinl = (XO) + .. + Xinl/n+1) for n= 6, .. 99. (Note that indexing requires some care)
“The values of X are kept in memory, so we don’t have to generate them again if we ran the previous cel
Y= np-zeros(N) # on array of W zeros
for min range((): 8 for D = 8, By wove
starts with 8)
Yle) = sunbx{ snes) ne
values of X, for m= @, 2,
Plt seatter(tinesteps, 2
= 1 (renenber thot Python
A X{snot] (5 the List of the frst net
1 we could have wettten Us code nore effletently es FoLLous (ean you
fell why it te more effictent?)>
ye) = x0)
4 for'n sn range(99)
2 maa = Carvin) + xEmea})/(n04)
1 ple pleeey;
oes).
060
050
oss
Central Limit Theorem
Observe that, even though the values X fluctuate wildly, their sample average converges to 0.5. That
property is the Strong Law of Large Numbers. Note that Y[99] tends to be closer to 05 than ¥115]. The
Central Limit Theorem (CLT) makes that observation precise. To illust
Previous experiment 100 times and count the fraction of times that Y{a] fas in intervals of width 0.07,
5 and n = 99, We do this the hard way, just to build some familiarity withthe code.
that result let us repeat theY= nporeros(s) # Ufh) mL be the numer of tines that ¥f15) fotLs tn
wv9. 02, (he2)%0.03)
i= npoteros( so) # W(h) MLL be the number of tines that ¥[99) FotLs tn
[e79.02,_(hei)"9.63)
{for ioe tn range(200): 9 to 15 the Andex of our experinent
np randon-unitonn(O, 1, W) # we generate X
x10)
for nin rango(¥-3):
Wnt] = (n°v{a) + x{na])/ (oe) # we generate ¥, using the more
efficient code
c= nt (S60°Y[35]) # tnt(x) ts the integer part of x. For instance,
sne(as.7) = 35
ve)
ents (5 the value of b such that ¥f35) ts &2
(470,05, (he2)70.05),
VU) S22 a that fs, vpey = VER] #2
= tne(ae0"¥199))
Mio 2
values = 0:03"1p.arange(i09) # this AS (0, 0:61, 0,02.» 9.99)
ple-seatcor(values,4/00)3 # the seatter plot gives the value of V as @
function of As
4 we divide V by 100 to get the fraction of times
plt-seatter(values,w/180);
We can add legends to the scatter plot as follows
fig, ax plt-subpiots()
axe scatter(vaiues, ¥/180,label="¥{35)")
de-seatter (values, W/i0e,label='¥[9]")
be legend()
aeartectrue)
plteahow()
020 + yas)
+ Y199]
010
20s)
“These plots show that most of the time, Y[99] is very close to 0.5 whereas YIT5] tends to be more
dispersed, To make the case abit stronger, we repeat the experiment 10000 times, instead of 100,Y= mporeros(s) # Ufh] MLL be the numer of tines that ¥[15) fotts én
ev9.02, (he2)%2.03)
Ui nposeros( soa) # W(S) MLL be the number of tines that ¥[99) Fotts tn
[e°9.02,.(het)*9.63)
for sex 19 ronge(s000e): # tak 1s the index of our expertnent
np.randon-unitorn(®, 1, 108) # we generate X
wie) = x18)
for n in rango( 98)
Y{nez] = Ko°v{n) + x(na])/ (net) # me generate ¥, using the more
efftctent code
= nt(200°Y(35]) ff tnt(x) s the tntager part of x For instance,
sne(as.7) = 35
ents ts the value of b such that ¥f35) {st
(279,01, (he2)"0.01),
Vik) = 2 a that , vpey = VER) #2
= ine(a00"¥199))
Mtg 2
‘inestens = 0.01"mp.arange(see) # this ts [2 6.01, 6,82, ..., 0.99)
Fig, ax = pit subplots)
parscatter(values, 0/1000, abel="15]°)
dr seatter(values, W/1¢000,3be1-"¥(99]")
be tegenac)
decerie(rrue)
pltshow)
+ Yas)
on oc + 199]
“The two plots show that YI15} and 99] havea probably distribution that looks like a bell cure, ie, 9
Gaussian distribution, The dstbbution of [15] is more spread out than that of YI99] which is more
concentrated around 0.5, The spread is measured by the standard deviation of the distribution and its
vale iso /T6 = 2/4 for ¥115} and o/-V/TO0 = a /10 for V9). Here, o's the standard deviation of a
uniform [0, 1] random variable, and « = 1//T2 = 0.3. Roughly, about 70% of the values fall within one
standard deviation of 05, Thus, one expects that about 70% of the values of [15 fall within «/4 = 0.07
of 0.5 and 70% of the values of ¥[99] fall within a /10 = 0.03 of 0.5.
Lets ty to see how far the empirical distribution of [99 i from a Gaussian. The Centra Limit Theorem
states that Y[99] should be almost be distributed like 2 Gaussian ranclom variable with mean 0.5 and
standard deviation 0/10 = 0.03, We plot the fraction of experiments, out of 10000, where ¥199) is less
‘than k x 0.01, asa function of k € [1,..., 100] and we also plot F[k/100] where F isthe cumulative
distribution function (cd of a Gaussian G with mean 0.5 and standard deviation 0.03. We use the cdf
norm cf of a Gaussian SG with mean 0 and standard deviation 1. The trick is that (G - 0.5)/0.03 is then
distributed lke SG, so thatthe probabilty that G < xis the probability that (G -05/0.03 < (x -0:5/003,
ie. the probability that SG < (x 050.03. As you see, the plot confirms the prediction of the CLT.
2 = np. eros(200)
C= np.2er0s(s02)
for kin range(s, 300)
bik = bbe} = wk
{Kd = norm. cae
fig, ax = plt-subplots()
pe-seatcen(values, 6/18000,.abel='Erpirical")
be seatter(values, C,label~"GaussSan')
be legend)
begrie(irae)
pleshou()
~ €.59/0.65)20) empiric eee
os E
06. ~
oa e
a
Markov Chain
Using 2 uniform random variable we can generate random variables with an abitray distribution. Let us
generate a random variable that takes the value 2 with probability py for k 1K = Lwhere the
numbers pg are nonnegative and add up to one, We define a function dicreteRVixp) that returns such a
random variable, Let U be a random variable uniformly distributed in [0,1]. Define $
2 mig t-te J',Q[:.0)4P) # the einstetn sun
noteates hou t2 manipulate indices
1 mp. etnsun th hy
Ded"yA) ts the product of A ond 8
return X, @
P= (10.4,0.6,8),
10.7.6,0.5),
[.2,0°5,0.3))
‘inesteps = np. arange(s)
HO» MECH 30, 2)
spots = (00°) 0, 2°]
ple-ytters((2, 1, 2), labels)
ple ylabel( stinis")
prt aaaben(-$o5")
Ple-title(“forkov Chain $ub with initial state “estrOAOD)
ple scatter timesteps, 2):
ple. show()
Fig, ax = pt subplots()
ple adaver(“So5")
plt-title("vistribution of $x(n]s with sricial state “sstrQxle))
for & Sm range(len(?))
ax scatter(einesteps, Q[t,:]elabel~'P[x(a] = * + stegiys °)')
sx tegens()
ae grtocTrue)
fig'set_Fighetene(o)
ig. set_figeietn(i2) # Me specify the size of the Figure
pit show).
Markov Chain X with |
Distribution of xin} with intial state ©
MrT + Pixtn] = 0}
os + POxn = 21
+ PIX(n] = 21
1 ia
oof =Let's calculate the fraction of time Z[in] thatthe Matkov chain takes the value i duing [0, 1, . n-1). To
do this, we generate the Markov chain, then compute the Zin. (Don't forget to run the previous cell
first)
2 = p-reros({Len(P) NI) # Z {5 tho-dtnensanal: Len(P) by
for 3 in range(lenth))
2[iye] nf an HO # thas, Z[x0,0) = 2 and 26,8} © @ for { not equat
for nin cango(¥-2)
for in ranget en):
BLAgned] = (o°2Lan] + OXfner] == 49/2)
fig, ax = plt-subpiots()
for sin cange(len(P))
ax.seatter(tinesteps, Z[4y:],abele'Z[" + str(i) +, 9)")
pe.tegend()
bearie(True)
it show)
id + 210,01
+ Za)
ie + 22,01
06
02
0.0 semen
Markov Chain of Fig. 3.11
{As another example, we simulate the Markov chain in Fig. 3.11,
M28 # number of states
P= np. zeros( {ti ae ])
P= lawe(a - eu)
Fe = mur(a = Tan)
Aol
= pu/(pe - 92)
Pre,0] = 1 - pe
Plows] = 92
Pome] = 2 pe
Pinna) = pe
Plea} = 22
P= petalist()
XG = MEUN,2.P)
x = np. 20r056N)
{for win rangeC2.¥):
xin} = (eI) °AXEa-1] KC) /0
inestens = nprarange(™)
obelse{)
for n in range(t)
4 Labets.appene(str(m))
‘ple. ytichs(np.aronge(H), Labels)
plt-adabel sss")
lt scatter (tinesteps x 2abel= "suffer backlog $4{018°)
lt plot(2x,tabel~ expecter backlog $E[x)S"ye="red!)
Plt plot(Ax,Iabel~ average bacelog during $\(0,1, ldots, n\)$"€
Sareen')
pit tegen)
put shou)— expected backlog E11
— average backlog during {0,1,...0}
buffer backlog Xin]
Widgets
Instead of modifying parameters in the code of a cell its sometimes convenient to expose these
parameters in ‘widgets’ We illustrate this method on the plot of a Gaussian with mean js and standard
deviation a. In a Jupyter Notebook, the widgets are wellintegrated with code ces and ajusting a widget
triggers a new run of the code. Unfortunately, there is a major bug in Jupyter Bock and widgets do not
‘work as smoothly asin Jupyter Notebooks. As a work-around, we separate the widgets and the code into
separate cells. Also, some useful widgets still do not work with this fx. Oh well... we wil live with what
works.
‘lost(mid), Float stgnad)
ind = widgets, oropdown(optionse['-3', 2's "2's
fe, tty'2", "a" }yvalun='8"jeescription="#\m$ydiaabled-False)
sgnad “widgets Dropdoun(options['0.1', "2.2", '0.3°)
113,552] values’ yeeseription= $\signas” disabled: False)
2 -'wiagets.snteractive(dumty, aud = mud, sigrad ~ signaa)
aisplay(e)
sinus [9
ssigmas [1
ef plotcaussian(mu signa)
‘alues = 0,095%np.arange(2IN + 1) = 0.6059 # this 4s our x-axis
Benpazeron(2oN 1 1) # this is our y-axis for the caf
CL mpezeros(2on 4 1) # this is our yrants for the pa
for nin range(2"® + 2)
tn] = norm. ca((walues[n] ~ ma)/sigea)
Un} = norm: pat((values(n] ~ ns)/Sigea)
fey a = ple-subplots()
ig. sat_fipneighe(s)
‘ig. set figwidth(i2) # We specify the size of the figure
acscatier(values, 9,labelo"COr"se4)
ax seatter(values, ¢;label="P0F' 5-4)
se legend()
axari(trae)
Its eeLe("S(\est A)(\mu \signa2)$") H We edd a €CtLe fo the graph
pis. snow)
plotsavsstan(nu, signa)
Frint("To charge the paroneter values, gp Sack £o the previous cell, \n
Bajust widgets, and run this cell again’)Myo)
a oF
POF:
oe
06
02
20
To change the paravater values, go back to the previous cell,
ajust wideets, and run this cell seein
Basic Data Analysis
‘We use pandas to explore sample data, We Use 2 table of weights and heights borrowed from
p/w stat ula edu/socr/indexhp/SOCR Data Dinov 020108 HeightsWeights
“This data is in the excel spreadsheet HW.xlsx ofthis chapters dtectory. First, we read the spreadsheet
into a Dataframe that we print. We then calculate the linear regression of weight over height. (More
about this in later chapters) We then plot. Many statistical tools are avallable in Python, The key isto
Understand not only how to use them, but what they do,
from sklearn.2inear_wodel inport Linearkegression
kde = pa.reae excel (‘Hi xsx Sndexs cot = 8)
prince de)
H = np. array(Wdf.Stoe[:,0]) reshape((-2, 1)) # convert H into a column
WW tages Sdoe( a)
Linear_regressor = LinearRegression() # create object for the class
Linear_regressor-Fit(0, K) 1 peeform Linear regression
pred” linear regressor.predict(n) # nate predictions
plt.ylaver(vetahe”)
pit sdabel height”)
PIL-UHELe("Lirear Regression of Welght over Hetgnt”)
ple seaecor(iy 6)
PIL-plOECH, Hered, color="red")
plt-show()
Print(*Ladk ma, 1 an doing oata Sctence!")
edge (inches) Weight (Pounds)
1 5.78 12.99
[200 rows x 2 coluans]Linear Regression of Weight over Height
Look ma, T a0 doing oata Setence!
Monte Carlo
‘We use simulations to estimate the area of the intersection of two unit circles whose centers are
separated by C > 1. The figure illustrates the setup,
‘The code generates 10000 points chosen uniformly inside the unit square SI calculates the fraction of
these points that fll inthe intersection of the two circles. This fraction isan estimate of half ofthe area
ofthe intersection of the two circles. A point (x, 9) is inside the frst circle if a? = 2? + y? < 1. ttis
inside the second circle i 8 = (2 — C)? + y# < 1. We include an estimate ofthe error based on the
standard deviation of a Bernoulli rancom variable,
estinate = @
for in rango(t:
X= np-randan. unsfora(@,2)
Y= nporandan.onfore(e 3)
festanace = (nventingee (X02 6 yt co 1 and (x = C2 6 ytez
= ayeeay
dev = 2.6e(estinate*(s-estinate))*(0.5)/308
Print( The ares value 15 sn ['yfound(2vestinate - dev,3),'s °s
rouna(2*estinate » ev,3).") with probability 99%")
4 = wagers. Floatstider(deseription="C"y min = 2, max = 2, Step = 8.
Yalue “4.5, position = ‘botton', continuous update = Poise)
#2 = lagers. interactive(estinstenres, = €)
+ atsplay(z)
cestanatetrea(1.5)
The area value 4s in [ 0.43 , 0.452 ] with probabiisty 99%
Conclusions:
‘The previous examples illustrate how to generate random variables, their paf and edf, Markov chains,
how to plot sequences, adjust widgets, and how to use pandas to manipulate DataFrames and analyze
data with statistical tools. Ifyou are comfortable with Python, you can move on to the next chapter.
Otherwise, you can review Appendix 1
By Jean Walrand
© Copyright 2021.