(CS 5008) Reinforcement Learning : Assignment 5
Q1) Consider the problem of approximating the Y value in terms of X values, i.e., Yi ≈ Xi θ. For a
given set of data points (actual value not important) such as the one shown in Figure 1 write down the
expression for the squared loss L(θ).
Figure 1: Fit a line through origin.
Q2) Now consider the approximation where Yi ≈ θ(0) + θ(1)Xi (say to fit a line for the data set
in Figure 2). Suppose we are interested in minimising the squared loss i) what is the expression of
L(θ)? (note that θ = (θ(0), θ(1)) ∈ R2 has two co-ordinates), ii) what is the matrix formulation of
the same problem, in particular what is the X matrix?
Figure 2: Fit a line that does not pass through origin.
Q3) Give an example of (Xi , Yi )1i=1 0 ∈ R2 × R, such that the approximation error Yi ≈ Xi> θ.
Q4) Consider a 2 × 2 grid with 4 states, and let the feature of positions be X(1) = (1, 1), X(2) =
(1, 2), X(3) = (2, 1), X(4) = (2, 2). Write down the loss function to project V (1) = 2, V (2) = 3,
V (3) = 3, V (4) = 4, i..e, appoximate V ≈ X > θ. Find θ∗ = arg minθ∈R2 kV − X > θk22 .
1 1 1
" #
Q5) What are the eigenvalues of the matrix A = 1 1 1 , and an eigenvector?
1 1 1
Q6) Consider a 3 state Markov Chain where the probability of going from any state to any state is 13 .
Write down the probability transition matrix P. What is the stationary distribution d> = d> P . What
are the eigenvalues of P ?
Q7) For any real matrix A ∈ Rd×d and vector x> , show that x> Ax = x> A> x.
1 2 4 3
Q8) Let X = ,Y = . Verify that XY = X1 Y1 + X2 Y2 , where X1 , X2 are columns
3 4 2 1
of X and Y1 and Y2 are rows of Y .
Q9) Consider a 2 × 2 grid with 4 states, and let the feature of positions be X1 = (1, 1), X2 =
(1, 2), X3 = (2, 1), X4 = (2, 2). The probability of going from any state to any other state is equal.
Let s = st be the current state and s0 = st+1 be the next state, then calculate E[X(s)X(> s0 )] for all
s = 1, 2, 3, 4.
iid
Q10) Consider learning the mean of random variable U nif orm[0, 1]. We are given samples Yt ∼
U nif orm[0, 1], and consider the following update rule:
Vt+1 = Vt + αt (Yt − Vt ) (1)
Estimate the expected squared error in θt after t = 100 for a constant step size α = 0.1.