[go: up one dir, main page]

0% found this document useful (0 votes)
31 views11 pages

Reinforcement Learning Question Bank

The document is a descriptive question bank for a Master's level course on Reinforcement Learning at Vignan Institute of Technology and Science. It includes various units with questions covering fundamental concepts, algorithms, and applications related to reinforcement learning, linear algebra, and decision-making processes. Each question is categorized by marks, course outcomes, program outcomes, and Bloom's Taxonomy levels.

Uploaded by

Parupally Girija
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views11 pages

Reinforcement Learning Question Bank

The document is a descriptive question bank for a Master's level course on Reinforcement Learning at Vignan Institute of Technology and Science. It includes various units with questions covering fundamental concepts, algorithms, and applications related to reinforcement learning, linear algebra, and decision-making processes. Each question is categorized by marks, course outcomes, program outcomes, and Bloom's Taxonomy levels.

Uploaded by

Parupally Girija
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 11

VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

(AN AUTONOMOUS INSTITUTION)


Deshmukhi (V), Pochampally (M), Yadadri Bhuvanagiri Dist., TS-508284

Course : M.TECH Year / Semester: I/II


Subject Name: Reinforcement Learning Branch Name(s): AI & DS
Subject Code(s): 23AI206PE72
DESCRIPTIVE QUESTION BANK

UNIT- I
Q.NO DESCRIPTION OF QUESTION MARKS CO PO BTL
a What are the basic steps of Reinforcement Learning 1 1 1 1
b Write the Definition of Linear Algebra?
1 1 1 2
1 c Explain the Basics of Probability and their significance in
Reinforcement How do probabilistic models relate to 10 1 2,3 3
decision-making?
a List out the main steps in Principal Component Analysis
1 1 1 1
b Elaborate the SVD and write short note
1 1 3 2
2
c Describe the fundamental concepts of Linear Algebra and
their rele Reinforcement Learning. Provide examples of linear 10 1 1 2
algebra operations us
a Write a short note on Upper Confidence Bound
1 1 2 3
b Which algorithm is used to estimate the optimal value function
1 1 1 2
3 directly without explicitly learning the policy?
c Explain the Fundamental Concepts Linear Algebra
10 1 2 2
a List out the Limitations of Support Vector Machines
1 1 2 2
b Explain Eigen Decomposition
1 1 1 2
4 c Explain the concept of Regret in the context of Multi-Armed
Bandits minimizing regret important
5 1 3 2
D Explain the operations in Linear Algebra
5
A What is "value function" in reinforcement learning?
1 1 1,2 1
B What is the difference between reinforcement learning and 1 1 2 1
5 supervised learning
C What are the Strategies to Solve the Multi-Armed Bandit Problem
5 1 2,3 3
D What are the Common Linear Transformations in Machine
5
Learning?
A What is the role of exploration in reinforcement learning
1 1 2 2
B What does the term "policy" mean in reinforcement
learning
1 1 2,3 3
6
C List out the applications of Armed Bandit Problem and explain 2.3 3
5 1
the Types of Regret(any two )
D What is the importance of Principal Component Analysis
5
(PCA)
UNIT- II
Q.NO DESCRIPTION OF QUESTION MARKS CO PO BTL
A Expand the MDP and write short note 2 1
1 1
B Explain Value Function 1 2 1 1
1
C Explain the fundamentals of a Markov Decision Problem 1
(MDP) in reinforcement learning. What are the key 10 2 2
components, and how do they relate to decision- making?

A What is the importance of The “state value function” 1 2 1 2


B What are the main contains in MDP model 1 2 1 1
2
C Define policy and value function in the context of MDPs. 1
10 2 2
How are these concepts used to represent and solve
reinforcement learning problems
A Write the importance of Finite Horizon Reward Model 1 2 1,2 4
B Uses of Value Function 1 2 1, 3,5 1
3 C Differentiate between episodic and continuing tasks in
reinforcement learning. How does the task type affect the 10 2 1 1
formulation and solution of an RL problem?

A What is the intention of policy improvement step 1 2 1,4 1


B Explain Value iteration in Policy iteration Algorithm 1 2 1,4,5 2
C Describe the concept of Value Iteration as a dynamic
4 5 2 1,3 2
programming method for solving MDPs.
D What are the key steps involved in the Value Iteration
5
algorithm?
A Define “Reward models”
1 2 1,5 2
B Write the uses of Bellman optimality operator 1 2 1 2
5 C Explain the concept of Policy Iteration as another dynamic
5 2 1,2 1
programming approach to solving MDPs.

D How does it alternate between policy evaluation and policy


5
improvement?
A Explain role of the optimal value function 1 2 1,2,10,12 2
B Define “Action-Value Function” 1 2 2,10 1
C Explain the Key Differences between Episodic and Continuing
5 2 1,10,12 1
tasks
6
D What are the main objectives of Episodic and Continuing
5
Tasks

UNIT- III
Q.NO DESCRIPTION OF QUESTION MARKS CO PO BTL
a List out the key components in Model-based algorithms 1 3 1,2,3 1

1 b Write a short note on Meta-Reinforcement Learning 1 3 1,2 2

c Explain the essence of the Reinforcement Learning


10 3 3 1
problem. Describe its key components, including agents,
environments, and rewards. Discuss the fundamental
challenges faced in reinforcement learning.
a What is Model Predictive Control
1 3 1,3,4,5 2
b Define Blocks world in goal stack planning 1 3 1,3,4,5 2
2
c Differentiate between prediction and control problems in
10 3 1,3,4,5 1
Reinforcement Learning. Provide real-world examples for
each type of problem and discuss the key distinctions
a What is main purpose of original STRIPS system in goal 1 3 1,3,4,5 2
stack planning
3 b Write the advantages of Monte Carlo Policy Evaluation
1 3 1,2 2
c Elaborate on the concept of model-based Reinforcement
10 3 1 1
Learning algorithms. How do these algorithms employ
models of the environment to make informed decisions?
Provide examples of situations where model-based methods
are beneficial.
a Elaborate RNN and explain which context we can used
1 3 1 2
b What is cumulative reward in Reinforcement Learning
1 3 1 2
4 c Describe the Monte Carlo method for solving prediction
5 3 1,2,3 2
problems in Reinforcement Learning. How does it
estimate value functions based on sampled episodes?
d Explain the key characteristics of Monte Carlo methods
5
a List out the the formulated things in Reinforcement Learning
1 3 1,2 1
problem
b List out the Disadvantages of Monte Carlo Policy 1 3 1 2
5 Evaluation
c What are the steps involved Monte Carlo prediction problem
5 3 1 1
to estimate the value function?
d Explain the general process of model-based RL
5
a What are the Types of Reinforcements in Reinforcement 1 3 1 1
Learning
b What is the advantage of Negative Reinforcement
1 3 1,3,4,5 2
6
c Discuss the online implementation of Monte Carlo policy
5 3 1 1
evaluation. How does this approach update value estimates
as new data becomes available?
d List out the advantages and limitations of online Monte
5
Carlo methods

UNIT- IV
Q.NO DESCRIPTION OF QUESTION MARKS CO PO BTL
a 4
What are the Types of Bootstrapping 1 1,6 1
1 b 1 4 1,3
List out the Advantages of Bootstrapping
c Explain the concept of Bootstrapping in Reinforcement 10 4 1,2,3 2
Learning. How does it differ from traditional Monte Carlo
methods, and what are its advantages?
a Explain Temporal Difference (TD) Learning in Bootstrapping 1 4 1,2 4
b Explain TD(0) stands for ? 1 4 1 1
2
c Describe the TD(0) algorithm in detail. How does it update
10 4 1 1
value estimates, and what is its significance in reinforcement
learning?
a Explain the role of Value Iteration in Bootstrapping 1 4 1,2 1
b List out uses of Function Approximation in Bootstrapping 1 4 1,2,3 2
3
c List out the key aspects of bootstrapping in reinforcement
10 4 1 1
learning
a Write the TD(0) Update Rule in TD(0) algorithm 1 4 1,2 1
b What is key difference between Expected SARSA and SARSA
1 4 1,2 2
4 c Write the overview of the SARSA algorithm 5 4 1 2
d Explain the following terminology in Bootstrapping 5
a) Value Iteration b) Function Approximation
a What are the main types of model-free control methods
1 4 1 1
b What are the Key Components of Model-Free Control
1 4 1 2
5 c What about Q-learning and explain key components
5 4 1,2 1
d What are the steps involved in the Q-learning algorithm
5
a What is the extension of SARSA 1 2
1 4
b What is Key aspect of Q-Learning in Bootstrapping 1,2,3,5 1
1 4
6
c Explain the concept of Model-Free Control in Reinforcement 1,2,3,5 2
5 4
Learning.
d Discuss the key algorithms used for model-free control,
5
including Q-learning, Sarsa, and Expected Sarsa. How do
these algorithms learn optimal policies without explicit
models of the environment?

UNIT- V
Q.NO DESCRIPTION OF QUESTION MARKS CO PO BTL
a Define Generalization 1 5 1 1
b What is the role of Transfer Learning in Generalization
1 5 1,5 1
1
c Explain the concept of n-step returns in Reinforcement
10 5 1,5 2
Learning. How do they balance the trade-off between
bootstrapping and sampling? Provide examples to illustrate
their use.
a What is Vector Space Representation geometric view 1 5 1,2,3,5,12 2
b What is the extinction of LFA
1 5 1 1
2
c Discuss the need for generalization in Reinforcement
10 5 1 1
Learning practice. Why is generalization important, and
how does it address issues related to scalability and
transferability?

a In Markov Decision Processes, what is the role of Linear


1 5 1,2,3,5,12 2
TD(λ)?
b Define TD Error in Linear TD(λ)
3 1 5 1,2,3,5,12 2
c Describe Linear TD(λ) and its application in reinforcement
10 5 1,2,3,5,12 1
learning. What are the advantages and limitations of using
linear function approximation with eligibility traces?
4 a What is the purpose of Linear Function Approximation 1 5 1 1

b What is the representation of TD error in Geometric View


1 5 1,2,3,5,12 4
c Describe Policy Search methods in Reinforcement
5 5 1,12 2
Learning.
d What are the key ideas behind policy search, and how do
5
they differ from value-based methods?
a Which one combines the advantages of both TD(0) and
1 5 1 1
multi-step TD methods
b What are the state and action spaces in Policy gradient methods 1 5 1,2,3,5,12 2
c Explain Policy Gradient methods and their significance in
5 5 5 1,2 2
Reinforcement Learning.
d How do they optimize parameterized policies directly in
5
Reinforcement Learning?
a What is the role of Generalization in machine learning 1,2,3,5 2
1 5
b What is advantage or uses of Policy gradient methods 1 5 1,2,3,5,12 2
6 c Provide case studies illustrating the practical application of
5 5 1,2 2
the real-world reinforcement learning scenarios.

d What is the importance of Experience replay technique which


5
is used in reinforcement learning?
OBJECTIVE QUESTIONS

Unit -1

S.No. Question
1. Reinforcement learning is a ____
A. Prediction-based learning technique
B. Feedback-based learning technique
C. History results-based learning technique

Answer: B) Feedback-based learning technique


2. Which kind of data does reinforcement learning use?

A. Labeled data
B. Unlabelled data
C. None
D. Both a & b

Answer: C) None
3. Reinforcement learning methods learned through ____?

A. Experience
B. Predictions
C. Analyzing the data
D. None

Answer: A) Experience
4. In Reinforcement Learning, what does the term “agent” refer to?
a) A person supervising the learning process
b) A software program making decisions
c) A labeled data point
d) A neural network architecture

Answer: B) A software program making decisions

5. The “reward function” in RL is used for:


a) Defining the neural network architecture
b) Calculating the probability of actions
c) Evaluating the quality of an agent’s actions
d) Filtering noisy observations

Answer: c) Evaluating the quality of an agent’s actions

6 What does the term "exploitation" mean in reinforcement learning?


a) Using the agent’s existing knowledge to make decisions
b) Searching for new states
c) Reducing the action space
d) Resetting the environment

Answer: a) Using the agent’s existing knowledge to make decisions


7 What is the state of reinforcement learning?
A. State is a situation in which an agent is present.
B. A state is the simple value of reinforcement learning.
C. A state is a result returned by the environment after an agent takes an action.
D. None

Answer: C) A state is a result returned by the environment after an agent takes an


action.
8 What are the Rewards of Reinforcement learning?

A. An agent's action is evaluated based on feedback returned from the environment.


B. Environment gives value in return which is known as a reward.
C. A reward is a result returned by the environment after an agent takes an action.
D. None

Answer: A) An agent's action is evaluated based on feedback returned from the


environment.
9 Balancing between trying new actions and exploiting known actions is known as:

a) Exploration vs. exploitation


b) Model validation
c) Feature extraction
d) Dimensionality reduction

Answer: a) Exploration vs. exploitation

10 Which component is essential in reinforcement learning?


a. Agent
b. Environment
c. Rewards
d. All of the above

Answer: d. All of the above

Fill in the Blank


Unit-1
S.NO
1. ___________is the objective of a reinforcement learning agent?
Answer: To maximize rewards

2. _________is an agent in reinforcement learning?


Answer: An entity that interacts with the environment and takes actions

3. In reinforcement learning, which environment is refer ________


Answer: The external system with which the agent interacts

4. _______ is the role of exploration in reinforcement learning?


Answer: To allow the agent to try new actions and learn from them
5. _____________ is an outcome from a sample space with one characteristic.
Answer ; Simple event

6. ________ are single numerical values, without direction, magnitude only


Answer : Scalars

7. _________ Involves two or more characteristics simultaneously


Answer : Joint event

8. ____________ involves multiplying each element of a vector or matrix by a scalar.


Answer : Scalar multiplication

9. The________ product of two vectors measures the similarity of their directions.


Answer : dot
10. ______ is the process of decomposing a square matrix into its eigenvalues and eigenvectors.
Answer : Eigen decomposition

Unit -2

S.No. Question
1. What is the "Markov decision process" (MDP) in reinforcement learning?

a) A process used to make supervised learning predictions


b) A mathematical framework to model decision-making with rewards and states
c) An algorithm for training deep neural networks
d) A system for classifying data

Answer: b) A mathematical framework to model decision-making with rewards and


states
2. What is "value function" in reinforcement learning

a) A function that predicts rewards based on future actions


b) A function that maps states to the expected future rewards
c) A function that reduces overfitting
d) A function that classifies the action space

Answer: b) A function that maps states to the expected future rewards


3. What does the Bellman equation describe in reinforcement learning?

a) The relation between current and future rewards


b) The total number of states in the environment
c) The accuracy of a supervised learning model
d) The number of possible actions

Answer: a) The relation between current and future rewards


4. What challenge does sparse rewards pose in Reinforcement Learning?

a) Agents become too greedy


b) Agents stop exploring new actions
c) Agents focus only on exploitation
d) Agents struggle to learn effective strategies
Answer: b) Agents stop exploring new actions
5. What is the role of the reward function in reinforcement learning?

a. It defines the actions available to the agent


b. It provides feedback to the agent based on its actions
c. It specifies the termination condition of the learning process
d. It determines the size of the agent's memory

Answer: b. It provides feedback to the agent based on its actions

6 Which algorithm is used when the reward function is not known in


reinforcement learning?

a. Q-learning
b. Policy gradient
c. Inverse Reinforcement Learning (IRL)
d. Deep Q-Network (DQN)

Answer: c. Inverse Reinforcement Learning (IRL)

7 Which algorithm is used when the environment is partially observable in


reinforcement learning?

a. Q-learning
b. Policy gradient
c. Partially Observable Markov Decision Process (POMDP)
d. Deep Q-Network (DQN)

Answer: c. Partially Observable Markov Decision Process (POMDP)


8 How do you represent the agent state in reinforcement learning?
A. Discount state
B. Discount factor
C. Markov state
D. None

Answer: C) Markov state


9 What do you mean by MDP in reinforcement learning?
A. Markov discount procedure
B. Markov discount process
C. Markov deciding procedure
D. Markov decision process

Answer: D) Markov decision process


10 P[St+1 | St ] = P[St +1 | S1,......, St], in this condition
What is the meaning of St?

A. State factor
B. Discount factor
C. Markov state
D. None

Answer: C) Markov state

Unit -2
S.NO

1. _______function is denoted by R(s, a, s'), where s is the current state, a is the action taken, and s' is
the next state.
Answer : The reward

2. ___________ that is used to weigh the importance of future rewards compared to immediate rewards.
Answer : A discount factor

3. ___________ define the immediate feedback an agent receives for each action taken in a particular
state.
Answer : Reward models

4. In the __________model, the agent aims to maximize the cumulative discounted reward over an
infinite time horizon.
Answer : infinite discounted reward

5. __________ model is similar to the total reward model, but it allows for variable time horizons.
Answer : The finite horizon reward

6. In ________, the agent's interactions with the environment are organized into episodes.
Answer : episodic tasks

7. The __________ is a special state that marks the end of an episode, after which the environment is
reset to the initial state, and a new episode begins.
Answer : terminal state

8. The _______ operator is used to iteratively update the value function based on the Bellman optimality
equation.
Answer : Bellman optimality

9. Policy iteration is an iterative algorithm used to find the in a Markov Decision Process (MDP)
Answer : optimal policy

10. The _______step involves selecting the best action in each state to maximize the expected cumulative
reward according to the current value function.
Answer : policy improvement

You might also like