2017 IISAReinforcementtheory
2017 IISAReinforcementtheory
net/publication/323789527
CITATIONS READS
15 3,406
3 authors:
             Maria Virvou
             University of Piraeus
             456 PUBLICATIONS 5,683 CITATIONS
SEE PROFILE
All content following this page was uploaded by Akrivi Krouska on 05 April 2018.
    Abstract—The provision of motivational messages and stimuli         speeds and teaching must be adapted accordingly. Hence,
to students is very important in online education. Such messages        taking into account each learner’s characteristics is
direct behavior towards particular learning goals, lead to              challenging, and this is even more important when these
increased effort and energy, increase initiation of and persistence     characteristics are not easily extracted from students (e.g.
in activities affects cognitive processes, affect cognitive processes   emotional state). As such, instructional systems that involve
and often enhance performance. Towards this direction, this             adaptivity may prove an efficient solution in that regard. The
paper presents an e-learning system which incorporates the              knowledge about learner differences should be used by a
reinforcement theory providing two methods of increasing                tutoring system to deliver personalized education.
desirable behaviors, namely the positive and the negative
reinforcement, and combines it with a badge system. Hence the e-            Towards this direction, in education and more notably in
learning system delivers corresponding messages and badges to           online learning, providing stimuli to students is very important,
students, based on their performance.                                   since impersonal tutoring can be altered by providing
                                                                        motivational messages and advice to students placing them in
    Keywords—badge system, e-learning, motivational messages,           the center of the educational process. Computer-based advising
reinforcement theory                                                    focuses on assisting and facilitating students while interacting
                                                                        with the tutoring system [4]. Given that educational
                       I. INTRODUCTION                                  applications are addressed to a very large and heterogeneous
    During the last years, the ever increasing role that personal       audience, namely learners with different characteristics and
computers play in the field of education has been emphasized            needs [5], the need of motivational messages of different
and thus the evolution of including computer-assisted methods           nature is evident.
in learning has been accentuated [1, 2]. Educational systems                 Educational researchers have begun to identify a need for
are based on computer models of instructional content and can           more fine-grained research studies that capture the subtleties of
support the learning, by providing personalized instruction to          learners’ interactions with dynamic and interactive learning
students [3].                                                           objects. In every field of education, interactive technology has
    One of the most significant aspects that need to be taken           been integrated into the curriculum at many levels. Providing
into consideration both for the context of traditional learning         motivational messages to students or awaking stimuli to
(learning in classrooms) and for any form of technology-                learners constitutes a crucial research issue in the related
enhanced instruction is the one of learner variability [3].             scientific literature and scientists seek ways to address this
Learners should be faced as a heterogeneous population in               emergent need. Understanding the nature of technology-
which individuals differ in several characteristics including           assisted learning interactions and the way in which they foster
personal traits, such as knowledge level, learning goals,               depth of understanding is a great challenge for both educational
learning style or even personality. This idea renders each              researchers and developers of e-learning technologies.
learner’s requirements and preferences in learning unique. The              In behavioral psychology, reinforcement is a consequence
approach to instruction in which a single teaching scenario is          that will strengthen an organism's future behavior whenever
used for all learners, better known in the scientific literature as     that behavior is preceded by a specific antecedent stimulus [6].
the “one-size-fits-all” approach to instruction, is often stated as     This strengthening effect may be measured as a higher
unsuitable. It becomes clear that student-centered or                   frequency of behavior (e.g., pulling a lever more frequently),
differentiated learning is the appropriate way of instruction           longer duration (e.g., pulling a lever for longer periods of
given that students learn in different ways and at different            time), greater magnitude (e.g., pulling a lever with greater
force), or shorter latency (e.g., pulling a lever more quickly        factor structure and internal reliability of these scales with a
following the antecedent stimulus).                                   sample of 368 postgraduate students in the area of Pakistan.
                                                                      The results of the study showed that the factor structure of the
    Rewarding stimuli, which are associated with “wanting”            six motivation scales of the MSLQ was significantly modified
and “liking” (desire and pleasure, respectively) and appetitive       with the current sample of postgraduate students in Pakistan.
behavior, functions as positive reinforcers [6]; the converse         These results indicated that there was a need for further
statement is also true: positive reinforcers provide a desirable      development and careful adaptation of MSLQ for use in other
stimulus [6]. Reinforcement does not require an individual to         eastern contexts in general and in the context of Pakistani
consciously perceive an effect elicited by the stimulus [7].          higher education in particular.
Thus, reinforcement occurs only if there is an observable
strengthening in behavior. However, there is also negative                In [11], the authors examine and measure the satisfaction
reinforcement, which is characterized by taking away an               factors by employed Information System Success Model
undesirable stimulus [7].                                             (ISSM) related to the usage and sustain use of e-learning in
                                                                      Jordanian public universities. A self-administered survey was
    In view of the above, this paper presents a web-based             conducted on 117 of academics selected as deans of faculties,
tutoring system for learning the programming language C# (C           academic departments’ chairs, and managers of computer and
Sharp). This system incorporates the Reinforcement theory1,
                                                                      e-learning centers. About 100 participants (85%) have
which is presented in Section 3, and delivers motivational            responded. The results showed that perceived usefulness,
messages to students in order to support them in the                  management support, self-efficacy, technical support, and
educational process. Such messages can assist the students            training have a significant relationship with the satisfaction of
since they are given stimuli to study more and reach the              the usage of e-learning system. On the other hand, the study
learning goals, using a badge system.                                 grants a good indicator to the higher education leaderships
    The rest of the paper is organized as follows. In section 2, a    about the quality of used e-learning in the Jordanian public
related work is presented. In sections 3 and 4, the                   universities.
reinforcement theory is explained in depth and an overview of             In [12], the authors present significant steps in contributing
the system is provided. Finally, in section 5, the conclusions        both theory and practice of e-leaning and addressing some gaps
and our future plans are presented.                                   in the current body of literature, by developing a
                                                                      comprehensive empirically-based model that quantificationally
                      II. RELATED WORK                                assessed and predicted the critical key factors that have the
    This section presents a literature review of learning theories    most significant influence on higher-education instructors’
and incentives provision used to motivate students when               intention to participate in e-learning systems. More
interacting with an e-learning system.                                specifically, this study has made a number of important
                                                                      practical implementations and theoretical contributions. In
    In [8], the authors presented a study where students acted as     terms of practical implications, the results presented in this
junior consultants in a game-like environment and investigated        paper can help higher-education institutions in developing
what students did when interacting with the electronic tutoring       countries to apply successful e-learning applications. Based on
system. The authors stated that intrinsically motivated students      the results, in order that e-learning could be implemented
are more persistent and more likely to achieve preset goals,          successfully, universities in these countries should formulate
with a much smaller drop-out risk in education, with more self-       strategies addressing critical factors that are essential in
regulation and higher self-reported well-being with less              enhancing instructors’ intention to participate in e-learning
avoidance behavior. The results of the conducted experiment           systems, such as: (a) Create a supportive culture toward e-
showed that students with high intrinsic motivation did not do        learning environment, (b) provide incentives to motivate
more, rather they tended to do different things. Analysis of log      instructors, (c) provide related Internet and computer training,
files showed that the increased curiosity that students with high     (d) establish more flexible and easy-use e-learning applications
intrinsic motivation have, resulted in proportionally more            and finally (e) increase awareness about the importance of e-
explorative study behavior.                                           learning as a useful medium to achieve instructors and
    In [9], the authors provide background information on             universities educational goals.
Cognitive Load Theory (CLT) and to place the contributing                 However, after a thorough investigation in the related
articles into context. Firstly, they briefly describe the             scientific literature, we came up with the result that there is an
assumptions of the theory, regarding memory systems and               implementation gap in e-learning system that incorporates the
learning processes, different types of cognitive load, and            reinforcement theory to provide motivational messages and
design implications. Secondly, for each of the contributing           stimuli to students with a badge system.
articles, they discuss the main research questions, findings, and
conclusions, and indicate how they offer new insights into
                                                                                      III. REINFORCEMENT THEORY
instructional procedures for the design of e-learning.
    In [10], the author present the results of an exploratory             Behaviorist Skinner derived the reinforcement theory [13]
factor analysis of the motivation scales of Motivated Strategies      as a way to explain behavior and why we do what we do. The
for Learning Questionnaire (MSLQ). An exploratory factor              theory states that “an individual’s behavior is a function of its
analysis of motivational subscales was used to examine the            consequences”1. It is based on “law of effect”, i.e. individual’s
1
  Management Study Guide (2013)-Reinforcement Theory of Motivation:
http://managementstudyguide.com/reinforcement-theory-motivation.htm
behavior with positive consequences tends to be repeated, but               homework, then s/he will receive negative reinforcement by
individual’s behavior with negative consequences tends not to               getting a bad grade; therefore that behavior turns into an
be repeated. Reinforcement theory has been used in many areas               unwanted behavior for the student. If the student turns
of study to include animal training, raising children, and                  homework in and is met with a positive reaction, then that is
motivating employees in the workplace. Reinforcement                        negative reinforcement and the student will be likely to turn
theories focus on observable behavior rather than needs                     homework in because of the positive response. Both positive
theories that focus on personal states. Reinforcement theory is             and negative reinforcement can be used for increasing
a form of operant conditioning and focuses on the                           desirable / required behavior.
environmental factors that contribute to shaping behavior.
Simply put, reinforcement theory claims that stimuli are used                   Reinforcement theory also includes positive and negative
to shape behaviors. There are four primary approaches to                    punishment. However, due to the fact that this paper presents a
reinforcement theory: positive reinforcement, negative                      tutoring system, we have not used positive and negative
reinforcement, positive punishment, and negative punishment,                punishment. Generally in education and even more in e-
which will be covered in a later paragraph. By analyzing the                learning, students should be reinforced and strengthened during
various components of the Law of Effect and the primary                     the tutoring process rather than be punished, since punishment
                                                                            may have negative results on them2.
approaches, we can achieve desired results, otherwise known
as consequences, through its application within the workplace.
                                                                            B. Scheduling of reinforcement
    Reinforcement theory of motivation overlooks the internal                   A schedule of reinforcement determines when and how
state of individual, i.e., the inner feelings and drives of                 often reinforcement of a behavior is given, as described in
individuals are ignored by Skinner. This theory focuses totally             Table I. Schedules of reinforcement play an important role in
on what happens to an individual when he takes some action.                 the learning process of operant conditioning since the speed
Thus, according to Skinner, the external environment of the                 and strength of the response can be significantly impacted by
organization must be designed effectively and positively so as              when and how often a behavior is reinforced. Two types of
to motivate the student. This theory is a strong tool for                   reinforcement schedules are: continuous reinforcement and
analyzing controlling mechanism for individual’s behavior.                  intermittent reinforcement.
However, it does not focus on the causes of individual’s
behavior.                                                                       Continuous reinforcement is when a desired behavior is
                                                                            reinforced each and every time it is displayed. This type of
A. Positive and Negative Reinforcement                                      reinforcement schedule should be “used during the initial
   Reinforcement theory provides two methods of increasing                  stages of learning in order to create a strong association
desirable behaviors. One is positive reinforcement and the                  between the behavior and the response”. Continuous
other is negative reinforcement.                                            reinforcement will not generate enduring changes in behavior,
                                                                            once the rewards are withdrawn, the desired behavior will
    To avoid any confusion we can think of positive as a plus               become extinct. A good example of continuous behavior is the
sign (+) and negative as minus sign (-). In other words:                    process of using a vending machine. For example, a soda
                                                                            machine will give a soda every time you feed it money. Every
    Positive         Reinforcement:      Give (+) what                      so often you may not receive the soda and you are likely to try
individuals like when they have performed the desired                       only a few more times. The likelihood that you will
behavior [14].                                                              continuously keep adding money when not receiving any
    Negative Reinforcement: Remove (-) what individuals do                  reward is extremely low so this behavior is often stopped very
not like when they have performed the desired behavior [14].                quickly.
    More specifically, positive reinforcement implies giving a                  Intermittent reinforcement is when a desired behavior is
positive response when an individual shows positive and                     reinforced only occasionally when it is displayed. In this type
required behavior. For example, the immediate praising of a                 of reinforcement schedule behaviors are obtained more
student for coming early in the class. This will increase                   gradually, however the behaviors are more enduring (defying
probability of outstanding behavior occurring again. Reward is              extinction). Intermittent schedules are based either on time
a positive reinforce, but not necessarily. Only if the students’            (interval schedules) or frequency (ratio schedules) [15]. Ratio
behavior improves, reward can said to be a positive reinforcer.             reinforcement is the reinforcement of a desired behavior after a
Positive reinforcement stimulates occurrence of a behavior. It              number of occurrences, while interval reinforcement is the
must be noted that more spontaneous is the giving of reward,                reinforcement of a desired behavior after a period of
the greater reinforcement value it has.                                     time. Consequently, four types of intermittent reinforcement
                                                                            schedules exist, namely fixed interval schedules, variable
    Negative Reinforcement implies rewarding a student by                   interval schedules, fixed ratio schedules and variable ratio
removing negative / undesirable consequences. For example, in               schedules and are presented below:
the workplace, a person may find it undesirable to be
monitored closely. If a person is doing his/her job to the                          •   Fixed Interval Schedules: A reinforcement of
highest standard, they may not be monitored as closely                                  appropriate behavior that is delivered after a
anymore. This removal of the monitoring is the reward for                               specified interval of time has elapse. An
consistently doing their job well. Another example of negative                          appropriate example is the one of a student’s
reinforcement could be a student that forgets to turn in                                grade that is raised every term to reach the top and
2
   eLearning INDUSTRY: https://elearningindustry.com/6-tips-use-positive-
reinforcement-in-elearning
    not to be the top from the first term. As the                             give reward after
    reinforcement is delivered, only after a specified                                             give reward after a certain
                                                                                  first proper
                                                                                                     amt of time w/ the amt
    amount of time has passed do we find that this                           response following
                                                                                                    changing before the next
    reinforcement type of schedule tends to produce a            Interval      a specified time
                                                                                                             reward
                                                                                     period
    scalloping effect between intervals as displayed in                         (yearly raise)
                                                                                                  (unexpected bonus based on
    the figure example below [15]. Only directly                                                     merit) [medium term]
                                                                                 [short term]
    before the interval time has elapsed is the desired
    behavior displayed so as to look good when the                               punishment       give reward after a number of
                                                                                (subtract from        responses, w/ that no.
    performance review comes around. After the                                     baseline)        changing before the next
    review, a dramatic drop-off of behavior                        Ratio
                                                                              (commissions or                reward
    immediately after reinforcement occurs [15]. The                           piecework pay)          (team-based bonus)
    fixed interval schedule is a form of continuous                            [medium term]               [long term]
    schedule and works well for punishment or
    learning a new behavior.                              IV. USE OF REINFORCEMENT THEORY AND OVERVIEW OF THE
                                                                                 SYSTEM
•   Variable     Interval    Schedules: This   is    a
    reinforcement of appropriate behavior that is             As mentioned above, the system has the capability to
    delivered after an average interval of time has       provide advice and messages to students deriving from the
    elapsed. Once the behavior has been reinforced, a     reinforcement theory and combines them with a badge system.
    new interval of time, either shorter or longer, is    As a testbed for our research, our e-learning system provided
    specified with the sum total of interval times        the tutoring of the programming language C# (C Sharp), which
    equaling the average [15]. This is best expressed     is a core module in the Computer Science Departments of all
    in the example of a corporate random drug testing     the Universities worldwide. This section provides an overview
    policy. The power of variable reinforcement lies      of the system showing screenshots with positive and negative
    in the fact that individuals do not know exactly      reinforcement messages to students based on their performance
    when it is coming. The policy may dictate that a      and the awarded badge. The performance that the student can
    random drug screening will be conducted every 3       have in each quiz is a crucial characteristic that plays an
    months or so, however because it is random the        important role in reinforcement. More specifically, students
    screening may happen sooner at 2 months or later      should be further strengthened and motivated if they do not
    at 4 months, with the average interval time           have good performance. Hence, the system which is presented
    equaling around 3 months. Because of the variable     in this paper can show motivational messages and advice
    nature of this schedule the scalloping effect         derived from the reinforcement theory based on students’
    between intervals is reduced [15].                    performance. Students’ performance is one significant
                                                          characteristic that is apt to influence students’ in the
•   Fixed Ratio Schedules: A reinforcement of a           achievement of their goals. If the performance of a student is
    desired behavior occurs only after a specified        good, then the student keeps up the good work; in case of bad
    number of actions have been performed. Because        performance, the student may be disappointed and s/he may
    the fixed ratio schedule is methodical, it produces   not try more. Many students will work harder in order to earn a
    a high, steady rate of response. The fixed ratio      higher grade. Also, even students, who are not motivated by
    schedule is also a form of continuous schedule and    grades, will be motivated by badges. To this direction, we use
    works well for punishment or learning a new           badges and the reinforcement theory to assist students in the
    behavior.                                             educational process. Fig. 1, 2 and 3 illustrate advice of positive
•   Variable Ratio Schedules: A reinforcement of a        reinforcement when students have achieved a passing grade,
    desired behavior occurs after a variable number of    while Fig. 4, 5 and 6 show messages of negative reinforcement
    actions have been performed. The number of            when students have failed to achieve a passing grade.
    behaviors required to obtain the reward
    changes. The variable rate schedules tend to be
    more effective than fixed ratio schedules, because
    they generate a higher rate of response and resist
    extinction. Fluctuating combinations of primary
    and secondary reinforcers fall under other terms in
    the variable ratio schedule; For example,
    Reinforcers delivered Intermittently in a
    Randomized Order (RIR) or Variable Ratio with
    Reinforcement Variety (VRRV).
                                                          Fig. 1. Positive reinforcement.
               Fixed                  Variable
                                                V. CONCLUSIONS AND FUTURE WORK
                                      Reinforcement theory explains in detail how an individual
                                  learns behavior. Instructors who are making attempt to
                                  motivate the students must ensure that they do not reward all
                                  students simultaneously. They must inform the students what
                                  they are not doing correctly and also tell them how they can
                                  achieve positive and negative reinforcement.
                                      This paper presents a novel web-based tutoring system that
                                  incorporates the reinforcement theory towards motivating
Fig. 2. Positive reinforcement.   students with a badge system to succeed in their learning effort.
                                  Hence, students are shown messages based on their
                                  performance and motivated through the badge system. Hence
                                  they can have stimuli to study more because of positive /
                                  negative reinforcement and badges.
                                  Future plans include the evaluation of the system as a whole
                                  and the reinforcement theory-based messages to students in
                                  order to examine the degree of motivation that these messages
                                  support.
                                                             ACKNOWLEDGMENT
Fig. 3. Positive reinforcement.
                                      The authors of this paper would like to thank the University
                                  of Piraeus Research Center for the financial support of this
                                  research paper.
                                                                 REFERENCES
                                  [1]   C. Troussas and M. Virvou, “Information Theoretic Clustering for an
                                        Intelligent Multilingual Tutoring System”, International Journal of
                                        Emerging Technologies in Learning (iJET), vol. 8, issue 6, pp. 55-61,
                                        2013.
                                  [2]   C. Troussas, M. Virvou, A. Vougiouklidou, and K. J. Espinosa,
                                        “Automatic misconception diagnosis in multiple language learning over
Fig. 4. Negative reinforcement.         social networks”, IISA 2013 - 4th International Conference on
                                        Information, Intelligence, Systems and Applications Information,
                                        Intelligence,     Systems      and     Applications      (IISA),   DOI:
                                        10.1109/IISA.2013.6623715, pp. 1-6, 2013.
                                  [3]   M. Virvou, E. Alepis, and C. Troussas, “Error diagnosis in computer-
                                        supported collaborative multiple language learning using user
                                        classification” in Frontiers in Artificial Intelligence and Applications,
                                        Knowledge-Based Software Engineering - Proceedings of the Tenth
                                        Conference on Knowledge-Based Software Engineering, vol. 240,
                                        Greece, pp. 266-275, 2012.
                                  [4]   C. Troussas, K. J. Espinosa, M. Virvou, “Intelligent advice generator for
                                        personalized language learning through social networking sites”, 2015
                                        6th International Conference on Information, Intelligence, Systems and
                                        Applications (IISA), DOI: 10.1109/IISA.2015.7388048, pp.1-5, 2015.
Fig. 5. Negative reinforcement.
                                  [5]   C. Troussas, M. Virvou, E. Alepis, “Multiple parameter cluster analysis
                                        in a multiple language learning system”, IISA 2013 - 4th International
                                        Conference on Information, Intelligence, Systems and Applications
                                        Information, Intelligence, Systems and Applications (IISA), DOI:
                                        10.1109/IISA.2013.6623714, pp. 1-6, 2013.
                                  [6]   W. Schultz, “Neuronal reward and decision signals: from theories to
                                        data”, Physiological Reviews, vol. 95, no. 3, pp. 853–951, 2015.
                                  [7]   P. Winkielman K. C. Berridge, and J. L. Wilbarger, “Unconscious
                                        affective reactions to masked happy verses angry faces influence
                                        consumption behavior and judgement value”, Pers Soc Psychol Bull,
                                        vol. 31, pp. 121–35, 2005.
                                  [8]   R. L. Martens, J. Gulikers, and T. Bastiaens, “The impact of intrinsic
                                        motivation on e-learningin authentic computer tasks”, Journal of
Fig. 6. Negative reinforcement.         Computer Assisted learning, vol. 20, pp. 368–376, 2004.
                                  [9]   J. G. van Merriënboer, and P. Ayres, “Research on Cognitive Load
                                        Theory and Its Design Implications for E-Learning”, Educational
                                        Technology Research and Development, vol.53, no. 3, pp. 5–13, 2005.
[10] M. Nausheen, “An Adaptation of the Motivated Strategies for Learning          Higher Education Instructors’ Participation”, International Journal of
     Questionnaire (MSLQ) for Postgraduate Students in Pakistan: Results of        Academic Research in Business and Social Sciences, vol. 3, no.1, pp.
     an Exploratory Factor Analysis”, Bulletin of Education and Research,          422-435, 2013.
     vol. 38, no. 1, pp. 1-16, 2016.                                          [13] B. F. Skinner, “Operant Behaviour”, American Psychologist, vol. 18, no.
[11] M. Khasawneh, and A. Bani Yaseen, “CRITICAL SUCCESS                           8, pp. 503-515, 1963.
     FACTORS FOR E-LEARNING SATISFACTION, JORDANIAN                           [14] R. A. Griggs, “Psychology: A concise introduction” (2nd ed.), New
     UNIVERSITIES' EXPERIENCE”, Journal of Business & Management,                  York: Worth Publishers, 2009.
     vol.5, no.1, pp. 56-69, 2017.
                                                                              [15] W. Huitt, and J. Hummel, J., “An introduction to operant (instrumental)
[12] A. Tarek Taha, “Towards Successful E-Learning Implementation in               conditioning”, Educational Psychology Interactive, Valdosta, GA:
     Developing Countries: A Proposed Model for Predicting and Enhancing           Valdosta State University, 1997.