WorldWideScience

Sample records for reinforcement learning approach

  1. Multi-agent machine learning a reinforcement approach

    CERN Document Server

    Schwartz, H M

    2014-01-01

    The book begins with a chapter on traditional methods of supervised learning, covering recursive least squares learning, mean square error methods, and stochastic approximation. Chapter 2 covers single agent reinforcement learning. Topics include learning value functions, Markov games, and TD learning with eligibility traces. Chapter 3 discusses two player games including two player matrix games with both pure and mixed strategies. Numerous algorithms and examples are presented. Chapter 4 covers learning in multi-player games, stochastic games, and Markov games, focusing on learning multi-pla

  2. Treating epilepsy via adaptive neurostimulation: a reinforcement learning approach.

    Science.gov (United States)

    Pineau, Joelle; Guez, Arthur; Vincent, Robert; Panuccio, Gabriella; Avoli, Massimo

    2009-08-01

    This paper presents a new methodology for automatically learning an optimal neurostimulation strategy for the treatment of epilepsy. The technical challenge is to automatically modulate neurostimulation parameters, as a function of the observed EEG signal, so as to minimize the frequency and duration of seizures. The methodology leverages recent techniques from the machine learning literature, in particular the reinforcement learning paradigm, to formalize this optimization problem. We present an algorithm which is able to automatically learn an adaptive neurostimulation strategy directly from labeled training data acquired from animal brain tissues. Our results suggest that this methodology can be used to automatically find a stimulation strategy which effectively reduces the incidence of seizures, while also minimizing the amount of stimulation applied. This work highlights the crucial role that modern machine learning techniques can play in the optimization of treatment strategies for patients with chronic disorders such as epilepsy.

  3. A reinforcement learning approach to model interactions between landmarks and geometric cues during spatial learning.

    Science.gov (United States)

    Sheynikhovich, Denis; Arleo, Angelo

    2010-12-13

    In contrast to predictions derived from the associative learning theory, a number of behavioral studies suggested the absence of competition between geometric cues and landmarks in some experimental paradigms. In parallel to these studies, neurobiological experiments suggested the existence of separate independent memory systems which may not always interact according to classic associative principles. In this paper we attempt to combine these two lines of research by proposing a model of spatial learning that is based on the theory of multiple memory systems. In our model, a place-based locale strategy uses activities of modeled hippocampal place cells to drive navigation to a hidden goal, while a stimulus-response taxon strategy, presumably mediated by the dorso-lateral striatum, learns landmark-approaching behavior. A strategy selection network, proposed to reside in the prefrontal cortex, implements a simple reinforcement learning rule to switch behavioral strategies. The model is used to reproduce the results of a behavioral experiment in which an interaction between a landmark and geometric cues was studied. We show that this model, built on the basis of neurobiological data, can explain the lack of competition between the landmark and geometry, potentiation of geometry learning by the landmark, and blocking. Namely, we propose that the geometry potentiation is a consequence of cooperation between memory systems during learning, while blocking is due to competition between the memory systems during action selection. Copyright © 2010 Elsevier B.V. All rights reserved.

  4. A Deep-Reinforcement Learning Approach for Software-Defined Networking Routing Optimization

    OpenAIRE

    Stampa, Giorgio; Arias, Marta; Sanchez-Charles, David; Muntes-Mulero, Victor; Cabellos, Albert

    2017-01-01

    In this paper we design and evaluate a Deep-Reinforcement Learning agent that optimizes routing. Our agent adapts automatically to current traffic conditions and proposes tailored configurations that attempt to minimize the network delay. Experiments show very promising performance. Moreover, this approach provides important operational advantages with respect to traditional optimization algorithms.

  5. Algorithms for Reinforcement Learning

    CERN Document Server

    Szepesvari, Csaba

    2010-01-01

    Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms'

  6. Quantum reinforcement learning.

    Science.gov (United States)

    Dong, Daoyi; Chen, Chunlin; Li, Hanxiong; Tarn, Tzyh-Jong

    2008-10-01

    The key approaches for machine learning, particularly learning in unknown probabilistic environments, are new representations and computation mechanisms. In this paper, a novel quantum reinforcement learning (QRL) method is proposed by combining quantum theory and reinforcement learning (RL). Inspired by the state superposition principle and quantum parallelism, a framework of a value-updating algorithm is introduced. The state (action) in traditional RL is identified as the eigen state (eigen action) in QRL. The state (action) set can be represented with a quantum superposition state, and the eigen state (eigen action) can be obtained by randomly observing the simulated quantum state according to the collapse postulate of quantum measurement. The probability of the eigen action is determined by the probability amplitude, which is updated in parallel according to rewards. Some related characteristics of QRL such as convergence, optimality, and balancing between exploration and exploitation are also analyzed, which shows that this approach makes a good tradeoff between exploration and exploitation using the probability amplitude and can speedup learning through the quantum parallelism. To evaluate the performance and practicability of QRL, several simulated experiments are given, and the results demonstrate the effectiveness and superiority of the QRL algorithm for some complex problems. This paper is also an effective exploration on the application of quantum computation to artificial intelligence.

  7. Linking Individual Learning Styles to Approach-Avoidance Motivational Traits and Computational Aspects of Reinforcement Learning.

    Directory of Open Access Journals (Sweden)

    Kristoffer Carl Aberg

    Full Text Available Learning how to gain rewards (approach learning and avoid punishments (avoidance learning is fundamental for everyday life. While individual differences in approach and avoidance learning styles have been related to genetics and aging, the contribution of personality factors, such as traits, remains undetermined. Moreover, little is known about the computational mechanisms mediating differences in learning styles. Here, we used a probabilistic selection task with positive and negative feedbacks, in combination with computational modelling, to show that individuals displaying better approach (vs. avoidance learning scored higher on measures of approach (vs. avoidance trait motivation, but, paradoxically, also displayed reduced learning speed following positive (vs. negative outcomes. These data suggest that learning different types of information depend on associated reward values and internal motivational drives, possibly determined by personality traits.

  8. Linking Individual Learning Styles to Approach-Avoidance Motivational Traits and Computational Aspects of Reinforcement Learning.

    Science.gov (United States)

    Aberg, Kristoffer Carl; Doell, Kimberly C; Schwartz, Sophie

    2016-01-01

    Learning how to gain rewards (approach learning) and avoid punishments (avoidance learning) is fundamental for everyday life. While individual differences in approach and avoidance learning styles have been related to genetics and aging, the contribution of personality factors, such as traits, remains undetermined. Moreover, little is known about the computational mechanisms mediating differences in learning styles. Here, we used a probabilistic selection task with positive and negative feedbacks, in combination with computational modelling, to show that individuals displaying better approach (vs. avoidance) learning scored higher on measures of approach (vs. avoidance) trait motivation, but, paradoxically, also displayed reduced learning speed following positive (vs. negative) outcomes. These data suggest that learning different types of information depend on associated reward values and internal motivational drives, possibly determined by personality traits.

  9. The Reinforcement Learning Competitions

    NARCIS (Netherlands)

    Whiteson, S.; Tanner, B.; White, A.

    2010-01-01

    This article reports on the reinforcement learning competitions, which have been held annually since 2006. In these events, researchers from around the world developed reinforcement learning agents to compete in domains of various complexity and difficulty. We focus on the 2008 competition, which

  10. Reinforcement Learning: A Survey

    OpenAIRE

    Kaelbling, L. P.; Littman, M. L.; Moore, A. W.

    1996-01-01

    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in...

  11. Optimal medication dosing from suboptimal clinical examples: a deep reinforcement learning approach.

    Science.gov (United States)

    Nemati, Shamim; Ghassemi, Mohammad M; Clifford, Gari D

    2016-08-01

    Misdosing medications with sensitive therapeutic windows, such as heparin, can place patients at unnecessary risk, increase length of hospital stay, and lead to wasted hospital resources. In this work, we present a clinician-in-the-loop sequential decision making framework, which provides an individualized dosing policy adapted to each patient's evolving clinical phenotype. We employed retrospective data from the publicly available MIMIC II intensive care unit database, and developed a deep reinforcement learning algorithm that learns an optimal heparin dosing policy from sample dosing trails and their associated outcomes in large electronic medical records. Using separate training and testing datasets, our model was observed to be effective in proposing heparin doses that resulted in better expected outcomes than the clinical guidelines. Our results demonstrate that a sequential modeling approach, learned from retrospective data, could potentially be used at the bedside to derive individualized patient dosing policies.

  12. Approach for moving small target detection in infrared image sequence based on reinforcement learning

    Science.gov (United States)

    Wang, Chuanyun; Qin, Shiyin

    2016-09-01

    Addressing the problems of moving small target detection in infrared image sequence caused by background clutter and target size variation with time, an approach for moving small target detection is proposed under a pipeline framework with an optimization strategy based on reinforcement learning. The pipeline framework is composed by pipeline establishment, target-background images separation, and target confirmation, in which the pipeline is established by designating several successive images with temporal sliding window, target-background images separation is dealt with low-rank and sparse matrix decomposition via robust principal component analysis, and target confirmation is achieved by employing a voting mechanism over more than one separated target images of the same input image. For unremitting optimization of target-background images separation, the weighting parameter of low-rank and sparse matrix decomposition is dynamically regulated by the way of reinforcement learning in consecutive detection, in which the complexity evaluation from sequential infrared images and results assessment of moving small target detection are integrated. The experiment results over four infrared small target image sequences with different cloudy sky backgrounds demonstrate the effectiveness and advantages of the proposed approach in both background clutter suppression and small target detection.

  13. Reinforcement and learning

    NARCIS (Netherlands)

    Servedio, M.R.; Sæther, S.A.; Sætre, G.-P.

    2009-01-01

    Evidence has been accumulating to support the process of reinforcement as a potential mechanism in speciation. In many species, mate choice decisions are influenced by cultural factors, including learned mating preferences (sexual imprinting) or learned mate attraction signals (e.g., bird song). It

  14. A Reinforcement Learning Approach to Call Admission Control in HAPS Communication System

    Directory of Open Access Journals (Sweden)

    Ni Shu Yan

    2017-01-01

    Full Text Available The large changing of link capacity and number of users caused by the movement of both platform and users in communication system based on high altitude platform station (HAPS will resulting in high dropping rate of handover and reduce resource utilization. In order to solve these problems, this paper proposes an adaptive call admission control strategy based on reinforcement learning approach. The goal of this strategy is to maximize long-term gains of system, with the introduction of cross-layer interaction and the service downgraded. In order to access different traffics adaptively, the access utility of handover traffics and new call traffics is designed in different state of communication system. Numerical simulation result shows that the proposed call admission control strategy can enhance bandwidth resource utilization and the performances of handover traffics.

  15. An Evaluation of Pedagogical Tutorial Tactics for a Natural Language Tutoring System: A Reinforcement Learning Approach

    Science.gov (United States)

    Chi, Min; VanLehn, Kurt; Litman, Diane; Jordan, Pamela

    2011-01-01

    Pedagogical strategies are policies for a tutor to decide the next action when there are multiple actions available. When the content is controlled to be the same across experimental conditions, there has been little evidence that tutorial decisions have an impact on students' learning. In this paper, we applied Reinforcement Learning (RL) to…

  16. Alternate approach slab reinforcement.

    Science.gov (United States)

    2010-06-01

    The upper mat of reinforcing steel, in exposed concrete bridge approach slabs, is prone to corrosion damage. Chlorides applied to the highways : for winter maintenance can penetrate this concrete layer. Eventually chlorides reach the steel and begin ...

  17. Reinforcement learning in supply chains.

    Science.gov (United States)

    Valluri, Annapurna; North, Michael J; Macal, Charles M

    2009-10-01

    Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.

  18. Reinforced Intrusion Detection Using Pursuit Reinforcement Competitive Learning

    Directory of Open Access Journals (Sweden)

    Indah Yulia Prafitaning Tiyas

    2014-06-01

    Full Text Available Today, information technology is growing rapidly,all information can be obtainedmuch easier. It raises some new problems; one of them is unauthorized access to the system. We need a reliable network security system that is resistant to a variety of attacks against the system. Therefore, Intrusion Detection System (IDS required to overcome the problems of intrusions. Many researches have been done on intrusion detection using classification methods. Classification methodshave high precision, but it takes efforts to determine an appropriate classification model to the classification problem. In this paper, we propose a new reinforced approach to detect intrusion with On-line Clustering using Reinforcement Learning. Reinforcement Learning is a new paradigm in machine learning which involves interaction with the environment.It works with reward and punishment mechanism to achieve solution. We apply the Reinforcement Learning to the intrusion detection problem with considering competitive learning using Pursuit Reinforcement Competitive Learning (PRCL. Based on the experimental result, PRCL can detect intrusions in real time with high accuracy (99.816% for DoS, 95.015% for Probe, 94.731% for R2L and 99.373% for U2R and high speed (44 ms.The proposed approach can help network administrators to detect intrusion, so the computer network security systembecome reliable. Keywords: Intrusion Detection System, On-Line Clustering, Reinforcement Learning, Unsupervised Learning.

  19. [Reinforcement learning by striatum].

    Science.gov (United States)

    Kunisato, Yoshihiko; Okada, Go; Okamoto, Yasumasa

    2009-04-01

    Recently, computational models of reinforcement learning have been applied for the analysis of neuroimaging data. It has been clarified that the striatum plays a key role in decision making. We review the reinforcement learning theory and the biological structures such as the brain and signals such as neuromodulators associated with reinforcement learning. We also investigated the function of the striatum and the neurotransmitter serotonin in reward prediction. We first studied the brain mechanisms for reward prediction at different time scales. Our experiment on the striatum showed that the ventroanterior regions are involved in predicting immediate rewards and the dorsoposterior regions are involved in predicting future rewards. Further, we investigated whether serotonin regulates both the reward selection and the striatum function are specialized reward prediction at different time scales. To this end, we regulated the dietary intake of tryptophan, a precursor of serotonin. Our experiment showed that the activity of the ventral part of the striatum was correlated with reward prediction at shorter time scales, and this activity was stronger at low serotonin levels. By contrast, the activity of the dorsal part of the striatum was correlated with reward prediction at longer time scales, and this activity was stronger at high serotonin levels. Further, a higher proportion of small reward choices, together with a higher rate of discounting of delayed rewards is observed in the low-serotonin condition than in the control and high-serotonin conditions. Further examinations are required in future to assess the relation between the disturbance of reward prediction caused by low serotonin and mental disorders related to serotonin such as depression.

  20. Social Influence as Reinforcement Learning

    Science.gov (United States)

    2016-01-13

    SECURITY CLASSIFICATION OF: This project examined a reinforcement learning model of conformity and social influence. Under this model, individuals...Oct-2015 Approved for Public Release; Distribution Unlimited Final Report: Social Influence as Reinforcement Learning The views, opinions and/or...Research Triangle Park, NC 27709-2211 W911NF-14-1-0001 - Final Report - Social Influence as Reinforcement Learning REPORT DOCUMENTATION PAGE 11. SPONSOR

  1. The Reinforcement Learning Competition 2014

    NARCIS (Netherlands)

    Dimitrakakis, C.; Li, G.; Tziortziotis, N.

    2014-01-01

    Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the

  2. Deep Reinforcement Learning: An Overview

    OpenAIRE

    Li, Yuxi

    2017-01-01

    We give an overview of recent exciting achievements of deep reinforcement learning (RL). We discuss six core elements, six important mechanisms, and twelve applications. We start with background of machine learning, deep learning and reinforcement learning. Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. After that, we discuss important mechanisms for RL, including attention and memory, unsuperv...

  3. A Reinforcement Learning Approach to Access Management in Wireless Cellular Networks

    Directory of Open Access Journals (Sweden)

    Jihun Moon

    2017-01-01

    Full Text Available In smart city applications, huge numbers of devices need to be connected in an autonomous manner. 3rd Generation Partnership Project (3GPP specifies that Machine Type Communication (MTC should be used to handle data transmission among a large number of devices. However, the data transmission rates are highly variable, and this brings about a congestion problem. To tackle this problem, the use of Access Class Barring (ACB is recommended to restrict the number of access attempts allowed in data transmission by utilizing strategic parameters. In this paper, we model the problem of determining the strategic parameters with a reinforcement learning algorithm. In our model, the system evolves to minimize both the collision rate and the access delay. The experimental results show that our scheme improves system performance in terms of the access success rate, the failure rate, the collision rate, and the access delay.

  4. Evolutionary computation for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Wiering, M.; van Otterlo, M.

    2012-01-01

    Algorithms for evolutionary computation, which simulate the process of natural selection to solve optimization problems, are an effective tool for discovering high-performing reinforcement-learning policies. Because they can automatically find good representations, handle continuous action spaces,

  5. Learning to trade via direct reinforcement.

    Science.gov (United States)

    Moody, J; Saffell, M

    2001-01-01

    We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision-making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills.

  6. Reinforcement learning and Tourette syndrome.

    Science.gov (United States)

    Palminteri, Stefano; Pessiglione, Mathias

    2013-01-01

    In this chapter, we report the first experimental explorations of reinforcement learning in Tourette syndrome, realized by our team in the last few years. This report will be preceded by an introduction aimed to provide the reader with the state of the art of the knowledge concerning the neural bases of reinforcement learning at the moment of these studies and the scientific rationale beyond them. In short, reinforcement learning is learning by trial and error to maximize rewards and minimize punishments. This decision-making and learning process implicates the dopaminergic system projecting to the frontal cortex-basal ganglia circuits. A large body of evidence suggests that the dysfunction of the same neural systems is implicated in the pathophysiology of Tourette syndrome. Our results show that Tourette condition, as well as the most common pharmacological treatments (dopamine antagonists), affects reinforcement learning performance in these patients. Specifically, the results suggest a deficit in negative reinforcement learning, possibly underpinned by a functional hyperdopaminergia, which could explain the persistence of tics, despite their evident inadaptive (negative) value. This idea, together with the implications of these results in Tourette therapy and the future perspectives, is discussed in Section 4 of this chapter. © 2013 Elsevier Inc. All rights reserved.

  7. Explorations in Efficient Reinforcement Learning

    NARCIS (Netherlands)

    Wiering, M.A.

    1999-01-01

    This thesis describes reinforcement learning (RL) methods which can solve sequential decision making problems by learning from trial and error. Sequential decision making problems are problems in which an artificial agent interacts with a specific environment through its sensors (to get inputs)

  8. Ensemble algorithms in reinforcement learning

    NARCIS (Netherlands)

    Wiering, Marco A; van Hasselt, Hado

    This paper describes several ensemble methods that combine multiple different reinforcement learning (RL) algorithms in a single agent. The aim is to enhance learning speed and final performance by combining the chosen actions or action probabilities of different RL algorithms. We designed and

  9. Adaptable bandwidth planning using reinforcement learning

    Directory of Open Access Journals (Sweden)

    Dirk Hetzer

    2006-08-01

    Full Text Available In order to improve the bandwidth allocation considering feedback of operational environment, adaptable bandwidth planning based on reinforcement learning is proposed. The approach is based on new constrained scheduling algorithms controlled by reinforcement learning techniques. Different constrained scheduling algorithms,, such as "conflict free scheduling with minimum duration", "partial displacement" and "pattern oriented scheduling" are defined and implemented. The scheduling algorithms are integrated into reinforcement learning strategies. These strategies include: - Q-learning for selection of optimal planning schedule using Q-values; - Informed Q-learning for exploitation and handling of prior-knowledge (patterns of network behaviour; - Relational Q-learning for improving of bandwidth allocation policies dynamically in operational networks considering actual network performance data. Scenarios based on integration of the scheduling algorithms and reinforcement learning techniques in the experimental monitoring and bandwidth planning system called QORE (QoS and resource optimisation are given. The proposed adaptable bandwidth planning is required for more efficient usage of network resources.

  10. Challenges for Relational Reinforcement Learning

    NARCIS (Netherlands)

    van Otterlo, M.; Kersting, Kristian; Tadepalli, P.; Givan, R; Driessens, K.

    2004-01-01

    We present a perspective and challenges for Relational Reinforcement Learning (RRL). We first survey existing work and distinguish a number of main directions. We then highlight some research problems that are intrinsically involved in abstracting over relational Markov Decision Processes. These are

  11. Reinforcement Learning for Relational MDPs

    NARCIS (Netherlands)

    van Otterlo, M.; Nowe, A.; Lenaerts, T.; Steenhaut, K.

    2004-01-01

    In this paper we present a new method for reinforcement learning in relational domains. A logical language is employed to abstract over states and actions, thereby decreasing the size of the state-action space significantly. A probabilistic transition model of the abstracted Markov-Decision-Process

  12. Rational and Mechanistic Perspectives on Reinforcement Learning

    Science.gov (United States)

    Chater, Nick

    2009-01-01

    This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…

  13. Negative reinforcement learning is affected in substance dependence.

    Science.gov (United States)

    Thompson, Laetitia L; Claus, Eric D; Mikulich-Gilbertson, Susan K; Banich, Marie T; Crowley, Thomas; Krmpotich, Theodore; Miller, David; Tanabe, Jody

    2012-06-01

    Negative reinforcement results in behavior to escape or avoid an aversive outcome. Withdrawal symptoms are purported to be negative reinforcers in perpetuating substance dependence, but little is known about negative reinforcement learning in this population. The purpose of this study was to examine reinforcement learning in substance dependent individuals (SDI), with an emphasis on assessing negative reinforcement learning. We modified the Iowa Gambling Task to separately assess positive and negative reinforcement. We hypothesized that SDI would show differences in negative reinforcement learning compared to controls and we investigated whether learning differed as a function of the relative magnitude or frequency of the reinforcer. Thirty subjects dependent on psychostimulants were compared with 28 community controls on a decision making task that manipulated outcome frequencies and magnitudes and required an action to avoid a negative outcome. SDI did not learn to avoid negative outcomes to the same degree as controls. This difference was driven by the magnitude, not the frequency, of negative feedback. In contrast, approach behaviors in response to positive reinforcement were similar in both groups. Our findings are consistent with a specific deficit in negative reinforcement learning in SDI. SDI were relatively insensitive to the magnitude, not frequency, of loss. If this generalizes to drug-related stimuli, it suggests that repeated episodes of withdrawal may drive relapse more than the severity of a single episode. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  14. Event Triggered Reinforcement Learning Approach for Unknown Nonlinear Continuous Time System

    Science.gov (United States)

    2014-07-06

    32], [33]. A novel tracking scheme was studied and demonstrated with stability analysis with this GrADP approach in [34], [35]. The maze navigation...scheme. In Section V, 3 a single link robot arm is studied with two settings to demonstrate the control performance of the proposed method. The...δj) (36) V. SIMULATION RESULTS Consider a single link robot arm with the following dynamic function θ̈(t) = −MgH G sin(θ(t)) − D G θ̇(t) + 1 G u(t

  15. Modular inverse reinforcement learning for visuomotor behavior.

    Science.gov (United States)

    Rothkopf, Constantin A; Ballard, Dana H

    2013-08-01

    In a large variety of situations one would like to have an expressive and accurate model of observed animal or human behavior. While general purpose mathematical models may capture successfully properties of observed behavior, it is desirable to root models in biological facts. Because of ample empirical evidence for reward-based learning in visuomotor tasks, we use a computational model based on the assumption that the observed agent is balancing the costs and benefits of its behavior to meet its goals. This leads to using the framework of reinforcement learning, which additionally provides well-established algorithms for learning of visuomotor task solutions. To quantify the agent's goals as rewards implicit in the observed behavior, we propose to use inverse reinforcement learning, which quantifies the agent's goals as rewards implicit in the observed behavior. Based on the assumption of a modular cognitive architecture, we introduce a modular inverse reinforcement learning algorithm that estimates the relative reward contributions of the component tasks in navigation, consisting of following a path while avoiding obstacles and approaching targets. It is shown how to recover the component reward weights for individual tasks and that variability in observed trajectories can be explained succinctly through behavioral goals. It is demonstrated through simulations that good estimates can be obtained already with modest amounts of observation data, which in turn allows the prediction of behavior in novel configurations.

  16. Reinforcement learning in children with FASD.

    Science.gov (United States)

    Engle, Jennifer A; Kerns, Kimberly A

    2011-01-01

    It is often said that children with Fetal Alcohol Spectrum Disorder (FASD) have difficulty learning from reinforcement. However, there is little empirical evidence to support or deny this claim. To examine reinforcement learning in children with FASD, specifically: (1) the rate of learning from reinforcement; and (2) the impact of concreteness of the reinforcer. Participants included 18 children with FASD (IQ ≥ 70), ages 11-17, and 18 age- and sex-matched controls. Participants each completed a novel reinforcement learning discrimination task that involved visual probabilistic learning (70% contingent feedback). The task was completed twice, once with tokens, and once with points (counterbalanced). The control group demonstrated significantly stronger overall reinforcement learning, although rates of improvement and effect of concreteness of the reinforcer (tokens vs. points) were not different between groups. The FASD group's responses were more likely to be guided by the most recent information, rather than based on integration of reward status over multiple trials. Reinforcement learning does not appear to occur in a functionally different manner in children with FASD, but does take longer, and is more impacted by recent reward than an integration of overall reinforcement information. Children with FASD without an intellectual disability may be able to learn from reinforcement given sufficient consistent repetition. However, other failures associated with learning difficulties such as the complexity of the material, transfer of learning, or impulsivity were not addressed in this study.

  17. Reinforcement learning and its application to Othello

    NARCIS (Netherlands)

    M.C. van Wezel (Michiel); N.J.P. van Eck (Nees Jan)

    2005-01-01

    textabstractIn this article we describe reinforcement learning, a machine learning technique for solving sequential decision problems. We describe how reinforcement learning can be combined with function approximation to get approximate solutions for problems with very large state spaces. One such

  18. FeUdal Networks for Hierarchical Reinforcement Learning

    OpenAIRE

    Vezhnevets, Alexander Sasha; Osindero, Simon; Schaul, Tom; Heess, Nicolas; Jaderberg, Max; Silver, David; Kavukcuoglu, Koray

    2017-01-01

    We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical reinforcement learning. Our approach is inspired by the feudal reinforcement learning proposal of Dayan and Hinton, and gains power and efficacy by decoupling end-to-end learning across multiple levels -- allowing it to utilise different resolutions of time. Our framework employs a Manager module and a Worker module. The Manager operates at a lower temporal resolution and sets abstract goals which are conveyed to and e...

  19. Ensemble algorithms in reinforcement learning.

    Science.gov (United States)

    Wiering, Marco A; van Hasselt, Hado

    2008-08-01

    This paper describes several ensemble methods that combine multiple different reinforcement learning (RL) algorithms in a single agent. The aim is to enhance learning speed and final performance by combining the chosen actions or action probabilities of different RL algorithms. We designed and implemented four different ensemble methods combining the following five different RL algorithms: Q-learning, Sarsa, actor-critic (AC), QV-learning, and AC learning automaton. The intuitively designed ensemble methods, namely, majority voting (MV), rank voting, Boltzmann multiplication (BM), and Boltzmann addition, combine the policies derived from the value functions of the different RL algorithms, in contrast to previous work where ensemble methods have been used in RL for representing and learning a single value function. We show experiments on five maze problems of varying complexity; the first problem is simple, but the other four maze tasks are of a dynamic or partially observable nature. The results indicate that the BM and MV ensembles significantly outperform the single RL algorithms.

  20. Exploiting best-match equations for efficient reinforcement learning

    NARCIS (Netherlands)

    van Seijen, H.; Whiteson, S.; van Hasselt, H.; Wiering, M.

    2011-01-01

    This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of model-free methods. Best-match learning works by approximating the solution to a set of best-match equations,

  1. Exploiting Best-Match Equations for Efficient Reinforcement Learning

    NARCIS (Netherlands)

    van Seijen, Harm; Whiteson, Shimon; van Hasselt, Hado; Wiering, Marco

    This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of model-free methods. Best-match learning works by approximating the solution to a set of best-match equations,

  2. Exploiting Best-Match Equations for Efficient Reinforcement Learning

    NARCIS (Netherlands)

    Seijen, H.H. van; Whiteson, S.; Hasselt, H. van; Wiering, M.

    2011-01-01

    This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of model-free methods. Best-match learning works by approximating the solution to a set of best-match equations,

  3. Enhanced Experience Replay for Deep Reinforcement Learning

    Science.gov (United States)

    2015-11-01

    ARL-TR-7538 ● NOV 2015 US Army Research Laboratory Enhanced Experience Replay for Deep Reinforcement Learning by David Doria...Experience Replay for Deep Reinforcement Learning by David Doria, Bryan Dawson, and Manuel Vindiola Computational and Information Sciences Directorate...

  4. Rule Value Reinforcement Learning for Cognitive Agents

    OpenAIRE

    Child, C. H. T.; Stathis, K.

    2006-01-01

    RVRL (Rule Value Reinforcement Learning) is a new algorithm which extends an existing learning framework that models the environment of a situated agent using a probabilistic rule representation. The algorithm attaches values to learned rules by adapting reinforcement learning. Structure captured by the rules is used to form a policy. The resulting rule values represent the utility of taking an action if the rule`s conditions are present in the agent`s current percept. Advantages of the new f...

  5. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

    OpenAIRE

    He, Frank S.; Liu, Yang; Schwing, Alexander G.; Peng, Jian

    2016-01-01

    We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and...

  6. Reinforcement Learning State-of-the-Art

    CERN Document Server

    Wiering, Marco

    2012-01-01

    Reinforcement learning encompasses both a science of adaptive behavior of rational beings in uncertain environments and a computational methodology for finding optimal behaviors for challenging problems in control, optimization and adaptive behavior of intelligent agents. As a field, reinforcement learning has progressed tremendously in the past decade. The main goal of this book is to present an up-to-date series of survey articles on the main contemporary sub-fields of reinforcement learning. This includes surveys on partially observable environments, hierarchical task decompositions, relational knowledge representation and predictive state representations. Furthermore, topics such as transfer, evolutionary methods and continuous spaces in reinforcement learning are surveyed. In addition, several chapters review reinforcement learning methods in robotics, in games, and in computational neuroscience. In total seventeen different subfields are presented by mostly young experts in those areas, and together the...

  7. Multiagent reinforcement learning with unshared value functions.

    Science.gov (United States)

    Hu, Yujing; Gao, Yang; An, Bo

    2015-04-01

    One important approach of multiagent reinforcement learning (MARL) is equilibrium-based MARL, which is a combination of reinforcement learning and game theory. Most existing algorithms involve computationally expensive calculation of mixed strategy equilibria and require agents to replicate the other agents' value functions for equilibrium computing in each state. This is unrealistic since agents may not be willing to share such information due to privacy or safety concerns. This paper aims to develop novel and efficient MARL algorithms without the need for agents to share value functions. First, we adopt pure strategy equilibrium solution concepts instead of mixed strategy equilibria given that a mixed strategy equilibrium is often computationally expensive. In this paper, three types of pure strategy profiles are utilized as equilibrium solution concepts: pure strategy Nash equilibrium, equilibrium-dominating strategy profile, and nonstrict equilibrium-dominating strategy profile. The latter two solution concepts are strategy profiles from which agents can gain higher payoffs than one or more pure strategy Nash equilibria. Theoretical analysis shows that these strategy profiles are symmetric meta equilibria. Second, we propose a multistep negotiation process for finding pure strategy equilibria since value functions are not shared among agents. By putting these together, we propose a novel MARL algorithm called negotiation-based Q-learning (NegoQ). Experiments are first conducted in grid-world games, which are widely used to evaluate MARL algorithms. In these games, NegoQ learns equilibrium policies and runs significantly faster than existing MARL algorithms (correlated Q-learning and Nash Q-learning). Surprisingly, we find that NegoQ also performs well in team Markov games such as pursuit games, as compared with team-task-oriented MARL algorithms (such as friend Q-learning and distributed Q-learning).

  8. Reinforcement Learning via Boltzmann Machines Implemented on Quantum Annealers

    Science.gov (United States)

    Levit, A.; Crawford, D.; Ronagh, P.; Oberoi, J.; Ghadermarzy, N.

    2016-12-01

    Recent studies have suggested promising application of quantum annealing devices as Boltzmann distribution samplers. We exploit this idea to develop a reinforcement learning framework using Boltzmann machines implemented on quantum annealing systems. We apply our method to find optimal policies in Markov decision processes. The evolution of the reinforcement learning algorithms relies on training a Neural Network. We use Quantum Monte-Carlo simulation of an associated network of qubits to provide experimental evidence of suitability of our approach.

  9. Application of fuzzy logic-neural network based reinforcement learning to proximity and docking operations: Special approach/docking testcase results

    Science.gov (United States)

    Jani, Yashvant

    1993-01-01

    As part of the RICIS project, the reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Maximum Mission (SMM) satellite simulation. In utilizing these fuzzy learning techniques, we use the Approximate Reasoning based Intelligent Control (ARIC) architecture, and so we use these two terms interchangeably to imply the same. This activity is carried out in the Software Technology Laboratory utilizing the Orbital Operations Simulator (OOS) and programming/testing support from other contractor personnel. This report is the final deliverable D4 in our milestones and project activity. It provides the test results for the special testcase of approach/docking scenario for the shuttle and SMM satellite. Based on our experience and analysis with the attitude and translational controllers, we have modified the basic configuration of the reinforcement learning algorithm in ARIC. The shuttle translational controller and its implementation in ARIC is described in our deliverable D3. In order to simulate the final approach and docking operations, we have set-up this special testcase as described in section 2. The ARIC performance results for these operations are discussed in section 3 and conclusions are provided in section 4 along with the summary for the project.

  10. Model-based reinforcement learning with dimension reduction.

    Science.gov (United States)

    Tangkaratt, Voot; Morimoto, Jun; Sugiyama, Masashi

    2016-12-01

    The goal of reinforcement learning is to learn an optimal policy which controls an agent to acquire the maximum cumulative reward. The model-based reinforcement learning approach learns a transition model of the environment from data, and then derives the optimal policy using the transition model. However, learning an accurate transition model in high-dimensional environments requires a large amount of data which is difficult to obtain. To overcome this difficulty, in this paper, we propose to combine model-based reinforcement learning with the recently developed least-squares conditional entropy (LSCE) method, which simultaneously performs transition model estimation and dimension reduction. We also further extend the proposed method to imitation learning scenarios. The experimental results show that policy search combined with LSCE performs well for high-dimensional control tasks including real humanoid robot control. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Risk-sensitive reinforcement learning.

    Science.gov (United States)

    Shen, Yun; Tobia, Michael J; Sommer, Tobias; Obermayer, Klaus

    2014-07-01

    We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents' behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979 ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used.

  12. A Survey of Reinforcement Learning in Relational Domains

    NARCIS (Netherlands)

    van Otterlo, M.

    2005-01-01

    Reinforcement learning has developed into a primary approach for learning control strategies for autonomous agents. However, most of the work has focused on the algorithmic aspect, i.e. various ways of computing value functions and policies. Usually the representational aspects were limited to the

  13. Benchmarking for Bayesian Reinforcement Learning.

    Directory of Open Access Journals (Sweden)

    Michael Castronovo

    Full Text Available In the Bayesian Reinforcement Learning (BRL setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been proposed, but the benchmarks used to compare them are only relevant for specific cases. The paper addresses this problem, and provides a new BRL comparison methodology along with the corresponding open source library. In this methodology, a comparison criterion that measures the performance of algorithms on large sets of Markov Decision Processes (MDPs drawn from some probability distributions is defined. In order to enable the comparison of non-anytime algorithms, our methodology also includes a detailed analysis of the computation time requirement of each algorithm. Our library is released with all source code and documentation: it includes three test problems, each of which has two different prior distributions, and seven state-of-the-art RL algorithms. Finally, our library is illustrated by comparing all the available algorithms and the results are discussed.

  14. Benchmarking for Bayesian Reinforcement Learning.

    Science.gov (United States)

    Castronovo, Michael; Ernst, Damien; Couëtoux, Adrien; Fonteneau, Raphael

    2016-01-01

    In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been proposed, but the benchmarks used to compare them are only relevant for specific cases. The paper addresses this problem, and provides a new BRL comparison methodology along with the corresponding open source library. In this methodology, a comparison criterion that measures the performance of algorithms on large sets of Markov Decision Processes (MDPs) drawn from some probability distributions is defined. In order to enable the comparison of non-anytime algorithms, our methodology also includes a detailed analysis of the computation time requirement of each algorithm. Our library is released with all source code and documentation: it includes three test problems, each of which has two different prior distributions, and seven state-of-the-art RL algorithms. Finally, our library is illustrated by comparing all the available algorithms and the results are discussed.

  15. Reinforcement learning of motor skills with policy gradients.

    Science.gov (United States)

    Peters, Jan; Schaal, Stefan

    2008-05-01

    Autonomous learning is one of the hallmarks of human and animal behavior, and understanding the principles of learning will be crucial in order to achieve true autonomy in advanced machines like humanoid robots. In this paper, we examine learning of complex motor skills with human-like limbs. While supervised learning can offer useful tools for bootstrapping behavior, e.g., by learning from demonstration, it is only reinforcement learning that offers a general approach to the final trial-and-error improvement that is needed by each individual acquiring a skill. Neither neurobiological nor machine learning studies have, so far, offered compelling results on how reinforcement learning can be scaled to the high-dimensional continuous state and action spaces of humans or humanoids. Here, we combine two recent research developments on learning motor control in order to achieve this scaling. First, we interpret the idea of modular motor control by means of motor primitives as a suitable way to generate parameterized control policies for reinforcement learning. Second, we combine motor primitives with the theory of stochastic policy gradient learning, which currently seems to be the only feasible framework for reinforcement learning for humanoids. We evaluate different policy gradient methods with a focus on their applicability to parameterized motor primitives. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.

  16. Reinforcement learning improves behaviour from evaluative feedback

    Science.gov (United States)

    Littman, Michael L.

    2015-05-01

    Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.

  17. Pleasurable music affects reinforcement learning according to the listener

    OpenAIRE

    Benjamin P Gold; Frank, Michael J.; Brigitte eBogert; Elvira eBrattico

    2013-01-01

    Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the p...

  18. Decentralized Reinforcement Learning of robot behaviors

    NARCIS (Netherlands)

    Leottau, David L.; Ruiz-del-Solar, Javier; Babuska, R.

    2018-01-01

    A multi-agent methodology is proposed for Decentralized Reinforcement Learning (DRL) of individual behaviors in problems where multi-dimensional action spaces are involved. When using this methodology, sub-tasks are learned in parallel by individual agents working toward a common goal. In

  19. Adaptive Educational Software by Applying Reinforcement Learning

    Science.gov (United States)

    Bennane, Abdellah

    2013-01-01

    The introduction of the intelligence in teaching software is the object of this paper. In software elaboration process, one uses some learning techniques in order to adapt the teaching software to characteristics of student. Generally, one uses the artificial intelligence techniques like reinforcement learning, Bayesian network in order to adapt…

  20. Using a board game to reinforce learning.

    Science.gov (United States)

    Yoon, Bona; Rodriguez, Leslie; Faselis, Charles J; Liappis, Angelike P

    2014-03-01

    Experiential gaming strategies offer a variation on traditional learning. A board game was used to present synthesized content of fundamental catheter care concepts and reinforce evidence-based practices relevant to nursing. Board games are innovative educational tools that can enhance active learning. Copyright 2014, SLACK Incorporated.

  1. A REINFORCEMENT LEARNING MODEL OF PERSUASIVE COMMUNICATION.

    Science.gov (United States)

    WEISS, ROBERT FRANK

    THEORETICAL AND EXPERIMENTAL ANALOGIES ARE DRAWN BETWEEN LEARNING THEORY AND PERSUASIVE COMMUNICATION AS AN EXTENSION OF LIBERALIZED STIMULUS RESPONSE THEORY. IN THE FIRST EXPERIMENT ON INSTRUMENTAL CONDITIONING OF ATTITUDES, THE SUBJECTS READ AN OPINION TO BE LEARNED, FOLLOWED BY A SUPPORTING ARGUMENT ASSUMED TO FUNCTION AS A REINFORCER. THE TIME…

  2. Evolution with reinforcement learning in negotiation.

    Science.gov (United States)

    Zou, Yi; Zhan, Wenjie; Shao, Yuan

    2014-01-01

    Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm.

  3. Can reinforcement occur with a learned trait?

    Science.gov (United States)

    Olofsson, Helen; Frame, Alicia M; Servedio, Maria R

    2011-07-01

    We use birdsong as a case study to ask whether reinforcement can occur via the spread of a genetically determined female preference for a socially inherited (learned) male trait. We envision secondary contact between two neighboring populations with different song dialects. An individual's ability to learn song is confined by a genetic predisposition: if predispositions are strong, there will be no phenotypic overlap in song between populations, whereas weak predispositions allow phenotypic overlap, or "mixed" song. To determine if reinforcement has occurred, we consider if an allele for within-population female mating preference, based on song, can spread, and whether population specific songs can concurrently be maintained at equilibrium. We model several scenarios, including costs to mating preferences, mating preferences in hybrids, and hybrids having the ability to learn pure songs. We find that when weak predispositions are fixed within a population reinforcement based on song cannot occur. However, when some individuals have strong predispositions, restricting phenotypic overlap between populations in the trait, reinforcement is only slightly inhibited from a purely genetic model. Generalizing beyond the example of song, we conclude that socially learned signals will tend to prohibit reinforcement, but it may still occur if some individuals acquire trait phenotypes genetically. © 2011 The Author(s). Evolution© 2011 The Society for the Study of Evolution.

  4. Value learning through reinforcement : The basics of dopamine and reinforcement learning

    NARCIS (Netherlands)

    Daw, N.D.; Tobler, P.N.; Glimcher, P.W.; Fehr, E.

    2013-01-01

    This chapter provides an overview of reinforcement learning and temporal difference learning and relates these topics to the firing properties of midbrain dopamine neurons. First, we review the RescorlaWagner learning rule and basic learning phenomena, such as blocking, which the rule explains. Then

  5. Human-level control through deep reinforcement learning.

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-26

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  6. Human-level control through deep reinforcement learning

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-01

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  7. Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer.

    Science.gov (United States)

    Hu, Yujing; Gao, Yang; An, Bo

    2015-07-01

    An important approach in multiagent reinforcement learning (MARL) is equilibrium-based MARL, which adopts equilibrium solution concepts in game theory and requires agents to play equilibrium strategies at each state. However, most existing equilibrium-based MARL algorithms cannot scale due to a large number of computationally expensive equilibrium computations (e.g., computing Nash equilibria is PPAD-hard) during learning. For the first time, this paper finds that during the learning process of equilibrium-based MARL, the one-shot games corresponding to each state's successive visits often have the same or similar equilibria (for some states more than 90% of games corresponding to successive visits have similar equilibria). Inspired by this observation, this paper proposes to use equilibrium transfer to accelerate equilibrium-based MARL. The key idea of equilibrium transfer is to reuse previously computed equilibria when each agent has a small incentive to deviate. By introducing transfer loss and transfer condition, a novel framework called equilibrium transfer-based MARL is proposed. We prove that although equilibrium transfer brings transfer loss, equilibrium-based MARL algorithms can still converge to an equilibrium policy under certain assumptions. Experimental results in widely used benchmarks (e.g., grid world game, soccer game, and wall game) show that the proposed framework: 1) not only significantly accelerates equilibrium-based MARL (up to 96.7% reduction in learning time), but also achieves higher average rewards than algorithms without equilibrium transfer and 2) scales significantly better than algorithms without equilibrium transfer when the state/action space grows and the number of agents increases.

  8. Reinforcement Learning in Autism Spectrum Disorder.

    Science.gov (United States)

    Schuetze, Manuela; Rohr, Christiane S; Dewey, Deborah; McCrimmon, Adam; Bray, Signe

    2017-01-01

    Early behavioral interventions are recognized as integral to standard care in autism spectrum disorder (ASD), and often focus on reinforcing desired behaviors (e.g., eye contact) and reducing the presence of atypical behaviors (e.g., echoing others' phrases). However, efficacy of these programs is mixed. Reinforcement learning relies on neurocircuitry that has been reported to be atypical in ASD: prefrontal-sub-cortical circuits, amygdala, brainstem, and cerebellum. Thus, early behavioral interventions rely on neurocircuitry that may function atypically in at least a subset of individuals with ASD. Recent work has investigated physiological, behavioral, and neural responses to reinforcers to uncover differences in motivation and learning in ASD. We will synthesize this work to identify promising avenues for future research that ultimately can be used to enhance the efficacy of early intervention.

  9. Reinforcement Learning in Information Searching

    Science.gov (United States)

    Cen, Yonghua; Gan, Liren; Bai, Chen

    2013-01-01

    Introduction: The study seeks to answer two questions: How do university students learn to use correct strategies to conduct scholarly information searches without instructions? and, What are the differences in learning mechanisms between users at different cognitive levels? Method: Two groups of users, thirteen first year undergraduate students…

  10. Reinforcement and Systemic Machine Learning for Decision Making

    CERN Document Server

    Kulkarni, Parag

    2012-01-01

    Reinforcement and Systemic Machine Learning for Decision Making There are always difficulties in making machines that learn from experience. Complete information is not always available-or it becomes available in bits and pieces over a period of time. With respect to systemic learning, there is a need to understand the impact of decisions and actions on a system over that period of time. This book takes a holistic approach to addressing that need and presents a new paradigm-creating new learning applications and, ultimately, more intelligent machines. The first book of its kind in this new an

  11. Reinforcement Learning for a New Piano Mover

    Directory of Open Access Journals (Sweden)

    Yuko Ishiwaka

    2005-08-01

    Full Text Available We attempt to achieve corporative behavior of autonomous decentralized agents constructed via Q-Learning, which is a type of reinforcement learning. As such, in the present paper, we examine the piano mover's problem. We propose a multi-agent architecture that has a training agent, learning agents and intermediate agent. Learning agents are heterogeneous and can communicate with each other. The movement of an object with three kinds of agent depends on the composition of the actions of the learning agents. By learning its own shape through the learning agents, avoidance of obstacles by the object is expected. We simulate the proposed method in a two-dimensional continuous world. Results obtained in the present investigation reveal the effectiveness of the proposed method.

  12. Reinforcement learning signal predicts social conformity.

    Science.gov (United States)

    Klucharev, Vasily; Hytönen, Kaisa; Rijpkema, Mark; Smidts, Ale; Fernández, Guillén

    2009-01-15

    We often change our decisions and judgments to conform with normative group behavior. However, the neural mechanisms of social conformity remain unclear. Here we show, using functional magnetic resonance imaging, that conformity is based on mechanisms that comply with principles of reinforcement learning. We found that individual judgments of facial attractiveness are adjusted in line with group opinion. Conflict with group opinion triggered a neuronal response in the rostral cingulate zone and the ventral striatum similar to the "prediction error" signal suggested by neuroscientific models of reinforcement learning. The amplitude of the conflict-related signal predicted subsequent conforming behavioral adjustments. Furthermore, the individual amplitude of the conflict-related signal in the ventral striatum correlated with differences in conforming behavior across subjects. These findings provide evidence that social group norms evoke conformity via learning mechanisms reflected in the activity of the rostral cingulate zone and ventral striatum.

  13. The Computational Development of Reinforcement Learning during Adolescence

    National Research Council Canada - National Science Library

    Palminteri, Stefano; Kilford, Emma J; Coricelli, Giorgio; Blakemore, Sarah-Jayne

    2016-01-01

    .... Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence...

  14. Online reinforcement learning for dynamic multimedia systems.

    Science.gov (United States)

    Mastronarde, Nicholas; van der Schaar, Mihaela

    2010-02-01

    In our previous work, we proposed a systematic cross-layer framework for dynamic multimedia systems, which allows each layer to make autonomous and foresighted decisions that maximize the system's long-term performance, while meeting the application's real-time delay constraints. The proposed solution solved the cross-layer optimization offline, under the assumption that the multimedia system's probabilistic dynamics were known a priori, by modeling the system as a layered Markov decision process. In practice, however, these dynamics are unknown a priori and, therefore, must be learned online. In this paper, we address this problem by allowing the multimedia system layers to learn, through repeated interactions with each other, to autonomously optimize the system's long-term performance at run-time. The two key challenges in this layered learning setting are: (i) each layer's learning performance is directly impacted by not only its own dynamics, but also by the learning processes of the other layers with which it interacts; and (ii) selecting a learning model that appropriately balances time-complexity (i.e., learning speed) with the multimedia system's limited memory and the multimedia application's real-time delay constraints. We propose two reinforcement learning algorithms for optimizing the system under different design constraints: the first algorithm solves the cross-layer optimization in a centralized manner and the second solves it in a decentralized manner. We analyze both algorithms in terms of their required computation, memory, and interlayer communication overheads. After noting that the proposed reinforcement learning algorithms learn too slowly, we introduce a complementary accelerated learning algorithm that exploits partial knowledge about the system's dynamics in order to dramatically improve the system's performance. In our experiments, we demonstrate that decentralized learning can perform equally as well as centralized learning, while

  15. Challenges in the Verification of Reinforcement Learning Algorithms

    Science.gov (United States)

    Van Wesel, Perry; Goodloe, Alwyn E.

    2017-01-01

    Machine learning (ML) is increasingly being applied to a wide array of domains from search engines to autonomous vehicles. These algorithms, however, are notoriously complex and hard to verify. This work looks at the assumptions underlying machine learning algorithms as well as some of the challenges in trying to verify ML algorithms. Furthermore, we focus on the specific challenges of verifying reinforcement learning algorithms. These are highlighted using a specific example. Ultimately, we do not offer a solution to the complex problem of ML verification, but point out possible approaches for verification and interesting research opportunities.

  16. Reinforcement Learning in an Environment Synthetically Augmented with Digital Pheromones

    Directory of Open Access Journals (Sweden)

    Salvador E. Barbosa

    2014-01-01

    Full Text Available Reinforcement learning requires information about states, actions, and outcomes as the basis for learning. For many applications, it can be difficult to construct a representative model of the environment, either due to lack of required information or because of that the model's state space may become too large to allow a solution in a reasonable amount of time, using the experience of prior actions. An environment consisting solely of the occurrence or nonoccurrence of specific events attributable to a human actor may appear to lack the necessary structure for the positioning of responding agents in time and space using reinforcement learning. Digital pheromones can be used to synthetically augment such an environment with event sequence information to create a more persistent and measurable imprint on the environment that supports reinforcement learning. We implemented this method and combined it with the ability of agents to learn from actions not taken, a concept known as fictive learning. This approach was tested against the historical sequence of Somali maritime pirate attacks from 2005 to mid-2012, enabling a set of autonomous agents representing naval vessels to successfully respond to an average of 333 of the 899 pirate attacks, outperforming the historical record of 139 successes.

  17. A neural model of hierarchical reinforcement learning.

    Science.gov (United States)

    Rasmussen, Daniel; Voelker, Aaron; Eliasmith, Chris

    2017-01-01

    We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain's general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model's behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions.

  18. "Notice of Violation of IEEE Publication Principles" Multiobjective Reinforcement Learning: A Comprehensive Overview.

    Science.gov (United States)

    Liu, Chunming; Xu, Xin; Hu, Dewen

    2013-04-29

    Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.

  19. Space Objects Maneuvering Detection and Prediction via Inverse Reinforcement Learning

    Science.gov (United States)

    Linares, R.; Furfaro, R.

    This paper determines the behavior of Space Objects (SOs) using inverse Reinforcement Learning (RL) to estimate the reward function that each SO is using for control. The approach discussed in this work can be used to analyze maneuvering of SOs from observational data. The inverse RL problem is solved using the Feature Matching approach. This approach determines the optimal reward function that a SO is using while maneuvering by assuming that the observed trajectories are optimal with respect to the SO's own reward function. This paper uses estimated orbital elements data to determine the behavior of SOs in a data-driven fashion.

  20. Hierarchical extreme learning machine based reinforcement learning for goal localization

    Science.gov (United States)

    AlDahoul, Nouar; Zaw Htike, Zaw; Akmeliawati, Rini

    2017-03-01

    The objective of goal localization is to find the location of goals in noisy environments. Simple actions are performed to move the agent towards the goal. The goal detector should be capable of minimizing the error between the predicted locations and the true ones. Few regions need to be processed by the agent to reduce the computational effort and increase the speed of convergence. In this paper, reinforcement learning (RL) method was utilized to find optimal series of actions to localize the goal region. The visual data, a set of images, is high dimensional unstructured data and needs to be represented efficiently to get a robust detector. Different deep Reinforcement models have already been used to localize a goal but most of them take long time to learn the model. This long learning time results from the weights fine tuning stage that is applied iteratively to find an accurate model. Hierarchical Extreme Learning Machine (H-ELM) was used as a fast deep model that doesn’t fine tune the weights. In other words, hidden weights are generated randomly and output weights are calculated analytically. H-ELM algorithm was used in this work to find good features for effective representation. This paper proposes a combination of Hierarchical Extreme learning machine and Reinforcement learning to find an optimal policy directly from visual input. This combination outperforms other methods in terms of accuracy and learning speed. The simulations and results were analysed by using MATLAB.

  1. A reinforcement learning-based architecture for fuzzy logic control

    Science.gov (United States)

    Berenji, Hamid R.

    1992-01-01

    This paper introduces a new method for learning to refine a rule-based fuzzy logic controller. A reinforcement learning technique is used in conjunction with a multilayer neural network model of a fuzzy controller. The approximate reasoning based intelligent control (ARIC) architecture proposed here learns by updating its prediction of the physical system's behavior and fine tunes a control knowledge base. Its theory is related to Sutton's temporal difference (TD) method. Because ARIC has the advantage of using the control knowledge of an experienced operator and fine tuning it through the process of learning, it learns faster than systems that train networks from scratch. The approach is applied to a cart-pole balancing system.

  2. Application of reinforcement learning for segmentation of transrectal ultrasound images.

    Science.gov (United States)

    Sahba, Farhang; Tizhoosh, Hamid R; Salama, Magdy M A

    2008-04-22

    Among different medical image modalities, ultrasound imaging has a very widespread clinical use. But, due to some factors, such as poor image contrast, noise and missing or diffuse boundaries, the ultrasound images are inherently difficult to segment. An important application is estimation of the location and volume of the prostate in transrectal ultrasound (TRUS) images. For this purpose, manual segmentation is a tedious and time consuming procedure. We introduce a new method for the segmentation of the prostate in transrectal ultrasound images, using a reinforcement learning scheme. This algorithm is used to find the appropriate local values for sub-images and to extract the prostate. It contains an offline stage, where the reinforcement learning agent uses some images and manually segmented versions of these images to learn from. The reinforcement agent is provided with reward/punishment, determined objectively to explore/exploit the solution space. After this stage, the agent has acquired knowledge stored in the Q-matrix. The agent can then use this knowledge for new input images to extract a coarse version of the prostate. We have carried out experiments to segment TRUS images. The results demonstrate the potential of this approach in the field of medical image segmentation. By using the proposed method, we can find the appropriate local values and segment the prostate. This approach can be used for segmentation tasks containing one object of interest. To improve this prototype, more investigations are needed.

  3. Application of reinforcement learning for segmentation of transrectal ultrasound images

    Directory of Open Access Journals (Sweden)

    Tizhoosh Hamid R

    2008-04-01

    Full Text Available Abstract Background Among different medical image modalities, ultrasound imaging has a very widespread clinical use. But, due to some factors, such as poor image contrast, noise and missing or diffuse boundaries, the ultrasound images are inherently difficult to segment. An important application is estimation of the location and volume of the prostate in transrectal ultrasound (TRUS images. For this purpose, manual segmentation is a tedious and time consuming procedure. Methods We introduce a new method for the segmentation of the prostate in transrectal ultrasound images, using a reinforcement learning scheme. This algorithm is used to find the appropriate local values for sub-images and to extract the prostate. It contains an offline stage, where the reinforcement learning agent uses some images and manually segmented versions of these images to learn from. The reinforcement agent is provided with reward/punishment, determined objectively to explore/exploit the solution space. After this stage, the agent has acquired knowledge stored in the Q-matrix. The agent can then use this knowledge for new input images to extract a coarse version of the prostate. Results We have carried out experiments to segment TRUS images. The results demonstrate the potential of this approach in the field of medical image segmentation. Conclusion By using the proposed method, we can find the appropriate local values and segment the prostate. This approach can be used for segmentation tasks containing one object of interest. To improve this prototype, more investigations are needed.

  4. The QV Family Compared to Other Reinforcement Learning Algorithms

    NARCIS (Netherlands)

    Wiering, Marco A.; van Hasselt, Hado

    2009-01-01

    This paper describes several new online model-free reinforcement learning (RL) algorithms. We designed three new reinforcement algorithms, namely: QV2, QVMAX, and QV-MAX2, that are all based on the QV-learning algorithm, but in contrary to QV-learning, QVMAX and QVMAX2 are off-policy RL algorithms

  5. Perception-based Co-evolutionary Reinforcement Learning for UAV Sensor Allocation

    National Research Council Canada - National Science Library

    Berenji, Hamid

    2003-01-01

    .... A Perception-based reasoning approach based on co-evolutionary reinforcement learning was developed for jointly addressing sensor allocation on each individual UAV and allocation of a team of UAVs...

  6. Reinforcement Learning for the Unit Commitment Problem

    OpenAIRE

    Dalal, Gal; Mannor, Shie

    2015-01-01

    In this work we solve the day-ahead unit commitment (UC) problem, by formulating it as a Markov decision process (MDP) and finding a low-cost policy for generation scheduling. We present two reinforcement learning algorithms, and devise a third one. We compare our results to previous work that uses simulated annealing (SA), and show a 27% improvement in operation costs, with running time of 2.5 minutes (compared to 2.5 hours of existing state-of-the-art).

  7. Reinforcement learning account of network reciprocity.

    Directory of Open Access Journals (Sweden)

    Takahiro Ezaki

    Full Text Available Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results.

  8. Tank War Using Online Reinforcement Learning

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-Time Strategy(RTS) games provide a challenging platform to implement online reinforcement learning(RL) techniques in a real application. Computer as one player monitors opponents'(human or other computers) strategies and then updates its own policy using RL methods. In this paper, we propose...... a multi-layer framework for implementing the online RL in a RTS game. The framework significantly reduces the RL computational complexity by decomposing the state space in a hierarchical manner. We implement the RTS game - Tank General, and perform a thorough test on the proposed framework. The results...

  9. Model-Based Reinforcement Learning under Concurrent Schedules of Reinforcement in Rodents

    Science.gov (United States)

    Huh, Namjung; Jo, Suhyun; Kim, Hoseok; Sul, Jung Hoon; Jung, Min Whan

    2009-01-01

    Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's…

  10. Pleasurable music affects reinforcement learning according to the listener.

    Science.gov (United States)

    Gold, Benjamin P; Frank, Michael J; Bogert, Brigitte; Brattico, Elvira

    2013-01-01

    Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy.

  11. Pleasurable music affects reinforcement learning according to the listener

    Directory of Open Access Journals (Sweden)

    Benjamin P Gold

    2013-08-01

    Full Text Available Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB, and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy.

  12. Pleasurable music affects reinforcement learning according to the listener

    Science.gov (United States)

    Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira

    2013-01-01

    Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875

  13. Credit assignment during movement reinforcement learning.

    Science.gov (United States)

    Dam, Gregory; Kording, Konrad; Wei, Kunlin

    2013-01-01

    We often need to learn how to move based on a single performance measure that reflects the overall success of our movements. However, movements have many properties, such as their trajectories, speeds and timing of end-points, thus the brain needs to decide which properties of movements should be improved; it needs to solve the credit assignment problem. Currently, little is known about how humans solve credit assignment problems in the context of reinforcement learning. Here we tested how human participants solve such problems during a trajectory-learning task. Without an explicitly-defined target movement, participants made hand reaches and received monetary rewards as feedback on a trial-by-trial basis. The curvature and direction of the attempted reach trajectories determined the monetary rewards received in a manner that can be manipulated experimentally. Based on the history of action-reward pairs, participants quickly solved the credit assignment problem and learned the implicit payoff function. A Bayesian credit-assignment model with built-in forgetting accurately predicts their trial-by-trial learning.

  14. Vicarious reinforcement learning signals when instructing others.

    Science.gov (United States)

    Apps, Matthew A J; Lesage, Elise; Ramnani, Narender

    2015-02-18

    Reinforcement learning (RL) theory posits that learning is driven by discrepancies between the predicted and actual outcomes of actions (prediction errors [PEs]). In social environments, learning is often guided by similar RL mechanisms. For example, teachers monitor the actions of students and provide feedback to them. This feedback evokes PEs in students that guide their learning. We report the first study that investigates the neural mechanisms that underpin RL signals in the brain of a teacher. Neurons in the anterior cingulate cortex (ACC) signal PEs when learning from the outcomes of one's own actions but also signal information when outcomes are received by others. Does a teacher's ACC signal PEs when monitoring a student's learning? Using fMRI, we studied brain activity in human subjects (teachers) as they taught a confederate (student) action-outcome associations by providing positive or negative feedback. We examined activity time-locked to the students' responses, when teachers infer student predictions and know actual outcomes. We fitted a RL-based computational model to the behavior of the student to characterize their learning, and examined whether a teacher's ACC signals when a student's predictions are wrong. In line with our hypothesis, activity in the teacher's ACC covaried with the PE values in the model. Additionally, activity in the teacher's insula and ventromedial prefrontal cortex covaried with the predicted value according to the student. Our findings highlight that the ACC signals PEs vicariously for others' erroneous predictions, when monitoring and instructing their learning. These results suggest that RL mechanisms, processed vicariously, may underpin and facilitate teaching behaviors. Copyright © 2015 Apps et al.

  15. Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning.

    Science.gov (United States)

    Mkrtchian, Anahit; Aylward, Jessica; Dayan, Peter; Roiser, Jonathan P; Robinson, Oliver J

    2017-10-01

    Serious and debilitating symptoms of anxiety are the most common mental health problem worldwide, accounting for around 5% of all adult years lived with disability in the developed world. Avoidance behavior-avoiding social situations for fear of embarrassment, for instance-is a core feature of such anxiety. However, as for many other psychiatric symptoms the biological mechanisms underlying avoidance remain unclear. Reinforcement learning models provide formal and testable characterizations of the mechanisms of decision making; here, we examine avoidance in these terms. A total of 101 healthy participants and individuals with mood and anxiety disorders completed an approach-avoidance go/no-go task under stress induced by threat of unpredictable shock. We show an increased reliance in the mood and anxiety group on a parameter of our reinforcement learning model that characterizes a prepotent (pavlovian) bias to withhold responding in the face of negative outcomes. This was particularly the case when the mood and anxiety group was under stress. This formal description of avoidance within the reinforcement learning framework provides a new means of linking clinical symptoms with biophysically plausible models of neural circuitry and, as such, takes us closer to a mechanistic understanding of mood and anxiety disorders. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  16. Reinforcement Learning for Ramp Control: An Analysis of Learning Parameters

    Directory of Open Access Journals (Sweden)

    Chao Lu

    2016-08-01

    Full Text Available Reinforcement Learning (RL has been proposed to deal with ramp control problems under dynamic traffic conditions; however, there is a lack of sufficient research on the behaviour and impacts of different learning parameters. This paper describes a ramp control agent based on the RL mechanism and thoroughly analyzed the influence of three learning parameters; namely, learning rate, discount rate and action selection parameter on the algorithm performance. Two indices for the learning speed and convergence stability were used to measure the algorithm performance, based on which a series of simulation-based experiments were designed and conducted by using a macroscopic traffic flow model. Simulation results showed that, compared with the discount rate, the learning rate and action selection parameter made more remarkable impacts on the algorithm performance. Based on the analysis, some suggestionsabout how to select suitable parameter values that can achieve a superior performance were provided.

  17. Distributed reinforcement learning for adaptive and robust network intrusion response

    Science.gov (United States)

    Malialis, Kleanthis; Devlin, Sam; Kudenko, Daniel

    2015-07-01

    Distributed denial of service (DDoS) attacks constitute a rapidly evolving threat in the current Internet. Multiagent Router Throttling is a novel approach to defend against DDoS attacks where multiple reinforcement learning agents are installed on a set of routers and learn to rate-limit or throttle traffic towards a victim server. The focus of this paper is on online learning and scalability. We propose an approach that incorporates task decomposition, team rewards and a form of reward shaping called difference rewards. One of the novel characteristics of the proposed system is that it provides a decentralised coordinated response to the DDoS problem, thus being resilient to DDoS attacks themselves. The proposed system learns remarkably fast, thus being suitable for online learning. Furthermore, its scalability is successfully demonstrated in experiments involving 1000 learning agents. We compare our approach against a baseline and a popular state-of-the-art throttling technique from the network security literature and show that the proposed approach is more effective, adaptive to sophisticated attack rate dynamics and robust to agent failures.

  18. Automatic feature selection for model-based reinforcement learning in factored MDPs

    NARCIS (Netherlands)

    Kroon, M.; Whiteson, S.; Wani, M.A.; Kantardzic, M.; Palade, V.; Kurgan, L.; Qi, A.

    2009-01-01

    Feature selection is an important challenge in machine learning. Unfortunately, most methods for automating feature selection are designed for supervised learning tasks and are thus either inapplicable or impractical for reinforcement learning. This paper presents a new approach to feature selection

  19. Reinforcement of Science Learning through Local Culture: A Delphi Study

    Science.gov (United States)

    Nuangchalerm, Prasart

    2008-01-01

    This study aims to explore the ways to reinforce science learning through local culture by using Delphi technique. Twenty four participants in various fields of study were selected. The result of study provides a framework for reinforcement of science learning through local culture on the theme life and environment. (Contains 1 table.)

  20. A Hierarchical Maze Navigation Algorithm with Reinforcement Learning and Mapping

    NARCIS (Netherlands)

    Mannucci, T.; van Kampen, E.; Jin, Y; Kollias, S.

    2016-01-01

    Goal-finding in an unknown maze is a challenging problem for a Reinforcement Learning agent, because the corresponding state space can be large if not intractable, and the agent does not usually have a model of the environment. Hierarchical Reinforcement Learning has been shown in the past to

  1. Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers

    Science.gov (United States)

    Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.

    2014-01-01

    Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…

  2. Video Demo: Deep Reinforcement Learning for Coordination in Traffic Light Control

    NARCIS (Netherlands)

    van der Pol, E.; Oliehoek, F.A.; Bosse, T.; Bredeweg, B.

    2016-01-01

    This video demonstration contrasts two approaches to coordination in traffic light control using reinforcement learning: earlier work, based on a deconstruction of the state space into a linear combination of vehicle states, and our own approach based on the Deep Q-learning algorithm.

  3. A reinforcement learning formulation to the complex question answering problem

    OpenAIRE

    Chali, Yllias; Hasan, Sadid A.; Mojahid, Mustapha

    2015-01-01

    We use extractive multi-document summarization techniques to perform complex question answering and formulate it as a reinforcement learning problem. Given a set of complex questions, a list of relevant documents per question, and the corresponding human generated summaries (i.e. answers to the questions) as training data, the reinforcement learning module iteratively learns a number of feature weights in order to facilitate the automatic generation of summaries i.e. answers to previously u...

  4. A reinforcement learning formulation to the complex question answering problem

    OpenAIRE

    Chali, Yllias; Hasan, Sadid A.; Mojahid, Mustapha

    2015-01-01

    International audience; We use extractive multi-document summarization techniques to perform complex question answering and formulate it as a reinforcement learning problem. Given a set of complex questions, a list of relevant documents per question, and the corresponding human generated summaries (i.e. answers to the questions) as training data, the reinforcement learning module iteratively learns a number of feature weights in order to facilitate the automatic generation of summaries i.e. a...

  5. Reinforcement learning for routing in cognitive radio ad hoc networks.

    Science.gov (United States)

    Al-Rawi, Hasan A A; Yau, Kok-Lim Alvin; Mohamad, Hafizal; Ramli, Nordin; Hashim, Wahidah

    2014-01-01

    Cognitive radio (CR) enables unlicensed users (or secondary users, SUs) to sense for and exploit underutilized licensed spectrum owned by the licensed users (or primary users, PUs). Reinforcement learning (RL) is an artificial intelligence approach that enables a node to observe, learn, and make appropriate decisions on action selection in order to maximize network performance. Routing enables a source node to search for a least-cost route to its destination node. While there have been increasing efforts to enhance the traditional RL approach for routing in wireless networks, this research area remains largely unexplored in the domain of routing in CR networks. This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate) through simulation. New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing. Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs' network performance without significantly jeopardizing PUs' network performance, specifically SUs' interference to PUs.

  6. Prespeech motor learning in a neural network using reinforcement.

    Science.gov (United States)

    Warlaumont, Anne S; Westermann, Gert; Buder, Eugene H; Oller, D Kimbrough

    2013-02-01

    Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one's language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the different conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network's post-learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network's post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model's post-learning productions were more likely to resemble the English vowels and vice versa. Copyright © 2012 Elsevier Ltd. All rights reserved.

  7. Changes in corticostriatal connectivity during reinforcement learning in humans.

    Science.gov (United States)

    Horga, Guillermo; Maia, Tiago V; Marsh, Rachel; Hao, Xuejun; Xu, Dongrong; Duan, Yunsuo; Tau, Gregory Z; Graniello, Barbara; Wang, Zhishun; Kangarlu, Alayar; Martinez, Diana; Packard, Mark G; Peterson, Bradley S

    2015-02-01

    Many computational models assume that reinforcement learning relies on changes in synaptic efficacy between cortical regions representing stimuli and striatal regions involved in response selection, but this assumption has thus far lacked empirical support in humans. We recorded hemodynamic signals with fMRI while participants navigated a virtual maze to find hidden rewards. We fitted a reinforcement-learning algorithm to participants' choice behavior and evaluated the neural activity and the changes in functional connectivity related to trial-by-trial learning variables. Activity in the posterior putamen during choice periods increased progressively during learning. Furthermore, the functional connections between the sensorimotor cortex and the posterior putamen strengthened progressively as participants learned the task. These changes in corticostriatal connectivity differentiated participants who learned the task from those who did not. These findings provide a direct link between changes in corticostriatal connectivity and learning, thereby supporting a central assumption common to several computational models of reinforcement learning. © 2014 Wiley Periodicals, Inc.

  8. Attention-gated reinforcement learning of internal representations for classification.

    Science.gov (United States)

    Roelfsema, Pieter R; van Ooyen, Arjen

    2005-10-01

    Animal learning is associated with changes in the efficacy of connections between neurons. The rules that govern this plasticity can be tested in neural networks. Rules that train neural networks to map stimuli onto outputs are given by supervised learning and reinforcement learning theories. Supervised learning is efficient but biologically implausible. In contrast, reinforcement learning is biologically plausible but comparatively inefficient. It lacks a mechanism that can identify units at early processing levels that play a decisive role in the stimulus-response mapping. Here we show that this so-called credit assignment problem can be solved by a new role for attention in learning. There are two factors in our new learning scheme that determine synaptic plasticity: (1) a reinforcement signal that is homogeneous across the network and depends on the amount of reward obtained after a trial, and (2) an attentional feedback signal from the output layer that limits plasticity to those units at earlier processing levels that are crucial for the stimulus-response mapping. The new scheme is called attention-gated reinforcement learning (AGREL). We show that it is as efficient as supervised learning in classification tasks. AGREL is biologically realistic and integrates the role of feedback connections, attention effects, synaptic plasticity, and reinforcement learning signals into a coherent framework.

  9. On the Possibility of a Reinforcement Theory of Cognitive Learning.

    Science.gov (United States)

    Smith, Kendon

    This paper discusses cognitive learning in terms of reinforcement theory and presents arguments suggesting that a viable theory of cognition based on reinforcement principles is not out of the question. This position is supported by a discussion of the weaknesses of theories based entirely on contiguity and of considerations that are more positive…

  10. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning

    National Research Council Canada - National Science Library

    Bowling, Michael

    2000-01-01

    .... In this paper we contribute a comprehensive presentation of the relevant techniques for solving stochastic games from both the game theory community and reinforcement learning communities. We examine the assumptions and limitations of these algorithms, and identify similarities between these algorithms, single agent reinforcement learners, and basic game theory techniques.

  11. When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition

    Science.gov (United States)

    Janssen, Christian P.; Gray, Wayne D.

    2012-01-01

    Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other…

  12. The cerebellum: A neural system for the study of reinforcement learning

    Directory of Open Access Journals (Sweden)

    Rodney A. Swain

    2011-03-01

    Full Text Available In its strictest application, the term reinforcement learning refers to a computational approach to learning in which an agent (often a machine interacts with a mutable environment to maximize reward through trial and error. The approach borrows essentials from several fields, most notably Computer Science, Behavioral Neuroscience, and Psychology. At the most basic level, a neural system capable of mediating reinforcement learning must be able to acquire sensory information about the external environment and internal milieu (either directly or through connectivities with other brain regions, must be able to select a behavior to be executed, and must be capable of providing evaluative feedback about the success of that behavior. Given that Psychology informs us that reinforcers, both positive and negative, are stimuli or consequences that increase the probability that the immediately antecedent behavior will be repeated and that reinforcer strength or viability is modulated by the organism’s past experience with the reinforcer, its affect, and even the state of its muscles (e.g., eyes open or closed; it is the case that any neural system that supports reinforcement learning must also be sensitive to these same considerations. Once learning is established, such a neural system must finally be able to maintain continued response expression and prevent response drift. In this report, we examine both historical and recent evidence that the cerebellum satisfies all of these requirements. While we report evidence from a variety of learning paradigms, the majority of our discussion will focus on classical conditioning of the rabbit eye blink response as an ideal model system for the study of reinforcement and reinforcement learning.

  13. Neural basis of reinforcement learning and decision making.

    Science.gov (United States)

    Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan

    2012-01-01

    Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal's knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain.

  14. Neural Basis of Reinforcement Learning and Decision Making

    Science.gov (United States)

    Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan

    2012-01-01

    Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543

  15. Reinforcement Learning: Stochastic Approximation Algorithms for Markov Decision Processes

    OpenAIRE

    Krishnamurthy, Vikram

    2015-01-01

    This article presents a short and concise description of stochastic approximation algorithms in reinforcement learning of Markov decision processes. The algorithms can also be used as a suboptimal method for partially observed Markov decision processes.

  16. applying reinforcement learning to the weapon assignment problem

    African Journals Online (AJOL)

    ismith

    the machine-learning subfield of reinforcement learning (RL), namely a Monte. Carlo (MC) control algorithm with exploring ... What sets RL apart from other machine-learning methods is the fact that the agent is not told which actions to take, ..... For our simple model, the DA was situated at the centre of the grid and was fixed.

  17. Model-Based Multi-Objective Reinforcement Learning

    NARCIS (Netherlands)

    Wiering, Marco; Withagen, Maikel; Drugan, Madalina

    2014-01-01

    This paper describes a novel multi-objective reinforcement learning algorithm. The proposed algorithm first learns a model of the multi-objective sequential decision making problem, after which this learned model is used by a multi-objective dynamic programming method to compute Pareto op-timal

  18. Learning to Perform Physics Experiments via Deep Reinforcement Learning

    CERN Document Server

    Denil, Misha; Kulkarni, Tejas D; Erez, Tom; Battaglia, Peter; de Freitas, Nando

    2016-01-01

    When encountering novel object, humans are able to infer a wide range of physical properties such as mass, friction and deformability by interacting with them in a goal driven way. This process of active interaction is in the same spirit of a scientist performing an experiment to discover hidden facts. Recent advances in artificial intelligence have yielded machines that can achieve superhuman performance in Go, Atari, natural language processing, and complex control problems, but it is not clear that these systems can rival the scientific intuition of even a young child. In this work we introduce a basic set of tasks that require agents to estimate hidden properties such as mass and cohesion of objects in an interactive simulated environment where they can manipulate the objects and observe the consequences. We found that state of art deep reinforcement learning methods can learn to perform the experiments necessary to discover such hidden properties. By systematically manipulating the problem difficulty and...

  19. Multitask Reinforcement Learning on the Distribution of MDPs

    Science.gov (United States)

    Fumihide, Tanaka; Masayuki, Yamamura

    In this paper we address a new problem in reinforcement learning. Here we consider an agent that faces multiple learning tasks within its lifetime. The agent’s objective is to maximize its total reward in the lifetime as well as a conventional return in each task. To realize this, it has to be endowed an important ability to keep its past learning experiences and utilize them for improving future learning performance. This time we try to phrase this problem formally. The central idea is to introduce an environmental class, BV-MDPs that is defined with the distribution of MDPs. As an approach to exploiting past learning experiences, we focus on statistical information (mean and deviation) about the agent’s value tables. The mean can be used as initial values of the table when a new task is presented. The deviation can be viewed as measuring reliability of the mean, and we utilize it in calculating priority of simulated backups. We conduct experiments in computer simulation to evaluate the effectiveness.

  20. Behavioral and neural properties of social reinforcement learning.

    Science.gov (United States)

    Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Libby, Victoria; Glover, Gary; Voss, Henning U; Ballon, Douglas J; Casey, B J

    2011-09-14

    Social learning is critical for engaging in complex interactions with other individuals. Learning from positive social exchanges, such as acceptance from peers, may be similar to basic reinforcement learning. We formally test this hypothesis by developing a novel paradigm that is based on work in nonhuman primates and human imaging studies of reinforcement learning. The probability of receiving positive social reinforcement from three distinct peers was parametrically manipulated while brain activity was recorded in healthy adults using event-related functional magnetic resonance imaging. Over the course of the experiment, participants responded more quickly to faces of peers who provided more frequent positive social reinforcement, and rated them as more likeable. Modeling trial-by-trial learning showed ventral striatum and orbital frontal cortex activity correlated positively with forming expectations about receiving social reinforcement. Rostral anterior cingulate cortex activity tracked positively with modulations of expected value of the cues (peers). Together, the findings across three levels of analysis--social preferences, response latencies, and modeling neural responses--are consistent with reinforcement learning theory and nonhuman primate electrophysiological studies of reward. This work highlights the fundamental influence of acceptance by one's peers in altering subsequent behavior.

  1. Behavioral and neural properties of social reinforcement learning

    Science.gov (United States)

    Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Libby, Victoria; Glover, Gary; Voss, Henning U.; Ballon, Douglas J.; Casey, BJ

    2011-01-01

    Social learning is critical for engaging in complex interactions with other individuals. Learning from positive social exchanges, such as acceptance from peers, may be similar to basic reinforcement learning. We formally test this hypothesis by developing a novel paradigm that is based upon work in non-human primates and human imaging studies of reinforcement learning. The probability of receiving positive social reinforcement from three distinct peers was parametrically manipulated while brain activity was recorded in healthy adults using event-related functional magnetic resonance imaging (fMRI). Over the course of the experiment, participants responded more quickly to faces of peers who provided more frequent positive social reinforcement, and rated them as more likeable. Modeling trial-by-trial learning showed ventral striatum and orbital frontal cortex activity correlated positively with forming expectations about receiving social reinforcement. Rostral anterior cingulate cortex activity tracked positively with modulations of expected value of the cues (peers). Together, the findings across three levels of analysis - social preferences, response latencies and modeling neural responses – are consistent with reinforcement learning theory and non-human primate electrophysiological studies of reward. This work highlights the fundamental influence of acceptance by one’s peers in altering subsequent behavior. PMID:21917787

  2. Democratic reinforcement: learning via self-organization

    Energy Technology Data Exchange (ETDEWEB)

    Stassinopoulos, D. [Florida Atlantic Univ., Boca Raton, FL (United States); Bak, P. [Brookhaven National Lab., Upton, NY (United States)

    1995-12-31

    The problem of learning in the absence of external intelligence is discussed in the context of a simple model. The model consists of a set of randomly connected, or layered integrate-and fire neurons. Inputs to and outputs from the environment are connected randomly to subsets of neurons. The connections between firing neurons are strengthened or weakened according to whether the action is successful or not. The model departs from the traditional gradient-descent based approaches to learning by operating at a highly susceptible ``critical`` state, with low activity and sparse connections between firing neurons. Quantitative studies on the performance of our model in a simple association task show that by tuning our system close to this critical state we can obtain dramatic gains in performance.

  3. Instructional control of reinforcement learning: a behavioral and neurocomputational investigation.

    Science.gov (United States)

    Doll, Bradley B; Jacobs, W Jake; Sanfey, Alan G; Frank, Michael J

    2009-11-24

    Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.

  4. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Science.gov (United States)

    Kato, Ayaka; Morita, Kenji

    2016-10-01

    It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2) value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning

  5. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Directory of Open Access Journals (Sweden)

    Ayaka Kato

    2016-10-01

    Full Text Available It has been suggested that dopamine (DA represents reward-prediction-error (RPE defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1 decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2 value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i slowdown of behavior by post-training blockade of DA signaling, (ii observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems

  6. Reinforcement Learning in Robotics: Applications and Real-World Challenges

    Directory of Open Access Journals (Sweden)

    Petar Kormushev

    2013-07-01

    Full Text Available In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn, improve, adapt and reproduce tasks with dynamically changing constraints based on exploration and autonomous learning. We give a summary of the state-of-the-art of reinforcement learning in the context of robotics, in terms of both algorithms and policy representations. Numerous challenges faced by the policy representation in robotics are identified. Three recent examples for the application of reinforcement learning to real-world robots are described: a pancake flipping task, a bipedal walking energy minimization task and an archery-based aiming task. In all examples, a state-of-the-art expectation-maximization-based reinforcement learning is used, and different policy representations are proposed and evaluated for each task. The proposed policy representations offer viable solutions to six rarely-addressed challenges in policy representations: correlations, adaptability, multi-resolution, globality, multi-dimensionality and convergence. Both the successes and the practical difficulties encountered in these examples are discussed. Based on insights from these particular cases, conclusions are drawn about the state-of-the-art and the future perspective directions for reinforcement learning in robotics.

  7. The role of GABAB receptors in human reinforcement learning.

    Science.gov (United States)

    Ort, Andres; Kometer, Michael; Rohde, Judith; Seifritz, Erich; Vollenweider, Franz X

    2014-10-01

    Behavioral evidence from human studies suggests that the γ-aminobutyric acid type B receptor (GABAB receptor) agonist baclofen modulates reinforcement learning and reduces craving in patients with addiction spectrum disorders. However, in contrast to the well established role of dopamine in reinforcement learning, the mechanisms by which the GABAB receptor influences reinforcement learning in humans remain completely unknown. To further elucidate this issue, a cross-over, double-blind, placebo-controlled study was performed in healthy human subjects (N=15) to test the effects of baclofen (20 and 50mg p.o.) on probabilistic reinforcement learning. Outcomes were the feedback-induced P2 component of the event-related potential, the feedback-related negativity, and the P300 component of the event-related potential. Baclofen produced a reduction of P2 amplitude over the course of the experiment, but did not modulate the feedback-related negativity. Furthermore, there was a trend towards increased learning after baclofen administration relative to placebo over the course of the experiment. The present results extend previous theories of reinforcement learning, which focus on the importance of mesolimbic dopamine signaling, and indicate that stimulation of cortical GABAB receptors in a fronto-parietal network leads to better attentional allocation in reinforcement learning. This observation is a first step in our understanding of how baclofen may improve reinforcement learning in healthy subjects. Further studies with bigger sample sizes are needed to corroborate this conclusion and furthermore, test this effect in patients with addiction spectrum disorder. Copyright © 2014 Elsevier B.V. and ECNP. All rights reserved.

  8. Approaches to Machine Learning.

    Science.gov (United States)

    Langley, Pat; Carbonell, Jaime G.

    1984-01-01

    Reviews approaches to machine learning (development of techniques to automate acquisition of new information, skills, and ways of organizing existing information) in symbolic domains. Four categorical tasks addressed in machine learning literature are examined: learning from examples, learning search heuristics, learning by observation, and…

  9. Deep reinforcement learning for automated radiation adaptation in lung cancer.

    Science.gov (United States)

    Tseng, Huan-Hsin; Luo, Yi; Cui, Sunan; Chien, Jen-Tzung; Ten Haken, Randall K; Naqa, Issam El

    2017-12-01

    To investigate deep reinforcement learning (DRL) based on historical treatment plans for developing automated radiation adaptation protocols for nonsmall cell lung cancer (NSCLC) patients that aim to maximize tumor local control at reduced rates of radiation pneumonitis grade 2 (RP2). In a retrospective population of 114 NSCLC patients who received radiotherapy, a three-component neural networks framework was developed for deep reinforcement learning (DRL) of dose fractionation adaptation. Large-scale patient characteristics included clinical, genetic, and imaging radiomics features in addition to tumor and lung dosimetric variables. First, a generative adversarial network (GAN) was employed to learn patient population characteristics necessary for DRL training from a relatively limited sample size. Second, a radiotherapy artificial environment (RAE) was reconstructed by a deep neural network (DNN) utilizing both original and synthetic data (by GAN) to estimate the transition probabilities for adaptation of personalized radiotherapy patients' treatment courses. Third, a deep Q-network (DQN) was applied to the RAE for choosing the optimal dose in a response-adapted treatment setting. This multicomponent reinforcement learning approach was benchmarked against real clinical decisions that were applied in an adaptive dose escalation clinical protocol. In which, 34 patients were treated based on avid PET signal in the tumor and constrained by a 17.2% normal tissue complication probability (NTCP) limit for RP2. The uncomplicated cure probability (P+) was used as a baseline reward function in the DRL. Taking our adaptive dose escalation protocol as a blueprint for the proposed DRL (GAN + RAE + DQN) architecture, we obtained an automated dose adaptation estimate for use at ∼2/3 of the way into the radiotherapy treatment course. By letting the DQN component freely control the estimated adaptive dose per fraction (ranging from 1-5 Gy), the DRL automatically favored dose

  10. Microstimulation of the human substantia nigra alters reinforcement learning.

    Science.gov (United States)

    Ramayya, Ashwin G; Misra, Amrit; Baltuch, Gordon H; Kahana, Michael J

    2014-05-14

    Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action-reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action-reward associations rather than stimulus-reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action-reward associations during reinforcement learning. Copyright © 2014 the authors 0270-6474/14/346887-09$15.00/0.

  11. Mastery Learning through Individualized Instruction: A Reinforcement Strategy

    Science.gov (United States)

    Sagy, John; Ravi, R.; Ananthasayanam, R.

    2009-01-01

    The present study attempts to gauge the effect of individualized instructional methods as a reinforcement strategy for mastery learning. Among various individualized instructional methods, the study focuses on PIM (Programmed Instructional Method) and CAIM (Computer Assisted Instruction Method). Mastery learning is a process where students achieve…

  12. Role of dopamine D2 receptors in human reinforcement learning.

    Science.gov (United States)

    Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W

    2014-09-01

    Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well.

  13. Social Learning, Reinforcement and Crime: Evidence from Three European Cities

    Science.gov (United States)

    Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina

    2012-01-01

    This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…

  14. Reinforcement Learning based Algorithm with Safety Handling and Risk Perception

    NARCIS (Netherlands)

    Shyamsundar, S.; Mannucci, T.; van Kampen, E.; Jin, Y; Kollias, S.

    2016-01-01

    Navigation in an unknown or uncertain environment is a challenging task for an autonomous agent. The agent is expected to behave independently and to learn the suitable action to take for a given situation. Reinforcement Learning could be used to help the agent adapt to an unknown environment and

  15. Multi-task reinforcement learning: shaping and feature selection

    NARCIS (Netherlands)

    Snel, M.; Whiteson, S.

    2011-01-01

    Shaping functions can be used in multi-task reinforcement learning (RL) to incorporate knowledge from previously experienced source tasks to speed up learning on a new target task. Earlier work has not clearly motivated choices for the shaping function. This paper discusses and empirically compares

  16. The Effects of Large Disturbances on On-Line Reinforcement Learning for aWalking Robot

    NARCIS (Netherlands)

    Schuitema, E.; Caarls, W.; Wisse, M.; Jonker, P.P.; Babuska, R.

    2010-01-01

    Reinforcement Learning is a promising paradigm for adding learning capabilities to humanoid robots. One of the difficulties of the real world is the presence of disturbances. In Reinforcement Learning, disturbances are typically dealt with stochastically. However, large and infrequent disturbances

  17. Generalization of value in reinforcement learning by humans.

    Science.gov (United States)

    Wimmer, G Elliott; Daw, Nathaniel D; Shohamy, Daphna

    2012-04-01

    Research in decision-making has focused on the role of dopamine and its striatal targets in guiding choices via learned stimulus-reward or stimulus-response associations, behavior that is well described by reinforcement learning theories. However, basic reinforcement learning is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision-making. A candidate mechanism for this type of learning comes from the domain of memory, which has highlighted a role for the hippocampus in learning of stimulus-stimulus relations, typically dissociated from the role of the striatum in stimulus-response learning. Here, we used functional magnetic resonance imaging and computational model-based analyses to examine the joint contributions of these mechanisms to reinforcement learning. Humans performed a reinforcement learning task with added relational structure, modeled after tasks used to isolate hippocampal contributions to memory. On each trial participants chose one of four options, but the reward probabilities for pairs of options were correlated across trials. This (uninstructed) relationship between pairs of options potentially enabled an observer to learn about option values based on experience with the other options and to generalize across them. We observed blood oxygen level-dependent (BOLD) activity related to learning in the striatum and also in the hippocampus. By comparing a basic reinforcement learning model to one augmented to allow feedback to generalize between correlated options, we tested whether choice behavior and BOLD activity were influenced by the opportunity to generalize across correlated options. Although such generalization goes beyond standard computational accounts of reinforcement learning and striatal BOLD, both choices and striatal BOLD activity were better explained by the augmented model. Consistent with the hypothesized role for the hippocampus in this generalization, functional

  18. Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.

    Science.gov (United States)

    Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna

    2016-09-01

    Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing.

  19. Can model-free reinforcement learning explain deontological moral judgments?

    Science.gov (United States)

    Ayars, Alisabeth

    2016-05-01

    Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. Reinforcement learning in depression: A review of computational research.

    Science.gov (United States)

    Chen, Chong; Takahashi, Taiki; Nakagawa, Shin; Inoue, Takeshi; Kusumi, Ichiro

    2015-08-01

    Despite being considered primarily a mood disorder, major depressive disorder (MDD) is characterized by cognitive and decision making deficits. Recent research has employed computational models of reinforcement learning (RL) to address these deficits. The computational approach has the advantage in making explicit predictions about learning and behavior, specifying the process parameters of RL, differentiating between model-free and model-based RL, and the computational model-based functional magnetic resonance imaging and electroencephalography. With these merits there has been an emerging field of computational psychiatry and here we review specific studies that focused on MDD. Considerable evidence suggests that MDD is associated with impaired brain signals of reward prediction error and expected value ('wanting'), decreased reward sensitivity ('liking') and/or learning (be it model-free or model-based), etc., although the causality remains unclear. These parameters may serve as valuable intermediate phenotypes of MDD, linking general clinical symptoms to underlying molecular dysfunctions. We believe future computational research at clinical, systems, and cellular/molecular/genetic levels will propel us toward a better understanding of the disease. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Reinforcement learning signal predicts social conformity.

    NARCIS (Netherlands)

    Klucharev, V.; Hytonen, K.A.; Rijpkema, M.J.P.; Smidts, A.; Fernandez, G.S.E.

    2009-01-01

    We often change our decisions and judgments to conform with normative group behavior. However, the neural mechanisms of social conformity remain unclear. Here we show, using functional magnetic resonance imaging, that conformity is based on mechanisms that comply with principles of reinforcement

  2. Residential Demand Response of Thermostatically Controlled Loads Using Batch Reinforcement Learning

    NARCIS (Netherlands)

    Ruelens, F; Claessens, BJ; Vandael, S; De Schutter, B.H.K.; Babuska, R.; Belmans, R

    2017-01-01

    Driven by recent advances in batch Reinforcement Learning (RL), this paper contributes to the application of batch RL to demand response. In contrast to conventional model-based approaches, batch RL techniques do not require a system identification step, making them more suitable for a large-scale

  3. Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Taylor, M.E.; Stone, P.

    2010-01-01

    Temporal difference and evolutionary methods are two of the most common approaches to solving reinforcement learning problems. However, there is little consensus on their relative merits and there have been few empirical studies that directly compare their performance. This article aims to address

  4. Adaptive Trajectory Tracking Control using Reinforcement Learning for Quadrotor

    Directory of Open Access Journals (Sweden)

    Wenjie Lou

    2016-02-01

    Full Text Available Inaccurate system parameters and unpredicted external disturbances affect the performance of non-linear controllers. In this paper, a new adaptive control algorithm under the reinforcement framework is proposed to stabilize a quadrotor helicopter. Based on a command-filtered non-linear control algorithm, adaptive elements are added and learned by policy-search methods. To predict the inaccurate system parameters, a new kernel-based regression learning method is provided. In addition, Policy learning by Weighting Exploration with the Returns (PoWER and Return Weighted Regression (RWR are utilized to learn the appropriate parameters for adaptive elements in order to cancel the effect of external disturbance. Furthermore, numerical simulations under several conditions are performed, and the ability of adaptive trajectory-tracking control with reinforcement learning are demonstrated.

  5. The Computational Development of Reinforcement Learning during Adolescence.

    Directory of Open Access Journals (Sweden)

    Stefano Palminteri

    2016-06-01

    Full Text Available Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed. Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback and a value contextualisation module (enabling symmetrical reward and punishment learning. Unlike adults, adolescent performance did not benefit from counterfactual (complete feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.

  6. The Computational Development of Reinforcement Learning during Adolescence.

    Science.gov (United States)

    Palminteri, Stefano; Kilford, Emma J; Coricelli, Giorgio; Blakemore, Sarah-Jayne

    2016-06-01

    Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.

  7. Comparison of two novel approaches to model fibre reinforced concrete

    NARCIS (Netherlands)

    Radtke, F.K.F.; Simone, A.; Sluys, L.J.

    2009-01-01

    We present two approaches to model fibre reinforced concrete. In both approaches, discrete fibre distributions and the behaviour of the fibre-matrix interface are explicitly considered. One approach employs the reaction forces from fibre to matrix while the other is based on the partition of unity

  8. Stress modulates reinforcement learning in younger and older adults.

    Science.gov (United States)

    Lighthall, Nichole R; Gorlick, Marissa A; Schoeke, Andrej; Frank, Michael J; Mather, Mara

    2013-03-01

    Animal research and human neuroimaging studies indicate that stress increases dopamine levels in brain regions involved in reward processing, and stress also appears to increase the attractiveness of addictive drugs. The current study tested the hypothesis that stress increases reward salience, leading to more effective learning about positive than negative outcomes in a probabilistic selection task. Changes to dopamine pathways with age raise the question of whether stress effects on incentive-based learning differ by age. Thus, the present study also examined whether effects of stress on reinforcement learning differed for younger (age 18-34) and older participants (age 65-85). Cold pressor stress was administered to half of the participants in each age group, and salivary cortisol levels were used to confirm biophysiological response to cold stress. After the manipulation, participants completed a probabilistic learning task involving positive and negative feedback. In both younger and older adults, stress enhanced learning about cues that predicted positive outcomes. In addition, during the initial learning phase, stress diminished sensitivity to recent feedback across age groups. These results indicate that stress affects reinforcement learning in both younger and older adults and suggests that stress exerts different effects on specific components of reinforcement learning depending on their neural underpinnings.

  9. Reinforcement learning in multidimensional environments relies on attention mechanisms.

    Science.gov (United States)

    Niv, Yael; Daniel, Reka; Geana, Andra; Gershman, Samuel J; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C

    2015-05-27

    In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this "representation learning" process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the "curse of dimensionality" in reinforcement learning. Copyright © 2015 the authors 0270-6474/15/358145-13$15.00/0.

  10. Punishment insensitivity and impaired reinforcement learning in preschoolers

    Science.gov (United States)

    Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.

    2013-01-01

    Background Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a developmental vulnerability to psychopathic traits. Methods 157 preschoolers (mean age 4.7 ±0.8 years) participated in a substudy that was embedded within a larger project. Children completed the “Stars-in-Jars” task, which involved learning to select rewarded jars and avoid punished jars. Maternal report of responsiveness to socialization was assessed with the Punishment Insensitivity and Low Concern for Others scales of the Multidimensional Assessment of Preschool Disruptive Behavior (MAP-DB). Results Punishment Insensitivity, but not Low Concern for Others, was significantly associated with reinforcement learning in multivariate models that accounted for age and sex. Specifically, higher Punishment Insensitivity was associated with significantly lower overall performance and more errors on punished trials (“passive avoidance”). Conclusions Impairments in reinforcement learning manifest in preschoolers who are high in maternal ratings of Punishment Insensitivity. If replicated, these findings may help to pinpoint the neurodevelopmental antecedents of psychopathic tendencies and suggest novel intervention targets beginning in early childhood. PMID:24033313

  11. Learning the specific quality of taste reinforcement in larval Drosophila.

    Science.gov (United States)

    Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram

    2015-01-27

    The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing-in any brain.

  12. A Reinforcement Learning Framework for Spiking Networks with Dynamic Synapses

    Directory of Open Access Journals (Sweden)

    Karim El-Laithy

    2011-01-01

    Full Text Available An integration of both the Hebbian-based and reinforcement learning (RL rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.

  13. Antipsychotic dose modulates behavioral and neural responses to feedback during reinforcement learning in schizophrenia.

    Science.gov (United States)

    Insel, Catherine; Reinen, Jenna; Weber, Jochen; Wager, Tor D; Jarskog, L Fredrik; Shohamy, Daphna; Smith, Edward E

    2014-03-01

    Schizophrenia is characterized by an abnormal dopamine system, and dopamine blockade is the primary mechanism of antipsychotic treatment. Consistent with the known role of dopamine in reward processing, prior research has demonstrated that patients with schizophrenia exhibit impairments in reward-based learning. However, it remains unknown how treatment with antipsychotic medication impacts the behavioral and neural signatures of reinforcement learning in schizophrenia. The goal of this study was to examine whether antipsychotic medication modulates behavioral and neural responses to prediction error coding during reinforcement learning. Patients with schizophrenia completed a reinforcement learning task while undergoing functional magnetic resonance imaging. The task consisted of two separate conditions in which participants accumulated monetary gain or avoided monetary loss. Behavioral results indicated that antipsychotic medication dose was associated with altered behavioral approaches to learning, such that patients taking higher doses of medication showed increased sensitivity to negative reinforcement. Higher doses of antipsychotic medication were also associated with higher learning rates (LRs), suggesting that medication enhanced sensitivity to trial-by-trial feedback. Neuroimaging data demonstrated that antipsychotic dose was related to differences in neural signatures of feedback prediction error during the loss condition. Specifically, patients taking higher doses of medication showed attenuated prediction error responses in the striatum and the medial prefrontal cortex. These findings indicate that antipsychotic medication treatment may influence motivational processes in patients with schizophrenia.

  14. Flow Navigation by Smart Microswimmers via Reinforcement Learning

    Science.gov (United States)

    Colabrese, Simona; Biferale, Luca; Celani, Antonio; Gustavsson, Kristian

    2017-11-01

    We have numerically modeled active particles which are able to acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. We show that those swimmers can learn effective strategies just by experience, using a reinforcement learning algorithm. As an example, we focus on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, exploiting the underlying flow whenever possible. The reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This work paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. ERC AdG NewTURB 339032.

  15. Reinforcement Learning in Young Adults with Developmental Language Impairment

    Science.gov (United States)

    Lee, Joanna C.; Tomblin, J. Bruce

    2012-01-01

    The aim of the study was to examine reinforcement learning (RL) in young adults with developmental language impairment (DLI) within the context of a neurocomputational model of the basal ganglia-dopamine system (Frank, Seeberger, & O'Reilly, 2004). Two groups of young adults, one with DLI and the other without, were recruited. A probabilistic…

  16. Generalized domains for empirical evaluations in reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Tanner, B.; Taylor, M.E.; Stone, P.

    2009-01-01

    Many empirical results in reinforcement learning are based on a very small set of environments. These results often represent the best algorithm parameters that were found after an ad-hoc tuning or fitting process. We argue that presenting tuned scores from a small set of environments leads to

  17. Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

    Science.gov (United States)

    Ezaki, Takahiro; Horita, Yutaka; Takezawa, Masanori; Masuda, Naoki

    2016-07-01

    Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody) conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.

  18. Perceptual learning rules based on reinforcers and attention

    NARCIS (Netherlands)

    Roelfsema, Pieter R.; van Ooyen, Arjen; Watanabe, Takeo

    2010-01-01

    How does the brain learn those visual features that are relevant for behavior? In this article, we focus on two factors that guide plasticity of visual representations. First, reinforcers cause the global release of diffusive neuromodulatory signals that gate plasticity. Second, attentional feedback

  19. Traffic light control by multiagent reinforcement learning systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.; Groen, F.C.A.; Babuška, R.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  20. Traffic Light Control by Multiagent Reinforcement Learning Systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.J.H.M.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  1. Perceptual learning rules based on reinforcers and attention.

    NARCIS (Netherlands)

    Roelfsema, P.R.; van Ooyen, A.; Watanabe, T.

    2010-01-01

    How does the brain learn those visual features that are relevant for behavior? In this article, we focus on two factors that guide plasticity of visual representations. First, reinforcers cause the global release of diffusive neuromodulatory signals that gate plasticity. Second, attentional feedback

  2. Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

    Directory of Open Access Journals (Sweden)

    Takahiro Ezaki

    2016-07-01

    Full Text Available Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.

  3. Neuroevolutionary reinforcement learning for generalized control of simulated helicopters

    NARCIS (Netherlands)

    Koppejan, R.; Whiteson, S.

    2011-01-01

    This article presents an extended case study in the application of neuroevolution to generalized simulated helicopter hovering, an important challenge problem for reinforcement learning. While neuroevolution is well suited to coping with the domain’s complex transition dynamics and high-dimensional

  4. Switching between different state representations in reinforcement learning

    NARCIS (Netherlands)

    van Seijen, H.; Bakker, B.; Kester, L.; Gammerman, A.

    2008-01-01

    This paper proposes a reinforcement learning architecture con taining multiple "experts", each of which is a specialist in a dif ferent region in the overall state space. The central idea is that the different experts use qualitatively different (but sufficiently Markov) state representations, each

  5. Deep Reinforcement Learning using Capsules in Advanced Game Environments

    OpenAIRE

    Andersen, Per-Arne

    2018-01-01

    Reinforcement Learning (RL) is a research area that has blossomed tremendously in recent years and has shown remarkable potential for artificial intelligence based opponents in computer games. This success is primarily due to vast capabilities of Convolutional Neural Networks (ConvNet), enabling algorithms to extract useful information from noisy environments. Capsule Network (CapsNet) is a recent introduction to the Deep Learning algorithm group and has only barely begun to be explored. The ...

  6. Hierarchical object detection with deep reinforcement learning

    OpenAIRE

    Bellver, Míriam; Giró Nieto, Xavier; Marqués Acosta, Fernando; Torres Viñals, Jordi

    2017-01-01

    Deep learning and image processing are two areas of great interest to academics and industry professionals alike. The areas of application of these two disciplines range widely, encompassing fields such as medicine, robotics, and security and surveillance. The aim of this book, ‘Deep Learning for Image Processing Applications’, is to offer concepts from these two areas in the same platform, and the book brings together the shared ideas of professionals from academia and research about pro...

  7. Reinforcement Learning Applications to Combat Identification

    Science.gov (United States)

    2017-03-01

    Reduction Project (0704-0188) Washington, DC 20503. 1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE March 2017 3. REPORT TYPE AND DATES COVERED...of parameters. C. SOAR CID APPLICATION While the Soar software suite is comprised of a set of files that are all required to work in concert...explore the possible benefits of one learning type over another. Testing the data against a series of exploration strategies using Q-learning over

  8. Optimization of anemia treatment in hemodialysis patients via reinforcement learning.

    Science.gov (United States)

    Escandell-Montero, Pablo; Chermisi, Milena; Martínez-Martínez, José M; Gómez-Sanchis, Juan; Barbieri, Carlo; Soria-Olivas, Emilio; Mari, Flavio; Vila-Francés, Joan; Stopper, Andrea; Gatti, Emanuele; Martín-Guerrero, José D

    2014-09-01

    Anemia is a frequent comorbidity in hemodialysis patients that can be successfully treated by administering erythropoiesis-stimulating agents (ESAs). ESAs dosing is currently based on clinical protocols that often do not account for the high inter- and intra-individual variability in the patient's response. As a result, the hemoglobin level of some patients oscillates around the target range, which is associated with multiple risks and side-effects. This work proposes a methodology based on reinforcement learning (RL) to optimize ESA therapy. RL is a data-driven approach for solving sequential decision-making problems that are formulated as Markov decision processes (MDPs). Computing optimal drug administration strategies for chronic diseases is a sequential decision-making problem in which the goal is to find the best sequence of drug doses. MDPs are particularly suitable for modeling these problems due to their ability to capture the uncertainty associated with the outcome of the treatment and the stochastic nature of the underlying process. The RL algorithm employed in the proposed methodology is fitted Q iteration, which stands out for its ability to make an efficient use of data. The experiments reported here are based on a computational model that describes the effect of ESAs on the hemoglobin level. The performance of the proposed method is evaluated and compared with the well-known Q-learning algorithm and with a standard protocol. Simulation results show that the performance of Q-learning is substantially lower than FQI and the protocol. When comparing FQI and the protocol, FQI achieves an increment of 27.6% in the proportion of patients that are within the targeted range of hemoglobin during the period of treatment. In addition, the quantity of drug needed is reduced by 5.13%, which indicates a more efficient use of ESAs. Although prospective validation is required, promising results demonstrate the potential of RL to become an alternative to current

  9. A Neuro-Control Design Based on Fuzzy Reinforcement Learning

    DEFF Research Database (Denmark)

    Katebi, S.D.; Blanke, M.

    This paper describes a neuro-control fuzzy critic design procedure based on reinforcement learning. An important component of the proposed intelligent control configuration is the fuzzy credit assignment unit which acts as a critic, and through fuzzy implications provides adjustment mechanisms...... ones instruct the neuro-control unit to adjust its weights and are simultaneously stored in the memory unit during the training phase. In response to the internal reinforcement signal (set point threshold deviation), the stored information is retrieved by the action applier unit and utilized for re...

  10. Mobile robots exploration through cnn-based reinforcement learning.

    Science.gov (United States)

    Tai, Lei; Liu, Ming

    2016-01-01

    Exploration in an unknown environment is an elemental application for mobile robots. In this paper, we outlined a reinforcement learning method aiming for solving the exploration problem in a corridor environment. The learning model took the depth image from an RGB-D sensor as the only input. The feature representation of the depth image was extracted through a pre-trained convolutional-neural-networks model. Based on the recent success of deep Q-network on artificial intelligence, the robot controller achieved the exploration and obstacle avoidance abilities in several different simulated environments. It is the first time that the reinforcement learning is used to build an exploration strategy for mobile robots through raw sensor information.

  11. Working memory contributions to reinforcement learning impairments in schizophrenia.

    Science.gov (United States)

    Collins, Anne G E; Brown, Jaime K; Gold, James M; Waltz, James A; Frank, Michael J

    2014-10-08

    Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia. Copyright © 2014 the authors 0270-6474/14/3413747-10$15.00/0.

  12. Towards practical multiscale approach for analysis of reinforced concrete structures

    Science.gov (United States)

    Moyeda, Arturo; Fish, Jacob

    2017-12-01

    We present a novel multiscale approach for analysis of reinforced concrete structural elements that overcomes two major hurdles in utilization of multiscale technologies in practice: (1) coupling between material and structural scales due to consideration of large representative volume elements (RVE), and (2) computational complexity of solving complex nonlinear multiscale problems. The former is accomplished using a variant of computational continua framework that accounts for sizeable reinforced concrete RVEs by adjusting the location of quadrature points. The latter is accomplished by means of reduced order homogenization customized for structural elements. The proposed multiscale approach has been verified against direct numerical simulations and validated against experimental results.

  13. Basic protocols in quantum reinforcement learning with superconducting circuits.

    Science.gov (United States)

    Lamata, Lucas

    2017-05-09

    Superconducting circuit technologies have recently achieved quantum protocols involving closed feedback loops. Quantum artificial intelligence and quantum machine learning are emerging fields inside quantum technologies which may enable quantum devices to acquire information from the outer world and improve themselves via a learning process. Here we propose the implementation of basic protocols in quantum reinforcement learning, with superconducting circuits employing feedback- loop control. We introduce diverse scenarios for proof-of-principle experiments with state-of-the-art superconducting circuit technologies and analyze their feasibility in presence of imperfections. The field of quantum artificial intelligence implemented with superconducting circuits paves the way for enhanced quantum control and quantum computation protocols.

  14. Amygdala and Ventral Striatum Make Distinct Contributions to Reinforcement Learning.

    Science.gov (United States)

    Costa, Vincent D; Dal Monte, Olga; Lucas, Daniel R; Murray, Elisabeth A; Averbeck, Bruno B

    2016-10-19

    Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL, we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with an RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys' choice reaction times, which emphasized a speed-accuracy trade-off that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL. Published by Elsevier Inc.

  15. Learning to Predict Consequences as a Method of Knowledge Transfer in Reinforcement Learning.

    Science.gov (United States)

    Chalmers, Eric; Contreras, Edgar Bermudez; Robertson, Brandon; Luczak, Artur; Gruber, Aaron

    2017-04-17

    The reinforcement learning (RL) paradigm allows agents to solve tasks through trial-and-error learning. To be capable of efficient, long-term learning, RL agents should be able to apply knowledge gained in the past to new tasks they may encounter in the future. The ability to predict actions' consequences may facilitate such knowledge transfer. We consider here domains where an RL agent has access to two kinds of information: agent-centric information with constant semantics across tasks, and environment-centric information, which is necessary to solve the task, but with semantics that differ between tasks. For example, in robot navigation, environment-centric information may include the robot's geographic location, while agent-centric information may include sensor readings of various nearby obstacles. We propose that these situations provide an opportunity for a very natural style of knowledge transfer, in which the agent learns to predict actions' environmental consequences using agent-centric information. These predictions contain important information about the affordances and dangers present in a novel environment, and can effectively transfer knowledge from agent-centric to environment-centric learning systems. Using several example problems including spatial navigation and network routing, we show that our knowledge transfer approach can allow faster and lower cost learning than existing alternatives.

  16. Possibility of reinforcement learning based on event-related potential.

    Science.gov (United States)

    Yamagishi, Yuya; Tsubone, Tadashi; Wada, Yasuhiro

    2008-01-01

    We applied event-related potential (ERP) to reinforcement signals that are equivalent to reward and punishment signals.We conducted an electroencephalogram (EEG) in which volunteers identified the success or failure of a task. We confirmed that there were differences in the EEG depending on whether the task was successful or not and suggested that ERP might be used as a reward of reinforcement leaning. We used a support vector machine (SVM) for recognizing the P300. We selected the feature vector in SVM that was composed of averages of each 50 ms for each of the six channels (C3,Cz,C4,P3,Pz,P4) for a total of 700 ms. We can suggest that reinforcement learning using P300 can be performed accurately.

  17. Effective reinforcement learning following cerebellar damage requires a balance between exploration and motor noise.

    Science.gov (United States)

    Therrien, Amanda S; Wolpert, Daniel M; Bastian, Amy J

    2016-01-01

    Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise. © The Author (2015). Published by Oxford University Press on behalf of the Guarantors of Brain.

  18. Reinforcement learning agents providing advice in complex video games

    Science.gov (United States)

    Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa

    2014-01-01

    This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

  19. Better Education through Improved Reinforcement Learning

    Science.gov (United States)

    Mandel, Travis Scott

    2017-01-01

    When a new student comes to play an educational game, how can we determine what content to give them such that they learn as much as possible? When a frustrated customer calls in to a helpline, how can we determine what to say to best assist them? When an ill patient comes in to the clinic, how do we determine what tests to run and treatments to…

  20. Battery Energy Management in a Microgrid Using Batch Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Brida V. Mbuwir

    2017-11-01

    Full Text Available Motivated by recent developments in batch Reinforcement Learning (RL, this paper contributes to the application of batch RL in energy management in microgrids. We tackle the challenge of finding a closed-loop control policy to optimally schedule the operation of a storage device, in order to maximize self-consumption of local photovoltaic production in a microgrid. In this work, the fitted Q-iteration algorithm, a standard batch RL technique, is used by an RL agent to construct a control policy. The proposed method is data-driven and uses a state-action value function to find an optimal scheduling plan for a battery. The battery’s charge and discharge efficiencies, and the nonlinearity in the microgrid due to the inverter’s efficiency are taken into account. The proposed approach has been tested by simulation in a residential setting using data from Belgian residential consumers. The developed framework is benchmarked with a model-based technique, and the simulation results show a performance gap of 19%. The simulation results provide insight for developing optimal policies in more realistically-scaled and interconnected microgrids and for including uncertainties in generation and consumption for which white-box models become inaccurate and/or infeasible.

  1. Robot Learning Using Learning Classifier Systems Approach

    OpenAIRE

    Jabin, Suraiya

    2010-01-01

    In this chapter, I have presented Learning Classifier Systems, which add to the classical Reinforcement Learning framework the possibility of representing the state as a vector of attributes and finding a compact expression of the representation so induced. Their formalism conveys a nice interaction between learning and evolution, which makes them a class of particularly rich systems, at the intersection of several research domains. As a result, they profit from the accumulated extensions of ...

  2. A Discussion of Possibility of Reinforcement Learning Using Event-Related Potential in BCI

    Science.gov (United States)

    Yamagishi, Yuya; Tsubone, Tadashi; Wada, Yasuhiro

    Recently, Brain computer interface (BCI) which is a direct connecting pathway an external device such as a computer or a robot and a human brain have gotten a lot of attention. Since BCI can control the machines as robots by using the brain activity without using the voluntary muscle, the BCI may become a useful communication tool for handicapped persons, for instance, amyotrophic lateral sclerosis patients. However, in order to realize the BCI system which can perform precise tasks on various environments, it is necessary to design the control rules to adapt to the dynamic environments. Reinforcement learning is one approach of the design of the control rule. If this reinforcement leaning can be performed by the brain activity, it leads to the attainment of BCI that has general versatility. In this research, we paid attention to P300 of event-related potential as an alternative signal of the reward of reinforcement learning. We discriminated between the success and the failure trials from P300 of the EEG of the single trial by using the proposed discrimination algorithm based on Support vector machine. The possibility of reinforcement learning was examined from the viewpoint of the number of discriminated trials. It was shown that there was a possibility to be able to learn in most subjects.

  3. From Recurrent Choice to Skill Learning: A Reinforcement-Learning Model

    Science.gov (United States)

    Fu, Wai-Tat; Anderson, John R.

    2006-01-01

    The authors propose a reinforcement-learning mechanism as a model for recurrent choice and extend it to account for skill learning. The model was inspired by recent research in neurophysiological studies of the basal ganglia and provides an integrated explanation of recurrent choice behavior and skill learning. The behavior includes effects of…

  4. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.

    Science.gov (United States)

    Elfwing, Stefan; Uchibe, Eiji; Doya, Kenji

    2018-01-11

    In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Tesauro's TD-Gammon achieved near top-level human performance in backgammon, the deep reinforcement learning algorithm DQN achieved human-level performance in many Atari 2600 games. The purpose of this study is twofold. First, we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU). The activation of the SiLU is computed by the sigmoid function multiplied by its input. Second, we suggest that the more traditional approach of using on-policy learning with eligibility traces, instead of experience replay, and softmax action selection can be competitive with DQN, without the need for a separate target network. We validate our proposed approach by, first, achieving new state-of-the-art results in both stochastic SZ-Tetris and Tetris with a small 10 × 10 board, using TD(λ) learning and shallow dSiLU network agents, and, then, by outperforming DQN in the Atari 2600 domain by using a deep Sarsa(λ) agent with SiLU and dSiLU hidden units. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  5. Multiagent cooperation and competition with deep reinforcement learning.

    Directory of Open Access Journals (Sweden)

    Ardi Tampuu

    Full Text Available Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

  6. Multiagent cooperation and competition with deep reinforcement learning.

    Science.gov (United States)

    Tampuu, Ardi; Matiisen, Tambet; Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

    2017-01-01

    Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

  7. Emotional Multiagent Reinforcement Learning in Spatial Social Dilemmas.

    Science.gov (United States)

    Yu, Chao; Zhang, Minjie; Ren, Fenghui; Tan, Guozhen

    2015-12-01

    Social dilemmas have attracted extensive interest in the research of multiagent systems in order to study the emergence of cooperative behaviors among selfish agents. Understanding how agents can achieve cooperation in social dilemmas through learning from local experience is a critical problem that has motivated researchers for decades. This paper investigates the possibility of exploiting emotions in agent learning in order to facilitate the emergence of cooperation in social dilemmas. In particular, the spatial version of social dilemmas is considered to study the impact of local interactions on the emergence of cooperation in the whole system. A double-layered emotional multiagent reinforcement learning framework is proposed to endow agents with internal cognitive and emotional capabilities that can drive these agents to learn cooperative behaviors. Experimental results reveal that various network topologies and agent heterogeneities have significant impacts on agent learning behaviors in the proposed framework, and under certain circumstances, high levels of cooperation can be achieved among the agents.

  8. Reward and reinforcement activity in the nucleus accumbens during learning

    Directory of Open Access Journals (Sweden)

    John Thomas Gale

    2014-04-01

    Full Text Available The nucleus accumbens core (NAcc has been implicated in learning associations between sensory cues and profitable motor responses. However, the precise mechanisms that underlie these functions remain unclear. We recorded single-neuron activity from the NAcc of primates trained to perform a visual-motor associative learning task. During learning, we found two distinct classes of NAcc neurons. The first class demonstrated progressive increases in firing rates at the go-cue, feedback/tone and reward epochs of the task, as novel associations were learned. This suggests that these neurons may play a role in the exploitation of rewarding behaviors. In contrast, the second class exhibited attenuated firing rates, but only at the reward epoch of the task. These findings suggest that some NAcc neurons play a role in reward-based reinforcement during learning.

  9. Deep Direct Reinforcement Learning for Financial Signal Representation and Trading.

    Science.gov (United States)

    Deng, Yue; Bao, Feng; Kong, Youyong; Ren, Zhiquan; Dai, Qionghai

    2017-03-01

    Can we train the computer to beat experienced traders for financial assert trading? In this paper, we try to address this challenge by introducing a recurrent deep neural network (NN) for real-time financial signal representation and trading. Our model is inspired by two biological-related learning concepts of deep learning (DL) and reinforcement learning (RL). In the framework, the DL part automatically senses the dynamic market condition for informative feature learning. Then, the RL module interacts with deep representations and makes trading decisions to accumulate the ultimate rewards in an unknown environment. The learning system is implemented in a complex NN that exhibits both the deep and recurrent structures. Hence, we propose a task-aware backpropagation through time method to cope with the gradient vanishing issue in deep training. The robustness of the neural system is verified on both the stock and the commodity future markets under broad testing conditions.

  10. Multiagent Reinforcement Learning Dynamic Spectrum Access in Cognitive Radios

    Directory of Open Access Journals (Sweden)

    Wu Chun

    2014-02-01

    Full Text Available A multiuser independent Q-learning method which does not need information interaction is proposed for multiuser dynamic spectrum accessing in cognitive radios. The method adopts self-learning paradigm, in which each CR user performs reinforcement learning only through observing individual performance reward without spending communication resource on information interaction with others. The reward is defined suitably to present channel quality and channel conflict status. The learning strategy of sufficient exploration, preference for good channel, and punishment for channel conflict is designed to implement multiuser dynamic spectrum accessing. In two users two channels scenario, a fast learning algorithm is proposed and the convergence to maximal whole reward is proved. The simulation results show that, with the proposed method, the CR system can obtain convergence of Nash equilibrium with large probability and achieve great performance of whole reward.

  11. Simulation-based optimization parametric optimization techniques and reinforcement learning

    CERN Document Server

    Gosavi, Abhijit

    2003-01-01

    Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning introduces the evolving area of simulation-based optimization. The book's objective is two-fold: (1) It examines the mathematical governing principles of simulation-based optimization, thereby providing the reader with the ability to model relevant real-life problems using these techniques. (2) It outlines the computational technology underlying these methods. Taken together these two aspects demonstrate that the mathematical and computational methods discussed in this book do work. Broadly speaking, the book has two parts: (1) parametric (static) optimization and (2) control (dynamic) optimization. Some of the book's special features are: *An accessible introduction to reinforcement learning and parametric-optimization techniques. *A step-by-step description of several algorithms of simulation-based optimization. *A clear and simple introduction to the methodology of neural networks. *A gentle introduction to converg...

  12. Reinforcement active learning in the vibrissae system: optimal object localization.

    Science.gov (United States)

    Gordon, Goren; Dorfman, Nimrod; Ahissar, Ehud

    2013-01-01

    Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment. Copyright © 2012 Elsevier Ltd. All rights reserved.

  13. Learning and altering behaviours by reinforcement: neurocognitive differences between children and adults.

    Science.gov (United States)

    Shephard, E; Jackson, G M; Groom, M J

    2014-01-01

    This study examined neurocognitive differences between children and adults in the ability to learn and adapt simple stimulus-response associations through feedback. Fourteen typically developing children (mean age=10.2) and 15 healthy adults (mean age=25.5) completed a simple task in which they learned to associate visually presented stimuli with manual responses based on performance feedback (acquisition phase), and then reversed and re-learned those associations following an unexpected change in reinforcement contingencies (reversal phase). Electrophysiological activity was recorded throughout task performance. We found no group differences in learning-related changes in performance (reaction time, accuracy) or in the amplitude of event-related potentials (ERPs) associated with stimulus processing (P3 ERP) or feedback processing (feedback-related negativity; FRN) during the acquisition phase. However, children's performance was significantly more disrupted by the reversal than adults and FRN amplitudes were significantly modulated by the reversal phase in children but not adults. These findings indicate that children have specific difficulties with reinforcement learning when acquired behaviours must be altered. This may be caused by the added demands on immature executive functioning, specifically response monitoring, created by the requirement to reverse the associations, or a developmental difference in the way in which children and adults approach reinforcement learning. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.

  14. Learning and altering behaviours by reinforcement: Neurocognitive differences between children and adults

    Directory of Open Access Journals (Sweden)

    E. Shephard

    2014-01-01

    Full Text Available This study examined neurocognitive differences between children and adults in the ability to learn and adapt simple stimulus–response associations through feedback. Fourteen typically developing children (mean age = 10.2 and 15 healthy adults (mean age = 25.5 completed a simple task in which they learned to associate visually presented stimuli with manual responses based on performance feedback (acquisition phase, and then reversed and re-learned those associations following an unexpected change in reinforcement contingencies (reversal phase. Electrophysiological activity was recorded throughout task performance. We found no group differences in learning-related changes in performance (reaction time, accuracy or in the amplitude of event-related potentials (ERPs associated with stimulus processing (P3 ERP or feedback processing (feedback-related negativity; FRN during the acquisition phase. However, children's performance was significantly more disrupted by the reversal than adults and FRN amplitudes were significantly modulated by the reversal phase in children but not adults. These findings indicate that children have specific difficulties with reinforcement learning when acquired behaviours must be altered. This may be caused by the added demands on immature executive functioning, specifically response monitoring, created by the requirement to reverse the associations, or a developmental difference in the way in which children and adults approach reinforcement learning.

  15. Fiber-reinforced technology in multidisciplinary chairside approaches.

    Science.gov (United States)

    Arhun, Neslihan; Arman, Ayca

    2008-01-01

    There is an increasing demand to improve dentofacial esthetics in the adult population. This demand usually requires a close collaboration within the various disciplines of dentistry and the patient at every stage of the therapy. The materials and techniques used by these interdisciplinary clinicians must be conservative and minimally invasive. Fiber-reinforced composite technology offers such solutions for chairside applications. This case report presents two cases where fiber-reinforced ribbon and composite complex was used in a multidisciplinary approach to improve esthetics.

  16. Fiber-reinforced technology in multidisciplinary chairside approaches

    Directory of Open Access Journals (Sweden)

    Arhun Neslihan

    2008-01-01

    Full Text Available There is an increasing demand to improve dentofacial esthetics in the adult population. This demand usually requires a close collaboration within the various disciplines of dentistry and the patient at every stage of the therapy. The materials and techniques used by these interdisciplinary clinicians must be conservative and minimally invasive. Fiber-reinforced composite technology offers such solutions for chairside applications. This case report presents two cases where fiber-reinforced ribbon and composite complex was used in a multidisciplinary approach to improve esthetics.

  17. FlashRL: A Reinforcement Learning Platform for Flash Games

    OpenAIRE

    Andersen, Per-Arne; Goodwin, Morten; Granmo, Ole-Christoffer

    2018-01-01

    Reinforcement Learning (RL) is a research area that has blossomed tremendously in recent years and has shown remarkable potential in among others successfully playing computer games. However, there only exists a few game platforms that provide diversity in tasks and state-space needed to advance RL algorithms. The existing platforms offer RL access to Atari- and a few web-based games, but no platform fully expose access to Flash games. This is unfortunate because applying RL to Flash games ha...

  18. Intranasal oxytocin enhances socially-reinforced learning in rhesus monkeys

    Directory of Open Access Journals (Sweden)

    Lisa A Parr

    2014-09-01

    Full Text Available There are currently no drugs approved for the treatment of social deficits associated with autism spectrum disorders (ASD. One hypothesis for these deficits is that individuals with ASD lack the motivation to attend to social cues because those cues are not implicitly rewarding. Therefore, any drug that could enhance the rewarding quality of social stimuli could have a profound impact on the treatment of ASD, and other social disorders. Oxytocin (OT is a neuropeptide that has been effective in enhancing social cognition and social reward in humans. The present study examined the ability of OT to selectively enhance learning after social compared to nonsocial reward in rhesus monkeys, an important species for modeling the neurobiology of social behavior in humans. Monkeys were required to learn an implicit visual matching task after receiving either intranasal (IN OT or Placebo (saline. Correct trials were rewarded with the presentation of positive and negative social (play faces/threat faces or nonsocial (banana/cage locks stimuli, plus food. Incorrect trials were not rewarded. Results demonstrated a strong effect of socially-reinforced learning, monkeys’ performed significantly better when reinforced with social versus nonsocial stimuli. Additionally, socially-reinforced learning was significantly better and occurred faster after IN-OT compared to placebo treatment. Performance in the IN-OT, but not Placebo, condition was also significantly better when the reinforcement stimuli were emotionally positive compared to negative facial expressions. These data support the hypothesis that OT may function to enhance prosocial behavior in primates by increasing the rewarding quality of emotionally positive, social compared to emotionally negative or nonsocial images. These data also support the use of the rhesus monkey as a model for exploring the neurobiological basis of social behavior and its impairment.

  19. Quantum Probability and Operant Conditioning: Behavioral Uncertainty in Reinforcement Learning

    OpenAIRE

    Alonso, E.; Mondragon, E.

    2014-01-01

    An implicit assumption in the study of operant conditioning and reinforcement learning is that behavior is stochastic, in that it depends on the probability that an outcome follows a response and on how the presence or absence of the output affects the frequency of the response. In this paper we argue that classical probability is not the right tool to represent uncertainty operant conditioning and propose an interpretation of behavioral states in terms of quantum probability instead.

  20. Self-Organisation of Neural Topologies by Evolutionary Reinforcement Learning

    OpenAIRE

    Siebel, Nils T; Krause, Jochen; Sommer, Gerald

    2007-01-01

    In this article we present EANT, "Evolutionary Acquisition of Neural Topologies", a method that creates neural networks (NNs) by evolutionary reinforcement learning. The structure of NNs is developed using mutation operators, starting from a minimal structure. Their parameters are optimised using CMA-ES. EANT can create NNs that are very specialised; they achieve a very good performance while being relatively small. This can be seen in experiments where our method competes with a different on...

  1. A Definition of Happiness for Reinforcement Learning Agents

    OpenAIRE

    Daswani, Mayank; Leike, Jan

    2015-01-01

    What is happiness for reinforcement learning agents? We seek a formal definition satisfying a list of desiderata. Our proposed definition of happiness is the temporal difference error, i.e. the difference between the value of the obtained reward and observation and the agent's expectation of this value. This definition satisfies most of our desiderata and is compatible with empirical research on humans. We state several implications and discuss examples.

  2. Utilising reinforcement learning to develop strategies for driving auditory neural implants

    Science.gov (United States)

    Lee, Geoffrey W.; Zambetta, Fabio; Li, Xiaodong; Paolini, Antonio G.

    2016-08-01

    Objective. In this paper we propose a novel application of reinforcement learning to the area of auditory neural stimulation. We aim to develop a simulation environment which is based off real neurological responses to auditory and electrical stimulation in the cochlear nucleus (CN) and inferior colliculus (IC) of an animal model. Using this simulator we implement closed loop reinforcement learning algorithms to determine which methods are most effective at learning effective acoustic neural stimulation strategies. Approach. By recording a comprehensive set of acoustic frequency presentations and neural responses from a set of animals we created a large database of neural responses to acoustic stimulation. Extensive electrical stimulation in the CN and the recording of neural responses in the IC provides a mapping of how the auditory system responds to electrical stimuli. The combined dataset is used as the foundation for the simulator, which is used to implement and test learning algorithms. Main results. Reinforcement learning, utilising a modified n-Armed Bandit solution, is implemented to demonstrate the model’s function. We show the ability to effectively learn stimulation patterns which mimic the cochlea’s ability to covert acoustic frequencies to neural activity. Time taken to learn effective replication using neural stimulation takes less than 20 min under continuous testing. Significance. These results show the utility of reinforcement learning in the field of neural stimulation. These results can be coupled with existing sound processing technologies to develop new auditory prosthetics that are adaptable to the recipients current auditory pathway. The same process can theoretically be abstracted to other sensory and motor systems to develop similar electrical replication of neural signals.

  3. Knowledge-Based Reinforcement Learning for Data Mining

    Science.gov (United States)

    Kudenko, Daniel; Grzes, Marek

    Data Mining is the process of extracting patterns from data. Two general avenues of research in the intersecting areas of agents and data mining can be distinguished. The first approach is concerned with mining an agent’s observation data in order to extract patterns, categorize environment states, and/or make predictions of future states. In this setting, data is normally available as a batch, and the agent’s actions and goals are often independent of the data mining task. The data collection is mainly considered as a side effect of the agent’s activities. Machine learning techniques applied in such situations fall into the class of supervised learning. In contrast, the second scenario occurs where an agent is actively performing the data mining, and is responsible for the data collection itself. For example, a mobile network agent is acquiring and processing data (where the acquisition may incur a certain cost), or a mobile sensor agent is moving in a (perhaps hostile) environment, collecting and processing sensor readings. In these settings, the tasks of the agent and the data mining are highly intertwined and interdependent (or even identical). Supervised learning is not a suitable technique for these cases. Reinforcement Learning (RL) enables an agent to learn from experience (in form of reward and punishment for explorative actions) and adapt to new situations, without a teacher. RL is an ideal learning technique for these data mining scenarios, because it fits the agent paradigm of continuous sensing and acting, and the RL agent is able to learn to make decisions on the sampling of the environment which provides the data. Nevertheless, RL still suffers from scalability problems, which have prevented its successful use in many complex real-world domains. The more complex the tasks, the longer it takes a reinforcement learning algorithm to converge to a good solution. For many real-world tasks, human expert knowledge is available. For example, human

  4. Cocaine addiction as a homeostatic reinforcement learning disorder.

    Science.gov (United States)

    Keramati, Mehdi; Durand, Audrey; Girardeau, Paul; Gutkin, Boris; Ahmed, Serge H

    2017-03-01

    Drug addiction implicates both reward learning and homeostatic regulation mechanisms of the brain. This has stimulated 2 partially successful theoretical perspectives on addiction. Many important aspects of addiction, however, remain to be explained within a single, unified framework that integrates the 2 mechanisms. Building upon a recently developed homeostatic reinforcement learning theory, the authors focus on a key transition stage of addiction that is well modeled in animals, escalation of drug use, and propose a computational theory of cocaine addiction where cocaine reinforces behavior due to its rapid homeostatic corrective effect, whereas its chronic use induces slow and long-lasting changes in homeostatic setpoint. Simulations show that our new theory accounts for key behavioral and neurobiological features of addiction, most notably, escalation of cocaine use, drug-primed craving and relapse, individual differences underlying dose-response curves, and dopamine D2-receptor downregulation in addicts. The theory also generates unique predictions about cocaine self-administration behavior in rats that are confirmed by new experimental results. Viewing addiction as a homeostatic reinforcement learning disorder coherently explains many behavioral and neurobiological aspects of the transition to cocaine addiction, and suggests a new perspective toward understanding addiction. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  5. Experienced Gray Wolf Optimization Through Reinforcement Learning and Neural Networks.

    Science.gov (United States)

    Emary, E; Zawbaa, Hossam M; Grosan, Crina

    2017-01-10

    In this paper, a variant of gray wolf optimization (GWO) that uses reinforcement learning principles combined with neural networks to enhance the performance is proposed. The aim is to overcome, by reinforced learning, the common challenge of setting the right parameters for the algorithm. In GWO, a single parameter is used to control the exploration/exploitation rate, which influences the performance of the algorithm. Rather than using a global way to change this parameter for all the agents, we use reinforcement learning to set it on an individual basis. The adaptation of the exploration rate for each agent depends on the agent's own experience and the current terrain of the search space. In order to achieve this, experience repository is built based on the neural network to map a set of agents' states to a set of corresponding actions that specifically influence the exploration rate. The experience repository is updated by all the search agents to reflect experience and to enhance the future actions continuously. The resulted algorithm is called experienced GWO (EGWO) and its performance is assessed on solving feature selection problems and on finding optimal weights for neural networks algorithm. We use a set of performance indicators to evaluate the efficiency of the method. Results over various data sets demonstrate an advance of the EGWO over the original GWO and over other metaheuristics, such as genetic algorithms and particle swarm optimization.

  6. Time-Extended Policies in Mult-Agent Reinforcement Learning

    Science.gov (United States)

    Tumer, Kagan; Agogino, Adrian K.

    2004-01-01

    Reinforcement learning methods perform well in many domains where a single agent needs to take a sequence of actions to perform a task. These methods use sequences of single-time-step rewards to create a policy that tries to maximize a time-extended utility, which is a (possibly discounted) sum of these rewards. In this paper we build on our previous work showing how these methods can be extended to a multi-agent environment where each agent creates its own policy that works towards maximizing a time-extended global utility over all agents actions. We show improved methods for creating time-extended utilities for the agents that are both "aligned" with the global utility and "learnable." We then show how to crate single-time-step rewards while avoiding the pi fall of having rewards aligned with the global reward leading to utilities not aligned with the global utility. Finally, we apply these reward functions to the multi-agent Gridworld problem. We explicitly quantify a utility's learnability and alignment, and show that reinforcement learning agents using the prescribed reward functions successfully tradeoff learnability and alignment. As a result they outperform both global (e.g., team games ) and local (e.g., "perfectly learnable" ) reinforcement learning solutions by as much as an order of magnitude.

  7. Dynamic Sensor Tasking for Space Situational Awareness via Reinforcement Learning

    Science.gov (United States)

    Linares, R.; Furfaro, R.

    2016-09-01

    This paper studies the Sensor Management (SM) problem for optical Space Object (SO) tracking. The tasking problem is formulated as a Markov Decision Process (MDP) and solved using Reinforcement Learning (RL). The RL problem is solved using the actor-critic policy gradient approach. The actor provides a policy which is random over actions and given by a parametric probability density function (pdf). The critic evaluates the policy by calculating the estimated total reward or the value function for the problem. The parameters of the policy action pdf are optimized using gradients with respect to the reward function. Both the critic and the actor are modeled using deep neural networks (multi-layer neural networks). The policy neural network takes the current state as input and outputs probabilities for each possible action. This policy is random, and can be evaluated by sampling random actions using the probabilities determined by the policy neural network's outputs. The critic approximates the total reward using a neural network. The estimated total reward is used to approximate the gradient of the policy network with respect to the network parameters. This approach is used to find the non-myopic optimal policy for tasking optical sensors to estimate SO orbits. The reward function is based on reducing the uncertainty for the overall catalog to below a user specified uncertainty threshold. This work uses a 30 km total position error for the uncertainty threshold. This work provides the RL method with a negative reward as long as any SO has a total position error above the uncertainty threshold. This penalizes policies that take longer to achieve the desired accuracy. A positive reward is provided when all SOs are below the catalog uncertainty threshold. An optimal policy is sought that takes actions to achieve the desired catalog uncertainty in minimum time. This work trains the policy in simulation by letting it task a single sensor to "learn" from its performance

  8. Genetic Scheduling and Reinforcement Learning in Multirobot Systems for Intelligent Warehouses

    Directory of Open Access Journals (Sweden)

    Jiajia Dou

    2015-01-01

    Full Text Available A new hybrid solution is presented to improve the efficiency of intelligent warehouses with multirobot systems, where the genetic algorithm (GA based task scheduling is combined with reinforcement learning (RL based path planning for mobile robots. Reinforcement learning is an effective approach to search for a collision-free path in unknown dynamic environments. Genetic algorithm is a simple but splendid evolutionary search method that provides very good solutions for task allocation. In order to achieve higher efficiency of the intelligent warehouse system, we design a new solution by combining these two techniques and provide an effective and alternative way compared with other state-of-the-art methods. Simulation results demonstrate the effectiveness of the proposed approach regarding the optimization of travel time and overall efficiency of the intelligent warehouse system.

  9. Flow Navigation by Smart Microswimmers via Reinforcement Learning

    Science.gov (United States)

    Colabrese, Simona; Gustavsson, Kristian; Celani, Antonio; Biferale, Luca

    2017-04-01

    Smart active particles can acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. Their goal is to learn the best way to navigate by exploiting the underlying flow whenever possible. As an example, we focus our attention on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, given the constraints enforced by fluid mechanics. By means of numerical experiments, we show that swimmers indeed learn nearly optimal strategies just by experience. A reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This Letter illustrates the potential of reinforcement learning algorithms to model adaptive behavior in complex flows and paves the way towards the engineering of smart microswimmers that solve difficult navigation problems.

  10. Learning and tuning fuzzy logic controllers through reinforcements

    Science.gov (United States)

    Berenji, Hamid R.; Khedkar, Pratap

    1992-01-01

    This paper presents a new method for learning and tuning a fuzzy logic controller based on reinforcements from a dynamic system. In particular, our generalized approximate reasoning-based intelligent control (GARIC) architecture (1) learns and tunes a fuzzy logic controller even when only weak reinforcement, such as a binary failure signal, is available; (2) introduces a new conjunction operator in computing the rule strengths of fuzzy control rules; (3) introduces a new localized mean of maximum (LMOM) method in combining the conclusions of several firing control rules; and (4) learns to produce real-valued control actions. Learning is achieved by integrating fuzzy inference into a feedforward neural network, which can then adaptively improve performance by using gradient descent methods. We extend the AHC algorithm of Barto et al. (1983) to include the prior control knowledge of human operators. The GARIC architecture is applied to a cart-pole balancing system and demonstrates significant improvements in terms of the speed of learning and robustness to changes in the dynamic system's parameters over previous schemes for cart-pole balancing.

  11. The ubiquity of model-based reinforcement learning.

    Science.gov (United States)

    Doll, Bradley B; Simon, Dylan A; Daw, Nathaniel D

    2012-12-01

    The reward prediction error (RPE) theory of dopamine (DA) function has enjoyed great success in the neuroscience of learning and decision-making. This theory is derived from model-free reinforcement learning (RL), in which choices are made simply on the basis of previously realized rewards. Recently, attention has turned to correlates of more flexible, albeit computationally complex, model-based methods in the brain. These methods are distinguished from model-free learning by their evaluation of candidate actions using expected future outcomes according to a world model. Puzzlingly, signatures from these computations seem to be pervasive in the very same regions previously thought to support model-free learning. Here, we review recent behavioral and neural evidence about these two systems, in attempt to reconcile their enigmatic cohabitation in the brain. Copyright © 2012 Elsevier Ltd. All rights reserved.

  12. Use of Inverse Reinforcement Learning for Identity Prediction

    Science.gov (United States)

    Hayes, Roy; Bao, Jonathan; Beling, Peter; Horowitz, Barry

    2011-01-01

    We adopt Markov Decision Processes (MDP) to model sequential decision problems, which have the characteristic that the current decision made by a human decision maker has an uncertain impact on future opportunity. We hypothesize that the individuality of decision makers can be modeled as differences in the reward function under a common MDP model. A machine learning technique, Inverse Reinforcement Learning (IRL), was used to learn an individual's reward function based on limited observation of his or her decision choices. This work serves as an initial investigation for using IRL to analyze decision making, conducted through a human experiment in a cyber shopping environment. Specifically, the ability to determine the demographic identity of users is conducted through prediction analysis and supervised learning. The results show that IRL can be used to correctly identify participants, at a rate of 68% for gender and 66% for one of three college major categories.

  13. Decision theory, reinforcement learning, and the brain.

    Science.gov (United States)

    Dayan, Peter; Daw, Nathaniel D

    2008-12-01

    Decision making is a core competence for animals and humans acting and surviving in environments they only partially comprehend, gaining rewards and punishments for their troubles. Decision-theoretic concepts permeate experiments and computational models in ethology, psychology, and neuroscience. Here, we review a well-known, coherent Bayesian approach to decision making, showing how it unifies issues in Markovian decision problems, signal detection psychophysics, sequential sampling, and optimal exploration and discuss paradigmatic psychological and neural examples of each problem. We discuss computational issues concerning what subjects know about their task and how ambitious they are in seeking optimal solutions; we address algorithmic topics concerning model-based and model-free methods for making choices; and we highlight key aspects of the neural implementation of decision making.

  14. Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play

    NARCIS (Netherlands)

    van der Ree, Michiel; Wiering, Marco

    2013-01-01

    This paper compares three strategies in using reinforcement learning algorithms to let an artificial agent learnto play the game of Othello. The three strategies that are compared are: Learning by self-play, learning from playing against a fixed opponent, and learning from playing against a fixed

  15. Reinforcing communication skills while registered nurses simultaneously learn course content: a response to learning needs.

    Science.gov (United States)

    DeSimone, B B

    1994-01-01

    This article describes the implementation and evaluation of Integrated Skills Reinforcement (ISR) in a baccalaureate nursing course entitled "Principles of Health Assessment" for 15 registered nurse students. ISR is a comprehensive teaching-learning approach that simultaneously reinforces student writing, reading, speaking, and listening skills while they learn course content. The purpose of this study was to assess the influence of ISR on writing skills and student satisfaction. A learner's guide and teacher's guide, created in advance by the teacher, described specific language activities and assignments that were implemented throughout the ISR course. During each class, the teacher promoted discussion, collaboration, and co-inquiry among students, using course content as the vehicle of exchange. Writing was assessed at the beginning and end of the course. The influence of ISR on the content, organization, sentence structure, tone, and strength of position of student writing was analyzed. Writing samples were scored by an independent evaluator trained in methods of holistic scoring. Ninety-three per cent (14 of 15 students) achieved writing growth from .5 to 1.5 points on a scale of 6 points. Student response to both the ISR approach and specific ISR activities was assessed by teacher-created surveys administered at the middle-end of the course. One hundred per cent of the students at the end of this project agreed that the ISR activities, specifically the writing and reading activities, helped them better understand the course content. These responses differed from evaluations written by the same students at the middle of the course. The ISR approach fostered analysis and communication through active collaboration, behaviors cited as critical for effective participation of nurses in today's complex health care environment.

  16. The effects of prior positive reinforcement contingency on shuttle-box avoidance learning in rats.

    OpenAIRE

    坂田, 省吾; 杉本, 助男

    1991-01-01

    Four groups of rats were exposed to response-reinforcement contingent, yoked reinforcement, or yoked stimulus reinforcement schedules in a Skinner box or to no experimental pretreatment. All groups were subsequently tested for transfer of the learned helplessness effect on a shuttle-box active avoidance task. The yoked stimulus reinforcement group showed some retardation of avoidance learning in comparison with the other three groups. Subjects were previously implanted electrodes into their b...

  17. The Study of Reinforcement Learning for Traffic Self-Adaptive Control under Multiagent Markov Game Environment

    Directory of Open Access Journals (Sweden)

    Lun-Hui Xu

    2013-01-01

    Full Text Available Urban traffic self-adaptive control problem is dynamic and uncertain, so the states of traffic environment are hard to be observed. Efficient agent which controls a single intersection can be discovered automatically via multiagent reinforcement learning. However, in the majority of the previous works on this approach, each agent needed perfect observed information when interacting with the environment and learned individually with less efficient coordination. This study casts traffic self-adaptive control as a multiagent Markov game problem. The design employs traffic signal control agent (TSCA for each signalized intersection that coordinates with neighboring TSCAs. A mathematical model for TSCAs’ interaction is built based on nonzero-sum markov game which has been applied to let TSCAs learn how to cooperate. A multiagent Markov game reinforcement learning approach is constructed on the basis of single-agent Q-learning. This method lets each TSCA learn to update its Q-values under the joint actions and imperfect information. The convergence of the proposed algorithm is analyzed theoretically. The simulation results show that the proposed method is convergent and effective in realistic traffic self-adaptive control setting.

  18. Observation, reflection, and reinforcement: surgery faculty members' and residents' perceptions of how they learned professionalism.

    Science.gov (United States)

    Park, Jason; Woodrow, Sarah I; Reznick, Richard K; Beales, Jennifer; MacRae, Helen M

    2010-01-01

    To explore perceptions of how professionalism is learned in the current academic environment. Professionalism is a core competency in surgery (as in all of medical practice), and its presence or absence affects all aspects of clinical education and practice, but the ways in which professional values and attitudes are best transmitted to developing generations of surgeons have not been well defined. The authors conducted 34 semistructured interviews of individual surgery residents and faculty members at two academic institutions from 2004 to 2006. Interviews consisted of open-ended questions on how the participants learned professionalism and what they perceived as challenges to learning professionalism. Two researchers analyzed the interview transcripts for emergent themes by using a grounded-theory approach. Faculty members' and residents' perceptions of how they learned professionalism reflected four major themes: (1) personal values and upbringing, including premedical education experiences, (2) learning by example from professional role models, (3) the structure of the surgery residency, and (4) formal instruction on professionalism. Of these, role modeling was the dominant theme: Participants identified observation, reflection, and reinforcement as playing key roles in their learning from role models and in distinguishing the sometimes blurred boundary between positive and negative role models. The theoretical framework generated out of this study proposes a focus on specific activities to improve professional education, including an active approach to role modeling through the intentional and explicit demonstration of professional behavior during the course of everyday work; structured, reflective self-examination; and timely and meaningful evaluation and feedback for reinforcement.

  19. Manufacturing Scheduling Using Colored Petri Nets and Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Maria Drakaki

    2017-02-01

    Full Text Available Agent-based intelligent manufacturing control systems are capable to efficiently respond and adapt to environmental changes. Manufacturing system adaptation and evolution can be addressed with learning mechanisms that increase the intelligence of agents. In this paper a manufacturing scheduling method is presented based on Timed Colored Petri Nets (CTPNs and reinforcement learning (RL. CTPNs model the manufacturing system and implement the scheduling. In the search for an optimal solution a scheduling agent uses RL and in particular the Q-learning algorithm. A warehouse order-picking scheduling is presented as a case study to illustrate the method. The proposed scheduling method is compared to existing methods. Simulation and state space results are used to evaluate performance and identify system properties.

  20. The Analysis of Experimental Results of Reinforcement Learning Systems

    Directory of Open Access Journals (Sweden)

    Jaroslav E. Poliscuk

    2002-07-01

    Full Text Available In this article a reinforcement learning method is analyzed, in which a subject of learning is defined. The essence of this method is the selection of activities by a try and fail process and awarding deferred rewards. If an environment is characterized by the Markov property, then step-by-step dynamics will enable forecasting of subsequent conditions and awarding subsequent rewards on the basis of the present known conditions and actions, relatively to the Markov decision making process. The relationship between the present conditions and values and the potential future conditions are defined by the Bellman equation. Also, the article discussed a method of temporal difference learning, mechanism of eligibility traces, as well as theirs algorithms TD(0 and TD(Lambda. Theoretical analysis were supplemented by the practical studies, with reference to implementation of the Sarsa(Lambda algorithm, with replacing eligibility traces and the Epsilon greedy policy.

  1. Fuzzy Lyapunov Reinforcement Learning for Non Linear Systems.

    Science.gov (United States)

    Kumar, Abhishek; Sharma, Rajneesh

    2017-03-01

    We propose a fuzzy reinforcement learning (RL) based controller that generates a stable control action by lyapunov constraining fuzzy linguistic rules. In particular, we attempt at lyapunov constraining the consequent part of fuzzy rules in a fuzzy RL setup. Ours is a first attempt at designing a linguistic RL controller with lyapunov constrained fuzzy consequents to progressively learn a stable optimal policy. The proposed controller does not need system model or desired response and can effectively handle disturbances in continuous state-action space problems. Proposed controller has been employed on the benchmark Inverted Pendulum (IP) and Rotational/Translational Proof-Mass Actuator (RTAC) control problems (with and without disturbances). Simulation results and comparison against a) baseline fuzzy Q learning, b) Lyapunov theory based Actor-Critic, and c) Lyapunov theory based Markov game controller, elucidate stability and viability of the proposed control scheme. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  2. Protein interaction network constructing based on text mining and reinforcement learning with application to prostate cancer.

    Science.gov (United States)

    Zhu, Fei; Liu, Quan; Zhang, Xiaofang; Shen, Bairong

    2015-08-01

    Constructing interaction network from biomedical texts is a very important and interesting work. The authors take advantage of text mining and reinforcement learning approaches to establish protein interaction network. Considering the high computational efficiency of co-occurrence-based interaction extraction approaches and high precision of linguistic patterns approaches, the authors propose an interaction extracting algorithm where they utilise frequently used linguistic patterns to extract the interactions from texts and then find out interactions from extended unprocessed texts under the basic idea of co-occurrence approach, meanwhile they discount the interaction extracted from extended texts. They put forward a reinforcement learning-based algorithm to establish a protein interaction network, where nodes represent proteins and edges denote interactions. During the evolutionary process, a node selects another node and the attained reward determines which predicted interaction should be reinforced. The topology of the network is updated by the agent until an optimal network is formed. They used texts downloaded from PubMed to construct a prostate cancer protein interaction network by the proposed methods. The results show that their method brought out pretty good matching rate. Network topology analysis results also demonstrate that the curves of node degree distribution, node degree probability and probability distribution of constructed network accord with those of the scale-free network well.

  3. Autonomous Inter-Task Transfer in Reinforcement Learning Domains

    Science.gov (United States)

    2008-08-01

    the psychological literature [ Thorndike and Woodworth, 1901, Skinner, 1953] for many years. More relevant are a number of approaches that transfer...Maclin, Jude Shavlik, Lisa Torrey, Trevor Walker, and Edward Wild. Giv- ing advice about preferred actions to reinforcement learners via knowledge-based...backgammon program, achieves master-level play. Neural Computation, 6(2):215–219, 1994. E. Thorndike and R. Woodworth. The influence of improvement in one

  4. An Upside to Reward Sensitivity: The Hippocampus Supports Enhanced Reinforcement Learning in Adolescence.

    Science.gov (United States)

    Davidow, Juliet Y; Foerde, Karin; Galván, Adriana; Shohamy, Daphna

    2016-10-05

    Adolescents are notorious for engaging in reward-seeking behaviors, a tendency attributed to heightened activity in the brain's reward systems during adolescence. It has been suggested that reward sensitivity in adolescence might be adaptive, but evidence of an adaptive role has been scarce. Using a probabilistic reinforcement learning task combined with reinforcement learning models and fMRI, we found that adolescents showed better reinforcement learning and a stronger link between reinforcement learning and episodic memory for rewarding outcomes. This behavioral benefit was related to heightened prediction error-related BOLD activity in the hippocampus and to stronger functional connectivity between the hippocampus and the striatum at the time of reinforcement. These findings reveal an important role for the hippocampus in reinforcement learning in adolescence and suggest that reward sensitivity in adolescence is related to adaptive differences in how adolescents learn from experience. Copyright © 2016 Elsevier Inc. All rights reserved.

  5. Genetic reinforcement learning through symbiotic evolution for fuzzy controller design.

    Science.gov (United States)

    Juang, C F; Lin, J Y; Lin, C T

    2000-01-01

    An efficient genetic reinforcement learning algorithm for designing fuzzy controllers is proposed in this paper. The genetic algorithm (GA) adopted in this paper is based upon symbiotic evolution which, when applied to fuzzy controller design, complements the local mapping property of a fuzzy rule. Using this Symbiotic-Evolution-based Fuzzy Controller (SEFC) design method, the number of control trials, as well as consumed CPU time, are considerably reduced when compared to traditional GA-based fuzzy controller design methods and other types of genetic reinforcement learning schemes. Moreover, unlike traditional fuzzy controllers, which partition the input space into a grid, SEFC partitions the input space in a flexible way, thus creating fewer fuzzy rules. In SEFC, different types of fuzzy rules whose consequent parts are singletons, fuzzy sets, or linear equations (TSK-type fuzzy rules) are allowed. Further, the free parameters (e.g., centers and widths of membership functions) and fuzzy rules are all tuned automatically. For the TSK-type fuzzy rule especially, which put the proposed learning algorithm in use, only the significant input variables are selected to participate in the consequent of a rule. The proposed SEFC design method has been applied to different simulated control problems, including the cart-pole balancing system, a magnetic levitation system, and a water bath temperature control system. The proposed SEFC has been verified to be efficient and superior from these control problems, and from comparisons with some traditional GA-based fuzzy systems.

  6. Optimal control in microgrid using multi-agent reinforcement learning.

    Science.gov (United States)

    Li, Fu-Dong; Wu, Min; He, Yong; Chen, Xin

    2012-11-01

    This paper presents an improved reinforcement learning method to minimize electricity costs on the premise of satisfying the power balance and generation limit of units in a microgrid with grid-connected mode. Firstly, the microgrid control requirements are analyzed and the objective function of optimal control for microgrid is proposed. Then, a state variable "Average Electricity Price Trend" which is used to express the most possible transitions of the system is developed so as to reduce the complexity and randomicity of the microgrid, and a multi-agent architecture including agents, state variables, action variables and reward function is formulated. Furthermore, dynamic hierarchical reinforcement learning, based on change rate of key state variable, is established to carry out optimal policy exploration. The analysis shows that the proposed method is beneficial to handle the problem of "curse of dimensionality" and speed up learning in the unknown large-scale world. Finally, the simulation results under JADE (Java Agent Development Framework) demonstrate the validity of the presented method in optimal control for a microgrid with grid-connected mode. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  7. Reconciling Reinforcement Learning Models with Behavioral Extinction and Renewal: Implications for Addiction, Relapse, and Problem Gambling

    Science.gov (United States)

    Redish, A. David; Jensen, Steve; Johnson, Adam; Kurth-Nelson, Zeb

    2007-01-01

    Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL…

  8. Reinforcement Learning Based on the Bayesian Theorem for Electricity Markets Decision Support

    DEFF Research Database (Denmark)

    Sousa, Tiago; Pinto, Tiago; Praca, Isabel

    2014-01-01

    This paper presents the applicability of a reinforcement learning algorithm based on the application of the Bayesian theorem of probability. The proposed reinforcement learning algorithm is an advantageous and indispensable tool for ALBidS (Adaptive Learning strategic Bidding System), a multi...

  9. Reinforcement Learning Based Web Service Compositions for Mobile Business

    Science.gov (United States)

    Zhou, Juan; Chen, Shouming

    In this paper, we propose a new solution to Reactive Web Service Composition, via molding with Reinforcement Learning, and introducing modified (alterable) QoS variables into the model as elements in the Markov Decision Process tuple. Moreover, we give an example of Reactive-WSC-based mobile banking, to demonstrate the intrinsic capability of the solution in question of obtaining the optimized service composition, characterized by (alterable) target QoS variable sets with optimized values. Consequently, we come to the conclusion that the solution has decent potentials in boosting customer experiences and qualities of services in Web Services, and those in applications in the whole electronic commerce and business sector.

  10. Stochastic abstract policies: generalizing knowledge to improve reinforcement learning.

    Science.gov (United States)

    Koga, Marcelo L; Freire, Valdinei; Costa, Anna H R

    2015-01-01

    Reinforcement learning (RL) enables an agent to learn behavior by acquiring experience through trial-and-error interactions with a dynamic environment. However, knowledge is usually built from scratch and learning to behave may take a long time. Here, we improve the learning performance by leveraging prior knowledge; that is, the learner shows proper behavior from the beginning of a target task, using the knowledge from a set of known, previously solved, source tasks. In this paper, we argue that building stochastic abstract policies that generalize over past experiences is an effective way to provide such improvement and this generalization outperforms the current practice of using a library of policies. We achieve that contributing with a new algorithm, AbsProb-PI-multiple and a framework for transferring knowledge represented as a stochastic abstract policy in new RL tasks. Stochastic abstract policies offer an effective way to encode knowledge because the abstraction they provide not only generalizes solutions but also facilitates extracting the similarities among tasks. We perform experiments in a robotic navigation environment and analyze the agent's behavior throughout the learning process and also assess the transfer ratio for different amounts of source tasks. We compare our method with the transfer of a library of policies, and experiments show that the use of a generalized policy produces better results by more effectively guiding the agent when learning a target task.

  11. Using reinforcement learning to learn how to play text-based games

    OpenAIRE

    Zelinka, Mikuláš

    2018-01-01

    The ability to learn optimal control policies in systems where action space is defined by sentences in natural language would allow many interesting real-world applications such as automatic optimisation of dialogue systems. Text-based games with multiple endings and rewards are a promising platform for this task, since their feedback allows us to employ reinforcement learning techniques to jointly learn text representations and control policies. We present a general text game playing agent, ...

  12. Two Novel On-policy Reinforcement Learning Algorithms based on TD(lambda)-methods

    NARCIS (Netherlands)

    Wiering, M.A.; Hasselt, H. van

    2007-01-01

    This paper describes two novel on-policy reinforcement learning algorithms, named QV(lambda)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(lambda)-methods. The difference between the algorithms is that QV-learning uses the learned

  13. Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.

    Science.gov (United States)

    Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

    2015-01-01

    Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.

  14. Generalization of value in reinforcement learning by humans

    Science.gov (United States)

    Wimmer, G. Elliott; Daw, Nathaniel D.; Shohamy, Daphna

    2012-01-01

    Research in decision making has focused on the role of dopamine and its striatal targets in guiding choices via learned stimulus-reward or stimulus-response associations, behavior that is well-described by reinforcement learning (RL) theories. However, basic RL is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision making. A candidate mechanism for this type of learning comes from the domain of memory, which has highlighted a role for the hippocampus in learning of stimulus-stimulus relations, typically dissociated from the role of the striatum in stimulus-response learning. Here, we used fMRI and computational model-based analyses to examine the joint contributions of these mechanisms to RL. Humans performed an RL task with added relational structure, modeled after tasks used to isolate hippocampal contributions to memory. On each trial participants chose one of four options, but the reward probabilities for pairs of options were correlated across trials. This (uninstructed) relationship between pairs of options potentially enabled an observer to learn about options’ values based on experience with the other options and to generalize across them. We observed BOLD activity related to learning in the striatum and also in the hippocampus. By comparing a basic RL model to one augmented to allow feedback to generalize between correlated options, we tested whether choice behavior and BOLD activity were influenced by the opportunity to generalize across correlated options. Although such generalization goes beyond standard computational accounts of RL and striatal BOLD, both choices and striatal BOLD were better explained by the augmented model. Consistent with the hypothesized role for the hippocampus in this generalization, functional connectivity between the ventral striatum and hippocampus was modulated, across participants, by the ability of the augmented model to capture participants’ choice

  15. Explicit and implicit reinforcement learning across the psychosis spectrum.

    Science.gov (United States)

    Barch, Deanna M; Carter, Cameron S; Gold, James M; Johnson, Sheri L; Kring, Ann M; MacDonald, Angus W; Pizzagalli, Diego A; Ragland, J Daniel; Silverstein, Steven M; Strauss, Milton E

    2017-07-01

    Motivational and hedonic impairments are core features of a variety of types of psychopathology. An important aspect of motivational function is reinforcement learning (RL), including implicit (i.e., outside of conscious awareness) and explicit (i.e., including explicit representations about potential reward associations) learning, as well as both positive reinforcement (learning about actions that lead to reward) and punishment (learning to avoid actions that lead to loss). Here we present data from paradigms designed to assess both positive and negative components of both implicit and explicit RL, examine performance on each of these tasks among individuals with schizophrenia, schizoaffective disorder, and bipolar disorder with psychosis, and examine their relative relationships to specific symptom domains transdiagnostically. None of the diagnostic groups differed significantly from controls on the implicit RL tasks in either bias toward a rewarded response or bias away from a punished response. However, on the explicit RL task, both the individuals with schizophrenia and schizoaffective disorder performed significantly worse than controls, but the individuals with bipolar did not. Worse performance on the explicit RL task, but not the implicit RL task, was related to worse motivation and pleasure symptoms across all diagnostic categories. Performance on explicit RL, but not implicit RL, was related to working memory, which accounted for some of the diagnostic group differences. However, working memory did not account for the relationship of explicit RL to motivation and pleasure symptoms. These findings suggest transdiagnostic relationships across the spectrum of psychotic disorders between motivation and pleasure impairments and explicit RL. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  16. Reinforcement Learning with Autonomous Small Unmanned Aerial Vehicles in Cluttered Environments

    Science.gov (United States)

    Tran, Loc; Cross, Charles; Montague, Gilbert; Motter, Mark; Neilan, James; Qualls, Garry; Rothhaar, Paul; Trujillo, Anna; Allen, B. Danette

    2015-01-01

    We present ongoing work in the Autonomy Incubator at NASA Langley Research Center (LaRC) exploring the efficacy of a data set aggregation approach to reinforcement learning for small unmanned aerial vehicle (sUAV) flight in dense and cluttered environments with reactive obstacle avoidance. The goal is to learn an autonomous flight model using training experiences from a human piloting a sUAV around static obstacles. The training approach uses video data from a forward-facing camera that records the human pilot's flight. Various computer vision based features are extracted from the video relating to edge and gradient information. The recorded human-controlled inputs are used to train an autonomous control model that correlates the extracted feature vector to a yaw command. As part of the reinforcement learning approach, the autonomous control model is iteratively updated with feedback from a human agent who corrects undesired model output. This data driven approach to autonomous obstacle avoidance is explored for simulated forest environments furthering autonomous flight under the tree canopy research. This enables flight in previously inaccessible environments which are of interest to NASA researchers in Earth and Atmospheric sciences.

  17. A multicomponent approach to thinning reinforcer delivery during noncontingent reinforcement schedules.

    Science.gov (United States)

    Slocum, Sarah K; Grauerholz-Fisher, Emma; Peters, Kerri P; Vollmer, Timothy R

    2018-01-01

    We evaluated a noncontingent reinforcement procedure that involved initially providing three subjects with signaled, continuous access to the functional reinforcer for aggression and slowly increasing the amount of time subjects were exposed to the signaled unavailability of the reinforcer. Additionally, alternative potential reinforcers were available throughout the sessions. Results showed immediate and substantial reductions in aggression for all three subjects. The clinical utility of this intervention is discussed, and future research directions are recommended. © 2017 Society for the Experimental Analysis of Behavior.

  18. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis

    Science.gov (United States)

    Collins, Anne G. E.; Frank, Michael J.

    2012-01-01

    Instrumental learning involves corticostriatal circuitry and the dopaminergic system. This system is typically modeled in the reinforcement learning (RL) framework by incrementally accumulating reward values of states and actions. However, human learning also implicates prefrontal cortical mechanisms involved in higher level cognitive functions. The interaction of these systems remains poorly understood, and models of human behavior often ignore working memory (WM) and therefore incorrectly assign behavioral variance to the RL system. Here we designed a task that highlights the profound entanglement of these two processes, even in simple learning problems. By systematically varying the size of the learning problem and delay between stimulus repetitions, we separately extracted WM-specific effects of load and delay on learning. We propose a new computational model that accounts for the dynamic integration of RL and WM processes observed in subjects' behavior. Incorporating capacity-limited WM into the model allowed us to capture behavioral variance that could not be captured in a pure RL framework even if we (implausibly) allowed separate RL systems for each set size. The WM component also allowed for a more reasonable estimation of a single RL process. Finally, we report effects of two genetic polymorphisms having relative specificity for prefrontal and basal ganglia functions. Whereas the COMT gene coding for catechol-O-methyl transferase selectively influenced model estimates of WM capacity, the GPR6 gene coding for G-protein-coupled receptor 6 influenced the RL learning rate. Thus, this study allowed us to specify distinct influences of the high-level and low-level cognitive functions on instrumental learning, beyond the possibilities offered by simple RL models. PMID:22487033

  19. Grounding the Meanings in Sensorimotor Behavior using Reinforcement Learning.

    Science.gov (United States)

    Farkaš, Igor; Malík, Tomáš; Rebrová, Kristína

    2012-01-01

    The recent outburst of interest in cognitive developmental robotics is fueled by the ambition to propose ecologically plausible mechanisms of how, among other things, a learning agent/robot could ground linguistic meanings in its sensorimotor behavior. Along this stream, we propose a model that allows the simulated iCub robot to learn the meanings of actions (point, touch, and push) oriented toward objects in robot's peripersonal space. In our experiments, the iCub learns to execute motor actions and comment on them. Architecturally, the model is composed of three neural-network-based modules that are trained in different ways. The first module, a two-layer perceptron, is trained by back-propagation to attend to the target position in the visual scene, given the low-level visual information and the feature-based target information. The second module, having the form of an actor-critic architecture, is the most distinguishing part of our model, and is trained by a continuous version of reinforcement learning to execute actions as sequences, based on a linguistic command. The third module, an echo-state network, is trained to provide the linguistic description of the executed actions. The trained model generalizes well in case of novel action-target combinations with randomized initial arm positions. It can also promptly adapt its behavior if the action/target suddenly changes during motor execution.

  20. Grounding the meanings in sensorimotor behavior using reinforcement learning

    Directory of Open Access Journals (Sweden)

    Igor eFarkaš

    2012-02-01

    Full Text Available The recent outburst of interest in cognitive developmental robotics is fueled by the ambition to propose ecologically plausible mechanisms of how, among other things, a learning agent/robot could ground linguistic meanings in its sensorimotor behaviour. Along this stream, we propose a model that allows the simulated iCub robot to learn the meanings of actions (point, touch and push oriented towards objects in robot's peripersonal space. In our experiments, the iCub learns to execute motor actions and comment on them. Architecturally, the model is composed of three neural-network-based modules that are trained in different ways. The first module, a two-layer perceptron, is trained by back-propagation to attend to the target position in the visual scene, given the low-level visual information and the feature-based target information. The second module, having the form of an actor-critic architecture, is the most distinguishing part of our model, and is trained by a continuous version of reinforcement learning to execute actions as sequences, based on a linguistic command. The third module, an echo-state network, is trained to provide the linguistic description of the executed actions. The trained model generalises well in case of novel action-target combinations with randomised initial arm positions. It can also promptly adapt its behavior if the action/target suddenly changes during motor execution.

  1. Characterizing Reinforcement Learning Methods through Parameterized Learning Problems

    Science.gov (United States)

    2011-06-03

    et al., 2009) (9) Adaptive epilepsy treatment Partial Continuous Extremely randomized (Guez et al., 2008) trees (114) Computer memory scheduling...among action values from a state, different types of state noise, the sparsity and spread of the rewards, and the average episode length , would also be...updates to a linear function approxi- mator. Unfortunately, Greedy-GQ requires that the policy followed while learning stay fixed, preventing the agent from

  2. Agent-based traffic management and reinforcement learning in congested intersection network.

    Science.gov (United States)

    2012-08-01

    This study evaluates the performance of traffic control systems based on reinforcement learning (RL), also called approximate dynamic programming (ADP). Two algorithms have been selected for testing: 1) Q-learning and 2) approximate dynamic programmi...

  3. Reinforcement Learning to Train Ms. Pac-Man Using Higher-order Action-relative Inputs

    NARCIS (Netherlands)

    Bom, Luuk; Henken, Ruud; Wiering, Marco

    2013-01-01

    Reinforcement learning algorithms enable an agent to optimize its behavior from interacting with a specific environment. Although some very successful applications of reinforcement learning algorithms have been developed, it is still an open research question how to scale up to large dynamic

  4. Evaluation of physical damage associated with action selection strategies in reinforcement learning

    NARCIS (Netherlands)

    Koryakovskiy, I.; Vallery, H.; Babuska, R.; Caarls, W.; Dochain, Denis; Henrion, Didier; Peaucelle, Dimitri

    2017-01-01

    Reinforcement learning techniques enable robots to deal with their own dynamics and with unknown environments without using explicit models or preprogrammed behaviors. However, reinforcement learning relies on intrinsically risky exploration, which is often damaging for physical systems. In the

  5. A multiplicative reinforcement learning model capturing learning dynamics and interindividual variability in mice

    Science.gov (United States)

    Bathellier, Brice; Tee, Sui Poh; Hrovat, Christina; Rumpel, Simon

    2013-01-01

    Both in humans and in animals, different individuals may learn the same task with strikingly different speeds; however, the sources of this variability remain elusive. In standard learning models, interindividual variability is often explained by variations of the learning rate, a parameter indicating how much synapses are updated on each learning event. Here, we theoretically show that the initial connectivity between the neurons involved in learning a task is also a strong determinant of how quickly the task is learned, provided that connections are updated in a multiplicative manner. To experimentally test this idea, we trained mice to perform an auditory Go/NoGo discrimination task followed by a reversal to compare learning speed when starting from naive or already trained synaptic connections. All mice learned the initial task, but often displayed sigmoid-like learning curves, with a variable delay period followed by a steep increase in performance, as often observed in operant conditioning. For all mice, learning was much faster in the subsequent reversal training. An accurate fit of all learning curves could be obtained with a reinforcement learning model endowed with a multiplicative learning rule, but not with an additive rule. Surprisingly, the multiplicative model could explain a large fraction of the interindividual variability by variations in the initial synaptic weights. Altogether, these results demonstrate the power of multiplicative learning rules to account for the full dynamics of biological learning and suggest an important role of initial wiring in the brain for predispositions to different tasks. PMID:24255115

  6. Intrinsic interactive reinforcement learning - Using error-related potentials for real world human-robot interaction.

    Science.gov (United States)

    Kim, Su Kyoung; Kirchner, Elsa Andrea; Stefes, Arne; Kirchner, Frank

    2017-12-14

    Reinforcement learning (RL) enables robots to learn its optimal behavioral strategy in dynamic environments based on feedback. Explicit human feedback during robot RL is advantageous, since an explicit reward function can be easily adapted. However, it is very demanding and tiresome for a human to continuously and explicitly generate feedback. Therefore, the development of implicit approaches is of high relevance. In this paper, we used an error-related potential (ErrP), an event-related activity in the human electroencephalogram (EEG), as an intrinsically generated implicit feedback (rewards) for RL. Initially we validated our approach with seven subjects in a simulated robot learning scenario. ErrPs were detected online in single trial with a balanced accuracy (bACC) of 91%, which was sufficient to learn to recognize gestures and the correct mapping between human gestures and robot actions in parallel. Finally, we validated our approach in a real robot scenario, in which seven subjects freely chose gestures and the real robot correctly learned the mapping between gestures and actions (ErrP detection (90% bACC)). In this paper, we demonstrated that intrinsically generated EEG-based human feedback in RL can successfully be used to implicitly improve gesture-based robot control during human-robot interaction. We call our approach intrinsic interactive RL.

  7. Model-based hierarchical reinforcement learning and human action control.

    Science.gov (United States)

    Botvinick, Matthew; Weinstein, Ari

    2014-11-05

    Recent work has reawakened interest in goal-directed or 'model-based' choice, where decisions are based on prospective evaluation of potential action outcomes. Concurrently, there has been growing attention to the role of hierarchy in decision-making and action control. We focus here on the intersection between these two areas of interest, considering the topic of hierarchical model-based control. To characterize this form of action control, we draw on the computational framework of hierarchical reinforcement learning, using this to interpret recent empirical findings. The resulting picture reveals how hierarchical model-based mechanisms might play a special and pivotal role in human decision-making, dramatically extending the scope and complexity of human behaviour.

  8. How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

    Science.gov (United States)

    Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C

    2014-03-01

    Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward

  9. Novelty and Inductive Generalization in Human Reinforcement Learning.

    Science.gov (United States)

    Gershman, Samuel J; Niv, Yael

    2015-07-01

    In reinforcement learning (RL), a decision maker searching for the most rewarding option is often faced with the question: What is the value of an option that has never been tried before? One way to frame this question is as an inductive problem: How can I generalize my previous experience with one set of options to a novel option? We show how hierarchical Bayesian inference can be used to solve this problem, and we describe an equivalence between the Bayesian model and temporal difference learning algorithms that have been proposed as models of RL in humans and animals. According to our view, the search for the best option is guided by abstract knowledge about the relationships between different options in an environment, resulting in greater search efficiency compared to traditional RL algorithms previously applied to human cognition. In two behavioral experiments, we test several predictions of our model, providing evidence that humans learn and exploit structured inductive knowledge to make predictions about novel options. In light of this model, we suggest a new interpretation of dopaminergic responses to novelty. Copyright © 2015 Cognitive Science Society, Inc.

  10. Reinforcement learning for resource allocation in LEO satellite networks.

    Science.gov (United States)

    Usaha, Wipawee; Barria, Javier A

    2007-06-01

    In this paper, we develop and assess online decision-making algorithms for call admission and routing for low Earth orbit (LEO) satellite networks. It has been shown in a recent paper that, in a LEO satellite system, a semi-Markov decision process formulation of the call admission and routing problem can achieve better performance in terms of an average revenue function than existing routing methods. However, the conventional dynamic programming (DP) numerical solution becomes prohibited as the problem size increases. In this paper, two solution methods based on reinforcement learning (RL) are proposed in order to circumvent the computational burden of DP. The first method is based on an actor-critic method with temporal-difference (TD) learning. The second method is based on a critic-only method, called optimistic TD learning. The algorithms enhance performance in terms of requirements in storage, computational complexity and computational time, and in terms of an overall long-term average revenue function that penalizes blocked calls. Numerical studies are carried out, and the results obtained show that the RL framework can achieve up to 56% higher average revenue over existing routing methods used in LEO satellite networks with reasonable storage and computational requirements.

  11. Curiosity driven reinforcement learning for motion planning on humanoids

    Science.gov (United States)

    Frank, Mikhail; Leitner, Jürgen; Stollenga, Marijn; Förster, Alexander; Schmidhuber, Jürgen

    2014-01-01

    Most previous work on artificial curiosity (AC) and intrinsic motivation focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study AC in a more realistic setting, we embody a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning (RL) framework consists of a state-of-the-art, low-level, reactive control layer, which controls the iCub while respecting constraints, and a high-level curious agent, which explores the iCub's state-action space through information gain maximization, learning a world model from experience, controlling the actual iCub hardware in real-time. To the best of our knowledge, this is the first ever embodied, curious agent for real-time motion planning on a humanoid. We demonstrate that it can learn compact Markov models to represent large regions of the iCub's configuration space, and that the iCub explores intelligently, showing interest in its physical constraints as well as in objects it finds in its environment. PMID:24432001

  12. Curiosity Driven Reinforcement Learning for Motion Planning on Humanoids

    Directory of Open Access Journals (Sweden)

    Mikhail eFrank

    2014-01-01

    Full Text Available Most previous work on textit{artificial curiosity} and textit{intrinsic motivation} focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study artificial curiosity in a more realistic setting, we emph{embody} a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning framework consists of a state-of-the-art, low-level, reactive control layer, which controls the iCub while respecting constraints, and a high-level curious agent, which explores the iCub's state-action space through information gain maximization, learning a world model from experience, controlling the actual iCub hardware in real-time. To the best of our knowledge, this is the first ever embodied, curious agent for real-time motion planning on a humanoid. We demonstrate that it can learn compact Markov models to represent large regions of the iCub's configuration space, and that the iCub explores textit{intelligently}, showing textit{interest} in its physical constraints as well as in objects it finds in its environment

  13. Curiosity driven reinforcement learning for motion planning on humanoids.

    Science.gov (United States)

    Frank, Mikhail; Leitner, Jürgen; Stollenga, Marijn; Förster, Alexander; Schmidhuber, Jürgen

    2014-01-06

    Most previous work on artificial curiosity (AC) and intrinsic motivation focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study AC in a more realistic setting, we embody a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning (RL) framework consists of a state-of-the-art, low-level, reactive control layer, which controls the iCub while respecting constraints, and a high-level curious agent, which explores the iCub's state-action space through information gain maximization, learning a world model from experience, controlling the actual iCub hardware in real-time. To the best of our knowledge, this is the first ever embodied, curious agent for real-time motion planning on a humanoid. We demonstrate that it can learn compact Markov models to represent large regions of the iCub's configuration space, and that the iCub explores intelligently, showing interest in its physical constraints as well as in objects it finds in its environment.

  14. Memory Transformation Enhances Reinforcement Learning in Dynamic Environments.

    Science.gov (United States)

    Santoro, Adam; Frankland, Paul W; Richards, Blake A

    2016-11-30

    Over the course of systems consolidation, there is a switch from a reliance on detailed episodic memories to generalized schematic memories. This switch is sometimes referred to as "memory transformation." Here we demonstrate a previously unappreciated benefit of memory transformation, namely, its ability to enhance reinforcement learning in a dynamic environment. We developed a neural network that is trained to find rewards in a foraging task where reward locations are continuously changing. The network can use memories for specific locations (episodic memories) and statistical patterns of locations (schematic memories) to guide its search. We find that switching from an episodic to a schematic strategy over time leads to enhanced performance due to the tendency for the reward location to be highly correlated with itself in the short-term, but regress to a stable distribution in the long-term. We also show that the statistics of the environment determine the optimal utilization of both types of memory. Our work recasts the theoretical question of why memory transformation occurs, shifting the focus from the avoidance of memory interference toward the enhancement of reinforcement learning across multiple timescales. As time passes, memories transform from a highly detailed state to a more gist-like state, in a process called "memory transformation." Theories of memory transformation speak to its advantages in terms of reducing memory interference, increasing memory robustness, and building models of the environment. However, the role of memory transformation from the perspective of an agent that continuously acts and receives reward in its environment is not well explored. In this work, we demonstrate a view of memory transformation that defines it as a way of optimizing behavior across multiple timescales. Copyright © 2016 the authors 0270-6474/16/3612228-15$15.00/0.

  15. Beyond simple reinforcement learning: the computational neurobiology of reward-learning and valuation.

    Science.gov (United States)

    O'Doherty, John P

    2012-04-01

    Neural computational accounts of reward-learning have been dominated by the hypothesis that dopamine neurons behave like a reward-prediction error and thus facilitate reinforcement learning in striatal target neurons. While this framework is consistent with a lot of behavioral and neural evidence, this theory fails to account for a number of behavioral and neurobiological observations. In this special issue of EJN we feature a combination of theoretical and experimental papers highlighting some of the explanatory challenges faced by simple reinforcement-learning models and describing some of the ways in which the framework is being extended in order to address these challenges. © 2012 The Author. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.

  16. BAM learning of nonlinearly separable tasks by using an asymmetrical output function and reinforcement learning.

    Science.gov (United States)

    Chartier, Sylvain; Boukadoum, Mounir; Amiri, Mahmood

    2009-08-01

    Most bidirectional associative memory (BAM) networks use a symmetrical output function for dual fixed-point behavior. In this paper, we show that by introducing an asymmetry parameter into a recently introduced chaotic BAM output function, prior knowledge can be used to momentarily disable desired attractors from memory, hence biasing the search space to improve recall performance. This property allows control of chaotic wandering, favoring given subspaces over others. In addition, reinforcement learning can then enable a dual BAM architecture to store and recall nonlinearly separable patterns. Our results allow the same BAM framework to model three different types of learning: supervised, reinforcement, and unsupervised. This ability is very promising from the cognitive modeling viewpoint. The new BAM model is also useful from an engineering perspective; our simulations results reveal a notable overall increase in BAM learning and recall performances when using a hybrid model with the general regression neural network (GRNN).

  17. Reinforcement learning for discounted values often loses the goal in the application to animal learning.

    Science.gov (United States)

    Yamaguchi, Yoshiya; Sakai, Yutaka

    2012-11-01

    The impulsive preference of an animal for an immediate reward implies that it might subjectively discount the value of potential future outcomes. A theoretical framework to maximize the discounted subjective value has been established in the reinforcement learning theory. The framework has been successfully applied in engineering. However, this study identified a limitation when applied to animal behavior, where in some cases, there is no learning goal. Here a possible learning framework was proposed that is well-posed in any cases and that is consistent with the impulsive preference. Copyright © 2012 Elsevier Ltd. All rights reserved.

  18. Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

    Directory of Open Access Journals (Sweden)

    Nicolas Frémaux

    2013-04-01

    Full Text Available Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD learning of Doya (2000 to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

  19. Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

    Science.gov (United States)

    Frémaux, Nicolas; Sprekeler, Henning; Gerstner, Wulfram

    2013-04-01

    Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

  20. Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons

    Science.gov (United States)

    Frémaux, Nicolas; Sprekeler, Henning; Gerstner, Wulfram

    2013-01-01

    Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity. PMID:23592970

  1. A spiking neural network model of model-free reinforcement learning with high-dimensional sensory input and perceptual ambiguity.

    Science.gov (United States)

    Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji

    2015-01-01

    A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach.

  2. Does reward frequency or magnitude drive reinforcement-learning in attention-deficit/hyperactivity disorder?

    NARCIS (Netherlands)

    Luman, M.; van Meel, C.S.; Oosterlaan, J.; Sergeant, J.A.; Geurts, H.M.

    2009-01-01

    Children with attention-deficit/hyperactivity disorder (ADHD) show an impaired ability to use feedback in the context of learning. A stimulus-response learning task was used to investigate whether (1) children with ADHD displayed flatter learning curves, (2) reinforcement-learning in ADHD was

  3. Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity, and Performance

    Science.gov (United States)

    2014-09-29

    learning tasks ( Hester and Stone, 2011). avi-tamer is mostly identical to vi-tamer: the human reward model R̂H is learned as before, using tamer; a... Hester , T., Stone, P., 2011. Learning and using models. In: Wiering, M., van Otterlo, M. (Eds.), Reinforcement Learning: State of the Art. Springer

  4. Beamforming and Power Control in Sensor Arrays Using Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Náthalee C. Almeida

    2015-03-01

    Full Text Available The use of beamforming and power control, combined or separately, has advantages and disadvantages, depending on the application. The combined use of beamforming and power control has been shown to be highly effective in applications involving the suppression of interference signals from different sources. However, it is necessary to identify efficient methodologies for the combined operation of these two techniques. The most appropriate technique may be obtained by means of the implementation of an intelligent agent capable of making the best selection between beamforming and power control. The present paper proposes an algorithm using reinforcement learning (RL to determine the optimal combination of beamforming and power control in sensor arrays. The RL algorithm used was Q-learning, employing an ε-greedy policy, and training was performed using the offline method. The simulations showed that RL was effective for implementation of a switching policy involving the different techniques, taking advantage of the positive characteristics of each technique in terms of signal reception.

  5. A reinforcement learning model of joy, distress, hope and fear

    Science.gov (United States)

    Broekens, Joost; Jacobs, Elmer; Jonker, Catholijn M.

    2015-07-01

    In this paper we computationally study the relation between adaptive behaviour and emotion. Using the reinforcement learning framework, we propose that learned state utility, ?, models fear (negative) and hope (positive) based on the fact that both signals are about anticipation of loss or gain. Further, we propose that joy/distress is a signal similar to the error signal. We present agent-based simulation experiments that show that this model replicates psychological and behavioural dynamics of emotion. This work distinguishes itself by assessing the dynamics of emotion in an adaptive agent framework - coupling it to the literature on habituation, development, extinction and hope theory. Our results support the idea that the function of emotion is to provide a complex feedback signal for an organism to adapt its behaviour. Our work is relevant for understanding the relation between emotion and adaptation in animals, as well as for human-robot interaction, in particular how emotional signals can be used to communicate between adaptive agents and humans.

  6. Exploring Feature Dimensions to Learn a New Policy in an Uninformed Reinforcement Learning Task.

    Science.gov (United States)

    Choung, Oh-Hyeon; Lee, Sang Wan; Jeong, Yong

    2017-12-15

    When making a choice with limited information, we explore new features through trial-and-error to learn how they are related. However, few studies have investigated exploratory behaviour when information is limited. In this study, we address, at both the behavioural and neural level, how, when, and why humans explore new feature dimensions to learn a new policy for choosing a state-space. We designed a novel multi-dimensional reinforcement learning task to encourage participants to explore and learn new features, then used a reinforcement learning algorithm to model policy exploration and learning behaviour. Our results provide the first evidence that, when humans explore new feature dimensions, their values are transferred from the previous policy to the new online (active) policy, as opposed to being learned from scratch. We further demonstrated that exploration may be regulated by the level of cognitive ambiguity, and that this process might be controlled by the frontopolar cortex. This opens up new possibilities of further understanding how humans explore new features in an open-space with limited information.

  7. Reinforcement Learning for Predictive Analytics in Smart Cities

    Directory of Open Access Journals (Sweden)

    Kostas Kolomvatsos

    2017-06-01

    Full Text Available The digitization of our lives cause a shift in the data production as well as in the required data management. Numerous nodes are capable of producing huge volumes of data in our everyday activities. Sensors, personal smart devices as well as the Internet of Things (IoT paradigm lead to a vast infrastructure that covers all the aspects of activities in modern societies. In the most of the cases, the critical issue for public authorities (usually, local, like municipalities is the efficient management of data towards the support of novel services. The reason is that analytics provided on top of the collected data could help in the delivery of new applications that will facilitate citizens’ lives. However, the provision of analytics demands intelligent techniques for the underlying data management. The most known technique is the separation of huge volumes of data into a number of parts and their parallel management to limit the required time for the delivery of analytics. Afterwards, analytics requests in the form of queries could be realized and derive the necessary knowledge for supporting intelligent applications. In this paper, we define the concept of a Query Controller ( Q C that receives queries for analytics and assigns each of them to a processor placed in front of each data partition. We discuss an intelligent process for query assignments that adopts Machine Learning (ML. We adopt two learning schemes, i.e., Reinforcement Learning (RL and clustering. We report on the comparison of the two schemes and elaborate on their combination. Our aim is to provide an efficient framework to support the decision making of the QC that should swiftly select the appropriate processor for each query. We provide mathematical formulations for the discussed problem and present simulation results. Through a comprehensive experimental evaluation, we reveal the advantages of the proposed models and describe the outcomes results while comparing them with a

  8. Longitudinal investigation on learned helplessness tested under negative and positive reinforcement involving stimulus control.

    Science.gov (United States)

    Oliveira, Emileane C; Hunziker, Maria Helena

    2014-07-01

    In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.

  9. Reinforcement learning of self-regulated β-oscillations for motor restoration in chronic stroke

    Directory of Open Access Journals (Sweden)

    Georgios eNaros

    2015-07-01

    Full Text Available Neurofeedback training of motor imagery-related brain-states with brain-machine interfaces (BMI is currently being explored prior to standard physiotherapy to improve the motor outcome of stroke rehabilitation. Pilot studies suggest that such a priming intervention before physiotherapy might increase the responsiveness of the brain to the subsequent physiotherapy, thereby improving the clinical outcome. However, there is little evidence up to now that these BMI-based interventions have achieved operate conditioning of specific brain states that facilitate task-specific functional gains beyond the practice of primed physiotherapy. In this context, we argue that BMI technology needs to aim at physiological features relevant for the targeted behavioral gain. Moreover, this therapeutic intervention has to be informed by concepts of reinforcement learning to develop its full potential. Such a refined neurofeedback approach would need to address the following issues (1 Defining a physiological feedback target specific to the intended behavioral gain, e.g. β-band oscillations for cortico-muscular communication. This targeted brain state could well be different from the brain state optimal for the neurofeedback task (2 Selecting a BMI classification and thresholding approach on the basis of learning principles, i.e. balancing challenge and reward of the neurofeedback task instead of maximizing the classification accuracy of the feedback device (3 Adjusting the feedback in the course of the training period to account for the cognitive load and the learning experience of the participant. The proposed neurofeedback strategy provides evidence for the feasibility of the suggested approach by demonstrating that dynamic threshold adaptation based on reinforcement learning may lead to frequency-specific operant conditioning of β-band oscillations paralleled by task-specific motor improvement; a proposal that requires investigation in a larger cohort of stroke

  10. Frontostriatal and Dopamine Markers of Individual Differences in Reinforcement Learning: A Multi-modal Investigation.

    Science.gov (United States)

    Kaiser, Roselinde H; Treadway, Michael T; Wooten, Dustin W; Kumar, Poornima; Goer, Franziska; Murray, Laura; Beltzer, Miranda; Pechtel, Pia; Whitton, Alexis; Cohen, Andrew L; Alpert, Nathaniel M; El Fakhri, Georges; Normandin, Marc D; Pizzagalli, Diego A

    2017-10-31

    Prior studies have shown that dopamine (DA) functioning in frontostriatal circuits supports reinforcement learning (RL), as phasic DA activity in ventral striatum signals unexpected reward and may drive coordinated activity of striatal and orbitofrontal regions that support updating of action plans. However, the nature of DA functioning in RL is complex, in particular regarding the role of DA clearance in RL behavior. Here, in a multi-modal neuroimaging study with healthy adults, we took an individual differences approach to the examination of RL behavior and DA clearance mechanisms in frontostriatal learning networks. We predicted that better RL would be associated with decreased striatal DA transporter (DAT) availability and increased intrinsic functional connectivity among DA-rich frontostriatal regions. In support of these predictions, individual differences in RL behavior were related to DAT binding potential in ventral striatum and resting-state functional connectivity between ventral striatum and orbitofrontal cortex. Critically, DAT binding potential had an indirect effect on reinforcement learning behavior through frontostriatal connectivity, suggesting potential causal relationships across levels of neurocognitive functioning. These data suggest that individual differences in DA clearance and frontostriatal coordination may serve as markers for RL, and suggest directions for research on psychopathologies characterized by altered RL. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. Phasic excitation of ventral tegmental dopamine neurons potentiates the initiation of conditioned approach behavior: Parametric and reinforcement-schedule analyses

    Directory of Open Access Journals (Sweden)

    Anton eIlango

    2014-05-01

    Full Text Available Midbrain dopamine neurons are implicated in motivation and learning. However, it is unclear how phasic excitation of dopamine neurons, which is implicated in learning, is involved in motivation. Here we used a self-stimulation procedure to examine how mice seek for optogenetically-induced phasic excitation of dopamine neurons, with an emphasis on the temporal dimension. TH-Cre transgenic mice received adeno-associated viral vectors encoding channelrhodopsin-2 into the ventral tegmental area, resulting in selective expression of the opsin in dopamine neurons. These mice were trained to press on a lever for photo-pulse trains that phasically excited dopamine neurons. They learned to self-stimulate in a fast, constant manner, and rapidly reduced pressing during extinction. We first determined effective parameters of photo-pulse trains in self-stimulation. Lever-press rates changed as a function of the manipulation of pulse number, duration, intensity and frequency. We then examined effects of interval and ratio schedules of reinforcement on photo-pulse train reinforcement, which was contrasted with food reinforcement. Reinforcement with food inhibited lever pressing for a few seconds, after which pressing was robustly regulated in a goal-directed manner. In contrast, phasic excitation of dopamine neurons robustly potentiated the initiation of lever pressing; however, this effect did not last more than 1 s and quickly diminished. Indeed, response rates markedly decreased when lever pressing was reinforced with inter-reinforcement interval schedules of 3 or 10 s or ratio schedules requiring multiple responses per reinforcement. Thus, phasic excitation of dopamine neurons briefly potentiates the initiation of approach behavior with apparent lack of long-term motivational regulation.

  12. Intelligence moderates reinforcement learning: a mini-review of the neural evidence.

    Science.gov (United States)

    Chen, Chong

    2015-06-01

    Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. Copyright © 2015 the American Physiological Society.

  13. Reinforcement Learning Based Novel Adaptive Learning Framework for Smart Grid Prediction

    Directory of Open Access Journals (Sweden)

    Tian Li

    2017-01-01

    Full Text Available Smart grid is a potential infrastructure to supply electricity demand for end users in a safe and reliable manner. With the rapid increase of the share of renewable energy and controllable loads in smart grid, the operation uncertainty of smart grid has increased briskly during recent years. The forecast is responsible for the safety and economic operation of the smart grid. However, most existing forecast methods cannot account for the smart grid due to the disabilities to adapt to the varying operational conditions. In this paper, reinforcement learning is firstly exploited to develop an online learning framework for the smart grid. With the capability of multitime scale resolution, wavelet neural network has been adopted in the online learning framework to yield reinforcement learning and wavelet neural network (RLWNN based adaptive learning scheme. The simulations on two typical prediction problems in smart grid, including wind power prediction and load forecast, validate the effectiveness and the scalability of the proposed RLWNN based learning framework and algorithm.

  14. Predictive representations can link model-based reinforcement learning to model-free mechanisms.

    Science.gov (United States)

    Russek, Evan M; Momennejad, Ida; Botvinick, Matthew M; Gershman, Samuel J; Daw, Nathaniel D

    2017-09-01

    Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation.

  15. Simple approach to reinforce hydrogels with cellulose nanocrystals

    Science.gov (United States)

    Yang, Jun; Han, Chun-Rui; Xu, Feng; Sun, Run-Cang

    2014-05-01

    The physical crosslinking of colloidal nanoparticles via dynamic and directional non-covalent interactions has led to significant advances in composite hydrogels. In this paper, we report a simple approach to fabricate tough, stretchable and hysteretic isotropic nanocomposite hydrogels, where rod-like cellulose nanocrystals (CNCs) are encapsulated by flexible polymer chains of poly(N,N-dimethylacrylamide) (PDMA). The CNC-PDMA colloidal clusters build a homogeneously cross-linked network and lead to significant reinforcing effect of the composites. Hierarchically structured CNC-PDMA clusters, from isolated particles to an interpenetrated network, are observed by transmission electron microscopy measurements. Dynamic shear oscillation measurements are applied to demystify the differences in network rheological behaviors, which were compared with network behaviors of chemically cross-linked PDMA counterparts. Tensile tests indicate that the hybrid hydrogels possess higher mechanical properties and a more efficient energy dissipation mechanism. In particular, with only 0.8 wt% of CNC loading, a 4.8-fold increase in Young's modulus, 9.2-fold increase in tensile strength, and 5.8-fold increase in fracture strain are achieved, which is ascribed to a combination of CNC reinforcement in the soft matrix and CNC-PDMA colloidal cluster conformational rearrangement under stretching. Physical interactions within networks serve as reversible sacrificial bonds that dissociate upon deformation, exhibiting large hysteresis as an energy dissipation mechanism via cluster mobility. This result contrasts with the case of chemically cross-linked PDMA counterparts where the stress relaxation is slow due to the permanent cross-links and low resistance against crack propagation within the covalent network.The physical crosslinking of colloidal nanoparticles via dynamic and directional non-covalent interactions has led to significant advances in composite hydrogels. In this paper, we report

  16. Dopamine-Dependent Reinforcement of Motor Skill Learning: Evidence from Gilles de la Tourette Syndrome

    Science.gov (United States)

    Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias

    2011-01-01

    Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…

  17. Temporal-difference reinforcement learning with distributed representations.

    Directory of Open Access Journals (Sweden)

    Zeb Kurth-Nelson

    2009-10-01

    Full Text Available Temporal-difference (TD algorithms have been proposed as models of reinforcement learning (RL. We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential discounting factors produce hyperbolic discounting in the behavior of the agent itself. We examine these issues in the context of a TD RL model in which state-belief is distributed over a set of exponentially-discounting "micro-Agents", each of which has a separate discounting factor (gamma. Each microAgent maintains an independent hypothesis about the state of the world, and a separate value-estimate of taking actions within that hypothesized state. The overall agent thus instantiates a flexible representation of an evolving world-state. As with other TD models, the value-error (delta signal within the model matches dopamine signals recorded from animals in standard conditioning reward-paradigms. The distributed representation of belief provides an explanation for the decrease in dopamine at the conditioned stimulus seen in overtrained animals, for the differences between trace and delay conditioning, and for transient bursts of dopamine seen at movement initiation. Because each microAgent also includes its own exponential discounting factor, the overall agent shows hyperbolic discounting, consistent with behavioral experiments.

  18. Ultrafast photonic reinforcement learning based on laser chaos.

    Science.gov (United States)

    Naruse, Makoto; Terashima, Yuta; Uchida, Atsushi; Kim, Song-Ju

    2017-08-18

    Reinforcement learning involves decision making in dynamic and uncertain environments and constitutes an important element of artificial intelligence (AI). In this work, we experimentally demonstrate that the ultrafast chaotic oscillatory dynamics of lasers efficiently solve the multi-armed bandit problem (MAB), which requires decision making concerning a class of difficult trade-offs called the exploration-exploitation dilemma. To solve the MAB, a certain degree of randomness is required for exploration purposes. However, pseudorandom numbers generated using conventional electronic circuitry encounter severe limitations in terms of their data rate and the quality of randomness due to their algorithmic foundations. We generate laser chaos signals using a semiconductor laser sampled at a maximum rate of 100 GSample/s, and combine it with a simple decision-making principle called tug of war with a variable threshold, to ensure ultrafast, adaptive, and accurate decision making at a maximum adaptation speed of 1 GHz. We found that decision-making performance was maximized with an optimal sampling interval, and we highlight the exact coincidence between the negative autocorrelation inherent in laser chaos and decision-making performance. This study paves the way for a new realm of ultrafast photonics in the age of AI, where the ultrahigh bandwidth of light wave can provide new value.

  19. Approaches toward learning in physiotherapy

    Directory of Open Access Journals (Sweden)

    L. Keiller

    2013-01-01

    Full Text Available The aim of this study was to investigate the approaches toward learning of undergraduate Physiotherapy students in a PBl module to enhance facilitation of learning at the Stellenbosch University, Division of Physiotherapy in South Africa. This quantitative, descriptive study utilized the revised Two-factor Study Process Questionnaire (r-SPQ-2f to evaluate the study cohorts’ approaches toward learning in the module. results of the data instruments were analysed statistically and discussed in a descriptive manner. There were a statistically significant greater number of students who adopted a deep approach toward learning at the commencement of the academic year. Students showed a trend toward an increase in their intrinsic interest in the learning material as the module progressed. Students in the Applied Physiotherapy module (ATP started to shift their focus from a surface learning approach to a deep learning approach. further research is needed to determine the long-term changes in approach toward learning and the possible determinants of these changes. This can be done in conjunction with the implementation of quality assurance mechanisms for learning material and earlier preparation of students for the change in the learning environment.

  20. Multi-Objective Reinforcement Learning-based Deep Neural Networks for Cognitive Space Communications

    Science.gov (United States)

    Ferreria, Paulo; Paffenroth, Randy; Wyglinski, Alexander M.; Hackett, Timothy; Bilen, Sven; Reinhart, Richard; Mortensen, Dale

    2017-01-01

    Future communication subsystems of space exploration missions can potentially benefit from software-defined radios (SDRs) controlled by machine learning algorithms. In this paper, we propose a novel hybrid radio resource allocation management control algorithm that integrates multi-objective reinforcement learning and deep artificial neural networks. The objective is to efficiently manage communications system resources by monitoring performance functions with common dependent variables that result in conflicting goals. The uncertainty in the performance of thousands of different possible combinations of radio parameters makes the trade-off between exploration and exploitation in reinforcement learning (RL) much more challenging for future critical space-based missions. Thus, the system should spend as little time as possible on exploring actions, and whenever it explores an action, it should perform at acceptable levels most of the time. The proposed approach enables on-line learning by interactions with the environment and restricts poor resource allocation performance through virtual environment exploration. Improvements in the multiobjective performance can be achieved via transmitter parameter adaptation on a packet-basis, with poorly predicted performance promptly resulting in rejected decisions. Simulations presented in this work considered the DVB-S2 standard adaptive transmitter parameters and additional ones expected to be present in future adaptive radio systems. Performance results are provided by analysis of the proposed hybrid algorithm when operating across a satellite communication channel from Earth to GEO orbit during clear sky conditions. The proposed approach constitutes part of the core cognitive engine proof-of-concept to be delivered to the NASA Glenn Research Center SCaN Testbed located onboard the International Space Station.

  1. Multi-Objective Reinforcement Learning-Based Deep Neural Networks for Cognitive Space Communications

    Science.gov (United States)

    Ferreria, Paulo Victor R.; Paffenroth, Randy; Wyglinski, Alexander M.; Hackett, Timothy M.; Bilen, Sven G.; Reinhart, Richard C.; Mortensen, Dale J.

    2017-01-01

    Future communication subsystems of space exploration missions can potentially benefit from software-defined radios (SDRs) controlled by machine learning algorithms. In this paper, we propose a novel hybrid radio resource allocation management control algorithm that integrates multi-objective reinforcement learning and deep artificial neural networks. The objective is to efficiently manage communications system resources by monitoring performance functions with common dependent variables that result in conflicting goals. The uncertainty in the performance of thousands of different possible combinations of radio parameters makes the trade-off between exploration and exploitation in reinforcement learning (RL) much more challenging for future critical space-based missions. Thus, the system should spend as little time as possible on exploring actions, and whenever it explores an action, it should perform at acceptable levels most of the time. The proposed approach enables on-line learning by interactions with the environment and restricts poor resource allocation performance through virtual environment exploration. Improvements in the multiobjective performance can be achieved via transmitter parameter adaptation on a packet-basis, with poorly predicted performance promptly resulting in rejected decisions. Simulations presented in this work considered the DVB-S2 standard adaptive transmitter parameters and additional ones expected to be present in future adaptive radio systems. Performance results are provided by analysis of the proposed hybrid algorithm when operating across a satellite communication channel from Earth to GEO orbit during clear sky conditions. The proposed approach constitutes part of the core cognitive engine proof-of-concept to be delivered to the NASA Glenn Research Center SCaN Testbed located onboard the International Space Station.

  2. Nonparametric bayesian reward segmentation for skill discovery using inverse reinforcement learning

    CSIR Research Space (South Africa)

    Ranchod, P

    2015-10-01

    Full Text Available We present a method for segmenting a set of unstructured demonstration trajectories to discover reusable skills using inverse reinforcement learning (IRL). Each skill is characterised by a latent reward function which the demonstrator is assumed...

  3. Visual reinforcement audiometry: an Adobe Flash based approach.

    Science.gov (United States)

    Atherton, Steve

    2010-09-01

    Visual Reinforcement Audiometry (VRA) is a key behavioural test for young children. It is central to the diagnosis of hearing-impaired infants (1) . Habituation to the visual reinforcement can give misleading results. Medical Illustration ABM University Health Board has designed a collection of Flash animations to overcome this.

  4. [The model of the reward choice basing on the theory of reinforcement learning].

    Science.gov (United States)

    Smirnitskaia, I A; Frolov, A A; Merzhanova, G Kh

    2007-01-01

    We developed the model of alimentary instrumental conditioned bar-pressing reflex for cats making a choice between either immediate small reinforcement ("impulsive behavior") or delayed more valuable reinforcement ("self-control behavior"). Our model is based on the reinforcement learning theory. We emulated dopamine contribution by discount coefficient of this theory (a subjective decrease in the value of a delayed reinforcement). The results of computer simulation showed that "cats" with large discount coefficient demonstrated "self-control behavior"; small discount coefficient was associated with "impulsive behavior". This data are in agreement with the experimental data indicating that the impulsive behavior is due to a decreased amount of dopamine in striatum.

  5. Nucleus accumbens core lesions retard instrumental learning and performance with delayed reinforcement in the rat.

    Science.gov (United States)

    Cardinal, Rudolf N; Cheung, Timothy H C

    2005-02-03

    Delays between actions and their outcomes severely hinder reinforcement learning systems, but little is known of the neural mechanism by which animals overcome this problem and bridge such delays. The nucleus accumbens core (AcbC), part of the ventral striatum, is required for normal preference for a large, delayed reward over a small, immediate reward (self-controlled choice) in rats, but the reason for this is unclear. We investigated the role of the AcbC in learning a free-operant instrumental response using delayed reinforcement, performance of a previously-learned response for delayed reinforcement, and assessment of the relative magnitudes of two different rewards. Groups of rats with excitotoxic or sham lesions of the AcbC acquired an instrumental response with different delays (0, 10, or 20 s) between the lever-press response and reinforcer delivery. A second (inactive) lever was also present, but responding on it was never reinforced. As expected, the delays retarded learning in normal rats. AcbC lesions did not hinder learning in the absence of delays, but AcbC-lesioned rats were impaired in learning when there was a delay, relative to sham-operated controls. All groups eventually acquired the response and discriminated the active lever from the inactive lever to some degree. Rats were subsequently trained to discriminate reinforcers of different magnitudes. AcbC-lesioned rats were more sensitive to differences in reinforcer magnitude than sham-operated controls, suggesting that the deficit in self-controlled choice previously observed in such rats was a consequence of reduced preference for delayed rewards relative to immediate rewards, not of reduced preference for large rewards relative to small rewards. AcbC lesions also impaired the performance of a previously-learned instrumental response in a delay-dependent fashion. These results demonstrate that the AcbC contributes to instrumental learning and performance by bridging delays between subjects

  6. Dissociating error-based and reinforcement-based loss functions during sensorimotor learning.

    Science.gov (United States)

    Cashaback, Joshua G A; McGregor, Heather R; Mohatarem, Ayman; Gribble, Paul L

    2017-07-01

    It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback.

  7. Dissociating error-based and reinforcement-based loss functions during sensorimotor learning

    Science.gov (United States)

    McGregor, Heather R.; Mohatarem, Ayman

    2017-01-01

    It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback. PMID:28753634

  8. The "proactive" model of learning: Integrative framework for model-free and model-based reinforcement learning utilizing the associative learning-based proactive brain concept.

    Science.gov (United States)

    Zsuga, Judit; Biro, Klara; Papp, Csaba; Tajti, Gabor; Gesztelyi, Rudolf

    2016-02-01

    Reinforcement learning (RL) is a powerful concept underlying forms of associative learning governed by the use of a scalar reward signal, with learning taking place if expectations are violated. RL may be assessed using model-based and model-free approaches. Model-based reinforcement learning involves the amygdala, the hippocampus, and the orbitofrontal cortex (OFC). The model-free system involves the pedunculopontine-tegmental nucleus (PPTgN), the ventral tegmental area (VTA) and the ventral striatum (VS). Based on the functional connectivity of VS, model-free and model based RL systems center on the VS that by integrating model-free signals (received as reward prediction error) and model-based reward related input computes value. Using the concept of reinforcement learning agent we propose that the VS serves as the value function component of the RL agent. Regarding the model utilized for model-based computations we turned to the proactive brain concept, which offers an ubiquitous function for the default network based on its great functional overlap with contextual associative areas. Hence, by means of the default network the brain continuously organizes its environment into context frames enabling the formulation of analogy-based association that are turned into predictions of what to expect. The OFC integrates reward-related information into context frames upon computing reward expectation by compiling stimulus-reward and context-reward information offered by the amygdala and hippocampus, respectively. Furthermore we suggest that the integration of model-based expectations regarding reward into the value signal is further supported by the efferent of the OFC that reach structures canonical for model-free learning (e.g., the PPTgN, VTA, and VS). (c) 2016 APA, all rights reserved).

  9. Fiber-reinforced technology in multidisciplinary chairside approaches

    OpenAIRE

    Arhun Neslihan; Arman Ayca

    2008-01-01

    There is an increasing demand to improve dentofacial esthetics in the adult population. This demand usually requires a close collaboration within the various disciplines of dentistry and the patient at every stage of the therapy. The materials and techniques used by these interdisciplinary clinicians must be conservative and minimally invasive. Fiber-reinforced composite technology offers such solutions for chairside applications. This case report presents two cases where fiber-reinforced rib...

  10. Human dorsal striatal activity during choice discriminates reinforcement learning behavior from the gambler's fallacy.

    Science.gov (United States)

    Jessup, Ryan K; O'Doherty, John P

    2011-04-27

    Reinforcement learning theory has generated substantial interest in neurobiology, particularly because of the resemblance between phasic dopamine and reward prediction errors. Actor-critic theories have been adapted to account for the functions of the striatum, with parts of the dorsal striatum equated to the actor. Here, we specifically test whether the human dorsal striatum--as predicted by an actor-critic instantiation--is used on a trial-to-trial basis at the time of choice to choose in accordance with reinforcement learning theory, as opposed to a competing strategy: the gambler's fallacy. Using a partial-brain functional magnetic resonance imaging scanning protocol focused on the striatum and other ventral brain areas, we found that the dorsal striatum is more active when choosing consistent with reinforcement learning compared with the competing strategy. Moreover, an overlapping area of dorsal striatum along with the ventral striatum was found to be correlated with reward prediction errors at the time of outcome, as predicted by the actor-critic framework. These findings suggest that the same region of dorsal striatum involved in learning stimulus-response associations may contribute to the control of behavior during choice, thereby using those learned associations. Intriguingly, neither reinforcement learning nor the gambler's fallacy conformed to the optimal choice strategy on the specific decision-making task we used. Thus, the dorsal striatum may contribute to the control of behavior according to reinforcement learning even when the prescriptions of such an algorithm are suboptimal in terms of maximizing future rewards.

  11. One-back reinforcement dissociates implicit-procedural and explicit-declarative category learning.

    Science.gov (United States)

    Smith, J David; Jamani, Sonia; Boomer, Joseph; Church, Barbara A

    2017-10-10

    The debate over unitary/multiple category-learning utilities is reminiscent of debates about multiple memory systems and unitary/dual codes in knowledge representation. In categorization, researchers continue to seek paradigms to dissociate explicit learning processes (yielding verbalizable rules) from implicit learning processes (yielding stimulus-response associations that remain outside awareness). We introduce a new dissociation here. Participants learned matched category tasks with a multidimensional, information-integration solution or a one-dimensional, rule-based solution. They received reinforcement immediately (0-Back reinforcement) or after one intervening trial (1-Back reinforcement). Lagged reinforcement eliminated implicit, information-integration category learning but preserved explicit, rule-based learning. Moreover, information-integration learners facing lagged reinforcement spontaneously adopted explicit rule strategies that poorly suited their task. The results represent a strong process dissociation in categorization, broadening the range of empirical techniques for testing the multiple-process theoretical perspective. This and related methods that disable associative learning-fostering a transition to explicit-declarative cognition-could have broad utility in comparative, cognitive, and developmental science.

  12. Design issues of a reinforcement-based self-learning fuzzy controller for petrochemical process control

    Science.gov (United States)

    Yen, John; Wang, Haojin; Daugherity, Walter C.

    1992-01-01

    Fuzzy logic controllers have some often-cited advantages over conventional techniques such as PID control, including easier implementation, accommodation to natural language, and the ability to cover a wider range of operating conditions. One major obstacle that hinders the broader application of fuzzy logic controllers is the lack of a systematic way to develop and modify their rules; as a result the creation and modification of fuzzy rules often depends on trial and error or pure experimentation. One of the proposed approaches to address this issue is a self-learning fuzzy logic controller (SFLC) that uses reinforcement learning techniques to learn the desirability of states and to adjust the consequent part of its fuzzy control rules accordingly. Due to the different dynamics of the controlled processes, the performance of a self-learning fuzzy controller is highly contingent on its design. The design issue has not received sufficient attention. The issues related to the design of a SFLC for application to a petrochemical process are discussed, and its performance is compared with that of a PID and a self-tuning fuzzy logic controller.

  13. Stroke-Based Stylization Learning and Rendering with Inverse Reinforcement Learning

    OpenAIRE

    Xie, N.; Zhao, T.; Tian, Feng; Zhang, X. H.; Sugiyam, M.

    2015-01-01

    Among various traditional art forms, brush stroke drawing is one of the widely used styles in modern computer graphic tools such as GIMP, Photoshop and Painter. In this paper, we develop an AI-aided art authoring (A4) system of non-\\ud photorealistic rendering that allows users to automatically generate brush stroke paintings in a specific artist’s style. Within the reinforcement learning framework of brush stroke generation proposed by Xie et al.[Xie et al., 2012], our contribution in this p...

  14. Maximization of Learning Speed Due to Neuronal Redundancy in Reinforcement Learning

    Science.gov (United States)

    Takiyama, Ken

    2016-11-01

    Adaptable neural activity contributes to the flexibility of human behavior, which is optimized in situations such as motor learning and decision making. Although learning signals in motor learning and decision making are low-dimensional, neural activity, which is very high dimensional, must be modified to achieve optimal performance based on the low-dimensional signal, resulting in a severe credit-assignment problem. Despite this problem, the human brain contains a vast number of neurons, leaving an open question: what is the functional significance of the huge number of neurons? Here, I address this question by analyzing a redundant neural network with a reinforcement-learning algorithm in which the numbers of neurons and output units are N and M, respectively. Because many combinations of neural activity can generate the same output under the condition of N ≫ M, I refer to the index N - M as neuronal redundancy. Although greater neuronal redundancy makes the credit-assignment problem more severe, I demonstrate that a greater degree of neuronal redundancy facilitates learning speed. Thus, in an apparent contradiction of the credit-assignment problem, I propose the hypothesis that a functional role of a huge number of neurons or a huge degree of neuronal redundancy is to facilitate learning speed.

  15. Variability in Dopamine Genes Dissociates Model-Based and Model-Free Reinforcement Learning.

    Science.gov (United States)

    Doll, Bradley B; Bath, Kevin G; Daw, Nathaniel D; Frank, Michael J

    2016-01-27

    Considerable evidence suggests that multiple learning systems can drive behavior. Choice can proceed reflexively from previous actions and their associated outcomes, as captured by "model-free" learning algorithms, or flexibly from prospective consideration of outcomes that might occur, as captured by "model-based" learning algorithms. However, differential contributions of dopamine to these systems are poorly understood. Dopamine is widely thought to support model-free learning by modulating plasticity in striatum. Model-based learning may also be affected by these striatal effects, or by other dopaminergic effects elsewhere, notably on prefrontal working memory function. Indeed, prominent demonstrations linking striatal dopamine to putatively model-free learning did not rule out model-based effects, whereas other studies have reported dopaminergic modulation of verifiably model-based learning, but without distinguishing a prefrontal versus striatal locus. To clarify the relationships between dopamine, neural systems, and learning strategies, we combine a genetic association approach in humans with two well-studied reinforcement learning tasks: one isolating model-based from model-free behavior and the other sensitive to key aspects of striatal plasticity. Prefrontal function was indexed by a polymorphism in the COMT gene, differences of which reflect dopamine levels in the prefrontal cortex. This polymorphism has been associated with differences in prefrontal activity and working memory. Striatal function was indexed by a gene coding for DARPP-32, which is densely expressed in the striatum where it is necessary for synaptic plasticity. We found evidence for our hypothesis that variations in prefrontal dopamine relate to model-based learning, whereas variations in striatal dopamine function relate to model-free learning. Decisions can stem reflexively from their previously associated outcomes or flexibly from deliberative consideration of potential choice outcomes

  16. Automatic tuning of the reinforcement function

    Energy Technology Data Exchange (ETDEWEB)

    Touzet, C. [Oak Ridge National Lab., TN (United States). Center for Engineering Systems Advanced Research; Santos, J.M. [Univ. de Buenos Aires (Argentina)

    1997-12-31

    The aim of this work is to present a method that helps tuning the reinforcement function parameters in a reinforcement learning approach. Since the proposal of neural based implementations for the reinforcement learning paradigm (which reduced learning time and memory requirements to realistic values) reinforcement functions have become the critical components. Using a general definition for reinforcement functions, the authors solve, in a particular case, the so called exploration versus exploitation dilemma through the careful computation of the RF parameter values. They propose an algorithm to compute, during the exploration part of the learning phase, an estimate for the parameter values. Experiments with the mobile robot Nomad 200 validate their proposals.

  17. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning

    Science.gov (United States)

    Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ

    2014-01-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents towards action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggests possible explanations for how peers may motivate adolescent behavior. PMID:24550063

  18. Oxytocin enhances amygdala-dependent, socially reinforced learning and emotional empathy in humans.

    Science.gov (United States)

    Hurlemann, René; Patin, Alexandra; Onur, Oezguer A; Cohen, Michael X; Baumgartner, Tobias; Metzler, Sarah; Dziobek, Isabel; Gallinat, Juergen; Wagner, Michael; Maier, Wolfgang; Kendrick, Keith M

    2010-04-07

    Oxytocin (OT) is becoming increasingly established as a prosocial neuropeptide in humans with therapeutic potential in treatment of social, cognitive, and mood disorders. However, the potential of OT as a general facilitator of human learning and empathy is unclear. The current double-blind experiments on healthy adult male volunteers investigated first whether treatment with intranasal OT enhanced learning performance on a feedback-guided item-category association task where either social (smiling and angry faces) or nonsocial (green and red lights) reinforcers were used, and second whether it increased either cognitive or emotional empathy measured by the Multifaceted Empathy Test. Further experiments investigated whether OT-sensitive behavioral components required a normal functional amygdala. Results in control groups showed that learning performance was improved when social rather than nonsocial reinforcement was used. Intranasal OT potentiated this social reinforcement advantage and greatly increased emotional, but not cognitive, empathy in response to both positive and negative valence stimuli. Interestingly, after OT treatment, emotional empathy responses in men were raised to levels similar to those found in untreated women. Two patients with selective bilateral damage to the amygdala (monozygotic twins with congenital Urbach-Wiethe disease) were impaired on both OT-sensitive aspects of these learning and empathy tasks, but performed normally on nonsocially reinforced learning and cognitive empathy. Overall these findings provide the first demonstration that OT can facilitate amygdala-dependent, socially reinforced learning and emotional empathy in men.

  19. Learning Approach and Learning: Exploring a New Technological Learning System

    Science.gov (United States)

    Aflalo, Ester; Gabay, Eyal

    2013-01-01

    This study furthers the understanding of the connections between learning approaches and learning. The research population embraced 44 males from the Jewish ultraorthodox community, who abide by distinct methods of study. One group follows the very didactic, linear and structured approach of a methodical and gradual order, while the second group…

  20. Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints

    Science.gov (United States)

    Yang, Xiong; Liu, Derong; Wang, Ding

    2014-03-01

    In this paper, an adaptive reinforcement learning-based solution is developed for the infinite-horizon optimal control problem of constrained-input continuous-time nonlinear systems in the presence of nonlinearities with unknown structures. Two different types of neural networks (NNs) are employed to approximate the Hamilton-Jacobi-Bellman equation. That is, an recurrent NN is constructed to identify the unknown dynamical system, and two feedforward NNs are used as the actor and the critic to approximate the optimal control and the optimal cost, respectively. Based on this framework, the action NN and the critic NN are tuned simultaneously, without the requirement for the knowledge of system drift dynamics. Moreover, by using Lyapunov's direct method, the weights of the action NN and the critic NN are guaranteed to be uniformly ultimately bounded, while keeping the closed-loop system stable. To demonstrate the effectiveness of the present approach, simulation results are illustrated.

  1. Functional and structural connectivity underlying reinforcement learning in young and older adults

    NARCIS (Netherlands)

    van de Vijver, I.

    2016-01-01

    We base many of our choices on previous experiences. Learning from the results of our actions to improve our behavior and increase the resulting reward is called reinforcement learning. In this thesis, I investigated how brain areas communicate the necessity and implementation of such behavioral

  2. Predicting Pilot Behavior in Medium Scale Scenarios Using Game Theory and Reinforcement Learning

    Science.gov (United States)

    Yildiz, Yildiray; Agogino, Adrian; Brat, Guillaume

    2013-01-01

    Effective automation is critical in achieving the capacity and safety goals of the Next Generation Air Traffic System. Unfortunately creating integration and validation tools for such automation is difficult as the interactions between automation and their human counterparts is complex and unpredictable. This validation becomes even more difficult as we integrate wide-reaching technologies that affect the behavior of different decision makers in the system such as pilots, controllers and airlines. While overt short-term behavior changes can be explicitly modeled with traditional agent modeling systems, subtle behavior changes caused by the integration of new technologies may snowball into larger problems and be very hard to detect. To overcome these obstacles, we show how integration of new technologies can be validated by learning behavior models based on goals. In this framework, human participants are not modeled explicitly. Instead, their goals are modeled and through reinforcement learning their actions are predicted. The main advantage to this approach is that modeling is done within the context of the entire system allowing for accurate modeling of all participants as they interact as a whole. In addition such an approach allows for efficient trade studies and feasibility testing on a wide range of automation scenarios. The goal of this paper is to test that such an approach is feasible. To do this we implement this approach using a simple discrete-state learning system on a scenario where 50 aircraft need to self-navigate using Automatic Dependent Surveillance-Broadcast (ADS-B) information. In this scenario, we show how the approach can be used to predict the ability of pilots to adequately balance aircraft separation and fly efficient paths. We present results with several levels of complexity and airspace congestion.

  3. Application of reinforcement learning in cognitive radio networks: models and algorithms.

    Science.gov (United States)

    Yau, Kok-Lim Alvin; Poh, Geong-Sen; Chien, Su Fong; Al-Rawi, Hasan A A

    2014-01-01

    Cognitive radio (CR) enables unlicensed users to exploit the underutilized spectrum in licensed spectrum whilst minimizing interference to licensed users. Reinforcement learning (RL), which is an artificial intelligence approach, has been applied to enable each unlicensed user to observe and carry out optimal actions for performance enhancement in a wide range of schemes in CR, such as dynamic channel selection and channel sensing. This paper presents new discussions of RL in the context of CR networks. It provides an extensive review on how most schemes have been approached using the traditional and enhanced RL algorithms through state, action, and reward representations. Examples of the enhancements on RL, which do not appear in the traditional RL approach, are rules and cooperative learning. This paper also reviews performance enhancements brought about by the RL algorithms and open issues. This paper aims to establish a foundation in order to spark new research interests in this area. Our discussion has been presented in a tutorial manner so that it is comprehensive to readers outside the specialty of RL and CR.

  4. Application of Reinforcement Learning in Cognitive Radio Networks: Models and Algorithms

    Directory of Open Access Journals (Sweden)

    Kok-Lim Alvin Yau

    2014-01-01

    Full Text Available Cognitive radio (CR enables unlicensed users to exploit the underutilized spectrum in licensed spectrum whilst minimizing interference to licensed users. Reinforcement learning (RL, which is an artificial intelligence approach, has been applied to enable each unlicensed user to observe and carry out optimal actions for performance enhancement in a wide range of schemes in CR, such as dynamic channel selection and channel sensing. This paper presents new discussions of RL in the context of CR networks. It provides an extensive review on how most schemes have been approached using the traditional and enhanced RL algorithms through state, action, and reward representations. Examples of the enhancements on RL, which do not appear in the traditional RL approach, are rules and cooperative learning. This paper also reviews performance enhancements brought about by the RL algorithms and open issues. This paper aims to establish a foundation in order to spark new research interests in this area. Our discussion has been presented in a tutorial manner so that it is comprehensive to readers outside the specialty of RL and CR.

  5. Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models

    Directory of Open Access Journals (Sweden)

    Shamima Najnin

    2018-01-01

    Full Text Available Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs. In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The “novel words to novel objects” language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN, neural-fitted Q-network (NFQ, and deep Q-network (DQN]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1 with joint attention (Attentional models, and (2 with joint attention and prosodic cues (Attentional-prosodic models. Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task.

  6. Approaches to Learning to Control Dynamic Uncertainty

    Directory of Open Access Journals (Sweden)

    Magda Osman

    2015-10-01

    Full Text Available In dynamic environments, when faced with a choice of which learning strategy to adopt, do people choose to mostly explore (maximizing their long term gains or exploit (maximizing their short term gains? More to the point, how does this choice of learning strategy influence one’s later ability to control the environment? In the present study, we explore whether people’s self-reported learning strategies and levels of arousal (i.e., surprise, stress correspond to performance measures of controlling a Highly Uncertain or Moderately Uncertain dynamic environment. Generally, self-reports suggest a preference for exploring the environment to begin with. After which, those in the Highly Uncertain environment generally indicated they exploited more than those in the Moderately Uncertain environment; this difference did not impact on performance on later tests of people’s ability to control the dynamic environment. Levels of arousal were also differentially associated with the uncertainty of the environment. Going beyond behavioral data, our model of dynamic decision-making revealed that, in actual fact, there was no difference in exploitation levels between those in the highly uncertain or moderately uncertain environments, but there were differences based on sensitivity to negative reinforcement. We consider the implications of our findings with respect to learning and strategic approaches to controlling dynamic uncertainty.

  7. Application of Reinforcement Learning Algorithms for the Adaptive Computation of the Smoothing Parameter for Probabilistic Neural Network.

    Science.gov (United States)

    Kusy, Maciej; Zajdel, Roman

    2015-09-01

    In this paper, we propose new methods for the choice and adaptation of the smoothing parameter of the probabilistic neural network (PNN). These methods are based on three reinforcement learning algorithms: Q(0)-learning, Q(λ)-learning, and stateless Q-learning. We regard three types of PNN classifiers: the model that uses single smoothing parameter for the whole network, the model that utilizes single smoothing parameter for each data attribute, and the model that possesses the matrix of smoothing parameters different for each data variable and data class. Reinforcement learning is applied as the method of finding such a value of the smoothing parameter, which ensures the maximization of the prediction ability. PNN models with smoothing parameters computed according to the proposed algorithms are tested on eight databases by calculating the test error with the use of the cross validation procedure. The results are compared with state-of-the-art methods for PNN training published in the literature up to date and, additionally, with PNN whose sigma is determined by means of the conjugate gradient approach. The results demonstrate that the proposed approaches can be used as alternative PNN training procedures.

  8. EFFICIENT SPECTRUM UTILIZATION IN COGNITIVE RADIO THROUGH REINFORCEMENT LEARNING

    Directory of Open Access Journals (Sweden)

    Dhananjay Kumar

    2013-09-01

    Full Text Available Machine learning schemes can be employed in cognitive radio systems to intelligently locate the spectrum holes with some knowledge about the operating environment. In this paper, we formulate a variation of Actor Critic Learning algorithm known as Continuous Actor Critic Learning Automaton (CACLA and compare this scheme with Actor Critic Learning scheme and existing Q–learning scheme. Simulation results show that our CACLA scheme has lesser execution time and achieves higher throughput compared to other two schemes.

  9. Maximizing exposure therapy: an inhibitory learning approach.

    Science.gov (United States)

    Craske, Michelle G; Treanor, Michael; Conway, Christopher C; Zbozinek, Tomislav; Vervliet, Bram

    2014-07-01

    Exposure therapy is an effective approach for treating anxiety disorders, although a substantial number of individuals fail to benefit or experience a return of fear after treatment. Research suggests that anxious individuals show deficits in the mechanisms believed to underlie exposure therapy, such as inhibitory learning. Targeting these processes may help improve the efficacy of exposure-based procedures. Although evidence supports an inhibitory learning model of extinction, there has been little discussion of how to implement this model in clinical practice. The primary aim of this paper is to provide examples to clinicians for how to apply this model to optimize exposure therapy with anxious clients, in ways that distinguish it from a 'fear habituation' approach and 'belief disconfirmation' approach within standard cognitive-behavior therapy. Exposure optimization strategies include (1) expectancy violation, (2) deepened extinction, (3) occasional reinforced extinction, (4) removal of safety signals, (5) variability, (6) retrieval cues, (7) multiple contexts, and (8) affect labeling. Case studies illustrate methods of applying these techniques with a variety of anxiety disorders, including obsessive-compulsive disorder, posttraumatic stress disorder, social phobia, specific phobia, and panic disorder. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. Energy and Environmental Efficiency for the N-Ammonia Removal Process in Wastewater Treatment Plants by Means of Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Félix Hernández-del-Olmo

    2016-09-01

    Full Text Available Currently, energy and environmental efficiency are critical aspects in wastewater treatment plants (WWTPs. In fact, WWTPs are significant energy consumers, especially in the active sludge process (ASP for the N-ammonia removal. In this paper, we face the challenge of simultaneously improving the economic and environmental performance by using a reinforcement learning approach. This approach improves the costs of the N-ammonia removal process in the extended WWTP Benchmark Simulation Model 1 (BSM1. It also performs better than a manual plant operator when disturbances affect the plant. Satisfactory experimental results show significant savings in a year of a working BSM1 plant.

  11. Pareto Optimal Solutions for Network Defense Strategy Selection Simulator in Multi-Objective Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Yang Sun

    2018-01-01

    Full Text Available Using Pareto optimization in Multi-Objective Reinforcement Learning (MORL leads to better learning results for network defense games. This is particularly useful for network security agents, who must often balance several goals when choosing what action to take in defense of a network. If the defender knows his preferred reward distribution, the advantages of Pareto optimization can be retained by using a scalarization algorithm prior to the implementation of the MORL. In this paper, we simulate a network defense scenario by creating a multi-objective zero-sum game and using Pareto optimization and MORL to determine optimal solutions and compare those solutions to different scalarization approaches. We build a Pareto Defense Strategy Selection Simulator (PDSSS system for assisting network administrators on decision-making, specifically, on defense strategy selection, and the experiment results show that the Satisficing Trade-Off Method (STOM scalarization approach performs better than linear scalarization or GUESS method. The results of this paper can aid network security agents attempting to find an optimal defense policy for network security games.

  12. Neural correlates of reinforcement learning and social preferences in competitive bidding.

    Science.gov (United States)

    van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M

    2013-01-30

    In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.

  13. A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control

    Science.gov (United States)

    Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu

    This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.

  14. Learned helplessness and immunization: sensitivity to response-reinforcer independence in immunized rats.

    Science.gov (United States)

    Warren, D A; Rosellini, R A; Plonsky, M; DeCola, J P

    1985-10-01

    In experiments 1 and 2, we examined the learned helplessness and immunization effects using a test in which appetitive responding was extinguished by delivering noncontingent reinforcers. Contrary to learned helplessness theory, "immunized" animals showed performance virtually identical to that of animals exposed only to inescapable shock, and different from nonshocked controls. Experiment 2 suggests that the helplessness effect and the lack of immunization are not due to direct response suppression resulting from shock. In Experiment 3, where the immunization effect was assessed by measuring the acquisition of a response to obtain food when there was a positive response-reinforcer contingency, immunization was observed. These results cannot be explained on the basis of proactive interference, but suggest that animals exposed to the immunization procedure acquire an expectancy of response-reinforcer independence during inescapable shock. Thus, immunization effects may reflect the differential expression of expectancies, rather than their differential acquisition as learned helplessness theory postulates.

  15. Integration of reinforcement learning and optimal decision-making theories of the basal ganglia.

    Science.gov (United States)

    Bogacz, Rafal; Larsen, Tobias

    2011-04-01

    This article seeks to integrate two sets of theories describing action selection in the basal ganglia: reinforcement learning theories describing learning which actions to select to maximize reward and decision-making theories proposing that the basal ganglia selects actions on the basis of sensory evidence accumulated in the cortex. In particular, we present a model that integrates the actor-critic model of reinforcement learning and a model assuming that the cortico-basal-ganglia circuit implements a statistically optimal decision-making procedure. The values of cortico-striatal weights required for optimal decision making in our model differ from those provided by standard reinforcement learning models. Nevertheless, we show that an actor-critic model converges to the weights required for optimal decision making when biologically realistic limits on synaptic weights are introduced. We also describe the model's predictions concerning reaction times and neural responses during learning, and we discuss directions required for further integration of reinforcement learning and optimal decision-making theories.

  16. Nucleus accumbens core lesions retard instrumental learning and performance with delayed reinforcement in the rat

    Directory of Open Access Journals (Sweden)

    Cheung Timothy HC

    2005-02-01

    Full Text Available Abstract Background Delays between actions and their outcomes severely hinder reinforcement learning systems, but little is known of the neural mechanism by which animals overcome this problem and bridge such delays. The nucleus accumbens core (AcbC, part of the ventral striatum, is required for normal preference for a large, delayed reward over a small, immediate reward (self-controlled choice in rats, but the reason for this is unclear. We investigated the role of the AcbC in learning a free-operant instrumental response using delayed reinforcement, performance of a previously-learned response for delayed reinforcement, and assessment of the relative magnitudes of two different rewards. Results Groups of rats with excitotoxic or sham lesions of the AcbC acquired an instrumental response with different delays (0, 10, or 20 s between the lever-press response and reinforcer delivery. A second (inactive lever was also present, but responding on it was never reinforced. As expected, the delays retarded learning in normal rats. AcbC lesions did not hinder learning in the absence of delays, but AcbC-lesioned rats were impaired in learning when there was a delay, relative to sham-operated controls. All groups eventually acquired the response and discriminated the active lever from the inactive lever to some degree. Rats were subsequently trained to discriminate reinforcers of different magnitudes. AcbC-lesioned rats were more sensitive to differences in reinforcer magnitude than sham-operated controls, suggesting that the deficit in self-controlled choice previously observed in such rats was a consequence of reduced preference for delayed rewards relative to immediate rewards, not of reduced preference for large rewards relative to small rewards. AcbC lesions also impaired the performance of a previously-learned instrumental response in a delay-dependent fashion. Conclusions These results demonstrate that the AcbC contributes to instrumental learning

  17. Experiments with Online Reinforcement Learning in Real-Time Strategy Games

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-time strategy (RTS) games provide a challenging platform to implement online reinforcement learning (RL) techniques in a real application. Computer, as one game player, monitors opponents' (human or other computers) strategies and then updates its own policy using RL methods. In this article......, we first examine the suitability of applying the online RL in various computer games. Reinforcement learning application depends on both RL complexity and the game features. We then propose a multi-layer framework for implementing online RL in an RTS game. The framework significantly reduces RL...... the effectiveness of our proposed framework and shed light on relevant issues in using online RL in RTS games....

  18. Hippocampal lesions facilitate instrumental learning with delayed reinforcement but induce impulsive choice in rats

    Directory of Open Access Journals (Sweden)

    Cheung Timothy HC

    2005-05-01

    Full Text Available Abstract Background Animals must frequently act to influence the world even when the reinforcing outcomes of their actions are delayed. Learning with action-outcome delays is a complex problem, and little is known of the neural mechanisms that bridge such delays. When outcomes are delayed, they may be attributed to (or associated with the action that caused them, or mistakenly attributed to other stimuli, such as the environmental context. Consequently, animals that are poor at forming context-outcome associations might learn action-outcome associations better with delayed reinforcement than normal animals. The hippocampus contributes to the representation of environmental context, being required for aspects of contextual conditioning. We therefore hypothesized that animals with hippocampal lesions would be better than normal animals at learning to act on the basis of delayed reinforcement. We tested the ability of hippocampal-lesioned rats to learn a free-operant instrumental response using delayed reinforcement, and what is potentially a related ability – the ability to exhibit self-controlled choice, or to sacrifice an immediate, small reward in order to obtain a delayed but larger reward. Results Rats with sham or excitotoxic hippocampal lesions acquired an instrumental response with different delays (0, 10, or 20 s between the response and reinforcer delivery. These delays retarded learning in normal rats. Hippocampal-lesioned rats responded slightly less than sham-operated controls in the absence of delays, but they became better at learning (relative to shams as the delays increased; delays impaired learning less in hippocampal-lesioned rats than in shams. In contrast, lesioned rats exhibited impulsive choice, preferring an immediate, small reward to a delayed, larger reward, even though they preferred the large reward when it was not delayed. Conclusion These results support the view that the hippocampus hinders action-outcome learning

  19. The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive.

    Science.gov (United States)

    Otto, A Ross; Gershman, Samuel J; Markman, Arthur B; Daw, Nathaniel D

    2013-05-01

    A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls behavior-and under what circumstances-are still unclear. Following the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy. Further, we showed that, across trials, people negotiate the trade-off between the two systems dynamically as a function of concurrent executive-function demands, and people's choice latencies reflect the computational expenses of the strategy they employ. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.

  20. Mechanisms of hierarchical reinforcement learning in cortico-striatal circuits 2: evidence from fMRI.

    Science.gov (United States)

    Badre, David; Frank, Michael J

    2012-03-01

    The frontal lobes may be organized hierarchically such that more rostral frontal regions modulate cognitive control operations in caudal regions. In our companion paper (Frank MJ, Badre D. 2011. Mechanisms of hierarchical reinforcement learning in corticostriatal circuits I: computational analysis. 22:509-526), we provide novel neural circuit and algorithmic models of hierarchical cognitive control in cortico-striatal circuits. Here, we test key model predictions using functional magnetic resonance imaging (fMRI). Our neural circuit model proposes that contextual representations in rostral frontal cortex influence the striatal gating of contextual representations in caudal frontal cortex. Reinforcement learning operates at each level, such that the system adaptively learns to gate higher order contextual information into rostral regions. Our algorithmic Bayesian "mixture of experts" model captures the key computations of this neural model and provides trial-by-trial estimates of the learner's latent hypothesis states. In the present paper, we used these quantitative estimates to reanalyze fMRI data from a hierarchical reinforcement learning task reported in Badre D, Kayser AS, D'Esposito M. 2010. Frontal cortex and the discovery of abstract action rules. Neuron. 66:315--326. Results validate key predictions of the models and provide evidence for an individual cortico-striatal circuit for reinforcement learning of hierarchical structure at a specific level of policy abstraction. These findings are initially consistent with the proposal that hierarchical control in frontal cortex may emerge from interactions among nested cortico-striatal circuits at different levels of abstraction.

  1. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Lucas Kastner

    2017-10-01

    Full Text Available Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1 Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2 Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3 Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.

  2. Adaptive internal state space construction method for reinforcement learning of a real-world agent.

    Science.gov (United States)

    Samejima, K; Omori, T

    1999-10-01

    One of the difficulties encountered in the application of the reinforcement learning to real-world problems is the construction of a discrete state space from a continuous sensory input signal. In the absence of a priori knowledge about the task, a straightforward approach to this problem is to discretize the input space into a grid, and to use a lookup table. However, this method suffers from the curse of dimensionality. Some studies use continuous function approximators such as neural networks instead of lookup tables. However, when global basis functions such as sigmoid functions are used, convergence cannot be guaranteed. To overcome this problem, we propose a method in which local basis functions are incrementally assigned depending on the task requirement. Initially, only one basis function is allocated over the entire space. The basis function is divided according to the statistical property of locally weighted temporal difference error (TD error) of the value function. We applied this method to an autonomous robot collision avoidance problem, and evaluated the validity of the algorithm in simulation. The proposed algorithm, which we call adaptive basis division (ABD) algorithm, achieved the task using a smaller number of basis functions than the conventional methods. Moreover, we applied the method to a goal-directed navigation problem of a real mobile robot. The action strategy was learned using a database of sensor data, and it was then used for navigation of a real machine. The robot reached the goal using a smaller number of internal states than with the conventional methods.

  3. A clustering-based graph Laplacian framework for value function approximation in reinforcement learning.

    Science.gov (United States)

    Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold

    2014-12-01

    In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.

  4. Selective learning impairment of delayed reinforcement autoshaped behavior caused by low doses of trimethyltin.

    Science.gov (United States)

    Cohen, C A; Messing, R B; Sparber, S B

    1987-01-01

    The organometal neurotoxin trimethyltin (TMT), induces impaired learning and memory for various tasks. However, administration is also associated with other "non-specific" behavioral changes which may be responsible for effects on conditioned behaviors. To determine if TMT treatment causes a specific learning impairment, three experiments were done using variations of a delay of reinforcement autoshaping task in which rats learn to associate the presentation and retraction of a lever with the delivery of a food pellet reinforcer. No significant effects of TMT treatment were found with a short (4 s) delay of reinforcement, indicating that rats were motivated and had the sensorimotor capacity for learning. When the delay was increased to 6 s, 3.0 or 6.0 mg TMT/kg produced dose-related reductions in behaviors directed towards the lever. Performance of a group given 7.5 mg TMT/kg, while still impaired relative to controls, appeared to be better than the performance of groups given lower doses. This paradoxical effect was investigated with a latent inhibition paradigm, in which rats were pre-exposed to the Skinner boxes for several sessions without delivery of food reinforcement. Control rats showed retardation of autoshaping when food reinforcement was subsequently introduced. Rats given 7.5 mg TMT/kg exhibited elevated levels of lever responding during pre-exposure and autoshaping sessions. The results indicate that 7.5 mg TMT/kg produces learning impairments which are confounded by hyperreactivity to the environment and an inability to suppress behavior toward irrelevant stimuli. In contrast, low doses of TMT cause learning impairments which are not confounded by hyperreactivity, and may prove to be useful models for studying specific associational dysfunctions.

  5. Reinforcement function design and bias for efficient learning in mobile robots

    Energy Technology Data Exchange (ETDEWEB)

    Touzet, C. [Oak Ridge National Lab., TN (United States). Computer Science and Mathematics Div.; Santos, J.M. [Univ. of Buenos Aires (Argentina). Dept. Computacion

    1998-06-01

    The main paradigm in sub-symbolic learning robot domain is the reinforcement learning method. Various techniques have been developed to deal with the memorization/generalization problem, demonstrating the superior ability of artificial neural network implementations. In this paper, the authors address the issue of designing the reinforcement so as to optimize the exploration part of the learning. They also present and summarize works relative to the use of bias intended to achieve the effective synthesis of the desired behavior. Demonstrative experiments involving a self-organizing map implementation of the Q-learning and real mobile robots (Nomad 200 and Khepera) in a task of obstacle avoidance behavior synthesis are described. 3 figs., 5 tabs.

  6. Learning Agent for a Heat-Pump Thermostat with a Set-Back Strategy Using Model-Free Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Frederik Ruelens

    2015-08-01

    Full Text Available The conventional control paradigm for a heat pump with a less efficient auxiliary heating element is to keep its temperature set point constant during the day. This constant temperature set point ensures that the heat pump operates in its more efficient heat-pump mode and minimizes the risk of activating the less efficient auxiliary heating element. As an alternative to a constant set-point strategy, this paper proposes a learning agent for a thermostat with a set-back strategy. This set-back strategy relaxes the set-point temperature during convenient moments, e.g., when the occupants are not at home. Finding an optimal set-back strategy requires solving a sequential decision-making process under uncertainty, which presents two challenges. The first challenge is that for most residential buildings, a description of the thermal characteristics of the building is unavailable and challenging to obtain. The second challenge is that the relevant information on the state, i.e., the building envelope, cannot be measured by the learning agent. In order to overcome these two challenges, our paper proposes an auto-encoder coupled with a batch reinforcement learning technique. The proposed approach is validated for two building types with different thermal characteristics for heating in the winter and cooling in the summer. The simulation results indicate that the proposed learning agent can reduce the energy consumption by 4%–9% during 100 winter days and by 9%–11% during 80 summer days compared to the conventional constant set-point strategy.

  7. IMPLEMENTATION OF MULTIAGENT REINFORCEMENT LEARNING MECHANISM FOR OPTIMAL ISLANDING OPERATION OF DISTRIBUTION NETWORK

    DEFF Research Database (Denmark)

    Saleem, Arshad; Lind, Morten

    2008-01-01

    among electric power utilities to utilize modern information and communication technologies (ICT) in order to improve the automation of the distribution system. In this paper we present our work for the implementation of a dynamic multi-agent based distributed reinforcement learning mechanism...

  8. Life Span Differences in Electrophysiological Correlates of Monitoring Gains and Losses during Probabilistic Reinforcement Learning

    Science.gov (United States)

    Hammerer, Dorothea; Li, Shu-Chen; Muller, Viktor; Lindenberger, Ulman

    2011-01-01

    By recording the feedback-related negativity (FRN) in response to gains and losses, we investigated the contribution of outcome monitoring mechanisms to age-associated differences in probabilistic reinforcement learning. Specifically, we assessed the difference of the monitoring reactions to gains and losses to investigate the monitoring of…

  9. Bayes factors for reinforcement-learning models of the Iowa Gambling Task

    NARCIS (Netherlands)

    Steingroever, H.; Wetzels, R.; Wagenmakers, E.-J.

    2016-01-01

    The psychological processes that underlie performance on the Iowa gambling task (IGT) are often isolated with the help of reinforcement-learning (RL) models. The most popular method to compare RL models is the BIC post hoc fit criterion—a criterion that considers goodness-of-fit relative to model

  10. Improved deep reinforcement learning for robotics through distribution-based experience retention

    NARCIS (Netherlands)

    de Bruin, T.D.; Kober, J.; Tuyls, K.P.; Babuska, R.; Kwon, Dong-Soo; Kang, Chul-Goo; Suh, Il Hong

    2016-01-01

    Recent years have seen a growing interest in the use of deep neural networks as function approximators in reinforcement learning. In this paper, an experience replay method is proposed that ensures that the distribution of the experiences used for training is between that of the policy and a uniform

  11. Complexity of stochastic branch and bound for belief tree search in Bayesian reinforcement learning

    NARCIS (Netherlands)

    Dimitrakakis, C.

    2009-01-01

    There has been a lot of recent work on Bayesian methods for reinforcement learning exhibiting near-optimal online performance. The main obstacle facing such methods is that in most problems of interest, the optimal solution involves planning in an infinitely large tree. However, it is possible to

  12. Your Classroom as an Experiment in Education: The Reinforcement Theory of Learning

    Science.gov (United States)

    Fuller, Robert G.

    1976-01-01

    Presents the reinforcement theory of learning and explains how it relates to the Keller Plan of instruction. Advocates the use of the Keller Plan as an alternative to the lecture-demonstration system and provides an annotated bibliography of pertinent material. (GS)

  13. SMART (Stochastic Model Acquisition with ReinforcemenT) learning agents: A preliminary report

    OpenAIRE

    Child, C. H. T.; Stathis, K.

    2005-01-01

    We present a framework for building agents that learn using SMART, a system that combines stochastic model acquisition with reinforcement learning to enable an agent to model its environment through experience and subsequently form action selection policies using the acquired model. We extend an existing algorithm for automatic creation of stochastic strips operators [9] as a preliminary method of environment modelling. We then define the process of generation of future states using these ope...

  14. Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments.

    Science.gov (United States)

    Leong, Yuan Chang; Radulescu, Angela; Daniel, Reka; DeWoskin, Vivian; Niv, Yael

    2017-01-18

    Little is known about the relationship between attention and learning during decision making. Using eye tracking and multivariate pattern analysis of fMRI data, we measured participants' dimensional attention as they performed a trial-and-error learning task in which only one of three stimulus dimensions was relevant for reward at any given time. Analysis of participants' choices revealed that attention biased both value computation during choice and value update during learning. Value signals in the ventromedial prefrontal cortex and prediction errors in the striatum were similarly biased by attention. In turn, participants' focus of attention was dynamically modulated by ongoing learning. Attentional switches across dimensions correlated with activity in a frontoparietal attention network, which showed enhanced connectivity with the ventromedial prefrontal cortex between switches. Our results suggest a bidirectional interaction between attention and learning: attention constrains learning to relevant dimensions of the environment, while we learn what to attend to via trial and error. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. The wonder approach to learning

    Directory of Open Access Journals (Sweden)

    Catherine eL'Ecuyer

    2014-10-01

    Full Text Available Wonder, innate in the child, is an inner desire to learn that awaits reality in order to be awakened. Wonder is at the origin of reality-based consciousness, thus of learning. The scope of wonder, which occurs at a metaphysical level, is greater than that of curiosity. Unfortunate misinterpretations of neuroscience have led to false brain-based ideas in the field of education, all of these based on the scientifically wrong assumption that children’s learning depends on an enriched environment. These beliefs have re-enforced the Behaviorist Approach to education and to parenting and have contributed to deadening our children’s sense of wonder. We suggest wonder as the center of all motivation and action in the child. Wonder is what makes life genuinely personal. Beauty is what triggers wonder. Wonder attunes to beauty through sensitivity and is unfolded by attachment. When wonder, beauty, sensitivity and secure attachment are present, learning is meaningful.On the contrary, when there is no volitional dimension involved (no wonder, no end or meaning (no beauty and no trusting predisposition (secure attachment, the rigid and limiting mechanical process of so-called learning through mere repetition become a deadening and alienating routine. This could be described as training, not as learning, because it does not contemplate the human being as a whole.

  16. Protecting against evaluation overfitting in empirical reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Tanner, B.; Taylor, M.E.; Stone, P.

    2011-01-01

    Empirical evaluations play an important role in machine learning. However, the usefulness of any evaluation depends on the empirical methodology employed. Designing good empirical methodologies is difficult in part because agents can overfit test evaluations and thereby obtain misleadingly high

  17. Cerebellar and prefrontal cortex contributions to adaptation, strategies, and reinforcement learning.

    Science.gov (United States)

    Taylor, Jordan A; Ivry, Richard B

    2014-01-01

    Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. © 2014 Elsevier B.V. All rights reserved.

  18. Cerebellar and Prefrontal Cortex Contributions to Adaptation, Strategies, and Reinforcement Learning

    Science.gov (United States)

    Taylor, Jordan A.; Ivry, Richard B.

    2014-01-01

    Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. PMID:24916295

  19. Reinforced Concrete Finite Element Modeling based on the Discrete Crack Approach

    Directory of Open Access Journals (Sweden)

    Sri Tudjono

    2016-09-01

    Full Text Available The behavior of reinforced concrete elements is complex due to the nature of the concrete that is weak in tension. Among these complex issues are the initial cracking and crack propagation of concrete, and the bond-slip phenomenon between the concrete and reinforcing steel. Laboratory tested specimens are not only costly, but are limited in number. Therefore a finite element analysis is favored in combination to experimental data. The finite element technique involving the cracks inserting is one of the approaches to study the behavior of reinforced concrete structures through numerical simulation. In finite element modeling, the cracks can be represented by either smeared or discrete crack. The discrete crack method has its potential to include strain discontinuity within the structure. A finite element model (FEM including the concrete cracking and the bond-slip was developed to simulate the nonlinear response of reinforced concrete structures.

  20. Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing.

    Science.gov (United States)

    Palminteri, Stefano; Lefebvre, Germain; Kilford, Emma J; Blakemore, Sarah-Jayne

    2017-08-01

    Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice.

  1. Object habituation in horses: The effect of voluntary vs. negatively reinforced approach to frightening stimuli

    DEFF Research Database (Denmark)

    Christensen, Janne Winther

    2013-01-01

    Reasons for performing the study: The ability of horses to habituate to novel objects influences safety in the horse–human relationship. However, the effectiveness of different habituation techniques has not been investigated in detail. Objectives: 1) To investigate whether horses show increased...... stress responses when negatively reinforced to approach novel objects, compared with horses allowed to voluntarily explore the objects and 2) whether a negatively reinforced approach facilitates object habituation. Methods: Twenty-two 2–3-year-old Danish Warmblood geldings were included. Half...... of the horses (NR group) were negatively reinforced by a familiar human handler to approach a collection of novel objects in a test arena. The other half were individually released in the arena and were free to explore the objects (VOL group). On the next day, the horses were exposed to the objects again...

  2. A model of reward choice based on the theory of reinforcement learning.

    Science.gov (United States)

    Smirnitskaya, I A; Frolov, A A; Merzhanova, G Kh

    2008-03-01

    A model explaining behavioral "impulsivity" and "self-control" is proposed on the basis of the theory of reinforcement learning. The discount coefficient gamma, which in this theory accounts for the subjective reduction in the value of a delayed reinforcement, is identified with the overall level of dopaminergic neuron activity which, according to published data, also determines the behavioral variant. Computer modeling showed that high values of gamma are characteristic of predominantly "self-controlled" subjects, while smaller values of gamma are characteristic of "impulsive" subjects.

  3. Stress affects instrumental learning based on positive or negative reinforcement in interaction with personality in domestic horses.

    Science.gov (United States)

    Valenchon, Mathilde; Lévy, Frédéric; Moussu, Chantal; Lansade, Léa

    2017-01-01

    The present study investigated how stress affects instrumental learning performance in horses (Equus caballus) depending on the type of reinforcement. Horses were assigned to four groups (N = 15 per group); each group received training with negative or positive reinforcement in the presence or absence of stressors unrelated to the learning task. The instrumental learning task consisted of the horse entering one of two compartments at the appearance of a visual signal given by the experimenter. In the absence of stressors unrelated to the task, learning performance did not differ between negative and positive reinforcements. The presence of stressors unrelated to the task (exposure to novel and sudden stimuli) impaired learning performance. Interestingly, this learning deficit was smaller when the negative reinforcement was used. The negative reinforcement, considered as a stressor related to the task, could have counterbalanced the impact of the extrinsic stressor by focusing attention toward the learning task. In addition, learning performance appears to differ between certain dimensions of personality depending on the presence of stressors and the type of reinforcement. These results suggest that when negative reinforcement is used (i.e. stressor related to the task), the most fearful horses may be the best performers in the absence of stressors but the worst performers when stressors are present. On the contrary, when positive reinforcement is used, the most fearful horses appear to be consistently the worst performers, with and without exposure to stressors unrelated to the learning task. This study is the first to demonstrate in ungulates that stress affects learning performance differentially according to the type of reinforcement and in interaction with personality. It provides fundamental and applied perspectives in the understanding of the relationships between personality and training abilities.

  4. Stress affects instrumental learning based on positive or negative reinforcement in interaction with personality in domestic horses.

    Directory of Open Access Journals (Sweden)

    Mathilde Valenchon

    Full Text Available The present study investigated how stress affects instrumental learning performance in horses (Equus caballus depending on the type of reinforcement. Horses were assigned to four groups (N = 15 per group; each group received training with negative or positive reinforcement in the presence or absence of stressors unrelated to the learning task. The instrumental learning task consisted of the horse entering one of two compartments at the appearance of a visual signal given by the experimenter. In the absence of stressors unrelated to the task, learning performance did not differ between negative and positive reinforcements. The presence of stressors unrelated to the task (exposure to novel and sudden stimuli impaired learning performance. Interestingly, this learning deficit was smaller when the negative reinforcement was used. The negative reinforcement, considered as a stressor related to the task, could have counterbalanced the impact of the extrinsic stressor by focusing attention toward the learning task. In addition, learning performance appears to differ between certain dimensions of personality depending on the presence of stressors and the type of reinforcement. These results suggest that when negative reinforcement is used (i.e. stressor related to the task, the most fearful horses may be the best performers in the absence of stressors but the worst performers when stressors are present. On the contrary, when positive reinforcement is used, the most fearful horses appear to be consistently the worst performers, with and without exposure to stressors unrelated to the learning task. This study is the first to demonstrate in ungulates that stress affects learning performance differentially according to the type of reinforcement and in interaction with personality. It provides fundamental and applied perspectives in the understanding of the relationships between personality and training abilities.

  5. Stress enhances model-free reinforcement learning only after negative outcome.

    Science.gov (United States)

    Park, Heyeon; Lee, Daeyeol; Chey, Jeanyung

    2017-01-01

    Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i.e., how quickly information from the new environment is incorporated into choices. Participants were randomly assigned to either a stress or a control condition, and performed a two-stage Markov decision-making task in which the reward probabilities underwent periodic reversals without notice. We found that stress increased the contribution of model-free reinforcement learning only after negative outcome. Furthermore, stress decreased the learning rate. The results suggest that stress diminishes one's ability to make adaptive choices in multiple aspects of reinforcement learning. This finding has implications for understanding how stress facilitates maladaptive habits, such as addictive behavior, and other dysfunctional behaviors associated with stress in clinical and educational contexts.

  6. DYNAMIC AND INCREMENTAL EXPLORATION STRATEGY IN FUSION ADAPTIVE RESONANCE THEORY FOR ONLINE REINFORCEMENT LEARNING

    Directory of Open Access Journals (Sweden)

    Budhitama Subagdja

    2016-06-01

    Full Text Available One of the fundamental challenges in reinforcement learning is to setup a proper balance between exploration and exploitation to obtain the maximum cummulative reward in the long run. Most protocols for exploration bound the overall values to a convergent level of performance. If new knowledge is inserted or the environment is suddenly changed, the issue becomes more intricate as the exploration must compromise the pre-existing knowledge. This paper presents a type of multi-channel adaptive resonance theory (ART neural network model called fusion ART which serves as a fuzzy approximator for reinforcement learning with inherent features that can regulate the exploration strategy. This intrinsic regulation is driven by the condition of the knowledge learnt so far by the agent. The model offers a stable but incremental reinforcement learning that can involve prior rules as bootstrap knowledge for guiding the agent to select the right action. Experiments in obstacle avoidance and navigation tasks demonstrate that in the configuration of learning wherein the agent learns from scratch, the inherent exploration model in fusion ART model is comparable to the basic E-greedy policy. On the other hand, the model is demonstrated to deal with prior knowledge and strike a balance between exploration and exploitation.

  7. A reinforcement learning mechanism responsible for the valuation of free choice.

    Science.gov (United States)

    Cockburn, Jeffrey; Collins, Anne G E; Frank, Michael J

    2014-08-06

    Humans exhibit a preference for options they have freely chosen over equally valued options they have not; however, the neural mechanism that drives this bias and its functional significance have yet to be identified. Here, we propose a model in which choice biases arise due to amplified positive reward prediction errors associated with free choice. Using a novel variant of a probabilistic learning task, we show that choice biases are selective to options that are predominantly associated with positive outcomes. A polymorphism in DARPP-32, a gene linked to dopaminergic striatal plasticity and individual differences in reinforcement learning, was found to predict the effect of choice as a function of value. We propose that these choice biases are the behavioral byproduct of a credit assignment mechanism responsible for ensuring the effective delivery of dopaminergic reinforcement learning signals broadcast to the striatum. Copyright © 2014 Elsevier Inc. All rights reserved.

  8. What Can Reinforcement Learning Teach Us About Non-Equilibrium Quantum Dynamics

    Science.gov (United States)

    Bukov, Marin; Day, Alexandre; Sels, Dries; Weinberg, Phillip; Polkovnikov, Anatoli; Mehta, Pankaj

    Equilibrium thermodynamics and statistical physics are the building blocks of modern science and technology. Yet, our understanding of thermodynamic processes away from equilibrium is largely missing. In this talk, I will reveal the potential of what artificial intelligence can teach us about the complex behaviour of non-equilibrium systems. Specifically, I will discuss the problem of finding optimal drive protocols to prepare a desired target state in quantum mechanical systems by applying ideas from Reinforcement Learning [one can think of Reinforcement Learning as the study of how an agent (e.g. a robot) can learn and perfect a given policy through interactions with an environment.]. The driving protocols learnt by our agent suggest that the non-equilibrium world features possibilities easily defying intuition based on equilibrium physics.

  9. A cholinergic feedback circuit to regulate striatal population uncertainty and optimize reinforcement learning.

    Science.gov (United States)

    Franklin, Nicholas T; Frank, Michael J

    2015-12-25

    Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments.

  10. Homeostatic reinforcement learning for integrating reward collection and physiological stability.

    Science.gov (United States)

    Keramati, Mehdi; Gutkin, Boris

    2014-12-02

    Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated by internal states. Within this framework, we mathematically prove that seeking rewards is equivalent to the fundamental objective of physiological stability, defining the notion of physiological rationality of behavior. We further suggest a formal basis for temporal discounting of rewards by showing that discounting motivates animals to follow the shortest path in the space of physiological variables toward the desired setpoint. We also explain how animals learn to act predictively to preclude prospective homeostatic challenges, and several other behavioral patterns. Finally, we suggest a computational role for interaction between hypothalamus and the brain reward system.

  11. Reinforcement Learning for Constrained Energy Trading Games With Incomplete Information.

    Science.gov (United States)

    Wang, Huiwei; Huang, Tingwen; Liao, Xiaofeng; Abu-Rub, Haitham; Chen, Guo

    2017-10-01

    This paper considers the problem of designing adaptive learning algorithms to seek the Nash equilibrium (NE) of the constrained energy trading game among individually strategic players with incomplete information. In this game, each player uses the learning automaton scheme to generate the action probability distribution based on his/her private information for maximizing his own averaged utility. It is shown that if one of admissible mixed-strategies converges to the NE with probability one, then the averaged utility and trading quantity almost surely converge to their expected ones, respectively. For the given discontinuous pricing function, the utility function has already been proved to be upper semicontinuous and payoff secure which guarantee the existence of the mixed-strategy NE. By the strict diagonal concavity of the regularized Lagrange function, the uniqueness of NE is also guaranteed. Finally, an adaptive learning algorithm is provided to generate the strategy probability distribution for seeking the mixed-strategy NE.

  12. Robot-assisted motor training: assistance decreases exploration during reinforcement learning.

    Science.gov (United States)

    Sans-Muntadas, Albert; Duarte, Jaime E; Reinkensmeyer, David J

    2014-01-01

    Reinforcement learning (RL) is a form of motor learning that robotic therapy devices could potentially manipulate to promote neurorehabilitation. We developed a system that requires trainees to use RL to learn a predefined target movement. The system provides higher rewards for movements that are more similar to the target movement. We also developed a novel algorithm that rewards trainees of different abilities with comparable reward sizes. This algorithm measures a trainee's performance relative to their best performance, rather than relative to an absolute target performance, to determine reward. We hypothesized this algorithm would permit subjects who cannot normally achieve high reward levels to do so while still learning. In an experiment with 21 unimpaired human subjects, we found that all subjects quickly learned to make a first target movement with and without the reward equalization. However, artificially increasing reward decreased the subjects' tendency to engage in exploration and therefore slowed learning, particularly when we changed the target movement. An anti-slacking watchdog algorithm further slowed learning. These results suggest that robotic algorithms that assist trainees in achieving rewards or in preventing slacking might, over time, discourage the exploration needed for reinforcement learning.

  13. TEXPLORE temporal difference reinforcement learning for robots and time-constrained domains

    CERN Document Server

    Hester, Todd

    2013-01-01

    This book presents and develops new reinforcement learning methods that enable fast and robust learning on robots in real-time. Robots have the potential to solve many problems in society, because of their ability to work in dangerous places doing necessary jobs that no one wants or is able to do. One barrier to their widespread deployment is that they are mainly limited to tasks where it is possible to hand-program behaviors for every situation that may be encountered. For robots to meet their potential, they need methods that enable them to learn and adapt to novel situations that they were not programmed for. Reinforcement learning (RL) is a paradigm for learning sequential decision making processes and could solve the problems of learning and adaptation on robots. This book identifies four key challenges that must be addressed for an RL algorithm to be practical for robotic control tasks. These RL for Robotics Challenges are: 1) it must learn in very few samples; 2) it must learn in domains with continuou...

  14. The effects of aging on the interaction between reinforcement learning and attention.

    Science.gov (United States)

    Radulescu, Angela; Daniel, Reka; Niv, Yael

    2016-11-01

    Reinforcement learning (RL) in complex environments relies on selective attention to uncover those aspects of the environment that are most predictive of reward. Whereas previous work has focused on age-related changes in RL, it is not known whether older adults learn differently from younger adults when selective attention is required. In 2 experiments, we examined how aging affects the interaction between RL and selective attention. Younger and older adults performed a learning task in which only 1 stimulus dimension was relevant to predicting reward, and within it, 1 "target" feature was the most rewarding. Participants had to discover this target feature through trial and error. In Experiment 1, stimuli varied on 1 or 3 dimensions and participants received hints that revealed the target feature, the relevant dimension, or gave no information. Group-related differences in accuracy and RTs differed systematically as a function of the number of dimensions and the type of hint available. In Experiment 2 we used trial-by-trial computational modeling of the learning process to test for age-related differences in learning strategies. Behavior of both young and older adults was explained well by a reinforcement-learning model that uses selective attention to constrain learning. However, the model suggested that older adults restricted their learning to fewer features, employing more focused attention than younger adults. Furthermore, this difference in strategy predicted age-related deficits in accuracy. We discuss these results suggesting that a narrower filter of attention may reflect an adaptation to the reduced capabilities of the reinforcement learning system. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  15. Approaching work and learning indirectly

    OpenAIRE

    Macintyre, Ronald; Thomson, Joan

    2013-01-01

    This paper approaches work and learning indirectly. It is part of a mixed method longitudinal study that looks at the articulation from Higher National (HN) to part time Higher Education (HE). Since 2003 the Open University in Scotland (OU) has been collecting quantitative and qualitative data on students who have HN qualifications. This paper looks at the experiences of those students and the central place of work in those journeys. \\ud \\ud The paper beings with a statistical overview. HN St...

  16. Reinforcement learning deficits in people with schizophrenia persist after extended trials.

    Science.gov (United States)

    Cicero, David C; Martin, Elizabeth A; Becker, Theresa M; Kerns, John G

    2014-12-30

    Previous research suggests that people with schizophrenia have difficulty learning from positive feedback and when learning needs to occur rapidly. However, they seem to have relatively intact learning from negative feedback when learning occurs gradually. Participants are typically given a limited amount of acquisition trials to learn the reward contingencies and then tested about what they learned. The current study examined whether participants with schizophrenia continue to display these deficits when given extra time to learn the contingences. Participants with schizophrenia and matched healthy controls completed the Probabilistic Selection Task, which measures positive and negative feedback learning separately. Participants with schizophrenia showed a deficit in learning from both positive feedback and negative feedback. These reward learning deficits persisted even if people with schizophrenia are given extra time (up to 10 blocks of 60 trials) to learn the reward contingencies. These results suggest that the observed deficits cannot be attributed solely to slower learning and instead reflect a specific deficit in reinforcement learning. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  17. Blended Learning: An Innovative Approach

    Science.gov (United States)

    Lalima; Dangwal, Kiran Lata

    2017-01-01

    Blended learning is an innovative concept that embraces the advantages of both traditional teaching in the classroom and ICT supported learning including both offline learning and online learning. It has scope for collaborative learning; constructive learning and computer assisted learning (CAI). Blended learning needs rigorous efforts, right…

  18. Instructional control of reinforcement learning: A behavioral and neurocomputational investigation

    NARCIS (Netherlands)

    Doll, B.B.; Jacobs, W.J.; Sanfey, A.G.; Frank, M.J.

    2009-01-01

    Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S (Ed) 1989. Rule-governed behavior:

  19. Oculomotor learning revisited: a model of reinforcement learning in the basal ganglia incorporating an efference copy of motor actions

    Directory of Open Access Journals (Sweden)

    Michale Sean Fee

    2012-06-01

    Full Text Available In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current ‘time’ in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually-guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources.

  20. Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning.

    Science.gov (United States)

    Sidarta, Ananda; Vahdat, Shahabeddin; Bernardi, Nicolò F; Ostry, David J

    2016-11-16

    As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration. In the initial stages of motor learning, the placement of the limbs is learned primarily through trial and error. In an experimental analog, participants make reaching movements to a hidden target and receive positive

  1. Homeostatic reinforcement learning for integrating reward collection and physiological stability

    OpenAIRE

    Keramati, Mehdi; Gutkin, Boris

    2014-01-01

    eLife digest Our survival depends on our ability to maintain internal states, such as body temperature and blood sugar levels, within narrowly defined ranges, despite being subject to constantly changing external forces. This process, which is known as homeostasis, requires humans and other animals to carry out specific behaviors?such as seeking out warmth or food?to compensate for changes in their environment. Animals must also learn to prevent the potential impact of changes that can be ant...

  2. Stress Modulates Reinforcement Learning in Younger and Older Adults

    OpenAIRE

    Lighthall, Nichole R.; Gorlick, Marissa A.; Schoeke, Andrej; Frank, Michael J.; Mather, Mara

    2012-01-01

    Animal research and human neuroimaging studies indicate that stress increases dopamine levels in brain regions involved in reward processing and stress also appears to increase the attractiveness of addictive drugs. The current study tested the hypothesis that stress increases reward salience, leading to more effective learning about positive than negative outcomes in a probabilistic selection task. Changes to dopamine pathways with age raise the question of whether stress effects on incentiv...

  3. An Integrated Model of Associative and Reinforcement Learning

    Science.gov (United States)

    2012-08-01

    from robotics (Peters, Vijayakumar, & Schaal, 2003) and artificial intelli- gence (Russell & Norvig, 1995) to cognitive architectures (Fu & Anderson...cycles are less expensive than actions (e.g. robotics ). One of the limitations of RL as a complete model of human decision-making becomes apparent in...1948). For example, Blodgett (1929) ran three groups of rats in a maze -learning experiment. One group (the control) was re- warded upon reaching the

  4. Homeostatic reinforcement learning for integrating reward collection and physiological stability

    Science.gov (United States)

    Keramati, Mehdi; Gutkin, Boris

    2014-01-01

    Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated by internal states. Within this framework, we mathematically prove that seeking rewards is equivalent to the fundamental objective of physiological stability, defining the notion of physiological rationality of behavior. We further suggest a formal basis for temporal discounting of rewards by showing that discounting motivates animals to follow the shortest path in the space of physiological variables toward the desired setpoint. We also explain how animals learn to act predictively to preclude prospective homeostatic challenges, and several other behavioral patterns. Finally, we suggest a computational role for interaction between hypothalamus and the brain reward system. DOI: http://dx.doi.org/10.7554/eLife.04811.001 PMID:25457346

  5. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia.

    Science.gov (United States)

    Markou, Athina; Salamone, John D; Bussey, Timothy J; Mar, Adam C; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-11-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu) meeting. A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. Copyright © 2013 Elsevier

  6. Goal-Directed and Habit-Like Modulations of Stimulus Processing during Reinforcement Learning.

    Science.gov (United States)

    Luque, David; Beesley, Tom; Morris, Richard W; Jack, Bradley N; Griffiths, Oren; Whitford, Thomas J; Le Pelley, Mike E

    2017-03-15

    Recent research has shown that perceptual processing of stimuli previously associated with high-value rewards is automatically prioritized even when rewards are no longer available. It has been hypothesized that such reward-related modulation of stimulus salience is conceptually similar to an "attentional habit." Recording event-related potentials in humans during a reinforcement learning task, we show strong evidence in favor of this hypothesis. Resistance to outcome devaluation (the defining feature of a habit) was shown by the stimulus-locked P1 component, reflecting activity in the extrastriate visual cortex. Analysis at longer latencies revealed a positive component (corresponding to the P3b, from 550-700 ms) sensitive to outcome devaluation. Therefore, distinct spatiotemporal patterns of brain activity were observed corresponding to habitual and goal-directed processes. These results demonstrate that reinforcement learning engages both attentional habits and goal-directed processes in parallel. Consequences for brain and computational models of reinforcement learning are discussed. SIGNIFICANCE STATEMENT The human attentional network adapts to detect stimuli that predict important rewards. A recent hypothesis suggests that the visual cortex automatically prioritizes reward-related stimuli, driven by cached representations of reward value; that is, stimulus-response habits. Alternatively, the neural system may track the current value of the predicted outcome. Our results demonstrate for the first time that visual cortex activity is increased for reward-related stimuli even when the rewarding event is temporarily devalued. In contrast, longer-latency brain activity was specifically sensitive to transient changes in reward value. Therefore, we show that both habit-like attention and goal-directed processes occur in the same learning episode at different latencies. This result has important consequences for computational models of reinforcement learning

  7. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia

    Science.gov (United States)

    Markou, Athina; Salamone, John D.; Bussey, Timothy; Mar, Adam; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-01-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu). A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. PMID:23994273

  8. Extraversion differentiates between model-based and model-free strategies in a reinforcement learning task

    Science.gov (United States)

    Skatova, Anya; Chan, Patricia A.; Daw, Nathaniel D.

    2013-01-01

    Prominent computational models describe a neural mechanism for learning from reward prediction errors, and it has been suggested that variations in this mechanism are reflected in personality factors such as trait extraversion. However, although trait extraversion has been linked to improved reward learning, it is not yet known whether this relationship is selective for the particular computational strategy associated with error-driven learning, known as model-free reinforcement learning, vs. another strategy, model-based learning, which the brain is also known to employ. In the present study we test this relationship by examining whether humans' scores on an extraversion scale predict individual differences in the balance between model-based and model-free learning strategies in a sequentially structured decision task designed to distinguish between them. In previous studies with this task, participants have shown a combination of both types of learning, but with substantial individual variation in the balance between them. In the current study, extraversion predicted worse behavior across both sorts of learning. However, the hypothesis that extraverts would be selectively better at model-free reinforcement learning held up among a subset of the more engaged participants, and overall, higher task engagement was associated with a more selective pattern by which extraversion predicted better model-free learning. The findings indicate a relationship between a broad personality orientation and detailed computational learning mechanisms. Results like those in the present study suggest an intriguing and rich relationship between core neuro-computational mechanisms and broader life orientations and outcomes. PMID:24027514

  9. Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems.

    Science.gov (United States)

    Liu, Yan-Jun; Tang, Li; Tong, Shaocheng; Chen, C L Philip; Li, Dong-Juan

    2015-01-01

    Based on the neural network (NN) approximator, an online reinforcement learning algorithm is proposed for a class of affine multiple input and multiple output (MIMO) nonlinear discrete-time systems with unknown functions and disturbances. In the design procedure, two networks are provided where one is an action network to generate an optimal control signal and the other is a critic network to approximate the cost function. An optimal control signal and adaptation laws can be generated based on two NNs. In the previous approaches, the weights of critic and action networks are updated based on the gradient descent rule and the estimations of optimal weight vectors are directly adjusted in the design. Consequently, compared with the existing results, the main contributions of this paper are: 1) only two parameters are needed to be adjusted, and thus the number of the adaptation laws is smaller than the previous results and 2) the updating parameters do not depend on the number of the subsystems for MIMO systems and the tuning rules are replaced by adjusting the norms on optimal weight vectors in both action and critic networks. It is proven that the tracking errors, the adaptation laws, and the control inputs are uniformly bounded using Lyapunov analysis method. The simulation examples are employed to illustrate the effectiveness of the proposed algorithm.

  10. Beyond Positive Reinforcement: OBM as a Humanizing Approach to Management Practices

    Science.gov (United States)

    Crowell, Charles R.

    2005-01-01

    This article comments on the need to recognize that OBM already is a "positive psychology" for many more reasons than just that it embraces positive reinforcement as a cornerstone of workplace improvement. This paper suggests there are at least 10 ways in which OBM constitutes a distinctly "positive" and humanizing approach to management…

  11. Selection for Reinforcement-Free Learning Ability as an Organizing Factor in the Evolution of Cognition

    Directory of Open Access Journals (Sweden)

    Solvi Arnold

    2013-01-01

    Full Text Available This research explores the relation between environmental structure and neurocognitive structure. We hypothesize that selection pressure on abilities for efficient learning (especially in settings with limited or no reward information translates into selection pressure on correspondence relations between neurocognitive and environmental structure, since such correspondence allows for simple changes in the environment to be handled with simple learning updates in neurocognitive structure. We present a model in which a simple form of reinforcement-free learning is evolved in neural networks using neuromodulation and analyze the effect this selection for learning ability has on the virtual species' neural organization. We find a higher degree of organization than in a control population evolved without learning ability and discuss the relation between the observed neural structure and the environmental structure. We discuss our findings in the context of the environmental complexity thesis, the Baldwin effect, and other interactions between adaptation processes.

  12. Reinforcement learning and counterfactual reasoning explain adaptive behavior in a changing environment.

    Science.gov (United States)

    Zhang, Yunfeng; Paik, Jaehyon; Pirolli, Peter

    2015-04-01

    Animals routinely adapt to changes in the environment in order to survive. Though reinforcement learning may play a role in such adaptation, it is not clear that it is the only mechanism involved, as it is not well suited to producing rapid, relatively immediate changes in strategies in response to environmental changes. This research proposes that counterfactual reasoning might be an additional mechanism that facilitates change detection. An experiment is conducted in which a task state changes over time and the participants had to detect the changes in order to perform well and gain monetary rewards. A cognitive model is constructed that incorporates reinforcement learning with counterfactual reasoning to help quickly adjust the utility of task strategies in response to changes. The results show that the model can accurately explain human data and that counterfactual reasoning is key to reproducing the various effects observed in this change detection paradigm. Copyright © 2015 Cognitive Science Society, Inc.

  13. An information-theoretic analysis of return maximization in reinforcement learning.

    Science.gov (United States)

    Iwata, Kazunori

    2011-12-01

    We present a general analysis of return maximization in reinforcement learning. This analysis does not require assumptions of Markovianity, stationarity, and ergodicity for the stochastic sequential decision processes of reinforcement learning. Instead, our analysis assumes the asymptotic equipartition property fundamental to information theory, providing a substantially different view from that in the literature. As our main results, we show that return maximization is achieved by the overlap of typical and best sequence sets, and we present a class of stochastic sequential decision processes with the necessary condition for return maximization. We also describe several examples of best sequences in terms of return maximization in the class of stochastic sequential decision processes, which satisfy the necessary condition. Copyright © 2011 Elsevier Ltd. All rights reserved.

  14. What is the optimal task difficulty for reinforcement learning of brain self-regulation?

    Science.gov (United States)

    Bauer, Robert; Vukelić, Mathias; Gharabaghi, Alireza

    2016-09-01

    The balance between action and reward during neurofeedback may influence reinforcement learning of brain self-regulation. Eleven healthy volunteers participated in three runs of motor imagery-based brain-machine interface feedback where a robot passively opened the hand contingent to β-band modulation. For each run, the β-desynchronization threshold to initiate the hand robot movement increased in difficulty (low, moderate, and demanding). In this context, the incentive to learn was estimated by the change of reward per action, operationalized as the change in reward duration per movement onset. Variance analysis revealed a significant interaction between threshold difficulty and the relationship between reward duration and number of movement onsets (plearning incentive for low difficulty, but a positive learning incentive for moderate and demanding runs. Exploration of different thresholds in the same data set indicated that the learning incentive peaked at higher thresholds than the threshold which resulted in maximum classification accuracy. Specificity is more important than sensitivity of neurofeedback for reinforcement learning of brain self-regulation. Learning efficiency requires adequate challenge by neurofeedback interventions. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  15. Effects of Ventral Striatum Lesions on Stimulus-Based versus Action-Based Reinforcement Learning.

    Science.gov (United States)

    Rothenhoefer, Kathryn M; Costa, Vincent D; Bartolo, Ramón; Vicario-Feliciano, Raquel; Murray, Elisabeth A; Averbeck, Bruno B

    2017-07-19

    Learning the values of actions versus stimuli may depend on separable neural circuits. In the current study, we evaluated the performance of rhesus macaques with ventral striatum (VS) lesions on a two-arm bandit task that had randomly interleaved blocks of stimulus-based and action-based reinforcement learning (RL). Compared with controls, monkeys with VS lesions had deficits in learning to select rewarding images but not rewarding actions. We used a RL model to quantify learning and choice consistency and found that, in stimulus-based RL, the VS lesion monkeys were more influenced by negative feedback and had lower choice consistency than controls. Using a Bayesian model to parse the groups' learning strategies, we also found that VS lesion monkeys defaulted to an action-based choice strategy. Therefore, the VS is involved specifically in learning the value of stimuli, not actions. SIGNIFICANCE STATEMENT Reinforcement learning models of the ventral striatum (VS) often assume that it maintains an estimate of state value. This suggests that it plays a general role in learning whether rewards are assigned based on a chosen action or stimulus. In the present experiment, we examined the effects of VS lesions on monkeys' ability to learn that choosing a particular action or stimulus was more likely to lead to reward. We found that VS lesions caused a specific deficit in the monkeys' ability to discriminate between images with different values, whereas their ability to discriminate between actions with different values remained intact. Our results therefore suggest that the VS plays a specific role in learning to select rewarded stimuli. Copyright © 2017 the authors 0270-6474/17/376902-13$15.00/0.

  16. A flexible mechanism of rule selection enables rapid feature-based reinforcement learning

    Directory of Open Access Journals (Sweden)

    Matthew eBalcarras

    2016-03-01

    Full Text Available Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or colour and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or colour. Two-thirds of subjects (n=22/32 exhibited behaviour that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behaviour of other subjects (n=10/32 was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioural rules by leveraging simple model-free reinforcement

  17. A Formal Verification Model for Performance Analysis of Reinforcement Learning Algorithms Applied t o Dynamic Networks

    OpenAIRE

    Shrirang Ambaji KULKARNI; Raghavendra G . RAO

    2017-01-01

    Routing data packets in a dynamic network is a difficult and important problem in computer networks. As the network is dynamic, it is subject to frequent topology changes and is subject to variable link costs due to congestion and bandwidth. Existing shortest path algorithms fail to converge to better solutions under dynamic network conditions. Reinforcement learning algorithms posses better adaptation techniques in dynamic environments. In this paper we apply model based Q-Routing technique ...

  18. Quality of Service Issues for Reinforcement Learning Based Routing Algorithm for Ad-Hoc Networks

    OpenAIRE

    Kulkarni, Shrirang Ambaji; Rao, G. Raghavendra

    2012-01-01

    Mobile ad-hoc networks are dynamic networks which are decentralized and autonomous in nature. Many routing algorithms have been proposed for these dynamic networks. It is an important problem to model Quality of Service requirements on these types of algorithms which traditionally have certain limitations. To model this scenario we have considered a reinforcement learning algorithm SAMPLE. SAMPLE promises to deal effectively with congestion and under high traffic load. As it is natural for ad...

  19. Application of fuzzy logic-neural network based reinforcement learning to proximity and docking operations

    Science.gov (United States)

    Jani, Yashvant

    1992-01-01

    As part of the Research Institute for Computing and Information Systems (RICIS) activity, the reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Max satellite simulation. This activity is carried out in the software technology laboratory utilizing the Orbital Operations Simulator (OOS). This interim report provides the status of the project and outlines the future plans.

  20. Challenges in adapting imitation and reinforcement learning to compliant robots

    Directory of Open Access Journals (Sweden)

    Calinon Sylvain

    2011-12-01

    Full Text Available There is an exponential increase of the range of tasks that robots are forecasted to accomplish. (Reprogramming these robots becomes a critical issue for their commercialization and for their applications to real-world scenarios in which users without expertise in robotics wish to adapt the robot to their needs. This paper addresses the problem of designing userfriendly human-robot interfaces to transfer skills in a fast and efficient manner. This paper presents recent work conducted at the Learning and Interaction group at ADVR-IIT, ranging from skill acquisition through kinesthetic teaching to self-refinement strategies initiated from demonstrations. Our group started to explore the use of imitation and exploration strategies that can take advantage of the compliant capabilities of recent robot hardware and control architectures.

  1. Application Of Reinforcement Learning In Heading Control Of A Fixed Wing UAV Using X-Plane Platform

    National Research Council Canada - National Science Library

    Kimathi; S; Kangethe; Kihato; P

    2015-01-01

    .... In this paper we propose an online adaptive reinforcement learning heading controller. The autopilot heading controller will be designed in MatlabSimulink for controlling a UAV in X-Plane test platform...

  2. Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive.

    Science.gov (United States)

    Collins, Anne G E; Frank, Michael J

    2014-07-01

    The striatal dopaminergic system has been implicated in reinforcement learning (RL), motor performance, and incentive motivation. Various computational models have been proposed to account for each of these effects individually, but a formal analysis of their interactions is lacking. Here we present a novel algorithmic model expanding the classical actor-critic architecture to include fundamental interactive properties of neural circuit models, incorporating both incentive and learning effects into a single theoretical framework. The standard actor is replaced by a dual opponent actor system representing distinct striatal populations, which come to differentially specialize in discriminating positive and negative action values. Dopamine modulates the degree to which each actor component contributes to both learning and choice discriminations. In contrast to standard frameworks, this model simultaneously captures documented effects of dopamine on both learning and choice incentive-and their interactions-across a variety of studies, including probabilistic RL, effort-based choice, and motor skill learning. (c) 2014 APA, all rights reserved.

  3. Conceptions of Learning and Approaches to Learning in Portuguese Students

    Science.gov (United States)

    Duarte, Antonio M.

    2007-01-01

    The article describes a study that attempted to characterise Portuguese students' conceptions of learning and approaches to learning. A sample of university students answered open questions on the meaning, process and context of learning. Results, derived from content analysis, replicate most conceptions of learning described by phenomenographical…

  4. Specific features of learning with nociceptive electrical reinforcement in rats of different genetic strains: role of brain neurotransmitter systems.

    Science.gov (United States)

    Kalyuzhnyi, A L; Litvinova, S V; Shulgovskii, V V; Panchenko, L F

    2005-01-01

    The interaction between neurotransmitter systems and opioid system in rats of different strains was studied during learning and emotional stress. The interaction between noradrenergic, serotoninergic, and opioid systems in the brain is the main neurochemical mechanism underlying learning reinforced by nociceptive electrical stimulation, which determine individual differences in the rate and type of learning.

  5. Reinforcement learning models and their neural correlates: An activation likelihood estimation meta-analysis.

    Science.gov (United States)

    Chase, Henry W; Kumar, Poornima; Eickhoff, Simon B; Dombrovski, Alexandre Y

    2015-06-01

    Reinforcement learning describes motivated behavior in terms of two abstract signals. The representation of discrepancies between expected and actual rewards/punishments-prediction error-is thought to update the expected value of actions and predictive stimuli. Electrophysiological and lesion studies have suggested that mesostriatal prediction error signals control behavior through synaptic modification of cortico-striato-thalamic networks. Signals in the ventromedial prefrontal and orbitofrontal cortex are implicated in representing expected value. To obtain unbiased maps of these representations in the human brain, we performed a meta-analysis of functional magnetic resonance imaging studies that had employed algorithmic reinforcement learning models across a variety of experimental paradigms. We found that the ventral striatum (medial and lateral) and midbrain/thalamus represented reward prediction errors, consistent with animal studies. Prediction error signals were also seen in the frontal operculum/insula, particularly for social rewards. In Pavlovian studies, striatal prediction error signals extended into the amygdala, whereas instrumental tasks engaged the caudate. Prediction error maps were sensitive to the model-fitting procedure (fixed or individually estimated) and to the extent of spatial smoothing. A correlate of expected value was found in a posterior region of the ventromedial prefrontal cortex, caudal and medial to the orbitofrontal regions identified in animal studies. These findings highlight a reproducible motif of reinforcement learning in the cortico-striatal loops and identify methodological dimensions that may influence the reproducibility of activation patterns across studies.

  6. A Novel Reinforcement Learning Architecture for Continuous State and Action Spaces

    Directory of Open Access Journals (Sweden)

    Víctor Uc-Cetina

    2013-01-01

    in two actors the work required to learn the final policy. One actor decides what action must be performed; meanwhile, a second actor determines the right parameters for the selected action. We tested our architecture and one algorithm based on it solving the robot dribbling problem, a challenging robot control problem taken from the RoboCup competitions. Our experimental work with three different function approximators provides enough evidence to prove that the proposed architecture can be used to implement fast, robust, and reliable reinforcement learning algorithms.

  7. Reaching control of a full-torso, modelled musculoskeletal robot using muscle synergies emergent under reinforcement learning.

    Science.gov (United States)

    Diamond, A; Holland, O E

    2014-03-01

    'Anthropomimetic' robots mimic both human morphology and internal structure-skeleton, muscles, compliance and high redundancy--thus presenting a formidable challenge to conventional control. Here we derive a novel controller for this class of robot which learns effective reaching actions through the sustained activation of weighted muscle synergies, an approach which draws upon compelling, recent evidence from animal and human studies, but is almost unexplored to date in the musculoskeletal robot literature. Since the effective synergy patterns for a given robot will be unknown, we derive a reinforcement-learning approach intended to allow their emergence, in particular those patterns aiding linearization of control. Using an extensive physics-based model of the anthropomimetic ECCERobot, we find that effective reaching actions can be learned comprising only two sequential motor co-activation patterns, each controlled by just a single common driving signal. Factor analysis shows the emergent muscle co-activations can be largely reconstructed using weighted combinations of only 13 common fragments. Testing these 'candidate' synergies as drivable units, the same controller now learns the reaching task both faster and better.

  8. Expectancies in decision making, reinforcement learning, and ventral striatum

    Directory of Open Access Journals (Sweden)

    Matthijs A A Van Der Meer

    2010-05-01

    Full Text Available Decisions can arise in different ways, such as a gut feeling, doing what worked last time, or planful deliberation. Different decision-making systems are dissociable behaviorally, map onto distinct brain systems, and require different computational demands. For instance, ’model-free’ decision strategies use prediction errors to estimate scalar action values from previous experience, while ’model-based’ strategies leverage internal forward models to generate and evaluate potentially rich outcome expectancies. Animal learning studies indicate that expectancies may arise from different sources, including not only forward models but also Pavlovian associations, and the flexibility with which such representations impact behavior may depend on how they are generated. In the light of these considerations, we review the results of van der Meer and Redish (2009a, who found that ventral striatal neurons that respond to reward delivery can also be activated at other points, notably at a decision point where hippocampal forward representations were also observed. These data suggest the possibility that ventral striatal reward representations contribute to model-based expectancies used in deliberative decision-making.

  9. Machine learning an artificial intelligence approach

    CERN Document Server

    Banerjee, R; Bradshaw, Gary; Carbonell, Jaime Guillermo; Mitchell, Tom Michael; Michalski, Ryszard Spencer

    1983-01-01

    Machine Learning: An Artificial Intelligence Approach contains tutorial overviews and research papers representative of trends in the area of machine learning as viewed from an artificial intelligence perspective. The book is organized into six parts. Part I provides an overview of machine learning and explains why machines should learn. Part II covers important issues affecting the design of learning programs-particularly programs that learn from examples. It also describes inductive learning systems. Part III deals with learning by analogy, by experimentation, and from experience. Parts IV a

  10. SAwSu: an integrated model of associative and reinforcement learning.

    Science.gov (United States)

    Veksler, Vladislav D; Myers, Christopher W; Gluck, Kevin A

    2014-04-01

    Successfully explaining and replicating the complexity and generality of human and animal learning will require the integration of a variety of learning mechanisms. Here, we introduce a computational model which integrates associative learning (AL) and reinforcement learning (RL). We contrast the integrated model with standalone AL and RL models in three simulation studies. First, a synthetic grid-navigation task is employed to highlight performance advantages for the integrated model in an environment where the reward structure is both diverse and dynamic. The second and third simulations contrast the performances of the three models in behavioral experiments, demonstrating advantages for the integrated model in accounting for behavioral data. Copyright © 2014 Cognitive Science Society, Inc.

  11. Short-Term Load Forecasting Using Adaptive Annealing Learning Algorithm Based Reinforcement Neural Network

    Directory of Open Access Journals (Sweden)

    Cheng-Ming Lee

    2016-11-01

    Full Text Available A reinforcement learning algorithm is proposed to improve the accuracy of short-term load forecasting (STLF in this article. The proposed model integrates radial basis function neural network (RBFNN, support vector regression (SVR, and adaptive annealing learning algorithm (AALA. In the proposed methodology, firstly, the initial structure of RBFNN is determined by using an SVR. Then, an AALA with time-varying learning rates is used to optimize the initial parameters of SVR-RBFNN (AALA-SVR-RBFNN. In order to overcome the stagnation for searching optimal RBFNN, a particle swarm optimization (PSO is applied to simultaneously find promising learning rates in AALA. Finally, the short-term load demands are predicted by using the optimal RBFNN. The performance of the proposed methodology is verified on the actual load dataset from the Taiwan Power Company (TPC. Simulation results reveal that the proposed AALA-SVR-RBFNN can achieve a better load forecasting precision compared to various RBFNNs.

  12. Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex.

    Science.gov (United States)

    Chadderdon, George L; Neymotin, Samuel A; Kerr, Cliff C; Lytton, William W

    2012-01-01

    Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint "forearm" to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1), no learning (0), or punishment (-1), corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.

  13. Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex.

    Directory of Open Access Journals (Sweden)

    George L Chadderdon

    Full Text Available Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint "forearm" to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1, no learning (0, or punishment (-1, corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.

  14. Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder.

    Science.gov (United States)

    Rothkirch, Marcus; Tonn, Jonas; Köhler, Stephan; Sterzer, Philipp

    2017-04-01

    According to current concepts, major depressive disorder is strongly related to dysfunctional neural processing of motivational information, entailing impairments in reinforcement learning. While computational modelling can reveal the precise nature of neural learning signals, it has not been used to study learning-related neural dysfunctions in unmedicated patients with major depressive disorder so far. We thus aimed at comparing the neural coding of reward and punishment prediction errors, representing indicators of neural learning-related processes, between unmedicated patients with major depressive disorder and healthy participants. To this end, a group of unmedicated patients with major depressive disorder (n = 28) and a group of age- and sex-matched healthy control participants (n = 30) completed an instrumental learning task involving monetary gains and losses during functional magnetic resonance imaging. The two groups did not differ in their learning performance. Patients and control participants showed the same level of prediction error-related activity in the ventral striatum and the anterior insula. In contrast, neural coding of reward prediction errors in the medial orbitofrontal cortex was reduced in patients. Moreover, neural reward prediction error signals in the medial orbitofrontal cortex and ventral striatum showed negative correlations with anhedonia severity. Using a standard instrumental learning paradigm we found no evidence for an overall impairment of reinforcement learning in medication-free patients with major depressive disorder. Importantly, however, the attenuated neural coding of reward in the medial orbitofrontal cortex and the relation between anhedonia and reduced reward prediction error-signalling in the medial orbitofrontal cortex and ventral striatum likely reflect an impairment in experiencing pleasure from rewarding events as a key mechanism of anhedonia in major depressive disorder. © The Author (2017). Published by Oxford

  15. THE SYSTEMIC APPROACH TO TEACHING AND LEARNING:

    African Journals Online (AJOL)

    Rita Wilkinson

    of chemical education would definitely result in improvement of social ... The Systemic Approach in Teaching and Learning (SATL) is based on ... learning becomes a powerful recruitment tool, particularly for global education, and empowers.

  16. THE SYSTEMIC APPROACH TO TEACHING AND LEARNING ...

    African Journals Online (AJOL)

    Preferred Customer

    The Systemic Approach to Teaching and Learning (SATL) is based on .... Meaningful learning presupposes that the learner has a disposition to relate the .... Students (n=429) in six (6) secondary schools in the Cairo and Giza (Egypt).

  17. Design and Evaluation of Two Blended Learning Approaches: Lessons Learned

    Science.gov (United States)

    Cheung, Wing Sum; Hew, Khe Foon

    2011-01-01

    In this paper, we share two blended learning approaches used at the National Institute of Education in Singapore. We have been using these two approaches in the last twelve years in many courses ranging from the diploma to graduate programs. For the first blended learning approach, we integrated one asynchronous communication tool with face to…

  18. Blended learning for reinforcing dental pharmacology in the clinical years: A qualitative analysis.

    Science.gov (United States)

    Eachempati, Prashanti; Kiran Kumar, K S; Sumanth, K N

    2016-10-01

    Blended learning has become the method of choice in educational institutions because of its systematic integration of traditional classroom teaching and online components. This study aims to analyze student's reflection regarding blended learning in dental pharmacology. A cross-sectional study was conducted in Faculty of Dentistry, Melaka-Manipal Medical College among 3 rd and 4 th year BDS students. A total of 145 dental students, who consented, participate in the study. Students were divided into 14 groups. Nine online sessions followed by nine face-to-face discussions were held. Each session addressed topics related to oral lesions and orofacial pain with pharmacological applications. After each week, students were asked to reflect on blended learning. On completion of 9 weeks, reflections were collected and analyzed. Qualitative analysis was done using thematic analysis model suggested by Braun and Clarke. The four main themes were identified, namely, merits of blended learning, skill in writing prescription for oral diseases, dosages of drugs, and identification of strengths and weakness. In general, the participants had a positive feedback regarding blended learning. Students felt more confident in drug selection and prescription writing. They could recollect the doses better after the online and face-to-face sessions. Most interestingly, the students reflected that they are able to identify their strength and weakness after the blended learning sessions. Blended learning module was successfully implemented for reinforcing dental pharmacology. The results obtained in this study enable us to plan future comparative studies to know the effectiveness of blended learning in dental pharmacology.

  19. Simulating the effect of reinforcement learning on neuronal synchrony and periodicity in the striatum

    Directory of Open Access Journals (Sweden)

    Sebastien eHelie

    2016-04-01

    Full Text Available The study of rhythms and oscillations in the brain is gaining attention. While it is unclear exactly what the role of oscillation, synchrony, and rhythm is, it appears increasingly likely that synchrony is related to normal and abnormal brain states and possibly cognition. In this article, we explore the relationship between basal ganglia (BG synchrony and reinforcement learning. We simulate a biologically-realistic model of the striatum initially proposed by Ponzi and Wickens (2010 and enhance the model by adding plastic cortico-BG synapses that can be modified using reinforcement learning. The effect of reinforcement learning on striatal rhythmic activity is then explored, and disrupted using simulated deep brain stimulation (DBS. The stimulator injects current in the brain structure to which it is attached, which affects neuronal synchrony. The results show that training the model without DBS yields a high accuracy in the learning task and reduced the number of active neurons in the striatum, along with an increased firing periodicity and a decreased firing synchrony between neurons in the same assembly. In addition, a spectral decomposition shows a stronger signal for correct trials than incorrect trials in high frequency bands. If the DBS is ON during the training phase, but not the test phase, the amount of learning in the model is reduced, along with firing periodicity. Similar to when the DBS is OFF, spectral decomposition shows a stronger signal for correct trials than for incorrect trials in high frequency domains, but this phenoemenon happens in higher frequency bands than when the DBS is OFF. Synchrony between the neurons is not affected. Finally, the results show that turning the DBS ON at test increases both firing periodicity and striatal synchrony, and spectral decomposition of the signal show that neural activity synchronizes with the DBS fundamental frequency (and its harmonics. Turning the DBS ON during the test phase results

  20. Machine Learning Approaches for Clinical Psychology and Psychiatry.

    Science.gov (United States)

    Dwyer, Dominic B; Falkai, Peter; Koutsouleris, Nikolaos

    2018-01-29

    Machine learning approaches for clinical psychology and psychiatry explicitly focus on learning statistical functions from multidimensional data sets to make generalizable predictions about individuals. The goal of this review is to provide an accessible understanding of why this approach is important for future practice because of its potential to augment decisions associated with the diagnosis, prognosis, and treatment of people suffering from mental illness using clinical and biological data. To this end, the limitations of current statistical paradigms in mental health research are critiqued, and an introduction is provided to the critical machine learning methods used in clinical studies. A selective literature review is then presented aiming to reinforce the usefulness of machine learning methods and provide evidence of their potential. In the context of promising initial results, the current limitations of machine learning approaches are addressed, and considerations for future clinical translation are outlined. Expected final online publication date for the Annual Review of Clinical Psychology Volume 14 is May 7, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

  1. Identifying cognitive remediation change through computational modelling--effects on reinforcement learning in schizophrenia.

    Science.gov (United States)

    Cella, Matteo; Bishara, Anthony J; Medin, Evelina; Swan, Sarah; Reeder, Clare; Wykes, Til

    2014-11-01

    Converging research suggests that individuals with schizophrenia show a marked impairment in reinforcement learning, particularly in tasks requiring flexibility and adaptation. The problem has been associated with dopamine reward systems. This study explores, for the first time, the characteristics of this impairment and how it is affected by a behavioral intervention-cognitive remediation. Using computational modelling, 3 reinforcement learning parameters based on the Wisconsin Card Sorting Test (WCST) trial-by-trial performance were estimated: R (reward sensitivity), P (punishment sensitivity), and D (choice consistency). In Study 1 the parameters were compared between a group of individuals with schizophrenia (n = 100) and a healthy control group (n = 50). In Study 2 the effect of cognitive remediation therapy (CRT) on these parameters was assessed in 2 groups of individuals with schizophrenia, one receiving CRT (n = 37) and the other receiving treatment as usual (TAU, n = 34). In Study 1 individuals with schizophrenia showed impairment in the R and P parameters compared with healthy controls. Study 2 demonstrated that sensitivity to negative feedback (P) and reward (R) improved in the CRT group after therapy compared with the TAU group. R and P parameter change correlated with WCST outputs. Improvements in R and P after CRT were associated with working memory gains and reduction of negative symptoms, respectively. Schizophrenia reinforcement learning difficulties negatively influence performance in shift learning tasks. CRT can improve sensitivity to reward and punishment. Identifying parameters that show change may be useful in experimental medicine studies to identify cognitive domains susceptible to improvement. © The Author 2013. Published by Oxford University Press on behalf of the Maryland Psychiatric Research Center. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  2. Case-based approaches for knowledge application and organisational learning

    DEFF Research Database (Denmark)

    Wang, Chengbo; Johansen, John; Luxhøj, James T.

    2005-01-01

    In dealing with the strategic issues within a manufacturing system, it is necessary to facilitate formulating the composing elements of a set of strategic manufacturing practices and activity patterns that will support an enterprise to reinforce and increase its competitive advantage. These pract......In dealing with the strategic issues within a manufacturing system, it is necessary to facilitate formulating the composing elements of a set of strategic manufacturing practices and activity patterns that will support an enterprise to reinforce and increase its competitive advantage....... These practices and activity patterns are based on learning and applying the knowledge internal and external to an organisation. To ensure their smooth formulation process, there are two important techniques designed – an expert adaptation approach and an expert evaluation approach. These two approaches provide...... structured processes to execute the organisational learning and knowledge application, which intend to guide the practitioners during the process of manufacturing competence development and improvement. They are based on Case-Based Reasoning (CBR) methodology and rely on cases as the primary knowledge supply...

  3. An Energy-Efficient Spectrum-Aware Reinforcement Learning-Based Clustering Algorithm for Cognitive Radio Sensor Networks.

    Science.gov (United States)

    Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal

    2015-08-13

    It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach.

  4. An Energy-Efficient Spectrum-Aware Reinforcement Learning-Based Clustering Algorithm for Cognitive Radio Sensor Networks

    Science.gov (United States)

    Mustapha, Ibrahim; Ali, Borhanuddin Mohd; Rasid, Mohd Fadlee A.; Sali, Aduwati; Mohamad, Hafizal

    2015-01-01

    It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191

  5. Multi-Objective Reinforcement Learning for Cognitive Radio-Based Satellite Communications

    Science.gov (United States)

    Ferreira, Paulo Victor R.; Paffenroth, Randy; Wyglinski, Alexander M.; Hackett, Timothy M.; Bilen, Sven G.; Reinhart, Richard C.; Mortensen, Dale J.

    2016-01-01

    Previous research on cognitive radios has addressed the performance of various machine-learning and optimization techniques for decision making of terrestrial link properties. In this paper, we present our recent investigations with respect to reinforcement learning that potentially can be employed by future cognitive radios installed onboard satellite communications systems specifically tasked with radio resource management. This work analyzes the performance of learning, reasoning, and decision making while considering multiple objectives for time-varying communications channels, as well as different cross-layer requirements. Based on the urgent demand for increased bandwidth, which is being addressed by the next generation of high-throughput satellites, the performance of cognitive radio is assessed considering links between a geostationary satellite and a fixed ground station operating at Ka-band (26 GHz). Simulation results show multiple objective performance improvements of more than 3.5 times for clear sky conditions and 6.8 times for rain conditions.

  6. Multi-Objective Reinforcement Learning for Cognitive Radio Based Satellite Communications

    Science.gov (United States)

    Ferreira, Paulo; Paffenroth, Randy; Wyglinski, Alexander; Hackett, Timothy; Bilen, Sven; Reinhart, Richard; Mortensen, Dale John

    2016-01-01

    Previous research on cognitive radios has addressed the performance of various machine learning and optimization techniques for decision making of terrestrial link properties. In this paper, we present our recent investigations with respect to reinforcement learning that potentially can be employed by future cognitive radios installed onboard satellite communications systems specifically tasked with radio resource management. This work analyzes the performance of learning, reasoning, and decision making while considering multiple objectives for time-varying communications channels, as well as different crosslayer requirements. Based on the urgent demand for increased bandwidth, which is being addressed by the next generation of high-throughput satellites, the performance of cognitive radio is assessed considering links between a geostationary satellite and a fixed ground station operating at Ka-band (26 GHz). Simulation results show multiple objective performance improvements of more than 3:5 times for clear sky conditions and 6:8 times for rain conditions.

  7. Approaches to e-learning

    DEFF Research Database (Denmark)

    Hartvig, Susanne Akrawi; Petersson, Eva

    2013-01-01

    E-learning has made its entrance into educational institutions. Compared to traditional learning methods, e-learning has the benefit of enabling educational institutions to attract more students. E-learning not only opens up for an increased enrollment, it also gives students who would otherwise...... not be able to take the education to now get the possibility to do so. This paper introduces Axel Honneth’s theory on the need for recognition as a framework to understand the role and function of interaction in relation to e-learning. The paper argues that an increased focus on the dialectic relationship...... between recognition and learning will enable an optimization of the learning conditions and the interactive affordances targeting students under e-learning programs. The paper concludes that the engagement and motivation to learn are not only influenced by but depending on recognition....

  8. Project Management Approaches for Online Learning Design

    Science.gov (United States)

    Eby, Gulsun; Yuzer, T. Volkan

    2013-01-01

    Developments in online learning and its design are areas that continue to grow in order to enhance students' learning environments and experiences. However, in the implementation of new technologies, the importance of properly and fairly overseeing these courses is often undervalued. "Project Management Approaches for Online Learning Design"…

  9. Machine learning approaches in medical image analysis

    DEFF Research Database (Denmark)

    de Bruijne, Marleen

    2016-01-01

    Machine learning approaches are increasingly successful in image-based diagnosis, disease prognosis, and risk assessment. This paper highlights new research directions and discusses three main challenges related to machine learning in medical imaging: coping with variation in imaging protocols......, learning from weak labels, and interpretation and evaluation of results....

  10. Universal effect of dynamical reinforcement learning mechanism in spatial evolutionary games

    Science.gov (United States)

    Zhang, Hai-Feng; Wu, Zhi-Xi; Wang, Bing-Hong

    2012-06-01

    One of the prototypical mechanisms in understanding the ubiquitous cooperation in social dilemma situations is the win-stay, lose-shift rule. In this work, a generalized win-stay, lose-shift learning model—a reinforcement learning model with dynamic aspiration level—is proposed to describe how humans adapt their social behaviors based on their social experiences. In the model, the players incorporate the information of the outcomes in previous rounds with time-dependent aspiration payoffs to regulate the probability of choosing cooperation. By investigating such a reinforcement learning rule in the spatial prisoner's dilemma game and public goods game, a most noteworthy viewpoint is that moderate greediness (i.e. moderate aspiration level) favors best the development and organization of collective cooperation. The generality of this observation is tested against different regulation strengths and different types of network of interaction as well. We also make comparisons with two recently proposed models to highlight the importance of the mechanism of adaptive aspiration level in supporting cooperation in structured populations.

  11. Sex-dependent effects on tasks assessing reinforcement learning and interference inhibition

    Directory of Open Access Journals (Sweden)

    Kelly L Evans

    2015-07-01

    Full Text Available Increasing evidence suggests that the prefrontal cortex (PFC is influenced by sex steroids and that some cognitive functions dependent on the PFC may be sexually differentiated in humans. Past work has identified a male advantage on certain complex reinforcement learning tasks, but it is unclear which latent task components are important to elicit the sex difference. The objective of the current study was to investigate whether there are sex differences on measures of response inhibition and valenced feedback processing, elements that are shared by previously studied reinforcement learning tasks. Healthy young adults (90 males, 86 females matched in general intelligence completed the Probabilistic Selection Task (PST, a Simon task, and the Stop-Signal task. On the PST, females were more accurate than males in learning from positive (but not negative feedback. On the Simon task, males were faster than females, especially in the face of incongruent stimuli. No sex difference was observed in Stop-Signal reaction time. The current findings provide preliminary support for a sex difference in the processing of valenced feedback and in interference inhibition.

  12. Dynamic pricing and automated resource allocation for complex information services reinforcement learning and combinatorial auctions

    CERN Document Server

    Schwind, Michael; Fandel, G

    2007-01-01

    Many firms provide their customers with online information products which require limited resources such as server capacity. This book develops allocation mechanisms that aim to ensure an efficient resource allocation in modern IT-services. Recent methods of artificial intelligence, such as neural networks and reinforcement learning, and nature-oriented optimization methods, such as genetic algorithms and simulated annealing, are advanced and applied to allocation processes in distributed IT-infrastructures, e.g. grid systems. The author presents two methods, both of which using the users??? w

  13. Time-Varying Formation Controllers for Unmanned Aerial Vehicles Using Deep Reinforcement Learning

    OpenAIRE

    Conde, Ronny; Llata, José Ramón; Torre-Ferrero, Carlos

    2017-01-01

    We consider the problem of designing scalable and portable controllers for unmanned aerial vehicles (UAVs) to reach time-varying formations as quickly as possible. This brief confirms that deep reinforcement learning can be used in a multi-agent fashion to drive UAVs to reach any formation while taking into account optimality and portability. We use a deep neural network to estimate how good a state is, so the agent can choose actions accordingly. The system is tested with different non-high-...

  14. Machine Learning Approaches for Music Information Retrieval

    OpenAIRE

    Li, Tao; Ogihara, Mitsunori; Shao, Bo; DingdingWang,

    2009-01-01

    We discussed the following machine learning approaches used in music information retrieval: (1) multi-class classification methods for music genre categorization; (2) multi-label classification methods for emotion detection; (3) clustering methods for music style identification; and (4) semi-supervised learning methods for music recommendation. Experimental results are also presented to evaluate the approaches.

  15. Learning Actions Models: Qualitative Approach

    DEFF Research Database (Denmark)

    Bolander, Thomas; Gierasimczuk, Nina

    2015-01-01

    identifiability (conclusively inferring the appropriate action model in finite time) and identifiability in the limit (inconclusive convergence to the right action model). We show that deterministic actions are finitely identifiable, while non-deterministic actions require more learning power......—they are identifiable in the limit.We then move on to a particular learning method, which proceeds via restriction of a space of events within a learning-specific action model. This way of learning closely resembles the well-known update method from dynamic epistemic logic. We introduce several different learning...

  16. Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework.

    Science.gov (United States)

    Gershman, Samuel J; Daw, Nathaniel D

    2017-01-03

    We review the psychology and neuroscience of reinforcement learning (RL), which has experienced significant progress in the past two decades, enabled by the comprehensive experimental study of simple learning and decision-making tasks. However, one challenge in the study of RL is computational: The simplicity of these tasks ignores important aspects of reinforcement learning in the real world: (a) State spaces are high-dimensional, continuous, and partially observable; this implies that (b) data are relatively sparse and, indeed, precisely the same situation may never be encountered twice; furthermore, (c) rewards depend on the long-term consequences of actions in ways that violate the classical assumptions that make RL tractable. A seemingly distinct challenge is that, cognitively, theories of RL have largely involved procedural and semantic memory, the way in which knowledge about action values or world models extracted gradually from many experiences can drive choice. This focus on semantic memory leaves out many aspects of memory, such as episodic memory, related to the traces of individual events. We suggest that these two challenges are related. The computational challenge can be dealt with, in part, by endowing RL systems with episodic memory, allowing them to (a) efficiently approximate value functions over complex state spaces, (b) learn with very little data, and (c) bridge long-term dependencies between actions and rewards. We review the computational theory underlying this proposal and the empirical evidence to support it. Our proposal suggests that the ubiquitous and diverse roles of memory in RL may function as part of an integrated learning system.

  17. Assessment of the Uretek process on continuously reinforced concrete pavement, jointed concrete pavement, and bridge approach slabs : technical assistance report.

    Science.gov (United States)

    2004-12-01

    This study evaluates the rehabilitation method utilizing the injection of Uretek (polyurethane) into the pavement structures on continuously reinforced concrete pavement (CRCP), jointed concrete pavement (JCP), and bridge approach slabs. The polyuret...

  18. Variance-penalized Markov decision processes: dynamic programming and reinforcement learning techniques

    Science.gov (United States)

    Gosavi, Abhijit

    2014-08-01

    In control systems theory, the Markov decision process (MDP) is a widely used optimization model involving selection of the optimal action in each state visited by a discrete-event system driven by Markov chains. The classical MDP model is suitable for an agent/decision-maker interested in maximizing expected revenues, but does not account for minimizing variability in the revenues. An MDP model in which the agent can maximize the revenues while simultaneously controlling the variance in the revenues is proposed. This work is rooted in machine learning/neural network concepts, where updating is based on system feedback and step sizes. First, a Bellman equation for the problem is proposed. Thereafter, convergent dynamic programming and reinforcement learning techniques for solving the MDP are provided along with encouraging numerical results on a small MDP and a preventive maintenance problem.

  19. An analysis of intergroup rivalry using Ising model and reinforcement learning

    Science.gov (United States)

    Zhao, Feng-Fei; Qin, Zheng; Shao, Zhuo

    2014-01-01

    Modeling of intergroup rivalry can help us better understand economic competitions, political elections and other similar activities. The result of intergroup rivalry depends on the co-evolution of individual behavior within one group and the impact from the rival group. In this paper, we model the rivalry behavior using Ising model. Different from other simulation studies using Ising model, the evolution rules of each individual in our model are not static, but have the ability to learn from historical experience using reinforcement learning technique, which makes the simulation more close to real human behavior. We studied the phase transition in intergroup rivalry and focused on the impact of the degree of social freedom, the personality of group members and the social experience of individuals. The results of computer simulation show that a society with a low degree of social freedom and highly educated, experienced individuals is more likely to be one-sided in intergroup rivalry.

  20. Automatic Skill Acquisition in Reinforcement Learning Agents Using Connection Bridge Centrality

    Science.gov (United States)

    Moradi, Parham; Shiri, Mohammad Ebrahim; Entezari, Negin

    Incorporating skills in reinforcement learning methods results in accelerate agents learning performance. The key problem of automatic skill discovery is to find subgoal states and create skills to reach them. Among the proposed algorithms, those based on graph centrality measures have achieved precise results. In this paper we propose a new graph centrality measure for identifying subgoal states that is crucial to develop useful skills. The main advantage of the proposed centrality measure is that this measure considers both local and global information of the agent states to score them that result in identifying real subgoal states. We will show through simulations for three benchmark tasks, namely, "four-room grid world", "taxi driver grid world" and "soccer simulation grid world" that a procedure based on the proposed centrality measure performs better than the procedure based on the other centrality measures.

  1. The role of multiple neuromodulators in reinforcement learning that is based on competition between eligibility traces

    Directory of Open Access Journals (Sweden)

    Marco A Huertas

    2016-12-01

    Full Text Available The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment, and how to stop learning once the target behaviors are attained (stopping rule. To address the first problem, synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although these were mere theoretical constructs, recent experiements have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP and one for long-term depression (LTD, each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different

  2. A Day-to-Day Route Choice Model Based on Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Fangfang Wei

    2014-01-01

    Full Text Available Day-to-day traffic dynamics are generated by individual traveler’s route choice and route adjustment behaviors, which are appropriate to be researched by using agent-based model and learning theory. In this paper, we propose a day-to-day route choice model based on reinforcement learning and multiagent simulation. Travelers’ memory, learning rate, and experience cognition are taken into account. Then the model is verified and analyzed. Results show that the network flow can converge to user equilibrium (UE if travelers can remember all the travel time they have experienced, but which is not necessarily the case under limited memory; learning rate can strengthen the flow fluctuation, but memory leads to the contrary side; moreover, high learning rate results in the cyclical oscillation during the process of flow evolution. Finally, both the scenarios of link capacity degradation and random link capacity are used to illustrate the model’s applications. Analyses and applications of our model demonstrate the model is reasonable and useful for studying the day-to-day traffic dynamics.

  3. Minimizing endpoint variability through reinforcement learning during reaching movements involving shoulder, elbow and wrist.

    Science.gov (United States)

    Mehler, David Marc Anton; Reichenbach, Alexandra; Klein, Julius; Diedrichsen, Jörn

    2017-01-01

    Reaching movements are comprised of the coordinated action across multiple joints. The human skeleton is redundant for this task because different joint configurations can lead to the same endpoint in space. How do people learn to use combinations of joints that maximize success in goal-directed motor tasks? To answer this question, we used a 3-degree-of-freedom manipulandum to measure shoulder, elbow and wrist joint movements during reaching in a plane. We tested whether a shift in the relative contribution of the wrist and elbow joints to a reaching movement could be learned by an implicit reinforcement regime. Unknown to the participants, we decreased the task success for certain joint configurations (wrist flexion or extension, respectively) by adding random variability to the endpoint feedback. In return, the opposite wrist postures were rewarded in the two experimental groups (flexion and extension group). We found that the joint configuration slowly shifted towards movements that provided more control over the endpoint and hence higher task success. While the overall learning was significant, only the group that was guided to extend the wrist joint more during the movement showed substantial learning. Importantly, all changes in movement pattern occurred independent of conscious awareness of the experimental manipulation. These findings suggest that the motor system is generally sensitive to its output variability and can optimize joint-space solutions that minimize task-relevant output variability. We discuss biomechanical biases (e.g. joint's range of movement) that could impose hurdles to the learning process.

  4. Energy Management Strategy for a Hybrid Electric Vehicle Based on Deep Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Yue Hu

    2018-01-01

    Full Text Available An energy management strategy (EMS is important for hybrid electric vehicles (HEVs since it plays a decisive role on the performance of the vehicle. However, the variation of future driving conditions deeply influences the effectiveness of the EMS. Most existing EMS methods simply follow predefined rules that are not adaptive to different driving conditions online. Therefore, it is useful that the EMS can learn from the environment or driving cycle. In this paper, a deep reinforcement learning (DRL-based EMS is designed such that it can learn to select actions directly from the states without any prediction or predefined rules. Furthermore, a DRL-based online learning architecture is presented. It is significant for applying the DRL algorithm in HEV energy management under different driving conditions. Simulation experiments have been conducted using MATLAB and Advanced Vehicle Simulator (ADVISOR co-simulation. Experimental results validate the effectiveness of the DRL-based EMS compared with the rule-based EMS in terms of fuel economy. The online learning architecture is also proved to be effective. The proposed method ensures the optimality, as well as real-time applicability, in HEVs.

  5. Multiobjective Reinforcement Learning for Traffic Signal Control Using Vehicular Ad Hoc Network

    Directory of Open Access Journals (Sweden)

    Houli Duan

    2010-01-01

    Full Text Available We propose a new multiobjective control algorithm based on reinforcement learning for urban traffic signal control, named multi-RL. A multiagent structure is used to describe the traffic system. A vehicular ad hoc network is used for the data exchange among agents. A reinforcement learning algorithm is applied to predict the overall value of the optimization objective given vehicles' states. The policy which minimizes the cumulative value of the optimization objective is regarded as the optimal one. In order to make the method adaptive to various traffic conditions, we also introduce a multiobjective control scheme in which the optimization objective is selected adaptively to real-time traffic states. The optimization objectives include the vehicle stops, the average waiting time, and the maximum queue length of the next intersection. In addition, we also accommodate a priority control to the buses and the emergency vehicles through our model. The simulation results indicated that our algorithm could perform more efficiently than traditional traffic light control methods.

  6. Multiobjective Reinforcement Learning for Traffic Signal Control Using Vehicular Ad Hoc Network

    Science.gov (United States)

    Houli, Duan; Zhiheng, Li; Yi, Zhang

    2010-12-01

    We propose a new multiobjective control algorithm based on reinforcement learning for urban traffic signal control, named multi-RL. A multiagent structure is used to describe the traffic system. A vehicular ad hoc network is used for the data exchange among agents. A reinforcement learning algorithm is applied to predict the overall value of the optimization objective given vehicles' states. The policy which minimizes the cumulative value of the optimization objective is regarded as the optimal one. In order to make the method adaptive to various traffic conditions, we also introduce a multiobjective control scheme in which the optimization objective is selected adaptively to real-time traffic states. The optimization objectives include the vehicle stops, the average waiting time, and the maximum queue length of the next intersection. In addition, we also accommodate a priority control to the buses and the emergency vehicles through our model. The simulation results indicated that our algorithm could perform more efficiently than traditional traffic light control methods.

  7. Reinforcement learning for adaptive threshold control of restorative brain-computer interfaces: a Bayesian simulation.

    Science.gov (United States)

    Bauer, Robert; Gharabaghi, Alireza

    2015-01-01

    Restorative brain-computer interfaces (BCI) are increasingly used to provide feedback of neuronal states in a bid to normalize pathological brain activity and achieve behavioral gains. However, patients and healthy subjects alike often show a large variability, or even inability, of brain self-regulation for BCI control, known as BCI illiteracy. Although current co-adaptive algorithms are powerful for assistive BCIs, their inherent class switching clashes with the operant conditioning goal of restorative BCIs. Moreover, due to the treatment rationale, the classifier of restorative BCIs usually has a constrained feature space, thus limiting the possibility of classifier adaptation. In this context, we applied a Bayesian model of neurofeedback and reinforcement learning for different threshold selection strategies to study the impact of threshold adaptation of a linear classifier on optimizing restorative BCIs. For each feedback iteration, we first determined the thresholds that result in minimal action entropy and maximal instructional efficiency. We then used the resulting vector for the simulation of continuous threshold adaptation. We could thus show that threshold adaptation can improve reinforcement learning, particularly in cases of BCI illiteracy. Finally, on the basis of information-theory, we provided an explanation for the achieved benefits of adaptive threshold setting.

  8. FMRQ-A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks.

    Science.gov (United States)

    Zhang, Zhen; Zhao, Dongbin; Gao, Junwei; Wang, Dongqing; Dai, Yujie

    2017-06-01

    In this paper, we propose a multiagent reinforcement learning algorithm dealing with fully cooperative tasks. The algorithm is called frequency of the maximum reward Q-learning (FMRQ). FMRQ aims to achieve one of the optimal Nash equilibria so as to optimize the performance index in multiagent systems. The frequency of obtaining the highest global immediate reward instead of immediate reward is used as the reinforcement signal. With FMRQ each agent does not need the observation of the other agents' actions and only shares its state and reward at each step. We validate FMRQ through case studies of repeated games: four cases of two-player two-action and one case of three-player two-action. It is demonstrated that FMRQ can converge to one of the optimal Nash equilibria in these cases. Moreover, comparison experiments on tasks with multiple states and finite steps are conducted. One is box-pushing and the other one is distributed sensor network problem. Experimental results show that the proposed algorithm outperforms others with higher performance.

  9. Reinforcement learning for adaptive threshold control of restorative brain-computer interfaces: a Bayesian simulation

    Directory of Open Access Journals (Sweden)

    Robert eBauer

    2015-02-01

    Full Text Available Restorative brain-computer interfaces (BCI are increasingly used to provide feedback of neuronal states in a bid to normalize pathological brain activity and achieve behavioral gains. However, patients and healthy subjects alike often show a large variability, or even inability, of brain self-regulation for BCI control, known as BCI illiteracy. Although current co-adaptive algorithms are powerful for assistive BCIs, their inherent class switching clashes with the operant conditioning goal of restorative BCIs. Moreover, due to the treatment rationale, the classifier of restorative BCIs usually has a constrained feature space, thus limiting the possibility of classifier adaptation.In this context, we applied a Bayesian model of neurofeedback and reinforcement learning for different threshold selection strategies to study the impact of threshold adaptation of a linear classifier on optimizing restorative BCIs. For each feedback iteration, we first determined the thresholds that result in minimal action entropy and maximal instructional efficiency. We then used the resulting vector for the simulation of continuous threshold adaptation. We could thus show that threshold adaptation can improve reinforcement learning, particularly in cases of BCI illiteracy. Finally, on the basis of information-theory, we provided an explanation for the achieved benefits of adaptive threshold setting.

  10. Learning and geometry computational approaches

    CERN Document Server

    Smith, Carl

    1996-01-01

    The field of computational learning theory arose out of the desire to for­ mally understand the process of learning. As potential applications to artificial intelligence became apparent, the new field grew rapidly. The learning of geo­ metric objects became a natural area of study. The possibility of using learning techniques to compensate for unsolvability provided an attraction for individ­ uals with an immediate need to solve such difficult problems. Researchers at the Center for Night Vision were interested in solving the problem of interpreting data produced by a variety of sensors. Current vision techniques, which have a strong geometric component, can be used to extract features. However, these techniques fall short of useful recognition of the sensed objects. One potential solution is to incorporate learning techniques into the geometric manipulation of sensor data. As a first step toward realizing such a solution, the Systems Research Center at the University of Maryland, in conjunction with the C...

  11. Electrophysiological correlates of reinforcement learning in young people with Tourette syndrome with and without co-occurring ADHD symptoms.

    Science.gov (United States)

    Shephard, Elizabeth; Jackson, Georgina M; Groom, Madeleine J

    2016-06-01

    Altered reinforcement learning is implicated in the causes of Tourette syndrome (TS) and attention-deficit/hyperactivity disorder (ADHD). TS and ADHD frequently co-occur but how this affects reinforcement learning has not been investigated. We examined the ability of young people with TS (n=18), TS+ADHD (N=17), ADHD (n=13) and typically developing controls (n=20) to learn and reverse stimulus-response (S-R) associations based on positive and negative reinforcement feedback. We used a 2 (TS-yes, TS-no)×2 (ADHD-yes, ADHD-no) factorial design to assess the effects of TS, ADHD, and their interaction on behavioural (accuracy, RT) and event-related potential (stimulus-locked P3, feedback-locked P2, feedback-related negativity, FRN) indices of learning and reversing the S-R associations. TS was associated with intact learning and reversal performance and largely typical ERP amplitudes. ADHD was associated with lower accuracy during S-R learning and impaired reversal learning (significantly reduced accuracy and a trend for smaller P3 amplitude). The results indicate that co-occurring ADHD symptoms impair reversal learning in TS+ADHD. The implications of these findings for behavioural tic therapies are discussed. Copyright © 2016 ISDN. Published by Elsevier Ltd. All rights reserved.

  12. Using reinforcement learning to provide stable brain-machine interface control despite neural input reorganization.

    Directory of Open Access Journals (Sweden)

    Eric A Pohlmeyer

    Full Text Available Brain-machine interface (BMI systems give users direct neural control of robotic, communication, or functional electrical stimulation systems. As BMI systems begin transitioning from laboratory settings into activities of daily living, an important goal is to develop neural decoding algorithms that can be calibrated with a minimal burden on the user, provide stable control for long periods of time, and can be responsive to fluctuations in the decoder's neural input space (e.g. neurons appearing or being lost amongst electrode recordings. These are significant challenges for static neural decoding algorithms that assume stationary input/output relationships. Here we use an actor-critic reinforcement learning architecture to provide an adaptive BMI controller that can successfully adapt to dramatic neural reorganizations, can maintain its performance over long time periods, and which does not require the user to produce specific kinetic or kinematic activities to calibrate the BMI. Two marmoset monkeys used the Reinforcement Learning BMI (RLBMI to successfully control a robotic arm during a two-target reaching task. The RLBMI was initialized using random initial conditions, and it quickly learned to control the robot from brain states using only a binary evaluative feedback regarding whether previously chosen robot actions were good or bad. The RLBMI was able to maintain control over the system throughout sessions spanning multiple weeks. Furthermore, the RLBMI was able to quickly adapt and maintain control of the robot despite dramatic perturbations to the neural inputs, including a series of tests in which the neuron input space was deliberately halved or doubled.

  13. Using reinforcement learning to provide stable brain-machine interface control despite neural input reorganization.

    Science.gov (United States)

    Pohlmeyer, Eric A; Mahmoudi, Babak; Geng, Shijia; Prins, Noeline W; Sanchez, Justin C

    2014-01-01

    Brain-machine interface (BMI) systems give users direct neural control of robotic, communication, or functional electrical stimulation systems. As BMI systems begin transitioning from laboratory settings into activities of daily living, an important goal is to develop neural decoding algorithms that can be calibrated with a minimal burden on the user, provide stable control for long periods of time, and can be responsive to fluctuations in the decoder's neural input space (e.g. neurons appearing or being lost amongst electrode recordings). These are significant challenges for static neural decoding algorithms that assume stationary input/output relationships. Here we use an actor-critic reinforcement learning architecture to provide an adaptive BMI controller that can successfully adapt to dramatic neural reorganizations, can maintain its performance over long time periods, and which does not require the user to produce specific kinetic or kinematic activities to calibrate the BMI. Two marmoset monkeys used the Reinforcement Learning BMI (RLBMI) to successfully control a robotic arm during a two-target reaching task. The RLBMI was initialized using random initial conditions, and it quickly learned to control the robot from brain states using only a binary evaluative feedback regarding whether previously chosen robot actions were good or bad. The RLBMI was able to maintain control over the system throughout sessions spanning multiple weeks. Furthermore, the RLBMI was able to quickly adapt and maintain control of the robot despite dramatic perturbations to the neural inputs, including a series of tests in which the neuron input space was deliberately halved or doubled.

  14. Long term effects of aversive reinforcement on colour discrimination learning in free-flying bumblebees.

    Directory of Open Access Journals (Sweden)

    Miguel A Rodríguez-Gironés

    Full Text Available The results of behavioural experiments provide important information about the structure and information-processing abilities of the visual system. Nevertheless, if we want to infer from behavioural data how the visual system operates, it is important to know how different learning protocols affect performance and to devise protocols that minimise noise in the response of experimental subjects. The purpose of this work was to investigate how reinforcement schedule and individual variability affect the learning process in a colour discrimination task. Free-flying bumblebees were trained to discriminate between two perceptually similar colours. The target colour was associated with sucrose solution, and the distractor could be associated with water or quinine solution throughout the experiment, or with one substance during the first half of the experiment and the other during the second half. Both acquisition and final performance of the discrimination task (measured as proportion of correct choices were determined by the choice of reinforcer during the first half of the experiment: regardless of whether bees were trained with water or quinine during the second half of the experiment, bees trained with quinine during the first half learned the task faster and performed better during the whole experiment. Our results confirm that the choice of stimuli used during training affects the rate at which colour discrimination tasks are acquired and show that early contact with a strongly aversive stimulus can be sufficient to maintain high levels of attention during several hours. On the other hand, bees which took more time to decide on which flower to alight were more likely to make correct choices than bees which made fast decisions. This result supports the existence of a trade-off between foraging speed and accuracy, and highlights the importance of measuring choice latencies during behavioural experiments focusing on cognitive abilities.

  15. Interactions Among Working Memory, Reinforcement Learning, and Effort in Value-Based Choice: A New Paradigm and Selective Deficits in Schizophrenia.

    Science.gov (United States)

    Collins, Anne G E; Albrecht, Matthew A; Waltz, James A; Gold, James M; Frank, Michael J

    2017-09-15

    When studying learning, researchers directly observe only the participants' choices, which are often assumed to arise from a unitary learning process. However, a number of separable systems, such as working memory (WM) and reinforcement learning (RL), contribute simultaneously to human learning. Identifying each system's contributions is essential for mapping the neural substrates contributing in parallel to behavior; computational modeling can help to design tasks that allow such a separable identification of processes and infer their contributions in individuals. We present a new experimental protocol that separately identifies the contributions of RL and WM to learning, is sensitive to parametric variations in both, and allows us to investigate whether the processes interact. In experiments 1 and 2, we tested this protocol with healthy young adults (n = 29 and n = 52, respectively). In experiment 3, we used it to investigate learning deficits in medicated individuals with schizophrenia (n = 49 patients, n = 32 control subjects). Experiments 1 and 2 established WM and RL contributions to learning, as evidenced by parametric modulations of choice by load and delay and reward history, respectively. They also showed interactions between WM and RL, where RL was enhanced under high WM load. Moreover, we observed a cost of mental effort when controlling for reinforcement history: participants preferred stimuli they encountered under low WM load. Experiment 3 revealed selective deficits in WM contributions and preserved RL value learning in individuals with schizophrenia compared with control subjects. Computational approaches allow us to disentangle contributions of multiple systems to learning and, consequently, to further our understanding of psychiatric diseases. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  16. Attentional Selection Can Be Predicted by Reinforcement Learning of Task-relevant Stimulus Features Weighted by Value-independent Stickiness.

    Science.gov (United States)

    Balcarras, Matthew; Ardid, Salva; Kaping, Daniel; Everling, Stefan; Womelsdorf, Thilo

    2016-02-01

    Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.

  17. Stochastic collusion and the power law of learning: a general reinforcement learning model of cooperation

    NARCIS (Netherlands)

    Flache, A.

    2002-01-01

    Concerns about models of cultural adaptation as analogs of genetic selection have led cognitive game theorists to explore learning-theoretic specifications. Two prominent examples, the Bush-Mosteller stochastic learning model and the Roth-Erev payoff-matching model, are aligned and integrated as

  18. Behavioral and Electrophysiological Alterations for Reinforcement Learning in Manic and Euthymic Patients with Bipolar Disorder.

    Science.gov (United States)

    Ryu, Vin; Ha, Ra Yeon; Lee, Su Jin; Ha, Kyooseob; Cho, Hyun-Sang

    2017-03-01

    Bipolar disorder is characterized by behavioral changes such as risk-taking and increasing goal-directed activities, which may result from altered reward processing. Patients with bipolar disorder show impaired reward learning in situations that require the integration of reinforced feedback over time. In this study, we examined the behavioral and electrophysiological characteristics of reward learning in manic and euthymic patients with bipolar disorder using a probabilistic reward task. Twenty-four manic and 20 euthymic patients with bipolar I disorder and 24 healthy control subjects performed the probabilistic reward task. We assessed response bias (RB) as a preference for the stimulus paired with the more frequent reward and feedback-related negativity (FRN) to correct identification of the rich stimulus. Both manic and euthymic patients showed significantly lower RB scores in the early learning stage (block 1) in comparison with the late learning stage (block 2 or block 3) of the task, as well as significantly lower RB scores in the early stage compared to healthy subjects. Relatively more negative FRN amplitude is elicited by no presentation of an expected reward, compared to that elicited by presentation of expected feedback. The FRN became significantly more negative from the early (block 1) to the later stages (blocks 2 and 3) in both manic and euthymic patients, but not in healthy subjects. Changes in RB scores and FRN amplitudes between blocks 2 and 3 and block 1 correlated positively in healthy controls, but correlated negatively in manic and euthymic patients. The severity of manic symptoms correlated positively with reward learning scores and negatively with the FRN. These findings suggest that patients with bipolar disorder during euthymic or manic states have behavioral and electrophysiological alterations in reward learning compared to healthy subjects. This dysfunctional reward processing may be related to the abnormal decision-making or altered

  19. A Guided Discovery Approach for Learning Glycolysis.

    Science.gov (United States)

    Schultz, Emeric

    1997-01-01

    Argues that more attention should be given to teaching students how to learn the rudiments of specific metabolic pathways. This approach describes a unique way of learning the glycolytic pathway in stepwise fashion. The pedagogy involves clear rote components that are connected to a set of generalizations that develop and enhance important…

  20. Template Approach for Adaptive Learning Strategies

    NARCIS (Netherlands)

    Abbing, Jana; Koidl, Kevin

    2006-01-01

    Please, cite this publication as: Abbing, J. & Koidl, K. (2006). Template Approach for Adaptive Learning Strategies. Proceedings of Adaptive Hypermedia. June, Dublin, Ireland. Retrieved June 30th, 2006, from http://dspace.learningnetworks.org

  1. Cultivating collaborative improvement: an action learning approach

    NARCIS (Netherlands)

    Middel, H.G.A.; McNichols, Timothy

    2006-01-01

    The process of implementing collaborative initiatives across disparate members of supply networks is fraught with difficulties. One approach designed to tackle the difficulties of organisational change and interorganisational improvement in practice is 'action learning'. This paper examines the

  2. Keep it clean: a visual approach to reinforce hand hygiene compliance in the emergency department.

    Science.gov (United States)

    Wiles, Lynn L; Roberts, Chris; Schmidt, Kim

    2015-03-01

    Although hand hygiene strategies significantly reduce health care-associated infections, multiple studies have documented that hand hygiene is the most overlooked and poorly performed infection control intervention. Emergency nurses and technicians (n = 95) in a 41-bed emergency department in eastern Virginia completed pretests and posttests, an education module, and two experiential learning activities reinforcing hand hygiene and infection control protocols. Posttest scores were significantly higher than pretest scores (t (108) = -6.928, P = .048). Hand hygiene compliance rates improved at the conclusion of the project and 3 months after the study (F (2, 15) = 9.89, P = .002). Interfaces with staff as they completed the interactive exercise, as well as anecdotal notes collected during the study, identified key times when compliance suffered and offered opportunities to further improve hand hygiene and, ultimately, patient safety. Copyright © 2015 Emergency Nurses Association. Published by Elsevier Inc. All rights reserved.

  3. Machine learning a theoretical approach

    CERN Document Server

    Natarajan, Balas K

    2014-01-01

    This is the first comprehensive introduction to computational learning theory. The author's uniform presentation of fundamental results and their applications offers AI researchers a theoretical perspective on the problems they study. The book presents tools for the analysis of probabilistic models of learning, tools that crisply classify what is and is not efficiently learnable. After a general introduction to Valiant's PAC paradigm and the important notion of the Vapnik-Chervonenkis dimension, the author explores specific topics such as finite automata and neural networks. The presentation

  4. MOBILE LEARNING: CONTEXT ADAPTATION AND SCENARIO APPROACH

    Directory of Open Access Journals (Sweden)

    Vladimir V. Kureichik

    2013-01-01

    Full Text Available The paper proposes a model of an open architecture for component context-dependent systems of computer training to the needs of the software applications of intelligent learning environments and adaptive learning systems. The structure of a content management system is developed based on Semantic Web. The model for developing of the engine is supposed to be based on probabilistic automata. Another part of work is developing of learning scenarios and possibilities for its adaptation. The context approach for personalization of learning style is described in the paper as well. 

  5. Cultural differences in learning approaches

    NARCIS (Netherlands)

    Tempelaar, D.T.; Rienties, B.C.; Giesbers, S.J.H.; Schim van der Loeff, S.; Van den Bossche, P.; Gijselaers, W.H.; Milter, R.G.

    2013-01-01

    Cultural differences in learning-related dispositions are investigated amongst 7,300 first year students from 81 different nationalities, using the framework of Hofstede (Culture’s consequences: international differences in work-related values. Sage, Beverly Hills, 1980). Comparing levels and

  6. A Cultural Approach to Learning

    DEFF Research Database (Denmark)

    Rasmussen, Lauge Baungaard

    1998-01-01

    The this article learning is discussed in relation to different understanding of culture. In particular the dialectics of 'Enlightenment' inthe Western culture are reflected , as well aslow- and high-context communication and learningin different types of culture. Finaaly the Weberian methodology...

  7. Altered Risk-Based Decision Making following Adolescent Alcohol Use Results from an Imbalance in Reinforcement Learning in Rats

    Science.gov (United States)

    Hart, Andrew S.; Collins, Anne L.; Bernstein, Ilene L.; Phillips, Paul E. M.

    2012-01-01

    Alcohol use during adolescence has profound and enduring consequences on decision-making under risk. However, the fundamental psychological processes underlying these changes are unknown. Here, we show that alcohol use produces over-fast learning for better-than-expected, but not worse-than-expected, outcomes without altering subjective reward valuation. We constructed a simple reinforcement learning model to simulate altered decision making using behavioral parameters extracted from rats with a history of adolescent alcohol use. Remarkably, the learning imbalance alone was sufficient to simulate the divergence in choice behavior observed between these groups of animals. These findings identify a selective alteration in reinforcement learning following adolescent alcohol use that can account for a robust change in risk-based decision making persisting into later life. PMID:22615989

  8. Goal-directed EEG activity evoked by discriminative stimuli in reinforcement learning.

    Science.gov (United States)

    Luque, David; Morís, Joaquín; Rushby, Jacqueline A; Le Pelley, Mike E

    2015-02-01

    In reinforcement learning (RL), discriminative stimuli (S) allow agents to anticipate the value of a future outcome, and the response that will produce that outcome. We examined this processing by recording EEG locked to S during RL. Incentive value of outcomes and predictive value of S were manipulated, allowing us to discriminate between outcome-related and response-related activity. S predicting the correct response differed from nonpredictive S in the P2. S paired with high-value outcomes differed from those paired with low-value outcomes in a frontocentral positivity and in the P3b. A slow negativity then distinguished between predictive and nonpredictive S. These results suggest that, first, attention prioritizes detection of informative S. Activation of mental representations of these informative S then retrieves representations of outcomes, which in turn retrieve representations of responses that previously produced those outcomes. © 2014 Society for Psychophysiological Research.

  9. A Formal Verification Model for Performance Analysis of Reinforcement Learning Algorithms Applied t o Dynamic Networks

    Directory of Open Access Journals (Sweden)

    Shrirang Ambaji KULKARNI

    2017-04-01

    Full Text Available Routing data packets in a dynamic network is a difficult and important problem in computer networks. As the network is dynamic, it is subject to frequent topology changes and is subject to variable link costs due to congestion and bandwidth. Existing shortest path algorithms fail to converge to better solutions under dynamic network conditions. Reinforcement learning algorithms posses better adaptation techniques in dynamic environments. In this paper we apply model based Q-Routing technique for routing in dynamic network. To analyze the correctness of Q-Routing algorithms mathematically, we provide a proof and also implement a SPIN based verification model. We also perform simulation based analysis of Q-Routing for given metrics.

  10. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis.

    Science.gov (United States)

    Glimcher, Paul W

    2011-09-13

    A number of recent advances have been achieved in the study of midbrain dopaminergic neurons. Understanding these advances and how they relate to one another requires a deep understanding of the computational models that serve as an explanatory framework and guide ongoing experimental inquiry. This intertwining of theory and experiment now suggests very clearly that the phasic activity of the midbrain dopamine neurons provides a global mechanism for synaptic modification. These synaptic modifications, in turn, provide the mechanistic underpinning for a specific class of reinforcement learning mechanisms that now seem to underlie much of human and animal behavior. This review describes both the critical empirical findings that are at the root of this conclusion and the fantastic theoretical advances from which this conclusion is drawn.

  11. Market Model for Resource Allocation in Emerging Sensor Networks with Reinforcement Learning.

    Science.gov (United States)

    Zhang, Yue; Song, Bin; Zhang, Ying; Du, Xiaojiang; Guizani, Mohsen

    2016-11-29

    Emerging sensor networks (ESNs) are an inevitable trend with the development of the Internet of Things (IoT), and intend to connect almost every intelligent device. Therefore, it is critical to study resource allocation in such an environment, due to the concern of efficiency, especially when resources are limited. By viewing ESNs as multi-agent environments, we model them with an agent-based modelling (ABM) method and deal with resource allocation problems with market models, after describing users' patterns. Reinforcement learning methods are introduced to estimate users' patterns and verify the outcomes in our market models. Experimental results show the efficiency of our methods, which are also capable of guiding topology management.

  12. Market Model for Resource Allocation in Emerging Sensor Networks with Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Yue Zhang

    2016-11-01

    Full Text Available Emerging sensor networks (ESNs are an inevitable trend with the development of the Internet of Things (IoT, and intend to connect almost every intelligent device. Therefore, it is critical to study resource allocation in such an environment, due to the concern of efficiency, especially when resources are limited. By viewing ESNs as multi-agent environments, we model them with an agent-based modelling (ABM method and deal with resource allocation problems with market models, after describing users’ patterns. Reinforcement learning methods are introduced to estimate users’ patterns and verify the outcomes in our market models. Experimental results show the efficiency of our methods, which are also capable of guiding topology management.

  13. Frontostriatal anatomical connections predict age- and difficulty-related differences in reinforcement learning.

    Science.gov (United States)

    van de Vijver, Irene; Ridderinkhof, K Richard; Harsay, Helga; Reneman, Liesbeth; Cavanagh, James F; Buitenweg, Jessika I V; Cohen, Michael X

    2016-10-01

    Reinforcement learning (RL) is supported by a network of striatal and frontal cortical structures that are connected through white-matter fiber bundles. With age, the integrity of these white-matter connections declines. The role of structural frontostriatal connectivity in individual and age-related differences in RL is unclear, although local white-matter density and diffusivity have been linked to individual differences in RL. Here we show that frontostriatal tract counts in young human adults (aged 18-28), as assessed noninvasively with diffusion-weighted magnetic resonance imaging and probabilistic tractography, positively predicted individual differences in RL when learning was difficult (70% valid feedback). In older adults (aged 63-87), in contrast, learning under both easy (90% valid feedback) and difficult conditions was predicted by tract counts in the same frontostriatal network. Furthermore, network-level analyses showed a double dissociation between the task-relevant networks in young and older adults, suggesting that older adults relied on different frontostriatal networks than young adults to obtain the same task performance. These results highlight the importance of successful information integration across striatal and frontal regions during RL, especially with variable outcomes. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning.

    Science.gov (United States)

    Elfwing, Stefan; Uchibe, Eiji; Doya, Kenji

    2016-12-01

    Free-energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state and action spaces. However, the FERL method does only really work well with binary, or close to binary, state input, where the number of active states is fewer than the number of non-active states. In the FERL method, the value function is approximated by the negative free energy of a restricted Boltzmann machine (RBM). In our earlier study, we demonstrated that the performance and the robustness of the FERL method can be improved by scaling the free energy by a constant that is related to the size of network. In this study, we propose that RBM function approximation can be further improved by approximating the value function by the negative expected energy (EERL), instead of the negative free energy, as well as being able to handle continuous state input. We validate our proposed method by demonstrating that EERL: (1) outperforms FERL, as well as standard neural network and linear function approximation, for three versions of a gridworld task with high-dimensional image state input; (2) achieves new state-of-the-art results in stochastic SZ-Tetris in both model-free and model-based learning settings; and (3) significantly outperforms FERL and standard neural network function approximation for a robot navigation task with raw and noisy RGB images as state input and a large number of actions. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  15. A Biologically Plausible Architecture of the Striatum to Solve Context-Dependent Reinforcement Learning Tasks.

    Science.gov (United States)

    Shivkumar, Sabyasachi; Muralidharan, Vignesh; Chakravarthy, V Srinivasa

    2017-01-01

    Basal ganglia circuit is an important subcortical system of the brain thought to be responsible for reward-based learning. Striatum, the largest nucleus of the basal ganglia, serves as an input port that maps cortical information. Microanatomical studies show that the striatum is a mosaic of specialized input-output structures called striosomes and regions of the surrounding matrix called the matrisomes. We have developed a computational model of the striatum using layered self-organizing maps to capture the center-surround structure seen experimentally and explain its functional significance. We believe that these structural components could build representations of state and action spaces in different environments. The striatum model is then integrated with other components of basal ganglia, making it capable of solving reinforcement learning tasks. We have proposed a biologically plausible mechanism of action-based learning where the striosome biases the matrisome activity toward a preferred action. Several studies indicate that the striatum is critical in solving context dependent problems. We build on this hypothesis and the proposed model exploits the modularity of the striatum to efficiently solve such tasks.

  16. A Biologically Plausible Architecture of the Striatum to Solve Context-Dependent Reinforcement Learning Tasks

    Directory of Open Access Journals (Sweden)

    Sabyasachi Shivkumar

    2017-06-01

    Full Text Available Basal ganglia circuit is an important subcortical system of the brain thought to be responsible for reward-based learning. Striatum, the largest nucleus of the basal ganglia, serves as an input port that maps cortical information. Microanatomical studies show that the striatum is a mosaic of specialized input-output structures called striosomes and regions of the surrounding matrix called the matrisomes. We have developed a computational model of the striatum using layered self-organizing maps to capture the center-surround structure seen experimentally and explain its functional significance. We believe that these structural components could build representations of state and action spaces in different environments. The striatum model is then integrated with other components of basal ganglia, making it capable of solving reinforcement learning tasks. We have proposed a biologically plausible mechanism of action-based learning where the striosome biases the matrisome activity toward a preferred action. Several studies indicate that the striatum is critical in solving context dependent problems. We build on this hypothesis and the proposed model exploits the modularity of the striatum to efficiently solve such tasks.

  17. Investigation of Drive-Reinforcement Learning and Application of Learning to Flight Control

    Science.gov (United States)

    1993-08-01

    subsections present the details of the fixed, adaptive, and learning components of the angular rate and f controllers. qc .__ - _ _ 4 .] - pqr - Veil ...E C’r’,P ) X -- (0 C"f + B ((Y’VT’ + C’r)) rk (0 (4 4- C ) (6.26) 6.2.2 Learning Control Flor the nonhnt;1r SYySt.1n, the output Ykc 1, (Q.274) nuist

  18. A Transfer Learning Approach for Network Modeling

    Science.gov (United States)

    Huang, Shuai; Li, Jing; Chen, Kewei; Wu, Teresa; Ye, Jieping; Wu, Xia; Yao, Li

    2012-01-01

    Networks models have been widely used in many domains to characterize the interacting relationship between physical entities. A typical problem faced is to identify the networks of multiple related tasks that share some similarities. In this case, a transfer learning approach that can leverage the knowledge gained during the modeling of one task to help better model another task is highly desirable. In this paper, we propose a transfer learning approach, which adopts a Bayesian hierarchical model framework to characterize task relatedness and additionally uses the L1-regularization to ensure robust learning of the networks with limited sample sizes. A method based on the Expectation-Maximization (EM) algorithm is further developed to learn the networks from data. Simulation studies are performed, which demonstrate the superiority of the proposed transfer learning approach over single task learning that learns the network of each task in isolation. The proposed approach is also applied to identification of brain connectivity networks of Alzheimer’s disease (AD) from functional magnetic resonance image (fMRI) data. The findings are consistent with the AD literature. PMID:24526804

  19. Grid cells, place cells, and geodesic generalization for spatial reinforcement learning.

    Directory of Open Access Journals (Sweden)

    Nicholas J Gustafson

    2011-10-01

    Full Text Available Reinforcement learning (RL provides an influential characterization of the brain's mechanisms for learning to make advantageous choices. An important problem, though, is how complex tasks can be represented in a way that enables efficient learning. We consider this problem through the lens of spatial navigation, examining how two of the brain's location representations--hippocampal place cells and entorhinal grid cells--are adapted to serve as basis functions for approximating value over space for RL. Although much previous work has focused on these systems' roles in combining upstream sensory cues to track location, revisiting these representations with a focus on how they support this downstream decision function offers complementary insights into their characteristics. Rather than localization, the key problem in learning is generalization between past and present situations, which may not match perfectly. Accordingly, although neural populations collectively offer a precise representation of position, our simulations of navigational tasks verify the suggestion that RL gains efficiency from the more diffuse tuning of individual neurons, which allows learning about rewards to generalize over longer distances given fewer training experiences. However, work on generalization in RL suggests the underlying representation should respect the environment's layout. In particular, although it is often assumed that neurons track location in Euclidean coordinates (that a place cell's activity declines "as the crow flies" away from its peak, the relevant metric for value is geodesic: the distance along a path, around any obstacles. We formalize this intuition and present simulations showing how Euclidean, but not geodesic, representations can interfere with RL by generalizing inappropriately across barriers. Our proposal that place and grid responses should be modulated by geodesic distances suggests novel predictions about how obstacles should affect

  20. Grid cells, place cells, and geodesic generalization for spatial reinforcement learning.

    Science.gov (United States)

    Gustafson, Nicholas J; Daw, Nathaniel D

    2011-10-01

    Reinforcement learning (RL) provides an influential characterization of the brain's mechanisms for learning to make advantageous choices. An important problem, though, is how complex tasks can be represented in a way that enables efficient learning. We consider this problem through the lens of spatial navigation, examining how two of the brain's location representations--hippocampal place cells and entorhinal grid cells--are adapted to serve as basis functions for approximating value over space for RL. Although much previous work has focused on these systems' roles in combining upstream sensory cues to track location, revisiting these representations with a focus on how they support this downstream decision function offers complementary insights into their characteristics. Rather than localization, the key problem in learning is generalization between past and present situations, which may not match perfectly. Accordingly, although neural populations collectively offer a precise representation of position, our simulations of navigational tasks verify the suggestion that RL gains efficiency from the more diffuse tuning of individual neurons, which allows learning about rewards to generalize over longer distances given fewer training experiences. However, work on generalization in RL suggests the underlying representation should respect the environment's layout. In particular, although it is often assumed that neurons track location in Euclidean coordinates (that a place cell's activity declines "as the crow flies" away from its peak), the relevant metric for value is geodesic: the distance along a path, around any obstacles. We formalize this intuition and present simulations showing how Euclidean, but not geodesic, representations can interfere with RL by generalizing inappropriately across barriers. Our proposal that place and grid responses should be modulated by geodesic distances suggests novel predictions about how obstacles should affect spatial firing fields

  1. A Lamb waves based statistical approach to structural health monitoring of carbon fibre reinforced polymer composites.

    Science.gov (United States)

    Carboni, Michele; Gianneo, Andrea; Giglio, Marco

    2015-07-01

    This research investigates a Lamb-wave based structural health monitoring approach matching an out-of-phase actuation of a pair of piezoceramic transducers at low frequency. The target is a typical quasi-isotropic carbon fibre reinforced polymer aeronautical laminate subjected to artificial, via Teflon patches, and natural, via suitable low velocity drop weight impact tests, delaminations. The performance and main influencing factors of such an approach are studied through a Design of Experiment statistical method, considering both Pulse Echo and Pitch Catch configurations of PZT sensors. Results show that some factors and their interactions can effectively influence the detection of a delamination-like damage. Copyright © 2015 Elsevier B.V. All rights reserved.

  2. Study strategies and approaches to learning

    DEFF Research Database (Denmark)

    Christensen, Hans Peter

    In this study students’ study strategies have been compared to their approaches to learning. The time students spend on different study activities has been investigated at the Technical University of Denmark, and as a pilot project a few students also filled in a reduced version of Bigg's Study...... conclusive conclusions, but it raises some intriguing questions. Is time more important than the kind of learning activities for obtaining conceptual understanding – is all that matters from a teaching point of view to ensure that the students spend long hours studying?...... Process Questionnaire to identify their approach to learning. It was hypothesised that the students’ learning approach would depend more on the quality of the study work than on the quantity; that an active and reflective study strategy was required to obtain deep conceptual understanding. The result...

  3. Feedback for reinforcement learning based brain-machine interfaces using confidence metrics

    Science.gov (United States)

    Prins, Noeline W.; Sanchez, Justin C.; Prasad, Abhishek

    2017-06-01

    Objective. For brain-machine interfaces (BMI) to be used in activities of daily living by paralyzed individuals, the BMI should be as autonomous as possible. One of the challenges is how the feedback is extracted and utilized in the BMI. Our long-term goal is to create autonomous BMIs that can utilize an evaluative feedback from the brain to update the decoding algorithm and use it intelligently in order to adapt the decoder. In this study, we show how to extract the necessary evaluative feedback from a biologically realistic (synthetic) source, use both the quantity and the quality of the feedback, and how that feedback information can be incorporated into a reinforcement learning (RL) controller architecture to maximize its performance. Approach. Motivated by the perception-action-reward cycle (PARC) in the brain which links reward for cognitive decision making and goal-directed behavior, we used a reward-based RL architecture named Actor-Critic RL as the model. Instead of using an error signal towards building an autonomous BMI, we envision to use a reward signal from the nucleus accumbens (NAcc) which plays a key role in the linking of reward to motor behaviors. To deal with the complexity and non-stationarity of biological reward signals, we used a confidence metric which was used to indicate the degree of feedback accuracy. This confidence was added to the Actor’s weight update equation in the RL controller architecture. If the confidence was high (>0.2), the BMI decoder used this feedback to update its parameters. However, when the confidence was low, the BMI decoder ignored the feedback and did not update its parameters. The range between high confidence and low confidence was termed as the ‘ambiguous’ region. When the feedback was within this region, the BMI decoder updated its weight at a lower rate than when fully confident, which was decided by the confidence. We used two biologically realistic models to generate synthetic data for MI (Izhikevich

  4. E-Learning Approach in Teacher Training

    Directory of Open Access Journals (Sweden)

    A. Seda YUCEL

    2006-10-01

    Full Text Available There has been an increasing interest in e-learning in teacher training at universities during the last ten years. With the developing technology, educational methods have differed as well as many other processes. Firstly, a definition on e-learning as a new approach should be given. E-learning could shortly be defined as a web-based educational system on platform with Internet, Intranet or computer access. The concept of e-learning has two main subtitles as synchronized (where a group of students and an instructor actualize an online conference meeting in a computer environment an asynchronized (where individuals actualize self-training in computer environments. Students have access to the course contents whenever they want and communicate with their peers or teachers via communication tools such as e-mail and forums. In order the distance learning system to succeed in e-learning, the program should be planned as both synchronized and asynchronized.

  5. Concept Learning for Achieving Personalized Ontologies: An Active Learning Approach

    Science.gov (United States)

    Şensoy, Murat; Yolum, Pinar

    In many multiagent approaches, it is usual to assume the existence of a common ontology among agents. However, in dynamic systems, the existence of such an ontology is unrealistic and its maintenance is cumbersome. Burden of maintaining a common ontology can be alleviated by enabling agents to evolve their ontologies personally. However, with different ontologies, agents are likely to run into communication problems since their vocabularies are different from each other. Therefore, to achieve personalized ontologies, agents must have a means to understand the concepts used by others. Consequently, this paper proposes an approach that enables agents to teach each other concepts from their ontologies using examples. Unlike other concept learning approaches, our approach enables the learner to elicit most informative examples interactively from the teacher. Hence, the learner participates to the learning process actively. We empirically compare the proposed approach with the previous concept learning approaches. Our experiments show that using the proposed approach, agents can learn new concepts successfully and with fewer examples.

  6. Assessing Approaches to Learning in School Readiness

    Directory of Open Access Journals (Sweden)

    Otilia C. Barbu

    2015-07-01

    Full Text Available This study examines the psychometric properties of two assessments of children’s approaches to learning: the Devereux Early Childhood Assessment (DECA and a 13-item approaches to learning rating scale (AtL derived from the Arizona Early Learning Standards (AELS. First, we administered questionnaires to 1,145 randomly selected parents/guardians of first-time kindergarteners. Second, we employed confirmatory factor analysis (CFA with parceling for DECA to reduce errors due to item specificity and prevent convergence difficulties when simultaneously estimating DECA and AtL models. Results indicated an overlap of 55% to 72% variance between the domains of the two instruments and suggested that the new AtL instrument is an easily administered alternative to the DECA for measuring children’s approaches to learning. This is one of the first studies that investigated DECA’s approaches to learning dimension and explored the measurement properties of an instrument purposely derived from a state’s early learning guidelines.

  7. A comparison of analytical approaches for the assessment of seismic displacements of geosynthetically reinforced geostructures

    DEFF Research Database (Denmark)

    Tzavara, I.; Tsompanakis, Y.; Zania, Varvara

    2012-01-01

    Aim of the current study is to assess the dynamic response of reinforced soil structures and the potential of the geosynthetics to prevent the seismic induced instabilities taking advantage of their reinforcing effect. For this purpose, representative models of reinforced soil slopes are develope...

  8. Examining Organizational Learning in Schools: The Role of Psychological Safety, Experimentation, and Leadership that Reinforces Learning

    Science.gov (United States)

    Higgins, Monica; Ishimaru, Ann; Holcombe, Rebecca; Fowler, Amy

    2012-01-01

    This study draws upon theory and methods from the field of organizational behavior to examine organizational learning (OL) in the context of a large urban US school district. We build upon prior literature on OL from the field of organizational behavior to introduce and validate three subscales that assess key dimensions of organizational learning…

  9. CAPES: Unsupervised Storage Performance Tuning Using Neural Network-Based Deep Reinforcement Learning

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Parameter tuning is an important task of storage performance optimization. Current practice usually involves numerous tweak-benchmark cycles that are slow and costly. To address this issue, we developed CAPES, a model-less deep reinforcement learning-based unsupervised parameter tuning system driven by a deep neural network (DNN). It is designed to nd the optimal values of tunable parameters in computer systems, from a simple client-server system to a large data center, where human tuning can be costly and often cannot achieve optimal performance. CAPES takes periodic measurements of a target computer system’s state, and trains a DNN which uses Q-learning to suggest changes to the system’s current parameter values. CAPES is minimally intrusive, and can be deployed into a production system to collect training data and suggest tuning actions during the system’s daily operation. Evaluation of a prototype on a Lustre system demonstrates an increase in I/O throughput up to 45% at saturation point. About the...

  10. Quantized Attention-Gated Kernel Reinforcement Learning for Brain-Machine Interface Decoding.

    Science.gov (United States)

    Wang, Fang; Wang, Yiwen; Xu, Kai; Li, Hongbao; Liao, Yuxi; Zhang, Qiaosheng; Zhang, Shaomin; Zheng, Xiaoxiang; Principe, Jose C

    2017-04-01

    Reinforcement learning (RL)-based decoders in brain-machine interfaces (BMIs) interpret dynamic neural activity without patients' real limb movements. In conventional RL, the goal state is selected by the user or defined by the physics of the problem, and the decoder finds an optimal policy essentially by assigning credit over time, which is normally very time-consuming. However, BMI tasks require finding a good policy in very few trials, which impose a limit on the complexity of the tasks that can be learned before the animal quits. Therefore, this paper explores the possibility of letting the agent infer potential goals through actions over space with multiple objects, using the instantaneous reward to assign credit spatially. A previous method, attention-gated RL employs a multilayer perceptron trained with backpropagation, but it is prone to local minima entrapment. We propose a quantized attention-gated kernel RL (QAGKRL) to avoid the local minima adaptation in spatial credit assignment and sparsify the network topology. The experimental results show that the QAGKRL achieves higher successful rates and more stable performance, indicating its powerful decoding ability for more sophisticated BMI tasks as required in clinical applications.

  11. Toward an autonomous brain machine interface: integrating sensorimotor reward modulation and reinforcement learning.

    Science.gov (United States)

    Marsh, Brandi T; Tarigoppula, Venkata S Aditya; Chen, Chen; Francis, Joseph T

    2015-05-13

    For decades, neurophysiologists have worked on elucidating the function of the cortical sensorimotor control system from the standpoint of kinematics or dynamics. Recently, computational neuroscientists have developed models that can emulate changes seen in the primary motor cortex during learning. However, these simulations rely on the existence of a reward-like signal in the primary sensorimotor cortex. Reward modulation of the primary sensorimotor cortex has yet to be characterized at the level of neural units. Here we demonstrate that single units/multiunits and local field potentials in the primary motor (M1) cortex of nonhuman primates (Macaca radiata) are modulated by reward expectation during reaching movements and that this modulation is present even while subjects passively view cursor motions that are predictive of either reward or nonreward. After establishing this reward modulation, we set out to determine whether we could correctly classify rewarding versus nonrewarding trials, on a moment-to-moment basis. This reward information could then be used in collaboration with reinforcement learning principles toward an autonomous brain-machine interface. The autonomous brain-machine interface would use M1 for both decoding movement intention and extraction of reward expectation information as evaluative feedback, which would then update the decoding algorithm as necessary. In the work presented here, we show that this, in theory, is possible. Copyright © 2015 the authors 0270-6474/15/357374-14$15.00/0.

  12. INTEGRATED EXPERIENCE APPROACH TO LEARNING.

    Science.gov (United States)

    POSTLETHWAIT, S.N.; AND OTHERS

    THE USE OF AUDIOTUTORIAL TECHNIQUES FOR TEACHING INTRODUCTORY COLLEGE BOTANY IS DESCRIBED. SPECIFIC PRACTICES USED AT PURDUE UNIVERSITY TO ILLUSTRATE DIFFERENT FACETS OF THE APPROACH ARE ANALYZED. INCLUDED ARE INDEPENDENT STUDY SESSIONS, SMALL ASSEMBLY SESSIONS, GENERAL ASSEMBLY SESSIONS, AND HOME STUDY SESSIONS. ILLUSTRATIONS AND SPECIFICATIONS…

  13. Reinforcement learning modulates the stability of cognitive control settings for object selection

    Directory of Open Access Journals (Sweden)

    Anthony William Sali

    2013-12-01

    Full Text Available Cognitive flexibility reflects both a trait that reliably differs between individuals and a state that can fluctuate moment-to-moment. Whether individuals can undergo persistent changes in cognitive flexibility as a result of reward learning is less understood. Here, we investigated whether reinforcing a periodic shift in an object selection strategy can make an individual more prone to switch strategies in a subsequent unrelated task. Participants completed two different choice tasks in which they selected one of four objects in an attempt to obtain a hidden reward on each trial. During a training phase, objects were defined by color. Participants received either consistent reward contingencies in which one color was more often rewarded, or contingencies in which the color that was more often rewarded changed periodically and without warning. Following the training phase, all participants completed a test phase in which reward contingencies were defined by spatial location and the location that was more often rewarded remained constant across the entire task. Those participants who received inconsistent contingencies during training continued to make more variable selections during the test phase in comparison to those who received the consistent training. Furthermore, a difference in the likelihood to switch selections on a trial-by-trial basis emerged between training groups: participants who received consistent contingencies during training were less likely to switch object selections following an unrewarded trial and more likely to repeat a selection following reward. Our findings provide evidence that the extent to which priority shifting is reinforced modulates the stability of cognitive control settings in a persistent manner, such that individuals become generally more or less prone to shifting priorities in the future.

  14. Modeling the contributions of Basal ganglia and Hippocampus to spatial navigation using reinforcement learning.

    Directory of Open Access Journals (Sweden)

    Deepika Sukumar

    Full Text Available A computational neural model that describes the competing roles of Basal Ganglia and Hippocampus in spatial navigation is presented. Model performance is evaluated on a simulated Morris water maze explored by a model rat. Cue-based and place-based navigational strategies, thought to be subserved by the Basal ganglia and Hippocampus respectively, are described. In cue-based navigation, the model rat learns to directly head towards a visible target, while in place-based navigation the target position is represented in terms of spatial context provided by an array of poles placed around the pool. Learning is formulated within the framework of Reinforcement Learning, with the nigrostriatal dopamine signal playing the role of Temporal Difference Error. Navigation inherently involves two apparently contradictory movements: goal oriented movements vs. random, wandering movements. The model hypothesizes that while the goal-directedness is determined by the gradient in Value function, randomness is driven by the complex activity of the SubThalamic Nucleus (STN-Globus Pallidus externa (GPe system. Each navigational system is associated with a Critic, prescribing actions that maximize value gradients for the corresponding system. In the integrated system, that incorporates both cue-based and place-based forms of navigation, navigation at a given position is determined by the system whose value function is greater at that position. The proposed model describes the experimental results of [1], a lesion-study that investigates the competition between cue-based and place-based navigational systems. The present study also examines impaired navigational performance under Parkinsonian-like conditions. The integrated navigational system, operated under dopamine-deficient conditions, exhibits increased escape latency as was observed in experimental literature describing MPTP model rats navigating a water maze.

  15. Modeling the contributions of Basal ganglia and Hippocampus to spatial navigation using reinforcement learning.

    Science.gov (United States)

    Sukumar, Deepika; Rengaswamy, Maithreye; Chakravarthy, V Srinivasa

    2012-01-01

    A computational neural model that describes the competing roles of Basal Ganglia and Hippocampus in spatial navigation is presented. Model performance is evaluated on a simulated Morris water maze explored by a model rat. Cue-based and place-based navigational strategies, thought to be subserved by the Basal ganglia and Hippocampus respectively, are described. In cue-based navigation, the model rat learns to directly head towards a visible target, while in place-based navigation the target position is represented in terms of spatial context provided by an array of poles placed around the pool. Learning is formulated within the framework of Reinforcement Learning, with the nigrostriatal dopamine signal playing the role of Temporal Difference Error. Navigation inherently involves two apparently contradictory movements: goal oriented movements vs. random, wandering movements. The model hypothesizes that while the goal-directedness is determined by the gradient in Value function, randomness is driven by the complex activity of the SubThalamic Nucleus (STN)-Globus Pallidus externa (GPe) system. Each navigational system is associated with a Critic, prescribing actions that maximize value gradients for the corresponding system. In the integrated system, that incorporates both cue-based and place-based forms of navigation, navigation at a given position is determined by the system whose value function is greater at that position. The proposed model describes the experimental results of [1], a lesion-study that investigates the competition between cue-based and place-based navigational systems. The present study also examines impaired navigational performance under Parkinsonian-like conditions. The integrated navigational system, operated under dopamine-deficient conditions, exhibits increased escape latency as was observed in experimental literature describing MPTP model rats navigating a water maze.

  16. The Effects of Observation of Learn Units during Reinforcement and Correction Conditions on the Rate of Learning Math Algorithms by Fifth Grade Students

    Science.gov (United States)

    Neu, Jessica Adele

    2013-01-01

    I conducted two studies on the comparative effects of the observation of learn units during (a) reinforcement or (b) correction conditions on the acquisition of math objectives. The dependent variables were the within-session cumulative numbers of correct responses emitted during observational sessions. The independent variables were the…

  17. Using a search engine-based mutually reinforcing approach to assess the semantic relatedness of biomedical terms.

    Science.gov (United States)

    Hsu, Yi-Yu; Chen, Hung-Yu; Kao, Hung-Yu

    2013-01-01

    Determining the semantic relatedness of two biomedical terms is an important task for many text-mining applications in the biomedical field. Previous studies, such as those using ontology-based and corpus-based approaches, measured semantic relatedness by using information from the structure of biomedical literature, but these methods are limited by the small size of training resources. To increase the size of training datasets, the outputs of search engines have been used extensively to analyze the lexical patterns of biomedical terms. In this work, we propose the Mutually Reinforcing Lexical Pattern Ranking (ReLPR) algorithm for learning and exploring the lexical patterns of synonym pairs in biomedical text. ReLPR employs lexical patterns and their pattern containers to assess the semantic relatedness of biomedical terms. By combining sentence structures and the linking activities between containers and lexical patterns, our algorithm can explore the correlation between two biomedical terms. The average correlation coefficient of the ReLPR algorithm was 0.82 for various datasets. The results of the ReLPR algorithm were significantly superior to those of previous methods.

  18. Neural Control of a Tracking Task via Attention-Gated Reinforcement Learning for Brain-Machine Interfaces.

    Science.gov (United States)

    Wang, Yiwen; Wang, Fang; Xu, Kai; Zhang, Qiaosheng; Zhang, Shaomin; Zheng, Xiaoxiang

    2015-05-01

    Reinforcement learning (RL)-based brain machine interfaces (BMIs) enable the user to learn from the environment through interactions to complete the task without desired signals, which is promising for clinical applications. Previous studies exploited Q-learning techniques to discriminate neural states into simple directional actions providing the trial initial timing. However, the movements in BMI applications can be quite complicated, and the action timing explicitly shows the intention when to move. The rich actions and the corresponding neural states form a large state-action space, imposing generalization difficulty on Q-learning. In this paper, we propose to adopt attention-gated reinforcement learning (AGREL) as a new learning scheme for BMIs to adaptively decode high-dimensional neural activities into seven distinct movements (directional moves, holdings and resting) due to the efficient weight-updating. We apply AGREL on neural data recorded from M1 of a monkey to directly predict a seven-action set in a time sequence to reconstruct the trajectory of a center-out task. Compared to Q-learning techniques, AGREL could improve the target acquisition rate to 90.16% in average with faster convergence and more stability to follow neural activity over multiple days, indicating the potential to achieve better online decoding performance for more complicated BMI tasks.

  19. A visual approach to efficient analysis and quantification of ductile iron and reinforced sprayed concrete.

    Science.gov (United States)

    Fritz, Laura; Hadwiger, Markus; Geier, Georg; Pittino, Gerhard; Gröller, M Eduard

    2009-01-01

    This paper describes advanced volume visualization and quantification for applications in non-destructive testing (NDT), which results in novel and highly effective interactive workflows for NDT practitioners. We employ a visual approach to explore and quantify the features of interest, based on transfer functions in the parameter spaces of specific application scenarios. Examples are the orientations of fibres or the roundness of particles. The applicability and effectiveness of our approach is illustrated using two specific scenarios of high practical relevance. First, we discuss the analysis of Steel Fibre Reinforced Sprayed Concrete (SFRSpC). We investigate the orientations of the enclosed steel fibres and their distribution, depending on the concrete's application direction. This is a crucial step in assessing the material's behavior under mechanical stress, which is still in its infancy and therefore a hot topic in the building industry. The second application scenario is the designation of the microstructure of ductile cast irons with respect to the contained graphite. This corresponds to the requirements of the ISO standard 945-1, which deals with 2D metallographic samples. We illustrate how the necessary analysis steps can be carried out much more efficiently using our system for 3D volumes. Overall, we show that a visual approach with custom transfer functions in specific application domains offers significant benefits and has the potential of greatly improving and optimizing the workflows of domain scientists and engineers.

  20. Closed-loop adaptation of neurofeedback based on mental effort facilitates reinforcement learning of brain self-regulation.

    Science.gov (United States)

    Bauer, Robert; Fels, Meike; Royter, Vladislav; Raco, Valerio; Gharabaghi, Alireza

    2016-09-01

    Considering self-rated mental effort during neurofeedback may improve training of brain self-regulation. Twenty-one healthy, right-handed subjects performed kinesthetic motor imagery of opening their left hand, while threshold-based classification of beta-band desynchronization resulted in proprioceptive robotic feedback. The experiment consisted of two blocks in a cross-over design. The participants rated their perceived mental effort nine times per block. In the adaptive block, the threshold was adjusted on the basis of these ratings whereas adjustments were carried out at random in the other block. Electroencephalography was used to examine the cortical activation patterns during the training sessions. The perceived mental effort was correlated with the difficulty threshold of neurofeedback training. Adaptive threshold-setting reduced mental effort and increased the classification accuracy and positive predictive value. This was paralleled by an inter-hemispheric cortical activation pattern in low frequency bands connecting the right frontal and left parietal areas. Optimal balance of mental effort was achieved at thresholds significantly higher than maximum classification accuracy. Rating of mental effort is a feasible approach for effective threshold-adaptation during neurofeedback training. Closed-loop adaptation of the neurofeedback difficulty level facilitates reinforcement learning of brain self-regulation. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  1. Feedback-related negativity is enhanced in adolescence during a gambling task with and without probabilistic reinforcement learning.

    Science.gov (United States)

    Martínez-Velázquez, Eduardo S; Ramos-Loyo, Julieta; González-Garrido, Andrés A; Sequeira, Henrique

    2015-01-21

    Feedback-related negativity (FRN) is a negative deflection that appears around 250 ms after the gain or loss of feedback to chosen alternatives in a gambling task in frontocentral regions following outcomes. Few studies have reported FRN enhancement in adolescents compared with adults in a gambling task without probabilistic reinforcement learning, despite the fact that learning from positive or negative consequences is crucial for decision-making during adolescence. Therefore, the aim of the present research was to identify differences in FRN amplitude and latency between adolescents and adults on a gambling task with favorable and unfavorable probabilistic reinforcement learning conditions, in addition to a nonlearning condition with monetary gains and losses. Higher rate scores of high-magnitude choices during the final 30 trials compared with the first 30 trials were observed during the favorable condition, whereas lower rates were observed during the unfavorable condition in both groups. Higher FRN amplitude in all conditions and longer latency in the nonlearning condition were observed in adolescents compared with adults and in relation to losses. Results indicate that both the adolescents and the adults improved their performance in relation to positive and negative feedback. However, the FRN findings suggest an increased sensitivity to external feedback to losses in adolescents compared with adults, irrespective of the presence or absence of probabilistic reinforcement learning. These results reflect processing differences on the neural monitoring system and provide new perspectives on the dynamic development of an adolescent's brain.

  2. Machine Learning Approaches in Cardiovascular Imaging.

    Science.gov (United States)

    Henglin, Mir; Stein, Gillian; Hushcha, Pavel V; Snoek, Jasper; Wiltschko, Alexander B; Cheng, Susan

    2017-10-01

    Cardiovascular imaging technologies continue to increase in their capacity to capture and store large quantities of data. Modern computational methods, developed in the field of machine learning, offer new approaches to leveraging the growing volume of imaging data available for analyses. Machine learning methods can now address data-related problems ranging from simple analytic queries of existing measurement data to the more complex challenges involved in analyzing raw images. To date, machine learning has been used in 2 broad and highly interconnected areas: automation of tasks that might otherwise be performed by a human and generation of clinically important new knowledge. Most cardiovascular imaging studies have focused on task-oriented problems, but more studies involving algorithms aimed at generating new clinical insights are emerging. Continued expansion in the size and dimensionality of cardiovascular imaging databases is driving strong interest in applying powerful deep learning methods, in particular, to analyze these data. Overall, the most effective approaches will require an investment in the resources needed to appropriately prepare such large data sets for analyses. Notwithstanding current technical and logistical challenges, machine learning and especially deep learning methods have much to offer and will substantially impact the future practice and science of cardiovascular imaging. © 2017 American Heart Association, Inc.

  3. Effectiveness of Contrasting Approaches to Response-Contingent Learning among Children with Significant Developmental Delays and Disabilities

    Science.gov (United States)

    Raab, Melinda; Dunst, Carl J.; Hamby, Deborah W.

    2016-01-01

    Findings from a randomized controlled design study of an ability-based versus needs-based approach to response-contingent learning among children with significant developmental delays and disabilities who did not use instrumental behavior to produce reinforcing consequences are reported. The ability-based intervention and needs-based intervention…

  4. Is all motivation good for learning? Dissociable influences of approach and avoidance motivation in declarative memory.

    Science.gov (United States)

    Murty, Vishnu P; LaBar, Kevin S; Hamilton, Derek A; Adcock, R Alison

    2011-01-01

    The present study investigated the effects of approach versus avoidance motivation on declarative learning. Human participants navigated a virtual reality version of the Morris water task, a classic spatial memory paradigm, adapted to permit the experimental manipulation of motivation during learning. During this task, participants were instructed to navigate to correct platforms while avoiding incorrect platforms. To manipulate motivational states participants were either rewarded for navigating to correct locations (approach) or punished for navigating to incorrect platforms (avoidance). Participants' skin conductance levels (SCLs) were recorded during navigation to investigate the role of physiological arousal in motivated learning. Behavioral results revealed that, overall, approach motivation enhanced and avoidance motivation impaired memory performance compared to nonmotivated spatial learning. This advantage was evident across several performance indices, including accuracy, learning rate, path length, and proximity to platform locations during probe trials. SCL analysis revealed three key findings. First, within subjects, arousal interacted with approach motivation, such that high arousal on a given trial was associated with performance deficits. In addition, across subjects, high arousal negated or reversed the benefits of approach motivation. Finally, low-performing, highly aroused participants showed SCL responses similar to those of avoidance-motivation participants, suggesting that for these individuals, opportunities for reward may evoke states of learning similar to those typically evoked by threats of punishment. These results provide a novel characterization of how approach and avoidance motivation influence declarative memory and indicate a critical and selective role for arousal in determining how reinforcement influences goal-oriented learning.

  5. Deficient reinforcement learning in medial frontal cortex as a model of dopamine-related motivational deficits in ADHD.

    Science.gov (United States)

    Silvetti, Massimo; Wiersema, Jan R; Sonuga-Barke, Edmund; Verguts, Tom

    2013-10-01

    Attention Deficit/Hyperactivity Disorder (ADHD) is a pathophysiologically complex and heterogeneous condition with both cognitive and motivational components. We propose a novel computational hypothesis of motivational deficits in ADHD, drawing together recent evidence on the role of anterior cingulate cortex (ACC) and associated mesolimbic dopamine circuits in both reinforcement learning and ADHD. Based on findings of dopamine dysregulation and ACC involvement in ADHD we simulated a lesion in a previously validated computational model of ACC (Reward Value and Prediction Model, RVPM). We explored the effects of the lesion on the processing of reinforcement signals. We tested specific behavioral predictions about the profile of reinforcement-related deficits in ADHD in three experimental contexts; probability tracking task, partial and continuous reward schedules, and immediate versus delayed rewards. In addition, predictions were made at the neurophysiological level. Behavioral and neurophysiological predictions from the RVPM-based lesion-model of motivational dysfunction in ADHD were confirmed by data from previously published studies. RVPM represents a promising model of ADHD reinforcement learning suggesting that ACC dysregulation might play a role in the pathogenesis of motivational deficits in ADHD. However, more behavioral and neurophysiological studies are required to test core predictions of the model. In addition, the interaction with different brain networks underpinning other aspects of ADHD neuropathology (i.e., executive function) needs to be better understood. Copyright © 2013 Elsevier Ltd. All rights reserved.

  6. Dialogical, Enquiry and Participatory Approaches to Learning

    DEFF Research Database (Denmark)

    Hurford, Donna; Rowley, Chris

    2018-01-01

    to what is to be learnt, what is in the adult’s mind, and both contribute to and are involved in the learning process, although the project also found that such exchanges do not occur frequently and that freely chosen play activities often provided the best opportunities for adults to extend children’s......Dialogical enquiry and participatory approaches This chapter is concerned with approaches to leading children into active participation and enquiry, through involvement in their own learning, both at Key Stages 1 and 2. The terms ‘enquiry’, ‘learning’ and ‘active participation’ are closely related...... Years (REPEY) Project (Siraj-Blatchford et al. 2002). This project found that the most effective strategies and techniques for promoting learning in the early years involved adult–child interactions in which the adult responds to the child’s understanding of a subject or activity, the child responds...

  7. Active Learning Approaches in Zimbabwean Science and ...

    African Journals Online (AJOL)

    Active Learning Approaches in Zimbabwean Science and Mathematics Classrooms. Rudo Kwari. Abstract. No Abstract Available Zimbabwe Journal of Educational Research Vol.15(3) 2003: 151-160. Full Text: EMAIL FULL TEXT EMAIL FULL TEXT · DOWNLOAD FULL TEXT DOWNLOAD FULL TEXT. Article Metrics.

  8. A Mixed Learning Approach in Mechatronics Education

    Science.gov (United States)

    Yilmaz, O.; Tuncalp, K.

    2011-01-01

    This study aims to investigate the effect of a Web-based mixed learning approach model on mechatronics education. The model combines different perception methods such as reading, listening, and speaking and practice methods developed in accordance with the vocational background of students enrolled in the course Electromechanical Systems in…

  9. THE SYSTEMIC APPROACH TO TEACHING AND LEARNING:

    African Journals Online (AJOL)

    Rita Wilkinson

    in understanding the associated diseases of improper metabolic reaction. This teaching technique will open new thinking approach and develop interest in students rather being getting confused and wary of learning. Teaching through connectivity will present the basic concepts in a cognitive way and students will be able ...

  10. [Competence approach in teaching-learning process].

    Science.gov (United States)

    da Silva, César Cavalcanti; da Silva, Ana Tereza M C; de Oliveira, Iaponira Cortez C; de Leon, Casandra Genoveva R M P; Serrão, Maria do Carmo P N

    2005-01-01

    The work relates the experience of the teaching-learning process based on the competence approach. The detail of that teaching process reveals the authors concern with the few occurrence of studies about the practical development of the competences theme in the Educational Practice and, point out to the need of more studies production in that dimension.

  11. THE SYSTEMIC APPROACH TO TEACHING AND LEARNING ...

    African Journals Online (AJOL)

    unesco

    ABSTRACT. The systemic approach to teaching and learning (SATL) has been developed during the last decade 1-7). SATL techniques have been implemented and evaluated in many different disciplines (chemistry, biology, physics, and math) in all educational levels, but the major SATL applications have been reported ...

  12. Transformative Learning Approaches for Public Relations Pedagogy

    Science.gov (United States)

    Motion, Judy; Burgess, Lois

    2014-01-01

    Public relations educators are frequently challenged by students' flawed perceptions of public relations. Two contrasting case studies are presented in this paper to illustrate how socially-oriented paradigms may be applied to a real-client project to deliver a transformative learning experience. A discourse-analytic approach is applied within the…

  13. A Learning Object Approach To Evidence based learning

    Directory of Open Access Journals (Sweden)

    Zabin Visram

    2005-06-01

    Full Text Available This paper describes the philosophy, development and framework of the body of elements formulated to provide an approach to evidence-based learning sustained by Learning Objects and web based technology Due to the demands for continuous improvement in the delivery of healthcare and in the continuous endeavour to improve the quality of life, there is a continuous need for practitioner's to update their knowledge by accomplishing accredited courses. The rapid advances in medical science has meant increasingly, there is a desperate need to adopt wireless schemes, whereby bespoke courses can be developed to help practitioners keep up with expanding knowledge base. Evidently, without current best evidence, practice risks becoming rapidly out of date, to the detriment of the patient. There is a need to provide a tactical, operational and effective environment, which allows professional to update their education, and complete specialised training, just-in-time, in their own time and location. Following this demand in the marketplace the information engineering group, in combination with several medical and dental schools, set out to develop and design a conceptual framework which form the basis of pioneering research, which at last, enables practitioner's to adopt a philosophy of life long learning. The body and structure of this framework is subsumed under the term Object oriented approach to Evidence Based learning, Just-in-time, via Internet sustained by Reusable Learning Objects (The OEBJIRLO Progression. The technical pillars which permit this concept of life long learning are pivoted by the foundations of object oriented technology, Learning objects, Just-in-time education, Data Mining, intelligent Agent technology, Flash interconnectivity and remote wireless technology, which allow practitioners to update their professional skills, complete specialised training which leads to accredited qualifications. This paper sets out to develop and

  14. A systematic review of the effectiveness of the community reinforcement approach in alcohol, cocaine and opioid addiction.

    NARCIS (Netherlands)

    Roozen, H.G.; Boulogne, J.J.; Tulder, van M.; Brink, van den W.; Jong, de C.A.J.; Kerkhof, A.J.F.M.

    2004-01-01

    Abstract The community reinforcement approach (CRA) has been applied in the treatment of disorders resulting from alcohol, cocaine and opioid use. The objectives were to review the effectiveness of (1) CRA compared with usual care, and (2) CRA versus CRA plus contingency management. Studies were

  15. A systematic review of the effectiveness of the community reinforcement approach in alcohol, cocaine and opioid addiction.

    NARCIS (Netherlands)

    Roozen, H.G.; Boulogne, J.J.; van Tulder, M.; van den Brink, W.; de Jong, C.A.J.; Kerkhof, A.J.F.M.

    2004-01-01

    The community reinforcement approach (CRA) has been applied in the treatment of disorders resulting from alcohol, cocaine and opioid use. The objectives were to review the effectiveness of (1) CRA compared with usual care, and (2) CRA versus CRA plus contingency management. Studies were selected

  16. A systematic review of the effectiveness of the community reinforcement approach in alcohol, cocaine and opioid addiction

    NARCIS (Netherlands)

    Roozen, Hendrik G.; Boulogne, Jiska J.; van Tulder, Maurits W.; van den Brink, Wim; de Jong, Cor A. J.; Kerkhof, Ad J. F. M.

    2004-01-01

    The community reinforcement approach (CRA) has been applied in the treatment of disorders resulting from alcohol, cocaine and opioid use. The objectives were to review the effectiveness of (1) CRA compared with usual care, and (2) CRA versus CRA plus contingency management. Studies were selected

  17. Conditioned reinforcement and information theory reconsidered.

    Science.gov (United States)

    Shahan, Timothy A; Cunningham, Paul

    2015-03-01

    The idea that stimuli might function as conditioned reinforcers because of the information they convey about primary reinforcers has a long history in the study of learning. However, formal application of information theory to conditioned reinforcement has been largely abandoned in modern theorizing because of its failures with respect to observing behavior. In this paper we show how recent advances in the application of information theory to Pavlovian conditioning offer a novel approach to conditioned reinforcement. The critical feature of this approach is that calculations of information are based on reductions of uncertainty about expected time to primary reinforcement signaled by a conditioned reinforcer. Using this approach, we show that previous failures of information theory with observing behavior can be remedied, and that the resulting framework produces predictions similar to Delay Reduction Theory in both observing-response and concurrent-chains procedures. We suggest that the similarity of these predictions might offer an analytically grounded reason for why Delay Reduction Theory has been a successful theory of conditioned reinforcement. Finally, we suggest that the approach provides a formal basis for the assertion that conditioned reinforcement results from Pavlovian conditioning and may provide an integrative approach encompassing both domains. © Society for the Experimental Analysis of Behavior.

  18. Reinforcing outpatient medical student learning using brief computer tutorials: the Patient-Teacher-Tutorial sequence

    Directory of Open Access Journals (Sweden)

    Pusic Martin V

    2012-08-01

    Full Text Available Abstract Background At present, what students read after an outpatient encounter is largely left up to them. Our objective was to evaluate the education efficacy of a clinical education model in which the student moves through a sequence that includes immediately reinforcing their learning using a specifically designed computer tutorial. Methods Prior to a 14-day Pediatric Emergency rotation, medical students completed pre-tests for two common pediatric topics: Oral Rehydration Solutions (ORS and Fever Without Source (FWS. After encountering a patient with either FWS or a patient needing ORS, the student logged into a computer that randomly assigned them to either a completing a relevant computer tutorial (e.g. FWS patient + FWS tutorial = “in sequence” or b completing the non-relevant tutorial (e.g. FWS patient + ORS tutorial = “out of sequence”. At the end of their rotation, they were tested again on both topics. Our main outcome was post-test scores on a given tutorial topic, contrasted by whether done in- or out-of-sequence. Results Ninety-two students completed the study protocol with 41 in the ‘in sequence’ group. Pre-test scores did not differ significantly. Overall, doing a computer tutorial in sequence resulted in significantly greater post-test scores (z-score 1.1 (SD 0.70 in sequence vs. 0.52 (1.1 out-of-sequence; 95% CI for difference +0.16, +0.93. Students spent longer on the tutorials when they were done in sequence (12.1 min (SD 7.3 vs. 10.5 (6.5 though the difference was not statistically significant (95% CI diff: -1.2 min, +4.5. Conclusions Outpatient learning frameworks could be structured to take best advantage of the heightened learning potential created by patient encounters. We propose the Patient-Teacher-Tutorial sequence as a framework for organizing learning in outpatient clinical settings.

  19. Implementation of real-time energy management strategy based on reinforcement learning for hybrid electric vehicles and simulation validation.

    Directory of Open Access Journals (Sweden)

    Zehui Kong

    Full Text Available To further improve the fuel economy of series hybrid electric tracked vehicles, a reinforcement learning (RL-based real-time energy management strategy is developed in this paper. In order to utilize the statistical characteristics of online driving schedule effectively, a recursive algorithm for the transition probability matrix (TPM of power-request is derived. The reinforcement learning (RL is applied to calculate and update the control policy at regular time, adapting to the varying driving conditions. A facing-forward powertrain model is built in detail, including the engine-generator model, battery model and vehicle dynamical model. The robustness and adaptability of real-time energy management strategy are validated through the comparison with the stationary control strategy based on initial transition probability matrix (TPM generated from a long naturalistic driving cycle in the simulation. Results indicate that proposed method has better fuel economy than stationary one and is more effective in real-time control.

  20. Multiagent-Based Simulation of Temporal-Spatial Characteristics of Activity-Travel Patterns Using Interactive Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Min Yang

    2014-01-01

    Full Text Available We propose a multiagent-based reinforcement learning algorithm, in which the interactions between travelers and the environment are considered to simulate temporal-spatial characteristics of activity-travel patterns in a city. Road congestion degree is added to the reinforcement learning algorithm as a medium that passes the influence of one traveler’s decision to others. Meanwhile, the agents used in the algorithm are initialized from typical activity patterns extracted from the travel survey diary data of Shangyu city in China. In the simulation, both macroscopic activity-travel characteristics such as traffic flow spatial-temporal distribution and microscopic characteristics such as activity-travel schedules of each agent are obtained. Comparing the simulation results with the survey data, we find that deviation of the peak-hour traffic flow is less than 5%, while the correlation of the simulated versus survey location choice distribution is over 0.9.