WorldWideScience

Sample records for fuzzy reinforcement learning

  1. Refining Linear Fuzzy Rules by Reinforcement Learning

    Science.gov (United States)

    Berenji, Hamid R.; Khedkar, Pratap S.; Malkani, Anil

    1996-01-01

    Linear fuzzy rules are increasingly being used in the development of fuzzy logic systems. Radial basis functions have also been used in the antecedents of the rules for clustering in product space which can automatically generate a set of linear fuzzy rules from an input/output data set. Manual methods are usually used in refining these rules. This paper presents a method for refining the parameters of these rules using reinforcement learning which can be applied in domains where supervised input-output data is not available and reinforcements are received only after a long sequence of actions. This is shown for a generalization of radial basis functions. The formation of fuzzy rules from data and their automatic refinement is an important step in closing the gap between the application of reinforcement learning methods in the domains where only some limited input-output data is available.

  2. Fuzzy Lyapunov Reinforcement Learning for Non Linear Systems.

    Science.gov (United States)

    Kumar, Abhishek; Sharma, Rajneesh

    2017-03-01

    We propose a fuzzy reinforcement learning (RL) based controller that generates a stable control action by lyapunov constraining fuzzy linguistic rules. In particular, we attempt at lyapunov constraining the consequent part of fuzzy rules in a fuzzy RL setup. Ours is a first attempt at designing a linguistic RL controller with lyapunov constrained fuzzy consequents to progressively learn a stable optimal policy. The proposed controller does not need system model or desired response and can effectively handle disturbances in continuous state-action space problems. Proposed controller has been employed on the benchmark Inverted Pendulum (IP) and Rotational/Translational Proof-Mass Actuator (RTAC) control problems (with and without disturbances). Simulation results and comparison against a) baseline fuzzy Q learning, b) Lyapunov theory based Actor-Critic, and c) Lyapunov theory based Markov game controller, elucidate stability and viability of the proposed control scheme.

  3. Using Fuzzy Logic for Performance Evaluation in Reinforcement Learning

    Science.gov (United States)

    Berenji, Hamid R.; Khedkar, Pratap S.

    1992-01-01

    Current reinforcement learning algorithms require long training periods which generally limit their applicability to small size problems. A new architecture is described which uses fuzzy rules to initialize its two neural networks: a neural network for performance evaluation and another for action selection. This architecture is applied to control of dynamic systems and it is demonstrated that it is possible to start with an approximate prior knowledge and learn to refine it through experiments using reinforcement learning.

  4. A reinforcement learning-based architecture for fuzzy logic control

    Science.gov (United States)

    Berenji, Hamid R.

    1992-01-01

    This paper introduces a new method for learning to refine a rule-based fuzzy logic controller. A reinforcement learning technique is used in conjunction with a multilayer neural network model of a fuzzy controller. The approximate reasoning based intelligent control (ARIC) architecture proposed here learns by updating its prediction of the physical system's behavior and fine tunes a control knowledge base. Its theory is related to Sutton's temporal difference (TD) method. Because ARIC has the advantage of using the control knowledge of an experienced operator and fine tuning it through the process of learning, it learns faster than systems that train networks from scratch. The approach is applied to a cart-pole balancing system.

  5. Temporal Difference based Tuning of Fuzzy Logic Controller through Reinforcement Learning to Control an Inverted Pendulum

    Directory of Open Access Journals (Sweden)

    Raj kumar

    2012-08-01

    Full Text Available This paper presents a self-tuning method of fuzzy logic controllers. The consequence part of the fuzzy logic controller is self-tuned through the Q-learning algorithm of reinforcement learning. The off policy temporal difference algorithm is used for tuning which directly approximate the action value function which gives the maximum reward. In this way, the Q-learning algorithm is used for the continuous time environment. The approach considered is having the advantage of fuzzy logic controller in a way that it is robust under the environmental uncertainties and no expert knowledge is required to design the rule base of the fuzzy logic controller.

  6. Self-learning Fuzzy Controllers Based On a Real-time Reinforcement Genetic Algorithm

    Institute of Scientific and Technical Information of China (English)

    FANG Jian-an; MIAO Qing-ying; GUO Zhao-xia; SHAO Shi-huang

    2002-01-01

    This paper presents a novel method for constructing fuzzy controllers based on a real time reinforcement genetic algorithm. This methodology introduces the real-time learning capability of neural networks into globally searching process of genetic algorithm, aiming to enhance the convergence rate and real-time learning ability of genetic algorithm, which is then used to construct fuzzy controllers for complex dynamic systems without any knowledge about system dynamics and prior control experience. The cart-pole system is employed as a test bed to demonstrate the effectiveness of the proposed control scheme, and the robustness of the acquired fuzzy controller with comparable result.

  7. A special hierarchical fuzzy neural-networks based reinforcement learning for multi-variables system

    Institute of Scientific and Technical Information of China (English)

    ZHANG Wen-zhi; LU Tian-sheng

    2005-01-01

    Proposes a reinforcement learning scheme based on a special Hierarchical Fuzzy Neural-Networks (HFNN) for solving complicated learning tasks in a continuous multi-variables environment. The output of the previous layer in the HFNN is no longer used as if-part of the next layer, but used only in then-part. Thus it can deal with the difficulty when the output of the previous layer is meaningless or its meaning is uncertain. The proposed HFNN has a minimal number of fuzzy rules and can successfully solve the problem of rules combination explosion and decrease the quantity of computation and memory requirement. In the learning process, two HFNN with the same structure perform fuzzy action composition and evaluation function approximation simultaneously where the parameters of neural-networks are tuned and updated on line by using gradient descent algorithm. The reinforcement learning method is proved to be correct and feasible by simulation of a double inverted pendulum system.

  8. Design issues of a reinforcement-based self-learning fuzzy controller for petrochemical process control

    Science.gov (United States)

    Yen, John; Wang, Haojin; Daugherity, Walter C.

    1992-01-01

    Fuzzy logic controllers have some often-cited advantages over conventional techniques such as PID control, including easier implementation, accommodation to natural language, and the ability to cover a wider range of operating conditions. One major obstacle that hinders the broader application of fuzzy logic controllers is the lack of a systematic way to develop and modify their rules; as a result the creation and modification of fuzzy rules often depends on trial and error or pure experimentation. One of the proposed approaches to address this issue is a self-learning fuzzy logic controller (SFLC) that uses reinforcement learning techniques to learn the desirability of states and to adjust the consequent part of its fuzzy control rules accordingly. Due to the different dynamics of the controlled processes, the performance of a self-learning fuzzy controller is highly contingent on its design. The design issue has not received sufficient attention. The issues related to the design of a SFLC for application to a petrochemical process are discussed, and its performance is compared with that of a PID and a self-tuning fuzzy logic controller.

  9. Design issues for a reinforcement-based self-learning fuzzy controller

    Science.gov (United States)

    Yen, John; Wang, Haojin; Dauherity, Walter

    1993-01-01

    Fuzzy logic controllers have some often cited advantages over conventional techniques such as PID control: easy implementation, its accommodation to natural language, the ability to cover wider range of operating conditions and others. One major obstacle that hinders its broader application is the lack of a systematic way to develop and modify its rules and as result the creation and modification of fuzzy rules often depends on try-error or pure experimentation. One of the proposed approaches to address this issue is self-learning fuzzy logic controllers (SFLC) that use reinforcement learning techniques to learn the desirability of states and to adjust the consequent part of fuzzy control rules accordingly. Due to the different dynamics of the controlled processes, the performance of self-learning fuzzy controller is highly contingent on the design. The design issue has not received sufficient attention. The issues related to the design of a SFLC for the application to chemical process are discussed and its performance is compared with that of PID and self-tuning fuzzy logic controller.

  10. Incorporation of Perception-based Information in Robot Learning Using Fuzzy Reinforcement Learning Agents

    Institute of Scientific and Technical Information of China (English)

    ZHOU Changjiu; MENG Qingchun; GUO Zhongwen; QU Weifen; YIN Bo

    2002-01-01

    Robot learning in unstructured environments has been proved to be an extremely challenging problem, mainly because of many uncertainties always present in the real world. Human beings, on the other hand, seem to cope very well with uncertain and unpredictable environments, often relying on perception-based information. Furthermore, humans beings can also utilize perceptions to guide their learning on those parts of the perception-action space that are actually relevant to the task. Therefore, we conduct a research aimed at improving robot learning through the incorporation of both perceptionbased and measurement-based information. For this reason, a fuzzy reinforcement learning (FRL) agent is proposed in this paper. Based on a neural-fuzzy architecture, different kinds of information can be incorporated into the FRL agent to initialise its action network, critic network and evaluation feedback module so as to accelerate its learning. By making use of the global optimisation capability of GAs (genetic algorithms), a GA-based FRL (GAFRL) agent is presented to solve the local minima problem in traditional actor-critic reinforcement learning. On the other hand, with the prediction capability of the critic network, GAs can perform a more effective global search. Different GAFRL agents are constructed and verified by using the simulation model of a physical biped robot. The simulation analysis shows that the biped learning rate for dynamic balance can be improved by incorporating perception-based information on biped balancing and walking evaluation.The biped robot can find its application in ocean exploration, detection or sea rescue activity, as well as military maritime activity.

  11. A Neuro-Control Design Based on Fuzzy Reinforcement Learning

    DEFF Research Database (Denmark)

    Katebi, S.D.; Blanke, M.

    ones instruct the neuro-control unit to adjust its weights and are simultaneously stored in the memory unit during the training phase. In response to the internal reinforcement signal (set point threshold deviation), the stored information is retrieved by the action applier unit and utilized for re......-adjustment of the neural network during the recall phase. In order to illustrate the effectiveness of the proposed technique, the controller is tested on a cart-pole balancing problem. Results of extensive simulation studies show a very good performance in comparison with other intelligent control methods....

  12. Cooperation and Coordination Between Fuzzy Reinforcement Learning Agents in Continuous State Partially Observable Markov Decision Processes

    Science.gov (United States)

    Berenji, Hamid R.; Vengerov, David

    1999-01-01

    Successful operations of future multi-agent intelligent systems require efficient cooperation schemes between agents sharing learning experiences. We consider a pseudo-realistic world in which one or more opportunities appear and disappear in random locations. Agents use fuzzy reinforcement learning to learn which opportunities are most worthy of pursuing based on their promise rewards, expected lifetimes, path lengths and expected path costs. We show that this world is partially observable because the history of an agent influences the distribution of its future states. We consider a cooperation mechanism in which agents share experience by using and-updating one joint behavior policy. We also implement a coordination mechanism for allocating opportunities to different agents in the same world. Our results demonstrate that K cooperative agents each learning in a separate world over N time steps outperform K independent agents each learning in a separate world over K*N time steps, with this result becoming more pronounced as the degree of partial observability in the environment increases. We also show that cooperation between agents learning in the same world decreases performance with respect to independent agents. Since cooperation reduces diversity between agents, we conclude that diversity is a key parameter in the trade off between maximizing utility from cooperation when diversity is low and maximizing utility from competitive coordination when diversity is high.

  13. Application of fuzzy logic-neural network based reinforcement learning to proximity and docking operations: Translational controller results

    Science.gov (United States)

    Jani, Yashvant

    1992-01-01

    The reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Maximum Mission (SMM) satellite simulation. In utilizing these fuzzy learning techniques, we also use the Approximate Reasoning based Intelligent Control (ARIC) architecture, and so we use two terms interchangeable to imply the same. This activity is carried out in the Software Technology Laboratory utilizing the Orbital Operations Simulator (OOS). This report is the deliverable D3 in our project activity and provides the test results of the fuzzy learning translational controller. This report is organized in six sections. Based on our experience and analysis with the attitude controller, we have modified the basic configuration of the reinforcement learning algorithm in ARIC as described in section 2. The shuttle translational controller and its implementation in fuzzy learning architecture is described in section 3. Two test cases that we have performed are described in section 4. Our results and conclusions are discussed in section 5, and section 6 provides future plans and summary for the project.

  14. Variable Admittance Control Based on Fuzzy Reinforcement Learning for Minimally Invasive Surgery Manipulator

    OpenAIRE

    Du, Zhijiang; Wang, Wei; Yan, Zhiyuan; Dong, Wei; Wang, Weidong

    2017-01-01

    In order to get natural and intuitive physical interaction in the pose adjustment of the minimally invasive surgery manipulator, a hybrid variable admittance model based on Fuzzy Sarsa(?)-learning is proposed in this paper. The proposed model provides continuous variable virtual damping to the admittance controller to respond to human intentions, and it effectively enhances the comfort level during the task execution by modifying the generated virtual damping dynamically. A fuzzy partition de...

  15. A reinforcement learning trained fuzzy neural network controller for maintaining wireless communication connections in multi-robot systems

    Science.gov (United States)

    Zhong, Xu; Zhou, Yu

    2014-05-01

    This paper presents a decentralized multi-robot motion control strategy to facilitate a multi-robot system, comprised of collaborative mobile robots coordinated through wireless communications, to form and maintain desired wireless communication coverage in a realistic environment with unstable wireless signaling condition. A fuzzy neural network controller is proposed for each robot to maintain the wireless link quality with its neighbors. The controller is trained through reinforcement learning to establish the relationship between the wireless link quality and robot motion decision, via consecutive interactions between the controller and environment. The tuned fuzzy neural network controller is applied to a multi-robot deployment process to form and maintain desired wireless communication coverage. The effectiveness of the proposed control scheme is verified through simulations under different wireless signal propagation conditions.

  16. Variable Admittance Control Based on Fuzzy Reinforcement Learning for Minimally Invasive Surgery Manipulator.

    Science.gov (United States)

    Du, Zhijiang; Wang, Wei; Yan, Zhiyuan; Dong, Wei; Wang, Weidong

    2017-04-12

    In order to get natural and intuitive physical interaction in the pose adjustment of the minimally invasive surgery manipulator, a hybrid variable admittance model based on Fuzzy Sarsa(λ)-learning is proposed in this paper. The proposed model provides continuous variable virtual damping to the admittance controller to respond to human intentions, and it effectively enhances the comfort level during the task execution by modifying the generated virtual damping dynamically. A fuzzy partition defined over the state space is used to capture the characteristics of the operator in physical human-robot interaction. For the purpose of maximizing the performance index in the long run, according to the identification of the current state input, the virtual damping compensations are determined by a trained strategy which can be learned through the experience generated from interaction with humans, and the influence caused by humans and the changing dynamics in the robot are also considered in the learning process. To evaluate the performance of the proposed model, some comparative experiments in joint space are conducted on our experimental minimally invasive surgical manipulator.

  17. Refining fuzzy logic controllers with machine learning

    Science.gov (United States)

    Berenji, Hamid R.

    1994-01-01

    In this paper, we describe the GARIC (Generalized Approximate Reasoning-Based Intelligent Control) architecture, which learns from its past performance and modifies the labels in the fuzzy rules to improve performance. It uses fuzzy reinforcement learning which is a hybrid method of fuzzy logic and reinforcement learning. This technology can simplify and automate the application of fuzzy logic control to a variety of systems. GARIC has been applied in simulation studies of the Space Shuttle rendezvous and docking experiments. It has the potential of being applied in other aerospace systems as well as in consumer products such as appliances, cameras, and cars.

  18. Partial Planning Reinforcement Learning

    Science.gov (United States)

    2012-08-31

    This project explored several problems in the areas of reinforcement learning , probabilistic planning, and transfer learning. In particular, it...studied Bayesian Optimization for model-based and model-free reinforcement learning , transfer in the context of model-free reinforcement learning based on

  19. Variable Resolution Reinforcement Learning.

    Science.gov (United States)

    1995-04-01

    Can reinforcement learning ever become a practical method for real control problems? This paper begins by reviewing three reinforcement learning algorithms... reinforcement learning . In addition to exploring state space, and developing a control policy to achieve a task, partigame also learns a kd-tree partitioning of

  20. Reinforcement-Based Fuzzy Neural Network ontrol with Automatic Rule Generation

    Institute of Scientific and Technical Information of China (English)

    1999-01-01

    A reinforcemen-based fuzzy neural network control with automatic rule generation RBFNNC) is pro-posed. A set of optimized fuzzy control rules can be automatically generated through reinforcement learning based onthe state variables of object system. RBFNNC was applied to a cart-pole balancing system and simulation resultshows significant improvements on the rule generation.

  1. Application of fuzzy logic-neural network based reinforcement learning to proximity and docking operations: Special approach/docking testcase results

    Science.gov (United States)

    Jani, Yashvant

    1993-01-01

    As part of the RICIS project, the reinforcement learning techniques developed at Ames Research Center are being applied to proximity and docking operations using the Shuttle and Solar Maximum Mission (SMM) satellite simulation. In utilizing these fuzzy learning techniques, we use the Approximate Reasoning based Intelligent Control (ARIC) architecture, and so we use these two terms interchangeably to imply the same. This activity is carried out in the Software Technology Laboratory utilizing the Orbital Operations Simulator (OOS) and programming/testing support from other contractor personnel. This report is the final deliverable D4 in our milestones and project activity. It provides the test results for the special testcase of approach/docking scenario for the shuttle and SMM satellite. Based on our experience and analysis with the attitude and translational controllers, we have modified the basic configuration of the reinforcement learning algorithm in ARIC. The shuttle translational controller and its implementation in ARIC is described in our deliverable D3. In order to simulate the final approach and docking operations, we have set-up this special testcase as described in section 2. The ARIC performance results for these operations are discussed in section 3 and conclusions are provided in section 4 along with the summary for the project.

  2. Reinforcement Learning: A Tutorial.

    Science.gov (United States)

    1997-01-01

    The purpose of this tutorial is to provide an introduction to reinforcement learning (RL) at a level easily understood by students and researchers in...provides a simple example to develop intuition of the underlying dynamic programming mechanism. In Section (2) the parts of a reinforcement learning problem... reinforcement learning algorithms. These include TD(lambda) and both the residual and direct forms of value iteration, Q-learning, and advantage learning

  3. Intelligent multiagent coordination based on reinforcement hierarchical neuro-fuzzy models.

    Science.gov (United States)

    Mendoza, Leonardo Forero; Vellasco, Marley; Figueiredo, Karla

    2014-12-01

    This paper presents the research and development of two hybrid neuro-fuzzy models for the hierarchical coordination of multiple intelligent agents. The main objective of the models is to have multiple agents interact intelligently with each other in complex systems. We developed two new models of coordination for intelligent multiagent systems, which integrates the Reinforcement Learning Hierarchical Neuro-Fuzzy model with two proposed coordination mechanisms: the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with a market-driven coordination mechanism (MA-RL-HNFP-MD) and the MultiAgent Reinforcement Learning Hierarchical Neuro-Fuzzy with graph coordination (MA-RL-HNFP-CG). In order to evaluate the proposed models and verify the contribution of the proposed coordination mechanisms, two multiagent benchmark applications were developed: the pursuit game and the robot soccer simulation. The results obtained demonstrated that the proposed coordination mechanisms greatly improve the performance of the multiagent system when compared with other strategies.

  4. Algorithms for Reinforcement Learning

    CERN Document Server

    Szepesvari, Csaba

    2010-01-01

    Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms'

  5. Hierarchical Multiagent Reinforcement Learning

    Science.gov (United States)

    2004-01-25

    In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multiagent tasks. We...introduce a hierarchical multiagent reinforcement learning (RL) framework and propose a hierarchical multiagent RL algorithm called Cooperative HRL. In

  6. Reinforcement learning in scheduling

    Science.gov (United States)

    Dietterich, Tom G.; Ok, Dokyeong; Zhang, Wei; Tadepalli, Prasad

    1994-01-01

    The goal of this research is to apply reinforcement learning methods to real-world problems like scheduling. In this preliminary paper, we show that learning to solve scheduling problems such as the Space Shuttle Payload Processing and the Automatic Guided Vehicle (AGV) scheduling can be usefully studied in the reinforcement learning framework. We discuss some of the special challenges posed by the scheduling domain to these methods and propose some possible solutions we plan to implement.

  7. Episodic reinforcement learning control approach for biped walking

    Directory of Open Access Journals (Sweden)

    Katić Duško

    2012-01-01

    Full Text Available This paper presents a hybrid dynamic control approach to the realization of humanoid biped robotic walk, focusing on the policy gradient episodic reinforcement learning with fuzzy evaluative feedback. The proposed structure of controller involves two feedback loops: a conventional computed torque controller and an episodic reinforcement learning controller. The reinforcement learning part includes fuzzy information about Zero-Moment- Point errors. Simulation tests using a medium-size 36-DOF humanoid robot MEXONE were performed to demonstrate the effectiveness of our method.

  8. Reinforcement and learning

    NARCIS (Netherlands)

    Servedio, M.R.; Sæther, S.A.; Sætre, G.-P.

    2009-01-01

    Evidence has been accumulating to support the process of reinforcement as a potential mechanism in speciation. In many species, mate choice decisions are influenced by cultural factors, including learned mating preferences (sexual imprinting) or learned mate attraction signals (e.g., bird song). It

  9. Motivated Reinforcement Learning

    CERN Document Server

    Maher, Mary Lou

    2009-01-01

    Motivated learning is a research field in artificial intelligence and cognitive modelling. This book describes how motivated reinforcement learning agents can be used in computer games for the design of non-player characters that can adapt their behaviour in response to unexpected changes in their environment

  10. Reinforcement and learning

    NARCIS (Netherlands)

    Servedio, M.R.; Sæther, S.A.; Sætre, G.-P.

    2009-01-01

    Evidence has been accumulating to support the process of reinforcement as a potential mechanism in speciation. In many species, mate choice decisions are influenced by cultural factors, including learned mating preferences (sexual imprinting) or learned mate attraction signals (e.g., bird song). It

  11. A neural fuzzy controller learning by fuzzy error propagation

    Science.gov (United States)

    Nauck, Detlef; Kruse, Rudolf

    1992-01-01

    In this paper, we describe a procedure to integrate techniques for the adaptation of membership functions in a linguistic variable based fuzzy control environment by using neural network learning principles. This is an extension to our work. We solve this problem by defining a fuzzy error that is propagated back through the architecture of our fuzzy controller. According to this fuzzy error and the strength of its antecedent each fuzzy rule determines its amount of error. Depending on the current state of the controlled system and the control action derived from the conclusion, each rule tunes the membership functions of its antecedent and its conclusion. By this we get an unsupervised learning technique that enables a fuzzy controller to adapt to a control task by knowing just about the global state and the fuzzy error.

  12. Learning fuzzy logic control system

    Science.gov (United States)

    Lung, Leung Kam

    1994-01-01

    The performance of the Learning Fuzzy Logic Control System (LFLCS), developed in this thesis, has been evaluated. The Learning Fuzzy Logic Controller (LFLC) learns to control the motor by learning the set of teaching values that are generated by a classical PI controller. It is assumed that the classical PI controller is tuned to minimize the error of a position control system of the D.C. motor. The Learning Fuzzy Logic Controller developed in this thesis is a multi-input single-output network. Training of the Learning Fuzzy Logic Controller is implemented off-line. Upon completion of the training process (using Supervised Learning, and Unsupervised Learning), the LFLC replaces the classical PI controller. In this thesis, a closed loop position control system of a D.C. motor using the LFLC is implemented. The primary focus is on the learning capabilities of the Learning Fuzzy Logic Controller. The learning includes symbolic representation of the Input Linguistic Nodes set and Output Linguistic Notes set. In addition, we investigate the knowledge-based representation for the network. As part of the design process, we implement a digital computer simulation of the LFLCS. The computer simulation program is written in 'C' computer language, and it is implemented in DOS platform. The LFLCS, designed in this thesis, has been developed on a IBM compatible 486-DX2 66 computer. First, the performance of the Learning Fuzzy Logic Controller is evaluated by comparing the angular shaft position of the D.C. motor controlled by a conventional PI controller and that controlled by the LFLC. Second, the symbolic representation of the LFLC and the knowledge-based representation for the network are investigated by observing the parameters of the Fuzzy Logic membership functions and the links at each layer of the LFLC. While there are some limitations of application with this approach, the result of the simulation shows that the LFLC is able to control the angular shaft position of the

  13. Manifold Regularized Reinforcement Learning.

    Science.gov (United States)

    Li, Hongliang; Liu, Derong; Wang, Ding

    2017-01-27

    This paper introduces a novel manifold regularized reinforcement learning scheme for continuous Markov decision processes. Smooth feature representations for value function approximation can be automatically learned using the unsupervised manifold regularization method. The learned features are data-driven, and can be adapted to the geometry of the state space. Furthermore, the scheme provides a direct basis representation extension for novel samples during policy learning and control. The performance of the proposed scheme is evaluated on two benchmark control tasks, i.e., the inverted pendulum and the energy storage problem. Simulation results illustrate the concepts of the proposed scheme and show that it can obtain excellent performance.

  14. Reinforcement learning with Marr.

    Science.gov (United States)

    Niv, Yael; Langdon, Angela

    2016-10-01

    To many, the poster child for David Marr's famous three levels of scientific inquiry is reinforcement learning-a computational theory of reward optimization, which readily prescribes algorithmic solutions that evidence striking resemblance to signals found in the brain, suggesting a straightforward neural implementation. Here we review questions that remain open at each level of analysis, concluding that the path forward to their resolution calls for inspiration across levels, rather than a focus on mutual constraints.

  15. Reinforcement Learning Trees.

    Science.gov (United States)

    Zhu, Ruoqing; Zeng, Donglin; Kosorok, Michael R

    In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach enables linear combination cuts at little extra computational cost. Second, we propose a variable muting procedure that progressively eliminates noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards terminal nodes, where the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method under basic assumptions and discuss rationale in general settings.

  16. Quantum reinforcement learning.

    Science.gov (United States)

    Dong, Daoyi; Chen, Chunlin; Li, Hanxiong; Tarn, Tzyh-Jong

    2008-10-01

    The key approaches for machine learning, particularly learning in unknown probabilistic environments, are new representations and computation mechanisms. In this paper, a novel quantum reinforcement learning (QRL) method is proposed by combining quantum theory and reinforcement learning (RL). Inspired by the state superposition principle and quantum parallelism, a framework of a value-updating algorithm is introduced. The state (action) in traditional RL is identified as the eigen state (eigen action) in QRL. The state (action) set can be represented with a quantum superposition state, and the eigen state (eigen action) can be obtained by randomly observing the simulated quantum state according to the collapse postulate of quantum measurement. The probability of the eigen action is determined by the probability amplitude, which is updated in parallel according to rewards. Some related characteristics of QRL such as convergence, optimality, and balancing between exploration and exploitation are also analyzed, which shows that this approach makes a good tradeoff between exploration and exploitation using the probability amplitude and can speedup learning through the quantum parallelism. To evaluate the performance and practicability of QRL, several simulated experiments are given, and the results demonstrate the effectiveness and superiority of the QRL algorithm for some complex problems. This paper is also an effective exploration on the application of quantum computation to artificial intelligence.

  17. Deep Reinforcement Learning: An Overview

    OpenAIRE

    Li, Yuxi

    2017-01-01

    We give an overview of recent exciting achievements of deep reinforcement learning (RL). We start with background of deep learning and reinforcement learning, as well as introduction of testbeds. Next we discuss Deep Q-Network (DQN) and its extensions, asynchronous methods, policy optimization, reward, and planning. After that, we talk about attention and memory, unsupervised learning, and learning to learn. Then we discuss various applications of RL, including games, in particular, AlphaGo, ...

  18. Reinforcement learning for energy conservation and comfort in buildings

    Energy Technology Data Exchange (ETDEWEB)

    Dalamagkidis, K. [Computer Science and Engineering Department, University of South Florida, Tampa, FL (United States); Kolokotsa, D. [Technical Educational Institute of Crete, Department of Natural Resources and Environment, Chania, Crete (Greece); Kalaitzakis, K.; Stavrakakis, G.S. [Technical University of Crete, Department of Chania, Crete (Greece)

    2007-07-15

    This paper deals with the issue of achieving comfort in buildings with minimal energy consumption. Specifically a reinforcement learning controller is developed and simulated using the Matlab/Simulink environment. The reinforcement learning signal used is a function of the thermal comfort of the building occupants, the indoor air quality and the energy consumption. This controller is then compared with a traditional on/off controller, as well as a Fuzzy-PD controller. The results show that, even after a couple of simulated years of training, the reinforcement learning controller has equivalent or better performance when compared to the other controllers. (author)

  19. [Reinforcement learning by striatum].

    Science.gov (United States)

    Kunisato, Yoshihiko; Okada, Go; Okamoto, Yasumasa

    2009-04-01

    Recently, computational models of reinforcement learning have been applied for the analysis of neuroimaging data. It has been clarified that the striatum plays a key role in decision making. We review the reinforcement learning theory and the biological structures such as the brain and signals such as neuromodulators associated with reinforcement learning. We also investigated the function of the striatum and the neurotransmitter serotonin in reward prediction. We first studied the brain mechanisms for reward prediction at different time scales. Our experiment on the striatum showed that the ventroanterior regions are involved in predicting immediate rewards and the dorsoposterior regions are involved in predicting future rewards. Further, we investigated whether serotonin regulates both the reward selection and the striatum function are specialized reward prediction at different time scales. To this end, we regulated the dietary intake of tryptophan, a precursor of serotonin. Our experiment showed that the activity of the ventral part of the striatum was correlated with reward prediction at shorter time scales, and this activity was stronger at low serotonin levels. By contrast, the activity of the dorsal part of the striatum was correlated with reward prediction at longer time scales, and this activity was stronger at high serotonin levels. Further, a higher proportion of small reward choices, together with a higher rate of discounting of delayed rewards is observed in the low-serotonin condition than in the control and high-serotonin conditions. Further examinations are required in future to assess the relation between the disturbance of reward prediction caused by low serotonin and mental disorders related to serotonin such as depression.

  20. Reinforcement Learning Through Gradient Descent

    Science.gov (United States)

    1999-05-14

    Reinforcement learning is often done using parameterized function approximators to store value functions. Algorithms are typically developed for...practice of existing types of algorithms, the gradient descent approach makes it possible to create entirely new classes of reinforcement learning algorithms

  1. Reinforcement Learning Control Using Fuzzy Adaptive Critic%采用模糊自适应评价的增强式学习控制

    Institute of Scientific and Technical Information of China (English)

    王直杰; 方建安; 邵世煌

    2000-01-01

    本文提出了基于模糊自适应评价(FLAC)的增强式学习(Reinforcement Learning)控制系统(FLAC/ASN),FLAC采用模糊规则表示学到的知识,因此可以有机地融入专家的经验.FLAC的学习方法为瞬时微分法(Temporal Difference).作用选择网络(ASN)采用多层前向网络.仿真结果表明(FLAC/ASN)具有很好的学习性能.

  2. Machining analysis of natural fibre reinforced composites using fuzzy logic

    Science.gov (United States)

    Balasubramanian, K.; Sultan, M. T. H.; Cardona, F.; Rajeswari, N.

    2016-10-01

    In this work, a new composite plate with natural jute fibre as the reinforcement fibres and isophthalic polyester as the resin was manufactured and subjected to a series of end milling operation by changing three input factors namely speed, feed rate and depth of cut. During each operation, the output responses namely thrust force and torque were measured. The responses were analyzed using Taguchi method to examine the relation between the input factors and output responses, and also to know the most influencing factors on the responses. The data was also analyzed using fuzzy rule model for prediction of responses for a range of input factors. The results showed that all three factors chosen have significant effect on the responses. The fuzzy model data in comparison with the experimental values shows only a marginal error and hence the prediction was highly satisfactory.

  3. Evolutionary computation for reinforcement learning

    NARCIS (Netherlands)

    S. Whiteson

    2012-01-01

    Algorithms for evolutionary computation, which simulate the process of natural selection to solve optimization problems, are an effective tool for discovering high-performing reinforcement-learning policies. Because they can automatically find good representations, handle continuous action spaces, a

  4. Reinforcement Learning Algorithms in Humanoid Robotics

    OpenAIRE

    Katic, Dusko; Vukobratovic, Miomir

    2007-01-01

    This study considers a optimal solutions for application of reinforcement learning in humanoid robotics Humanoid Robotics is a very challenging domain for reinforcement learning, Reinforcement learning control algorithms represents general framework to take traditional robotics towards true autonomy and versatility. The reinforcement learning paradigm described above has been successfully implemented for some special type of humanoid robots in the last 10 years. Reinforcement learning is well...

  5. Fuzzy self-learning control for magnetic servo system

    Science.gov (United States)

    Tarn, J. H.; Kuo, L. T.; Juang, K. Y.; Lin, C. E.

    1994-01-01

    It is known that an effective control system is the key condition for successful implementation of high-performance magnetic servo systems. Major issues to design such control systems are nonlinearity; unmodeled dynamics, such as secondary effects for copper resistance, stray fields, and saturation; and that disturbance rejection for the load effect reacts directly on the servo system without transmission elements. One typical approach to design control systems under these conditions is a special type of nonlinear feedback called gain scheduling. It accommodates linear regulators whose parameters are changed as a function of operating conditions in a preprogrammed way. In this paper, an on-line learning fuzzy control strategy is proposed. To inherit the wealth of linear control design, the relations between linear feedback and fuzzy logic controllers have been established. The exercise of engineering axioms of linear control design is thus transformed into tuning of appropriate fuzzy parameters. Furthermore, fuzzy logic control brings the domain of candidate control laws from linear into nonlinear, and brings new prospects into design of the local controllers. On the other hand, a self-learning scheme is utilized to automatically tune the fuzzy rule base. It is based on network learning infrastructure; statistical approximation to assign credit; animal learning method to update the reinforcement map with a fast learning rate; and temporal difference predictive scheme to optimize the control laws. Different from supervised and statistical unsupervised learning schemes, the proposed method learns on-line from past experience and information from the process and forms a rule base of an FLC system from randomly assigned initial control rules.

  6. Reinforcement learning and Tourette syndrome.

    Science.gov (United States)

    Palminteri, Stefano; Pessiglione, Mathias

    2013-01-01

    In this chapter, we report the first experimental explorations of reinforcement learning in Tourette syndrome, realized by our team in the last few years. This report will be preceded by an introduction aimed to provide the reader with the state of the art of the knowledge concerning the neural bases of reinforcement learning at the moment of these studies and the scientific rationale beyond them. In short, reinforcement learning is learning by trial and error to maximize rewards and minimize punishments. This decision-making and learning process implicates the dopaminergic system projecting to the frontal cortex-basal ganglia circuits. A large body of evidence suggests that the dysfunction of the same neural systems is implicated in the pathophysiology of Tourette syndrome. Our results show that Tourette condition, as well as the most common pharmacological treatments (dopamine antagonists), affects reinforcement learning performance in these patients. Specifically, the results suggest a deficit in negative reinforcement learning, possibly underpinned by a functional hyperdopaminergia, which could explain the persistence of tics, despite their evident inadaptive (negative) value. This idea, together with the implications of these results in Tourette therapy and the future perspectives, is discussed in Section 4 of this chapter.

  7. Reactive fuzzy controller design by Q-learning for mobile robot navigation

    Institute of Scientific and Technical Information of China (English)

    ZHANG Wen-zhi; LV Tian-sheng

    2005-01-01

    In this paper a learning mechanism for reactive fuzzy controller design of a mobile robot navigating in unknown environments is proposed. The fuzzy logical controller is constructed based on the kinematics model of a real robot. The approach to learning the fuzzy rule base by relatively simple and less computational Q-learning is described in detail. After analyzing the credit assignment problem caused by the rules collision, a remedy is presented. Furthermore, time-varying parameters are used to increase the learning speed. Simulation results prove the mechanism can learn fuzzy navigation rules successfully only using scalar reinforcement signal and the rule base learned is proved to be correct and feasible on real robot platforms.

  8. Adaptive representations for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.

    2010-01-01

    This book presents new algorithms for reinforcement learning, a form of machine learning in which an autonomous agent seeks a control policy for a sequential decision task. Since current methods typically rely on manually designed solution representations, agents that automatically adapt their own r

  9. Reinforcement learning in supply chains.

    Science.gov (United States)

    Valluri, Annapurna; North, Michael J; Macal, Charles M

    2009-10-01

    Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.

  10. A Fuzzy Approach to Classify Learning Disability

    Directory of Open Access Journals (Sweden)

    Pooja Manghirmalani

    2012-05-01

    Full Text Available The endeavor of this work is to support the special education community in their quest to be with the mainstream. The initial segment of the paper gives an exhaustive study of the different mechanisms of diagnosing learning disability. After diagnosis of learning disability the further classification of learning disability that is dyslexia, dysgraphia or dyscalculia are fuzzy. Hence the paper proposes a model based on Fuzzy Expert System which enables the classification of learning disability into its various types. This expert system facilitates in simulating conditions which are otherwise imprecisely defined.

  11. Reinforcement Learning by Value Gradients

    CERN Document Server

    Fairbank, Michael

    2008-01-01

    The concept of the value-gradient is introduced and developed in the context of reinforcement learning. It is shown that by learning the value-gradients exploration or stochastic behaviour is no longer needed to find locally optimal trajectories. This is the main motivation for using value-gradients, and it is argued that learning value-gradients is the actual objective of any value-function learning algorithm for control problems. It is also argued that learning value-gradients is significantly more efficient than learning just the values, and this argument is supported in experiments by efficiency gains of several orders of magnitude, in several problem domains. Once value-gradients are introduced into learning, several analyses become possible. For example, a surprising equivalence between a value-gradient learning algorithm and a policy-gradient learning algorithm is proven, and this provides a robust convergence proof for control problems using a value function with a general function approximator.

  12. Reinforcement Learning for Relational MDPs

    NARCIS (Netherlands)

    van Otterlo, M.; Nowe, A.; Lenaerts, T.; Steenhaut, K.

    2004-01-01

    In this paper we present a new method for reinforcement learning in relational domains. A logical language is employed to abstract over states and actions, thereby decreasing the size of the state-action space significantly. A probabilistic transition model of the abstracted Markov-Decision-Process

  13. Rational and Mechanistic Perspectives on Reinforcement Learning

    Science.gov (United States)

    Chater, Nick

    2009-01-01

    This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…

  14. Reinforcement Learning via AIXI Approximation

    CERN Document Server

    Veness, Joel; Hutter, Marcus; Silver, David

    2010-01-01

    This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a Monte Carlo Tree Search algorithm along with an agent-specific extension of the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a number of stochastic, unknown, and partially observable domains.

  15. Adaptive Bases for Reinforcement Learning

    CERN Document Server

    Di Castro, Dotan

    2010-01-01

    We consider the problem of reinforcement learning using function approximation, where the approximating basis can change dynamically while interacting with the environment. A motivation for such an approach is maximizing the value function fitness to the problem faced. Three errors are considered: approximation square error, Bellman residual, and projected Bellman residual. Algorithms under the actor-critic framework are presented, and shown to converge. The advantage of such an adaptive basis is demonstrated in simulations.

  16. Adaptive Bases for Reinforcement Learning

    OpenAIRE

    Di Castro, Dotan; Mannor, Shie

    2010-01-01

    We consider the problem of reinforcement learning using function approximation, where the approximating basis can change dynamically while interacting with the environment. A motivation for such an approach is maximizing the value function fitness to the problem faced. Three errors are considered: approximation square error, Bellman residual, and projected Bellman residual. Algorithms under the actor-critic framework are presented, and shown to converge. The advantage of such an adaptive basi...

  17. Competitive exception learning using fuzzy frequency distributions

    NARCIS (Netherlands)

    W.-M. van den Bergh (Willem-Max); J.H. van den Berg (Jan)

    2000-01-01

    textabstractA competitive exception learning algorithm for finding a non-linear mapping is proposed which puts the emphasis on the discovery of the important exceptions rather than the main rules. To do so,we first cluster the output space using a competitive fuzzy clustering algorithm and derive a

  18. Reinforced Intrusion Detection Using Pursuit Reinforcement Competitive Learning

    Directory of Open Access Journals (Sweden)

    Indah Yulia Prafitaning Tiyas

    2014-06-01

    Full Text Available Today, information technology is growing rapidly,all information can be obtainedmuch easier. It raises some new problems; one of them is unauthorized access to the system. We need a reliable network security system that is resistant to a variety of attacks against the system. Therefore, Intrusion Detection System (IDS required to overcome the problems of intrusions. Many researches have been done on intrusion detection using classification methods. Classification methodshave high precision, but it takes efforts to determine an appropriate classification model to the classification problem. In this paper, we propose a new reinforced approach to detect intrusion with On-line Clustering using Reinforcement Learning. Reinforcement Learning is a new paradigm in machine learning which involves interaction with the environment.It works with reward and punishment mechanism to achieve solution. We apply the Reinforcement Learning to the intrusion detection problem with considering competitive learning using Pursuit Reinforcement Competitive Learning (PRCL. Based on the experimental result, PRCL can detect intrusions in real time with high accuracy (99.816% for DoS, 95.015% for Probe, 94.731% for R2L and 99.373% for U2R and high speed (44 ms.The proposed approach can help network administrators to detect intrusion, so the computer network security systembecome reliable. Keywords: Intrusion Detection System, On-Line Clustering, Reinforcement Learning, Unsupervised Learning.

  19. Bayesian multitask inverse reinforcement learning

    CERN Document Server

    Dimitrakakis, Christos

    2011-01-01

    We generalise the problem of inverse reinforcement learning to multiple tasks, from a set of demonstrations. Each demonstration may represent one expert trying to solve a different task. Alternatively, one may see each demonstration as given by a different expert trying to solve the same task. Our main technical contribution is to solve the problem by formalising it as statistical preference elicitation, via a number of structured priors, whose form captures our biases about the relatedness of different tasks or expert policies. We show that our methodology allows us not only to learn to efficiently from multiple experts but to also effectively differentiate between the goals of each. Possible applications include analysing the intrinsic motivations of subjects in behavioural experiments and imitation learning from multiple teachers.

  20. A Reinforcement Learning Approach to Control.

    Science.gov (United States)

    1997-05-31

    acquisition is inherently a partially observable Markov decision problem. This report describes an efficient, scalable reinforcement learning approach to the...deployment of refined intelligent gaze control techniques. This report first lays a theoretical foundation for reinforcement learning . It then introduces...perform well in both high and low SNR ATR environments. Reinforcement learning coupled with history features appears to be both a sound foundation and a practical scalable base for gaze control.

  1. Feature Reinforcement Learning In Practice

    CERN Document Server

    Nguyen, Phuong; Hutter, Marcus

    2011-01-01

    Following a recent surge in using history-based methods for resolving perceptual aliasing in reinforcement learning, we introduce an algorithm based on the feature reinforcement learning framework called PhiMDP. To create a practical algorithm we devise a stochastic search procedure for a class of context trees based on parallel tempering and a specialized proposal distribution. We provide the first empirical evaluation for PhiMDP. Our proposed algorithm achieves superior performance to the classical U-tree algorithm and the recent active-LZ algorithm, and is competitive with MC-AIXI-CTW that maintains a bayesian mixture over all context trees up to a chosen depth.We are encouraged by our ability to compete with this sophisticated method using an algorithm that simply picks one single model, and uses Q-learning on the corresponding MDP. Our PhiMDP algorithm is much simpler, yet consumes less time and memory. These results show promise for our future work on attacking more complex and larger problems.

  2. Embedded Incremental Feature Selection for Reinforcement Learning

    Science.gov (United States)

    2012-05-01

    Classical reinforcement learning techniques become impractical in domains with large complex state spaces. The size of a domain’s state space is...require all the provided features. In this paper we present a feature selection algorithm for reinforcement learning called Incremental Feature

  3. Neuroevolutionary reinforcement learning for generalized helicopter control

    NARCIS (Netherlands)

    Koppejan, R.; Whiteson, S.

    2009-01-01

    Helicopter hovering is an important challenge problem in the field of reinforcement learning. This paper considers several neuroevolutionary approaches to discovering robust controllers for a generalized version of the problem used in the 2008 Reinforcement Learning Competition, in which wind in the

  4. Reinforcement Learning State-of-the-Art

    CERN Document Server

    Wiering, Marco

    2012-01-01

    Reinforcement learning encompasses both a science of adaptive behavior of rational beings in uncertain environments and a computational methodology for finding optimal behaviors for challenging problems in control, optimization and adaptive behavior of intelligent agents. As a field, reinforcement learning has progressed tremendously in the past decade. The main goal of this book is to present an up-to-date series of survey articles on the main contemporary sub-fields of reinforcement learning. This includes surveys on partially observable environments, hierarchical task decompositions, relational knowledge representation and predictive state representations. Furthermore, topics such as transfer, evolutionary methods and continuous spaces in reinforcement learning are surveyed. In addition, several chapters review reinforcement learning methods in robotics, in games, and in computational neuroscience. In total seventeen different subfields are presented by mostly young experts in those areas, and together the...

  5. Risk-sensitive reinforcement learning.

    Science.gov (United States)

    Shen, Yun; Tobia, Michael J; Sommer, Tobias; Obermayer, Klaus

    2014-07-01

    We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents' behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979 ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used.

  6. A neural signature of hierarchical reinforcement learning.

    Science.gov (United States)

    Ribas-Fernandes, José J F; Solway, Alec; Diuk, Carlos; McGuire, Joseph T; Barto, Andrew G; Niv, Yael; Botvinick, Matthew M

    2011-07-28

    Human behavior displays hierarchical structure: simple actions cohere into subtask sequences, which work together to accomplish overall task goals. Although the neural substrates of such hierarchy have been the target of increasing research, they remain poorly understood. We propose that the computations supporting hierarchical behavior may relate to those in hierarchical reinforcement learning (HRL), a machine-learning framework that extends reinforcement-learning mechanisms into hierarchical domains. To test this, we leveraged a distinctive prediction arising from HRL. In ordinary reinforcement learning, reward prediction errors are computed when there is an unanticipated change in the prospects for accomplishing overall task goals. HRL entails that prediction errors should also occur in relation to task subgoals. In three neuroimaging studies we observed neural responses consistent with such subgoal-related reward prediction errors, within structures previously implicated in reinforcement learning. The results reported support the relevance of HRL to the neural processes underlying hierarchical behavior.

  7. Immune Genetic Learning of Fuzzy Cognitive Map

    Institute of Scientific and Technical Information of China (English)

    LIN Chun-mei; HE Yue; TANG Bing-yong

    2006-01-01

    This paper presents a hybrid methodology of automatically constructing fuzzy cognitive map (FCM). The method uses immune genetic algorithm to learn the connection matrix of FCM. In the algorithm, the DNA coding method is used and an immune operator based on immune mechanism is constructed. The characteristics of the system and the experts' knowledge are abstracted as vaccine for restraining the degenerative phenomena during evolution so as to improve the algorithmic efficiency. Finally, an illustrative example is provided, and its results suggest that the method is capable of automatically generating FCM model.

  8. Reinforcement Learning in Repeated Portfolio Decisions

    OpenAIRE

    Diao, Linan; Rieskamp, Jörg

    2011-01-01

    How do people make investment decisions when they receive outcome feedback? We examined how well the standard mean-variance model and two reinforcement models predict people's portfolio decisions. The basic reinforcement model predicts a learning process that relies solely on the portfolio's overall return, whereas the proposed extended reinforcement model also takes the risk and covariance of the investments into account. The experimental results illustrate that people reacted sensitively to...

  9. Reinforcement learning in market games

    CERN Document Server

    Piotrowski, Edward W; Szczypinska, Anna

    2007-01-01

    Financial markets investors are involved in many games -- they must interact with other agents to achieve their goals. Among them are those directly connected with their activity on markets but one cannot neglect other aspects that influence human decisions and their performance as investors. Distinguishing all subgames is usually beyond hope and resource consuming. In this paper we study how investors facing many different games, gather information and form their decision despite being unaware of the complete structure of the game. To this end we apply reinforcement learning methods to the Information Theory Model of Markets (ITMM). Following Mengel, we can try to distinguish a class $\\Gamma$ of games and possible actions (strategies) $a^{i}_{m_{i}}$ for $i-$th agent. Any agent divides the whole class of games into analogy subclasses she/he thinks are analogous and therefore adopts the same strategy for a given subclass. The criteria for partitioning are based on profit and costs analysis. The analogy classe...

  10. PENGENALAN KEPRIBADIAN SESEORANG BERDASARKAN SIDIK JARI DENGAN METODE FUZZY LEARNING VECTOR QUANTIZATION DAN FUZZY BACKPROPAGATION

    Directory of Open Access Journals (Sweden)

    I Gede Sujana Eka Putra

    2014-12-01

    Full Text Available Kepribadian dapat diidentifikasi melalui analisis pola sidik jari. Pengenalan kepribadian umumnyamenggunakan uji psikometri melalui serangkaian tahapan yang relatif panjang. Melalui analisis pola sidik jari, dapatdiidentifikasi kepribadian secara lebih efisien. Penelitian ini mengajukan algoritma klasifikasi Fuzzy LearningVector Quantization (Fuzzy LVQ karena waktu komputasi yang lebih cepat dan tingkat pengenalan yang tinggi, dandengan metode Fuzzy Backpropagation yang mampu menyelesaikan model data non linier. Tahapan penelitianterdiri dari akuisisi dan klasifikasi. Tahapan pertama melalui akuisisi sidik jari, ekstraksi fitur, proses pelatihan, danpre-klasifikasi. Selanjutnya tahap klasifikasi, melalui klasifikasi fitur sidik jari uji menggunakan algoritma FuzzyLVQ, dibandingkan dengan Fuzzy Backpropagation. Kepribadian diidentifikasi melalui pola hasil klasifikasimenggunakan basis pengetahuan dermatoglyphics. Unjuk kerja diukur dari pencocokan pola hasil pre-klasifikasidan hasil klasifikasi. Hasil penelitian menunjukkan klasifikasi Fuzzy LVQ tingkat kecocokan tertinggi 93,78%dengan iterasi pelatihan maksimum=100 epoh pada target error 10-6. Sedangkan Fuzzy Backpropagation dengantingkat kecocokan tertinggi 93,30% dengan iterasi maksimum diatas 1000 epoh pada target error 10-3. Hal inimenunjukkan Fuzzy LVQ memiliki unjuk kerja lebih baik dibandingkan Fuzzy Backpropagation. Survey respondendilakukan untuk menguji kesesuaian analisa kepribadian sistem dibandingkan dengan kepribadian responden, danhasil survey menunjukkan analisa kepribadian sistem sebagian besar cocok dengan kepribadian responden.

  11. Reinforcement learning improves behaviour from evaluative feedback

    Science.gov (United States)

    Littman, Michael L.

    2015-05-01

    Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.

  12. On the integration of reinforcement learning and approximate reasoning for control

    Science.gov (United States)

    Berenji, Hamid R.

    1991-01-01

    The author discusses the importance of strengthening the knowledge representation characteristic of reinforcement learning techniques using methods such as approximate reasoning. The ARIC (approximate reasoning-based intelligent control) architecture is an example of such a hybrid approach in which the fuzzy control rules are modified (fine-tuned) using reinforcement learning. ARIC also demonstrates that it is possible to start with an approximately correct control knowledge base and learn to refine this knowledge through further experience. On the other hand, techniques such as the TD (temporal difference) algorithm and Q-learning establish stronger theoretical foundations for their use in adaptive control and also in stability analysis of hybrid reinforcement learning and approximate reasoning-based controllers.

  13. Racial bias shapes social reinforcement learning.

    Science.gov (United States)

    Lindström, Björn; Selbing, Ida; Molapour, Tanaz; Olsson, Andreas

    2014-03-01

    Both emotional facial expressions and markers of racial-group belonging are ubiquitous signals in social interaction, but little is known about how these signals together affect future behavior through learning. To address this issue, we investigated how emotional (threatening or friendly) in-group and out-group faces reinforced behavior in a reinforcement-learning task. We asked whether reinforcement learning would be modulated by intergroup attitudes (i.e., racial bias). The results showed that individual differences in racial bias critically modulated reinforcement learning. As predicted, racial bias was associated with more efficiently learned avoidance of threatening out-group individuals. We used computational modeling analysis to quantitatively delimit the underlying processes affected by social reinforcement. These analyses showed that racial bias modulates the rate at which exposure to threatening out-group individuals is transformed into future avoidance behavior. In concert, these results shed new light on the learning processes underlying social interaction with racial-in-group and out-group individuals.

  14. Learning to trade via direct reinforcement.

    Science.gov (United States)

    Moody, J; Saffell, M

    2001-01-01

    We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision-making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills.

  15. Adaptive Educational Software by Applying Reinforcement Learning

    Science.gov (United States)

    Bennane, Abdellah

    2013-01-01

    The introduction of the intelligence in teaching software is the object of this paper. In software elaboration process, one uses some learning techniques in order to adapt the teaching software to characteristics of student. Generally, one uses the artificial intelligence techniques like reinforcement learning, Bayesian network in order to adapt…

  16. A REINFORCEMENT LEARNING MODEL OF PERSUASIVE COMMUNICATION.

    Science.gov (United States)

    WEISS, ROBERT FRANK

    THEORETICAL AND EXPERIMENTAL ANALOGIES ARE DRAWN BETWEEN LEARNING THEORY AND PERSUASIVE COMMUNICATION AS AN EXTENSION OF LIBERALIZED STIMULUS RESPONSE THEORY. IN THE FIRST EXPERIMENT ON INSTRUMENTAL CONDITIONING OF ATTITUDES, THE SUBJECTS READ AN OPINION TO BE LEARNED, FOLLOWED BY A SUPPORTING ARGUMENT ASSUMED TO FUNCTION AS A REINFORCER. THE TIME…

  17. Using a board game to reinforce learning.

    Science.gov (United States)

    Yoon, Bona; Rodriguez, Leslie; Faselis, Charles J; Liappis, Angelike P

    2014-03-01

    Experiential gaming strategies offer a variation on traditional learning. A board game was used to present synthesized content of fundamental catheter care concepts and reinforce evidence-based practices relevant to nursing. Board games are innovative educational tools that can enhance active learning. Copyright 2014, SLACK Incorporated.

  18. Reinforcement Learning with Bounded Information Loss

    Science.gov (United States)

    Peters, Jan; Mülling, Katharina; Seldin, Yevgeny; Altun, Yasemin

    2011-03-01

    Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant or natural policy gradients, many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest two reinforcement learning methods, i.e., a model-based and a model free algorithm that bound the loss in relative entropy while maximizing their return. The resulting methods differ significantly from previous policy gradient approaches and yields an exact update step. It works well on typical reinforcement learning benchmark problems as well as novel evaluations in robotics. We also show a Bayesian bound motivation of this new approach [8].

  19. Evolution with reinforcement learning in negotiation.

    Science.gov (United States)

    Zou, Yi; Zhan, Wenjie; Shao, Yuan

    2014-01-01

    Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm.

  20. Preference elicitation and inverse reinforcement learning

    CERN Document Server

    Rothkopf, Constantin

    2011-01-01

    We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent's preferences, policy and optionally, the obtained reward sequence, from observations. We examine the relation of the resulting approach to other statistical methods for inverse reinforcement learning via analysis and experimental results. We show that preferences can be determined accurately, even if the observed agent's policy is sub-optimal with respect to its own preferences. In that case, significantly improved policies with respect to the agent's preferences are obtained, compared to both other methods and to the performance of the demonstrated policy.

  1. Accelerating Reinforcement Learning through Implicit Imitation

    CERN Document Server

    Boutilier, C; 10.1613/jair.898

    2011-01-01

    Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent's ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in certain cases. Roughly, by observing a mentor, a reinforcement-learning agent can extract information about its own capabilities in, and the relative value of, unvisited parts of the state space. We study two specific instantiations of this model, one in which the learning agent and the mentor have identical abilities, and one designed to deal with agents and mentors with different action sets. We illustrate the benefits of implicit imitation by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent ...

  2. Reinforcement learning with partitioning function system

    Institute of Scientific and Technical Information of China (English)

    李伟; 叶庆泰; 朱昌明

    2004-01-01

    The size of state-space is the limiting factor in applying reinforcement learning algorithms to practical cases. A reinforcement learning system with partitioning function (RLWPF) is established, in which statespace is partitioned into several regions. Inside the performance principle of RLWPF is based on a Semi-Markov decision process and has general significance. It can be applied to any reinforcement learning with a large statespace. In RLWPF, the partitioning module dispatches agents into different regions in order to decrease the state-space of each agent. This article proves the convergence of the SARSA algorithm for a Semi-Markov decision process, ensuring the convergence of RLWPF by analyzing the equivalence of two value functions in two Semi-Markov decision processes before and after partitioning. This article can show that the optimal policy learned by RLWPF is consistent with prior domain knowledge. An elevator group system is devised to decrease the average waiting time of passengers. Four agents control four elevator cars respectively. Based on RLWPF, a partitioning module is developed through defining a uniform round trip time as the partitioning criteria, making the wait time of most passengers more or less identical then elevator cars should only answer hall calls in their own region. Compared with ordinary elevator systems and reinforcement learning systems without partitioning module, the performance results show the advantage of RLWPF.

  3. Autonomous reinforcement learning with experience replay.

    Science.gov (United States)

    Wawrzyński, Paweł; Tanwani, Ajay Kumar

    2013-05-01

    This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time.

  4. Online support vector regression for reinforcement learning

    Institute of Scientific and Technical Information of China (English)

    Yu Zhenhua; Cai Yuanli

    2007-01-01

    The goal in reinforcement learning is to learn the value of state-action pair in order to maximize the total reward. For continuous states and actions in the real world, the representation of value functions is critical. Furthermore, the samples in value functions are sequentially obtained. Therefore, an online support vector regression (OSVR) is set up, which is a function approximator to estimate value functions in reinforcement learning. OSVR updates the regression function by analyzing the possible variation of support vector sets after new samples are inserted to the training set. To evaluate the OSVR learning ability, it is applied to the mountain-car task. The simulation results indicate that the OSVR has a preferable convergence speed and can solve continuous problems that are infeasible using lookup table.

  5. Efficient Abstraction Selection in Reinforcement Learning

    NARCIS (Netherlands)

    van Seijen, H.; Whiteson, S.; Kester, L.; Frisch, A.M.; Gregory, P.

    2013-01-01

    This paper introduces a novel approach for abstraction selection in reinforcement learning problems modelled as factored Markov decision processes (MDPs), for which a state is described via a set of state components. In abstraction selection, an agent must choose an abstraction from a set of candida

  6. Tank War Using Online Reinforcement Learning

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-Time Strategy(RTS) games provide a challenging platform to implement online reinforcement learning(RL) techniques in a real application. Computer as one player monitors opponents'(human or other computers) strategies and then updates its own policy using RL methods. In this paper, we propose...

  7. Geographical Inquiry and Learning Reinforcement Theory.

    Science.gov (United States)

    Davies, Christopher S.

    1983-01-01

    Although instructors have been reluctant to utilize the Keller Plan (a personalized system of instruction), it lends itself to teaching introductory geography. College students found that the routine and frequent reinforcement led to progressive learning. However, it does not lend itself to the study of reflexive or polemical concepts. (IS)

  8. Tank War Using Online Reinforcement Learning

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-Time Strategy(RTS) games provide a challenging platform to implement online reinforcement learning(RL) techniques in a real application. Computer as one player monitors opponents'(human or other computers) strategies and then updates its own policy using RL methods. In this paper, we propose...

  9. Efficient abstraction selection in reinforcement learning

    NARCIS (Netherlands)

    Seijen, H. van; Whiteson, S.; Kester, L.

    2013-01-01

    This paper introduces a novel approach for abstraction selection in reinforcement learning problems modelled as factored Markov decision processes (MDPs), for which a state is described via a set of state components. In abstraction selection, an agent must choose an abstraction from a set of candida

  10. Relational Reinforcement Learning in Infinite Mario

    CERN Document Server

    Mohan, Shiwali

    2012-01-01

    Relational representations in reinforcement learning allow for the use of structural information like the presence of objects and relationships between them in the description of value functions. Through this paper, we show that such representations allow for the inclusion of background knowledge that qualitatively describes a state and can be used to design agents that demonstrate learning behavior in domains with large state and actions spaces such as computer games.

  11. Tree-Based Hierarchical Reinforcement Learning

    Science.gov (United States)

    2002-08-01

    algorithms for Reinforce- ment Learning and Semi-Markov Decision Problem solving ( Puterman , 1994; Sutton and Barto, 1998). This chapter formally describes...learn a policy. A policy is a method of controlling an agent in an environment, see Puterman (1994) for a complete taxonomy. In this thesis we...to work with and so we refer to the interested reader to Puterman (1994) and instead use the next approximation. The reformulation we use is known as

  12. Rho-learning: a robotics oriented reinforcement learning algorithm

    OpenAIRE

    Porta Pleite, Josep Maria

    2000-01-01

    We present a new reinforcement learning system more suitable to be used in robotics than existing ones. Existing reinforcement learning algorithms are not specifically tailored for robotics and so they do not take advantage of the robotic perception characteristics as well as of the expected complexity of the task that robots are likely to face. In a robot, the information about the environment comes from a set of qualitatively different sensors and in the main par of tasks small subsets of t...

  13. Reinforcement Learning in Information Searching

    Science.gov (United States)

    Cen, Yonghua; Gan, Liren; Bai, Chen

    2013-01-01

    Introduction: The study seeks to answer two questions: How do university students learn to use correct strategies to conduct scholarly information searches without instructions? and, What are the differences in learning mechanisms between users at different cognitive levels? Method: Two groups of users, thirteen first year undergraduate students…

  14. Reinforcement Learning by Comparing Immediate Reward

    CERN Document Server

    Pandey, Punit; Kumar, Shishir

    2010-01-01

    This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate rewards using a variation of Q-Learning algorithm. Unlike the conventional Q-Learning, the proposed algorithm compares current reward with immediate reward of past move and work accordingly. Relative reward based Q-learning is an approach towards interactive learning. Q-Learning is a model free reinforcement learning method that used to learn the agents. It is observed that under normal circumstances algorithm take more episodes to reach optimal Q-value due to its normal reward or sometime negative reward. In this new form of algorithm agents select only those actions which have a higher immediate reward signal in comparison to previous one. The contribution of this article is the presentation of new Q-Learning Algorithm in order to maximize the performance of algorithm and reduce the number of episode required to reach optimal Q-value. Effectiveness of proposed algorithm is simulated in a 20 x20 Grid world dete...

  15. Reinforcement Learning for a New Piano Mover

    Directory of Open Access Journals (Sweden)

    Yuko Ishiwaka

    2005-08-01

    Full Text Available We attempt to achieve corporative behavior of autonomous decentralized agents constructed via Q-Learning, which is a type of reinforcement learning. As such, in the present paper, we examine the piano mover's problem. We propose a multi-agent architecture that has a training agent, learning agents and intermediate agent. Learning agents are heterogeneous and can communicate with each other. The movement of an object with three kinds of agent depends on the composition of the actions of the learning agents. By learning its own shape through the learning agents, avoidance of obstacles by the object is expected. We simulate the proposed method in a two-dimensional continuous world. Results obtained in the present investigation reveal the effectiveness of the proposed method.

  16. Reinforcement Learning for a New Piano Mover

    Directory of Open Access Journals (Sweden)

    Yuko Ishiwaka

    2005-08-01

    Full Text Available We attempt to achieve corporative behavior of autonomous decentralized agents constructed via Q-Learning, which is a type of reinforcement learning. As such, in the present paper, we examine the piano mover's problem. We propose a multi-agent architecture that has a training agent, learning agents and intermediate agent. Learning agents are heterogeneous and can communicate with each other. The movement of an object with three kinds of agent depends on the composition of the actions of the learning agents. By learning its own shape through the learning agents, avoidance of obstacles by the object is expected. We simulate the proposed method in a two-dimensional continuous world. Results obtained in the present investigation reveal the effectiveness of the proposed method.

  17. The Computational Development of Reinforcement Learning during Adolescence

    National Research Council Canada - National Science Library

    Palminteri, Stefano; Kilford, Emma J; Coricelli, Giorgio; Blakemore, Sarah-Jayne

    2016-01-01

    .... Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence...

  18. Reinforcement Learning Based Artificial Immune Classifier

    Directory of Open Access Journals (Sweden)

    Mehmet Karakose

    2013-01-01

    Full Text Available One of the widely used methods for classification that is a decision-making process is artificial immune systems. Artificial immune systems based on natural immunity system can be successfully applied for classification, optimization, recognition, and learning in real-world problems. In this study, a reinforcement learning based artificial immune classifier is proposed as a new approach. This approach uses reinforcement learning to find better antibody with immune operators. The proposed new approach has many contributions according to other methods in the literature such as effectiveness, less memory cell, high accuracy, speed, and data adaptability. The performance of the proposed approach is demonstrated by simulation and experimental results using real data in Matlab and FPGA. Some benchmark data and remote image data are used for experimental results. The comparative results with supervised/unsupervised based artificial immune system, negative selection classifier, and resource limited artificial immune classifier are given to demonstrate the effectiveness of the proposed new method.

  19. Multi-Agent Reinforcement Learning and Adaptive Neural Networks.

    Science.gov (United States)

    2007-11-02

    learning method. The objective was to study the utility of reinforcement learning as an approach to complex decentralized control problems. The major...accomplishment was a detailed study of multi-agent reinforcement learning applied to a large-scale decentralized stochastic control problem. This study...included a very successful demonstration that a multi-agent reinforcement learning system using neural networks could learn high-performance

  20. Performance evaluation of corrosion-affected reinforced concrete bridge girders using Markov chains with fuzzy states

    Indian Academy of Sciences (India)

    M B ANOOP; K BALAJI RAO

    2016-08-01

    A methodology for performance evaluation of reinforced concrete bridge girders in corrosive environments is proposed. The methodology uses the concept of performability and considers both serviceability- and ultimate-limit states. The serviceability limit states are defined based on the degree of cracking (characterized by crack width) in the girder due to chloride induced corrosion of reinforcement, and the ultimate limit states are defined based on the flexural load carrying capacity of the girder (characterized in terms of rating factor using the load and resistance factor rating method). The condition of the bridge girder is specified by the assignment of a condition state from a set of predefined condition states. Generally, the classification of condition states is linguistic, while the condition states are considered to be mutually exclusive and collectivelyexhaustive. In the present study, the condition states of the bridge girder are also represented by fuzzy sets to consider the ambiguities arising due to the linguistic classification of condition states. A non-homogeneous Markov chain (MC) model is used for modeling the condition state evolution of the bridge girder with time. The usefulness of the proposed methodology is demonstrated through a case study of a severely distressed beam of the Rocky Point Viaduct. The results obtained using the proposed approach are compared with those obtained using conventional MC model. It is noted that the use of MC with fuzzy states leads to conservative decision making for the problem considered in the case study.

  1. Reinforcement Learning and Savings Behavior

    OpenAIRE

    Laibson, David I.; Choi, James J.; Madrian, Brigitte; Metrick, Andrew

    2007-01-01

    We show that individual investors over-extrapolate from their personal experience when making savings decisions. Investors who experience particularly rewarding outcomes from saving in their 401(k)—a high average and/or low variance return—increase their 401(k) savings rate more than investors who have less rewarding experiences with saving. This finding is not driven by aggregate time-series shocks, income effects, rational learning about investing skill, investor fixed effects, or time-...

  2. Reinforcement Learning and Savings Behavior.

    Science.gov (United States)

    Choi, James J; Laibson, David; Madrian, Brigitte C; Metrick, Andrew

    2009-12-01

    We show that individual investors over-extrapolate from their personal experience when making savings decisions. Investors who experience particularly rewarding outcomes from saving in their 401(k)-a high average and/or low variance return-increase their 401(k) savings rate more than investors who have less rewarding experiences with saving. This finding is not driven by aggregate time-series shocks, income effects, rational learning about investing skill, investor fixed effects, or time-varying investor-level heterogeneity that is correlated with portfolio allocations to stock, bond, and cash asset classes. We discuss implications for the equity premium puzzle and interventions aimed at improving household financial outcomes.

  3. Optimal chaos control through reinforcement learning.

    Science.gov (United States)

    Gadaleta, Sabino; Dangelmayr, Gerhard

    1999-09-01

    A general purpose chaos control algorithm based on reinforcement learning is introduced and applied to the stabilization of unstable periodic orbits in various chaotic systems and to the targeting problem. The algorithm does not require any information about the dynamical system nor about the location of periodic orbits. Numerical tests demonstrate good and fast performance under noisy and nonstationary conditions. (c) 1999 American Institute of Physics.

  4. Reinforcement learning in signaling game

    CERN Document Server

    Hu, Yilei; Tarrès, Pierre

    2011-01-01

    We consider a signaling game originally introduced by Skyrms, which models how two interacting players learn to signal each other and thus create a common language. The first rigorous analysis was done by Argiento, Pemantle, Skyrms and Volkov (2009) with 2 states, 2 signals and 2 acts. We study the case of M_1 states, M_2 signals and M_1 acts for general M_1, M_2. We prove that the expected payoff increases in average and thus converges a.s., and that a limit bipartite graph emerges, such that no signal-state correspondence is associated to both a synonym and an informational bottleneck. Finally, we show that any graph correspondence with the above property is a limit configuration with positive probability.

  5. Online least-squares policy iteration for reinforcement learning control

    OpenAIRE

    2010-01-01

    Reinforcement learning is a promising paradigm for learning optimal control. We consider policy iteration (PI) algorithms for reinforcement learning, which iteratively evaluate and improve control policies. State-of-the-art, least-squares techniques for policy evaluation are sample-efficient and have relaxed convergence requirements. However, they are typically used in offline PI, whereas a central goal of reinforcement learning is to develop online algorithms. Therefore, we propose an online...

  6. APPLICATION OF HIERARCHICAL REINFORCEMENT LEARNING IN ENGINEERING DOMAIN

    Institute of Scientific and Technical Information of China (English)

    WEI LI; Qingtai YE; Changming ZHU

    2005-01-01

    The slow convergence rate of reinforcement learning algorithms limits their wider application.In engineering domains, hierarchical reinforcement learning is developed to perform actions temporally according to prior knowledge. This system can converge fast due to reduced state space.There is a test of elevator group control to show the power of the new system. Two conventional group control algorithms are adopted as prior knowledge. Performance indicates that hierarchical reinforcement learning can reduce the learning time dramatically.

  7. Professional Learning: A Fuzzy Logic-Based Modelling Approach

    Science.gov (United States)

    Gravani, M. N.; Hadjileontiadou, S. J.; Nikolaidou, G. N.; Hadjileontiadis, L. J.

    2007-01-01

    Studies have suggested that professional learning is influenced by two key parameters, i.e., climate and planning, and their associated variables (mutual respect, collaboration, mutual trust, supportiveness, openness). In this paper, we applied analysis of the relationships between the proposed quantitative, fuzzy logic-based model and a series of…

  8. Smart damping of laminated fuzzy fiber reinforced composite shells using 1-3 piezoelectric composites

    Science.gov (United States)

    Kundalwal, S. I.; Kumar, R. Suresh; Ray, M. C.

    2013-10-01

    This paper deals with the investigation of active constrained layer damping (ACLD) of smart laminated continuous fuzzy fiber reinforced composite (FFRC) shells. The distinct constructional feature of a novel FFRC is that the uniformly spaced short carbon nanotubes (CNTs) are radially grown on the circumferential surfaces of the continuous carbon fiber reinforcements. The constraining layer of the ACLD treatment is considered to be made of vertically/obliquely reinforced 1-3 piezoelectric composite materials. A finite element (FE) model is developed for the laminated FFRC shells integrated with the two patches of the ACLD treatment to investigate the damping characteristics of the laminated FFRC shells. The effect of variation of the orientation angle of the piezoelectric fibers on the damping characteristics of the laminated FFRC shells has been studied when the piezoelectric fibers are coplanar with either of the two mutually orthogonal vertical planes of the piezoelectric composite layer. It is revealed that radial growth of CNTs on the circumferential surfaces of the carbon fibers enhances the attenuation of the amplitude of vibrations and the natural frequencies of the laminated FFRC shells over those of laminated base composite shells without CNTs.

  9. Adaptive Educational Software by Applying Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Abdellah BENNANE

    2013-04-01

    Full Text Available The introduction of the intelligence in teaching software is the object of this paper. In software elaboration process, one uses some learning techniques in order to adapt the teaching software to characteristics of student. Generally, one uses the artificial intelligence techniques like reinforcement learning, Bayesian network in order to adapt the system to the environment internal and external conditions, and allow this system to interact efficiently with its potentials user. The intention is to automate and manage the pedagogical process of tutoring system, in particular the selection of the content and manner of pedagogic situations. Researchers create a pedagogic learning agent that simplifies the manual logic and supports progress and the management of the teaching process (tutor-learner through natural interactions.

  10. Real-Time Scheduling via Reinforcement Learning

    CERN Document Server

    Glaubius, Robert; Gill, Christopher; Smart, William D

    2012-01-01

    Cyber-physical systems, such as mobile robots, must respond adaptively to dynamic operating conditions. Effective operation of these systems requires that sensing and actu- ation tasks are performed in a timely manner. Additionally, execution of mission specific tasks such as imaging a room must be bal- anced against the need to perform more gen- eral tasks such as obstacle avoidance. This problem has been addressed by maintaining relative utilization of shared resources among tasks near a user-specified target level. Pro- ducing optimal scheduling strategies requires complete prior knowledge of task behavior, which is unlikely to be available in practice. Instead, suitable scheduling strategies must be learned online through interaction with the system. We consider the sample com- plexity of reinforcement learning in this do- main, and demonstrate that while the prob- lem state space is countably infinite, we may leverage the problem's structure to guarantee efficient learning.

  11. Multi-Agent Reinforcement Learning Algorithm Based on Action Prediction

    Institute of Scientific and Technical Information of China (English)

    TONG Liang; LU Ji-lian

    2006-01-01

    Multi-agent reinforcement learning algorithms are studied. A prediction-based multi-agent reinforcement learning algorithm is presented for multi-robot cooperation task. The multi-robot cooperation experiment based on multi-agent inverted pendulum is made to test the efficency of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation strategy much faster than the primitive multiagent reinforcement learning algorithm.

  12. Nonconvergence to Saddle Boundary Points under Perturbed Reinforcement Learning

    Science.gov (United States)

    2012-12-07

    This paper presents a novel reinforcement learning algorithm and provides conditions for global convergence to Nash equilibria. For several classes...of reinforcement learning schemes, including the ones proposed here, excluding convergence to action profiles which are not Nash equilibria may not be...perturbed reinforcement learning scheme where the strategy of each agent is perturbed by a strategy-dependent perturbation (or mutations) function

  13. Stochastic Scheduling and Planning Using Reinforcement Learning

    Science.gov (United States)

    2007-11-02

    reinforcement learning (RL) methods to large-scale optimization problems relevant to Air Force operations planning, scheduling, and maintenance. The objectives of this project were to: (1) investigate the utility of RL on large-scale logistics problems; (2) extend existing RL theory and practice to enhance the ease of application and the performance of RL on these problems; and (3) explore new problem formulations in order to take maximal advantage of RL methods. A method using RL to modify local search cost functions was developed and shown to yield significant

  14. Model-Based Reinforcement Learning under Concurrent Schedules of Reinforcement in Rodents

    Science.gov (United States)

    Huh, Namjung; Jo, Suhyun; Kim, Hoseok; Sul, Jung Hoon; Jung, Min Whan

    2009-01-01

    Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's…

  15. Vision-based reinforcement learning using approximate policy iteration

    OpenAIRE

    2009-01-01

    A major issue for reinforcement learning (RL) applied to robotics is the time required to learn a new skill. While RL has been used to learn mobile robot control in many simulated domains, applications involving learning on real robots are still relatively rare. In this paper, the Least-Squares Policy Iteration (LSPI) reinforcement learning algorithm and a new model-based algorithm Least-Squares Policy Iteration with Prioritized Sweeping (LSPI+), are implemented on a mobile robot to acquir...

  16. Airline Passenger Profiling Based on Fuzzy Deep Machine Learning.

    Science.gov (United States)

    Zheng, Yu-Jun; Sheng, Wei-Guo; Sun, Xing-Ming; Chen, Sheng-Yong

    2016-09-27

    Passenger profiling plays a vital part of commercial aviation security, but classical methods become very inefficient in handling the rapidly increasing amounts of electronic records. This paper proposes a deep learning approach to passenger profiling. The center of our approach is a Pythagorean fuzzy deep Boltzmann machine (PFDBM), whose parameters are expressed by Pythagorean fuzzy numbers such that each neuron can learn how a feature affects the production of the correct output from both the positive and negative sides. We propose a hybrid algorithm combining a gradient-based method and an evolutionary algorithm for training the PFDBM. Based on the novel learning model, we develop a deep neural network (DNN) for classifying normal passengers and potential attackers, and further develop an integrated DNN for identifying group attackers whose individual features are insufficient to reveal the abnormality. Experiments on data sets from Air China show that our approach provides much higher learning ability and classification accuracy than existing profilers. It is expected that the fuzzy deep learning approach can be adapted for a variety of complex pattern analysis tasks.

  17. Parametric Return Density Estimation for Reinforcement Learning

    CERN Document Server

    Morimura, Tetsuro; Kashima, Hisashi; Hachiya, Hirotaka; Tanaka, Toshiyuki

    2012-01-01

    Most conventional Reinforcement Learning (RL) algorithms aim to optimize decision- making rules in terms of the expected re- turns. However, especially for risk man- agement purposes, other risk-sensitive crite- ria such as the value-at-risk or the expected shortfall are sometimes preferred in real ap- plications. Here, we describe a parametric method for estimating density of the returns, which allows us to handle various criteria in a unified manner. We first extend the Bellman equation for the conditional expected return to cover a conditional probability density of the returns. Then we derive an extension of the TD-learning algorithm for estimating the return densities in an unknown environment. As test instances, several parametric density estimation algorithms are presented for the Gaussian, Laplace, and skewed Laplace dis- tributions. We show that these algorithms lead to risk-sensitive as well as robust RL paradigms through numerical experiments.

  18. A parallel framework for Bayesian reinforcement learning

    Science.gov (United States)

    Barrett, Enda; Duggan, Jim; Howley, Enda

    2014-01-01

    Solving a finite Markov decision process using techniques from dynamic programming such as value or policy iteration require a complete model of the environmental dynamics. The distribution of rewards, transition probabilities, states and actions all need to be fully observable, discrete and complete. For many problem domains, a complete model containing a full representation of the environmental dynamics may not be readily available. Bayesian reinforcement learning (RL) is a technique devised to make better use of the information observed through learning than simply computing Q-functions. However, this approach can often require extensive experience in order to build up an accurate representation of the true values. To address this issue, this paper proposes a method for parallelising a Bayesian RL technique aimed at reducing the time it takes to approximate the missing model. We demonstrate the technique on learning next state transition probabilities without prior knowledge. The approach is general enough for approximating any probabilistically driven component of the model. The solution involves multiple learning agents learning in parallel on the same task. Agents share probability density estimates amongst each other in an effort to speed up convergence to the true values.

  19. Credit assignment during movement reinforcement learning.

    Science.gov (United States)

    Dam, Gregory; Kording, Konrad; Wei, Kunlin

    2013-01-01

    We often need to learn how to move based on a single performance measure that reflects the overall success of our movements. However, movements have many properties, such as their trajectories, speeds and timing of end-points, thus the brain needs to decide which properties of movements should be improved; it needs to solve the credit assignment problem. Currently, little is known about how humans solve credit assignment problems in the context of reinforcement learning. Here we tested how human participants solve such problems during a trajectory-learning task. Without an explicitly-defined target movement, participants made hand reaches and received monetary rewards as feedback on a trial-by-trial basis. The curvature and direction of the attempted reach trajectories determined the monetary rewards received in a manner that can be manipulated experimentally. Based on the history of action-reward pairs, participants quickly solved the credit assignment problem and learned the implicit payoff function. A Bayesian credit-assignment model with built-in forgetting accurately predicts their trial-by-trial learning.

  20. Vicarious reinforcement learning signals when instructing others.

    Science.gov (United States)

    Apps, Matthew A J; Lesage, Elise; Ramnani, Narender

    2015-02-18

    Reinforcement learning (RL) theory posits that learning is driven by discrepancies between the predicted and actual outcomes of actions (prediction errors [PEs]). In social environments, learning is often guided by similar RL mechanisms. For example, teachers monitor the actions of students and provide feedback to them. This feedback evokes PEs in students that guide their learning. We report the first study that investigates the neural mechanisms that underpin RL signals in the brain of a teacher. Neurons in the anterior cingulate cortex (ACC) signal PEs when learning from the outcomes of one's own actions but also signal information when outcomes are received by others. Does a teacher's ACC signal PEs when monitoring a student's learning? Using fMRI, we studied brain activity in human subjects (teachers) as they taught a confederate (student) action-outcome associations by providing positive or negative feedback. We examined activity time-locked to the students' responses, when teachers infer student predictions and know actual outcomes. We fitted a RL-based computational model to the behavior of the student to characterize their learning, and examined whether a teacher's ACC signals when a student's predictions are wrong. In line with our hypothesis, activity in the teacher's ACC covaried with the PE values in the model. Additionally, activity in the teacher's insula and ventromedial prefrontal cortex covaried with the predicted value according to the student. Our findings highlight that the ACC signals PEs vicariously for others' erroneous predictions, when monitoring and instructing their learning. These results suggest that RL mechanisms, processed vicariously, may underpin and facilitate teaching behaviors.

  1. Reinforcement Learning for Ramp Control: An Analysis of Learning Parameters

    Directory of Open Access Journals (Sweden)

    Chao Lu

    2016-08-01

    Full Text Available Reinforcement Learning (RL has been proposed to deal with ramp control problems under dynamic traffic conditions; however, there is a lack of sufficient research on the behaviour and impacts of different learning parameters. This paper describes a ramp control agent based on the RL mechanism and thoroughly analyzed the influence of three learning parameters; namely, learning rate, discount rate and action selection parameter on the algorithm performance. Two indices for the learning speed and convergence stability were used to measure the algorithm performance, based on which a series of simulation-based experiments were designed and conducted by using a macroscopic traffic flow model. Simulation results showed that, compared with the discount rate, the learning rate and action selection parameter made more remarkable impacts on the algorithm performance. Based on the analysis, some suggestionsabout how to select suitable parameter values that can achieve a superior performance were provided.

  2. Proceedings of the Fifth European workshop on Reinforcement Learning

    NARCIS (Netherlands)

    Wiering, M.A.

    2008-01-01

    The Fifth European Workshop on Reinforcement Learning (EWRL-5) has gathered a wide variety of researchers interested in many different topics in reinforcement learning (RL). First of all, several papers describe RL algorithms for solving POMDPs. In this category, there is a paper by Hartley and Wyat

  3. Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers

    Science.gov (United States)

    Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.

    2014-01-01

    Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…

  4. Reinforcement Learning in Robotics: Applications and Real-World Challenges

    OpenAIRE

    Petar Kormushev; Sylvain Calinon; Darwin G Caldwell

    2013-01-01

    In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn, improve, adapt and reproduce tasks with dynamically changing constraints based on exploration and autonomous learning. We give a summary of the state-of-the-art of reinforcement learning in the context of robotics, in terms of both algorithms and policy representations. Numerous challenges faced by the policy representation in robotics are identified. Three recent examples for the applicatio...

  5. Multiagent reinforcement learning through merging individually learned value functions

    Institute of Scientific and Technical Information of China (English)

    ZHANG Hua-xiang; HUANG Shang-teng

    2005-01-01

    In cooperative multiagent systems, to learn the optimal policies of multiagents is very difficult. As the numbers of states and actions increase exponentially with the number of agents, their action policies become more intractable. By learning these value functions, an agent can learn its optimal action policies for a task. If a task can be decomposed into several subtasks and the agents have learned the optimal value functions for each subtask, this knowledge can be helpful for the agents in learning the optimal action policies for the whole task when they are acting simultaneously. When merging the agents' independently learned optimal value functions,a novel multiagent online reinforcement learning algorithm LU-Q is proposed. By applying a transformation to the individually learned value functions, the constraints on the optimal value functions of each subtask are loosened. In each learning iteration process in algorithm LU-Q, the agents ' joint action set in a state is processed. Some actions of that state are pruned from the available action set according to the defined multiagent value function in LU-Q. As the items of the available action set of each state are reduced gradually in the iteration process of LU-Q, the convergence of the value functions is accelerated. LU-Q's effectiveness, soundness and convergence are analyzed, and the experimental results show that the learning performance of LU-Q is better than the performance of standard Q learning.

  6. Fuzzy adaptive learning control network with sigmoid membership function

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    To get simpler operation in modified fuzzy adaptive learning control network (FALCON) in some engineering application, sigmoid nonlinear function is employed as a substitute of traditional Gaussian membership function. For making the modified FALCON learning more efficient and stable, a simulated annealing (SA) learning coefficient is introduced into learning algorithm. At first, the basic concepts and main advantages of FALCON were briefly reviewed. Subsequently, the topological structure and nodes operation were illustrated; the gradient-descent learning algorithm with SA learning coefficient was derived;and the distinctions between the archetype and the modification were analyzed. Eventually, the significance and worthiness of the modified FALCON were validated by its application to probability prediction of anode effect in aluminium electrolysis cells.

  7. Reinforcement learning for port-hamiltonian systems.

    Science.gov (United States)

    Sprangers, Olivier; Babuška, Robert; Nageshrao, Subramanya P; Lopes, Gabriel A D

    2015-05-01

    Passivity-based control (PBC) for port-Hamiltonian systems provides an intuitive way of achieving stabilization by rendering a system passive with respect to a desired storage function. However, in most instances the control law is obtained without any performance considerations and it has to be calculated by solving a complex partial differential equation (PDE). In order to address these issues we introduce a reinforcement learning (RL) approach into the energy-balancing passivity-based control (EB-PBC) method, which is a form of PBC in which the closed-loop energy is equal to the difference between the stored and supplied energies. We propose a technique to parameterize EB-PBC that preserves the systems's PDE matching conditions, does not require the specification of a global desired Hamiltonian, includes performance criteria, and is robust. The parameters of the control law are found by using actor-critic (AC) RL, enabling the search for near-optimal control policies satisfying a desired closed-loop energy landscape. The advantage is that the solutions learned can be interpreted in terms of energy shaping and damping injection, which makes it possible to numerically assess stability using passivity theory. From the RL perspective, our proposal allows for the class of port-Hamiltonian systems to be incorporated in the AC framework, speeding up the learning thanks to the resulting parameterization of the policy. The method has been successfully applied to the pendulum swing-up problem in simulations and real-life experiments.

  8. Reinforcement learning in complementarity game and population dynamics.

    Science.gov (United States)

    Jost, Jürgen; Li, Wei

    2014-02-01

    We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005)] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.

  9. Fuzzy-logic based learning style prediction in e-learning using web interface information

    Indian Academy of Sciences (India)

    L Jegatha Deborah; R Sathiyaseelan; S Audithan; P Vijayakumar

    2015-04-01

    he e-learners' excellence can be improved by recommending suitable e-contents available in e-learning servers that are based on investigating their learning styles. The learning styles had to be predicted carefully, because the psychological balance is variable in nature and the e-learners are diversified based on the learning patterns, environment, time and their mood. Moreover, the knowledge about the learners used for learning style prediction is uncertain in nature. This paper identifies Felder–Silverman learning style model as a suitable model for learning style prediction, especially in web environments and proposes to use Fuzzy rules to handle the uncertainty in the learning style predictions. The evaluations have used the Gaussian membership function based fuzzy logic for 120 students and tested for learning of C programming language and it has been observed that the proposed model improved the accuracy in prediction significantly.

  10. The role of GABAB receptors in human reinforcement learning.

    Science.gov (United States)

    Ort, Andres; Kometer, Michael; Rohde, Judith; Seifritz, Erich; Vollenweider, Franz X

    2014-10-01

    Behavioral evidence from human studies suggests that the γ-aminobutyric acid type B receptor (GABAB receptor) agonist baclofen modulates reinforcement learning and reduces craving in patients with addiction spectrum disorders. However, in contrast to the well established role of dopamine in reinforcement learning, the mechanisms by which the GABAB receptor influences reinforcement learning in humans remain completely unknown. To further elucidate this issue, a cross-over, double-blind, placebo-controlled study was performed in healthy human subjects (N=15) to test the effects of baclofen (20 and 50mg p.o.) on probabilistic reinforcement learning. Outcomes were the feedback-induced P2 component of the event-related potential, the feedback-related negativity, and the P300 component of the event-related potential. Baclofen produced a reduction of P2 amplitude over the course of the experiment, but did not modulate the feedback-related negativity. Furthermore, there was a trend towards increased learning after baclofen administration relative to placebo over the course of the experiment. The present results extend previous theories of reinforcement learning, which focus on the importance of mesolimbic dopamine signaling, and indicate that stimulation of cortical GABAB receptors in a fronto-parietal network leads to better attentional allocation in reinforcement learning. This observation is a first step in our understanding of how baclofen may improve reinforcement learning in healthy subjects. Further studies with bigger sample sizes are needed to corroborate this conclusion and furthermore, test this effect in patients with addiction spectrum disorder.

  11. Solution to reinforcement learning problems with artificial potential field

    Institute of Scientific and Technical Information of China (English)

    XIE Li-juan; XIE Guang-rong; CHEN Huan-wen; LI Xiao-li

    2008-01-01

    A novel method was designed to solve reinforcement learning problems with artificial potential field. Firstly a reinforcement learning problem was transferred to a path planning problem by using artificial potential field(APF), which was a very appropriate method to model a reinforcement learning problem. Secondly, a new APF algorithm was proposed to overcome the local minimum problem in the potential field methods with a virtual water-flow concept. The performance of this new method was tested by a gridworld problem named as key and door maze. The experimental results show that within 45 trials, good and deterministic policies are found in almost all simulations. In comparison with WIERING's HQ-learning system which needs 20 000 trials for stable solution, the proposed new method can obtain optimal and stable policy far more quickly than HQ-learning. Therefore, the new method is simple and effective to give an optimal solution to the reinforcement learning problem.

  12. Pass-ball trainning based on genetic reinforcement learning

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Introduces a mixture genetic algorithm and reinforcement learning computation model used for inde pendent agent learning in continuous, distributive, open environment, which takes full advantage of the reactive and robust of reinforcement learning algorithm and the property that genetic algorithm is suitable to the problem with high dimension, large collectivity, complex environment, and concludes that through proper training, the result verifies that this method is available in the complex multi-agent environment.

  13. Reinforcement Learning in BitTorrent Systems

    CERN Document Server

    Izhak-Ratzin, Rafit; van der Schaar, Mihaela

    2010-01-01

    Recent research efforts have shown that the popular BitTorrent protocol does not provide fair resource reciprocation and may allow free-riding. In this paper, we propose a BitTorrent-like protocol that replaces the peer selection mechanisms in the regular BitTorrent protocol with a novel reinforcement learning (RL) based mechanism. Due to the inherent opration of P2P systems, which involves repeated interactions among peers over a long period of time, the peers can efficiently identify free-riders as well as desirable collaborators by learning the behavior of their associated peers. Thus, it can help peers improve their download rates and discourage free-riding, while improving fairness in the system. We model the peers' interactions in the BitTorrent-like network as a repeated interaction game, where we explicitly consider the strategic behavior of the peers. A peer, which applies the RL-based mechanism, uses a partial history of the observations on associated peers' statistical reciprocal behaviors to deter...

  14. Prespeech motor learning in a neural network using reinforcement.

    Science.gov (United States)

    Warlaumont, Anne S; Westermann, Gert; Buder, Eugene H; Oller, D Kimbrough

    2013-02-01

    Vocal motor development in infancy provides a crucial foundation for language development. Some significant early accomplishments include learning to control the process of phonation (the production of sound at the larynx) and learning to produce the sounds of one's language. Previous work has shown that social reinforcement shapes the kinds of vocalizations infants produce. We present a neural network model that provides an account of how vocal learning may be guided by reinforcement. The model consists of a self-organizing map that outputs to muscles of a realistic vocalization synthesizer. Vocalizations are spontaneously produced by the network. If a vocalization meets certain acoustic criteria, it is reinforced, and the weights are updated to make similar muscle activations increasingly likely to recur. We ran simulations of the model under various reinforcement criteria and tested the types of vocalizations it produced after learning in the different conditions. When reinforcement was contingent on the production of phonated (i.e. voiced) sounds, the network's post-learning productions were almost always phonated, whereas when reinforcement was not contingent on phonation, the network's post-learning productions were almost always not phonated. When reinforcement was contingent on both phonation and proximity to English vowels as opposed to Korean vowels, the model's post-learning productions were more likely to resemble the English vowels and vice versa.

  15. Changes in corticostriatal connectivity during reinforcement learning in humans.

    Science.gov (United States)

    Horga, Guillermo; Maia, Tiago V; Marsh, Rachel; Hao, Xuejun; Xu, Dongrong; Duan, Yunsuo; Tau, Gregory Z; Graniello, Barbara; Wang, Zhishun; Kangarlu, Alayar; Martinez, Diana; Packard, Mark G; Peterson, Bradley S

    2015-02-01

    Many computational models assume that reinforcement learning relies on changes in synaptic efficacy between cortical regions representing stimuli and striatal regions involved in response selection, but this assumption has thus far lacked empirical support in humans. We recorded hemodynamic signals with fMRI while participants navigated a virtual maze to find hidden rewards. We fitted a reinforcement-learning algorithm to participants' choice behavior and evaluated the neural activity and the changes in functional connectivity related to trial-by-trial learning variables. Activity in the posterior putamen during choice periods increased progressively during learning. Furthermore, the functional connections between the sensorimotor cortex and the posterior putamen strengthened progressively as participants learned the task. These changes in corticostriatal connectivity differentiated participants who learned the task from those who did not. These findings provide a direct link between changes in corticostriatal connectivity and learning, thereby supporting a central assumption common to several computational models of reinforcement learning.

  16. Reinforcement learning in multidimensional environments relies on attention mechanisms.

    Science.gov (United States)

    Niv, Yael; Daniel, Reka; Geana, Andra; Gershman, Samuel J; Leong, Yuan Chang; Radulescu, Angela; Wilson, Robert C

    2015-05-27

    In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this "representation learning" process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the "curse of dimensionality" in reinforcement learning.

  17. CENTRIC MANAGEMENT SYSTEM BASED ON NEURO - FUZZY TOPOLOGY

    Directory of Open Access Journals (Sweden)

    Shumkov Y. A.

    2014-11-01

    Full Text Available The article describes the network-centric approach to a building control system based on the "inner teacher" neuro - fuzzy topology, which uses the principles of reinforcement learning

  18. A Fuzzy Logic Framework for Integrating Multiple Learned Models

    Energy Technology Data Exchange (ETDEWEB)

    Hartog, Bobi Kai Den [Univ. of Nebraska, Lincoln, NE (United States)

    1999-03-01

    The Artificial Intelligence field of Integrating Multiple Learned Models (IMLM) explores ways to combine results from sets of trained programs. Aroclor Interpretation is an ill-conditioned problem in which trained programs must operate in scenarios outside their training ranges because it is intractable to train them completely. Consequently, they fail in ways related to the scenarios. We developed a general-purpose IMLM solution, the Combiner, and applied it to Aroclor Interpretation. The Combiner's first step, Scenario Identification (M), learns rules from very sparse, synthetic training data consisting of results from a suite of trained programs called Methods. S1 produces fuzzy belief weights for each scenario by approximately matching the rules. The Combiner's second step, Aroclor Presence Detection (AP), classifies each of three Aroclors as present or absent in a sample. The third step, Aroclor Quantification (AQ), produces quantitative values for the concentration of each Aroclor in a sample. AP and AQ use automatically learned empirical biases for each of the Methods in each scenario. Through fuzzy logic, AP and AQ combine scenario weights, automatically learned biases for each of the Methods in each scenario, and Methods' results to determine results for a sample.

  19. Model-based reinforcement learning with dimension reduction.

    Science.gov (United States)

    Tangkaratt, Voot; Morimoto, Jun; Sugiyama, Masashi

    2016-12-01

    The goal of reinforcement learning is to learn an optimal policy which controls an agent to acquire the maximum cumulative reward. The model-based reinforcement learning approach learns a transition model of the environment from data, and then derives the optimal policy using the transition model. However, learning an accurate transition model in high-dimensional environments requires a large amount of data which is difficult to obtain. To overcome this difficulty, in this paper, we propose to combine model-based reinforcement learning with the recently developed least-squares conditional entropy (LSCE) method, which simultaneously performs transition model estimation and dimension reduction. We also further extend the proposed method to imitation learning scenarios. The experimental results show that policy search combined with LSCE performs well for high-dimensional control tasks including real humanoid robot control.

  20. Novelty as a Reinforcer for Position Learning in Children

    Science.gov (United States)

    Wilson, Marian Monyok

    1974-01-01

    The stimulus-familiarization-effect (SFE) paradigm, a reaction-time (RT) task based on a response to novelty procedure, was modified to assess response for novelty, ie., a response-reinforcement sequence. The potential implications of attention for reinforcement theory and learning in general are discussed. (Author/CS)

  1. Covert Operant Reinforcement of Remedial Reading Learning Tasks.

    Science.gov (United States)

    Schmickley, Verne G.

    The effects of covert operant reinforcement upon remedial reading learning tasks were investigated. Forty junior high school students were taught to imagine either neutral scenes (control) or positive scenes (treatment) upon cue while reading. It was hypothesized that positive covert reinforcement would enhance performance on several measures of…

  2. On the Possibility of a Reinforcement Theory of Cognitive Learning.

    Science.gov (United States)

    Smith, Kendon

    This paper discusses cognitive learning in terms of reinforcement theory and presents arguments suggesting that a viable theory of cognition based on reinforcement principles is not out of the question. This position is supported by a discussion of the weaknesses of theories based entirely on contiguity and of considerations that are more positive…

  3. A novel compensation-based recurrent fuzzy neural network and its learning algorithm

    Institute of Scientific and Technical Information of China (English)

    WU Bo; WU Ke; LU JianHong

    2009-01-01

    Based on detailed atudy on aeveral kinds of fuzzy neural networks, we propose a novel compensation. based recurrent fuzzy neural network (CRFNN) by adding recurrent element and compensatory element to the conventional fuzzy neural network. Then, we propose a sequential learning method for the structure Identification of the CRFNN In order to confirm the fuzzy rules and their correlaUve parameters effectively. Furthermore, we Improve the BP algorithm based on the characteristics of the proposed CRFNN to train the network. By modeling the typical nonlinear systems, we draw the conclusion that the proposed CRFNN has excellent dynamic response and strong learning ability.

  4. Reinforcement Based Fuzzy Neural Network Control with Automatic Rule Generation%基于增强型算法并能自动生成规则的模糊神经网络控制器

    Institute of Scientific and Technical Information of China (English)

    吴耿锋; 傅忠谦

    2001-01-01

    A reinforcement based fuzzy neural network controller (RBFNNC) is proposed. A set of optimised fuzzy control rules can be automatically generated through reinforcement learning based on the state variables of object system. RBFNNC was applied to a cart-pole balancing system and shows significant improvements on the rule generation.%给出了一种基于增强型算法并能自动生成控制规则的模糊神经网络控制器RBFNNC(reinforcements based fuzzy neural network controller).该控制器能根据被控对象的状态通过增强型学习自动生成模糊控制规则.RBFNNC用于倒立摆小车平衡系统控制的仿真实验表明了该系统的结构及增强型学习算法是有效和成功的.

  5. Habits, action sequences and reinforcement learning.

    Science.gov (United States)

    Dezfouli, Amir; Balleine, Bernard W

    2012-04-01

    It is now widely accepted that instrumental actions can be either goal-directed or habitual; whereas the former are rapidly acquired and regulated by their outcome, the latter are reflexive, elicited by antecedent stimuli rather than their consequences. Model-based reinforcement learning (RL) provides an elegant description of goal-directed action. Through exposure to states, actions and rewards, the agent rapidly constructs a model of the world and can choose an appropriate action based on quite abstract changes in environmental and evaluative demands. This model is powerful but has a problem explaining the development of habitual actions. To account for habits, theorists have argued that another action controller is required, called model-free RL, that does not form a model of the world but rather caches action values within states allowing a state to select an action based on its reward history rather than its consequences. Nevertheless, there are persistent problems with important predictions from the model; most notably the failure of model-free RL correctly to predict the insensitivity of habitual actions to changes in the action-reward contingency. Here, we suggest that introducing model-free RL in instrumental conditioning is unnecessary, and demonstrate that reconceptualizing habits as action sequences allows model-based RL to be applied to both goal-directed and habitual actions in a manner consistent with what real animals do. This approach has significant implications for the way habits are currently investigated and generates new experimental predictions.

  6. Robust central pattern generators for embodied hierarchical reinforcement learning

    NARCIS (Netherlands)

    Snel, M.; Whiteson, S.; Kuniyoshi, Y.

    2011-01-01

    Hierarchical organization of behavior and learning is widespread in animals and robots, among others to facilitate dealing with multiple tasks. In hierarchical reinforcement learning, agents usually have to learn to recombine or modulate low-level behaviors when facing a new task, which costs time t

  7. Exploiting Best-Match Equations for Efficient Reinforcement Learning

    NARCIS (Netherlands)

    Seijen, H.H. van; Whiteson, S.; Hasselt, H. van; Wiering, M.

    2011-01-01

    This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of model-free methods. Best-match learning works by approximating the solution to a set of best-match equations,

  8. Human operant learning under concurrent reinforcement of response variability

    NARCIS (Netherlands)

    Maes, J.H.R.; Goot, M.H. van der

    2006-01-01

    This study asked whether the concurrent reinforcement of behavioral variability facilitates learning to emit a difficult target response. Sixty students repeatedly pressed sequences of keys, with an originally infrequently occurring target sequence consistently being followed by positive feedback. T

  9. Reinforcement learning based backstepping control of power system oscillations

    Energy Technology Data Exchange (ETDEWEB)

    Karimi, Ali; Eftekharnejad, Sara; Feliachi, Ali [Advanced Power and Electric Research Center (APERC), West Virginia University, Morgantown, WV 26506-6109 (United States)

    2009-11-15

    This paper proposes a reinforcement learning based backstepping control technique for damping oscillations in electric power systems using the generators excitation systems. Decentralized controllers are first designed using the backstepping technique. Then, reinforcement learning is used to tune the gains of these controllers to adapt to various operating conditions. Simulation results for a two area power system show that the proposed control technique provides better damping than (i) conventional power system stabilizers and (ii) backstepping fixed gain controllers. (author)

  10. Learning to Perform Physics Experiments via Deep Reinforcement Learning

    CERN Document Server

    Denil, Misha; Kulkarni, Tejas D; Erez, Tom; Battaglia, Peter; de Freitas, Nando

    2016-01-01

    When encountering novel object, humans are able to infer a wide range of physical properties such as mass, friction and deformability by interacting with them in a goal driven way. This process of active interaction is in the same spirit of a scientist performing an experiment to discover hidden facts. Recent advances in artificial intelligence have yielded machines that can achieve superhuman performance in Go, Atari, natural language processing, and complex control problems, but it is not clear that these systems can rival the scientific intuition of even a young child. In this work we introduce a basic set of tasks that require agents to estimate hidden properties such as mass and cohesion of objects in an interactive simulated environment where they can manipulate the objects and observe the consequences. We found that state of art deep reinforcement learning methods can learn to perform the experiments necessary to discover such hidden properties. By systematically manipulating the problem difficulty and...

  11. Reinforcement learning, conditioning, and the brain: Successes and challenges.

    Science.gov (United States)

    Maia, Tiago V

    2009-12-01

    The field of reinforcement learning has greatly influenced the neuroscientific study of conditioning. This article provides an introduction to reinforcement learning followed by an examination of the successes and challenges using reinforcement learning to understand the neural bases of conditioning. Successes reviewed include (1) the mapping of positive and negative prediction errors to the firing of dopamine neurons and neurons in the lateral habenula, respectively; (2) the mapping of model-based and model-free reinforcement learning to associative and sensorimotor cortico-basal ganglia-thalamo-cortical circuits, respectively; and (3) the mapping of actor and critic to the dorsal and ventral striatum, respectively. Challenges reviewed consist of several behavioral and neural findings that are at odds with standard reinforcement-learning models, including, among others, evidence for hyperbolic discounting and adaptive coding. The article suggests ways of reconciling reinforcement-learning models with many of the challenging findings, and highlights the need for further theoretical developments where necessary. Additional information related to this study may be downloaded from http://cabn.psychonomic-journals.org/content/supplemental.

  12. A Proposal of Predictive Reinforcement Learning Realizing Moving Obstacle Avoidance

    Science.gov (United States)

    Takeda, Masato; Nagao, Tomoharu

    In recent years, researches on autonomous robots in real life have developed. Especially, moving obstacle avoidance is one of the most important tasks for robots. Reinforcement learning is a typical method of action acquisitions of autonomous mobile robots for obstacle avoidance. However, it has been indicated that reinforcement learning has various problems in unknown environment. In order to solve these problems, we propose predictive reinforcement learning for moving obstacle avoidance. In predictive reinforcement learning, our rules are not defined as a pair of actions and states like conventional reinforcement learning. The rules are defined as the transition of the states by robot action between steps. We think that proposed rules enable robots to adapt to unknown environment because these rules are independent from any environment where moving obstacles exist. The robots implemented these rules predict the next state. After this prediction, the robots reinforce its rules by comparing observed states with predicted ones and foresee collisions on obstacles. Then the robots select safer actions. In this paper, we verify the efficiency of our method in several simulations. First, the robot is trained in learning environment where moving obstacles exist. After that, we experiment to verify the ability of adaptation to unknown environments. As a result, the robot acquires moving obstacle avoidance actions.

  13. Behavioral and neural properties of social reinforcement learning.

    Science.gov (United States)

    Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Libby, Victoria; Glover, Gary; Voss, Henning U; Ballon, Douglas J; Casey, B J

    2011-09-14

    Social learning is critical for engaging in complex interactions with other individuals. Learning from positive social exchanges, such as acceptance from peers, may be similar to basic reinforcement learning. We formally test this hypothesis by developing a novel paradigm that is based on work in nonhuman primates and human imaging studies of reinforcement learning. The probability of receiving positive social reinforcement from three distinct peers was parametrically manipulated while brain activity was recorded in healthy adults using event-related functional magnetic resonance imaging. Over the course of the experiment, participants responded more quickly to faces of peers who provided more frequent positive social reinforcement, and rated them as more likeable. Modeling trial-by-trial learning showed ventral striatum and orbital frontal cortex activity correlated positively with forming expectations about receiving social reinforcement. Rostral anterior cingulate cortex activity tracked positively with modulations of expected value of the cues (peers). Together, the findings across three levels of analysis--social preferences, response latencies, and modeling neural responses--are consistent with reinforcement learning theory and nonhuman primate electrophysiological studies of reward. This work highlights the fundamental influence of acceptance by one's peers in altering subsequent behavior.

  14. Reinforcement learning of motor skills with policy gradients.

    Science.gov (United States)

    Peters, Jan; Schaal, Stefan

    2008-05-01

    Autonomous learning is one of the hallmarks of human and animal behavior, and understanding the principles of learning will be crucial in order to achieve true autonomy in advanced machines like humanoid robots. In this paper, we examine learning of complex motor skills with human-like limbs. While supervised learning can offer useful tools for bootstrapping behavior, e.g., by learning from demonstration, it is only reinforcement learning that offers a general approach to the final trial-and-error improvement that is needed by each individual acquiring a skill. Neither neurobiological nor machine learning studies have, so far, offered compelling results on how reinforcement learning can be scaled to the high-dimensional continuous state and action spaces of humans or humanoids. Here, we combine two recent research developments on learning motor control in order to achieve this scaling. First, we interpret the idea of modular motor control by means of motor primitives as a suitable way to generate parameterized control policies for reinforcement learning. Second, we combine motor primitives with the theory of stochastic policy gradient learning, which currently seems to be the only feasible framework for reinforcement learning for humanoids. We evaluate different policy gradient methods with a focus on their applicability to parameterized motor primitives. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.

  15. Learning of interval and general type-2 fuzzy logic systems using simulated annealing: Theory and practice

    OpenAIRE

    Almaraashia, M.; John, Robert; Hopgood, A.; S. Ahmadi

    2016-01-01

    This paper reports the use of simulated annealing to design more efficient fuzzy logic systems to model problems with associated uncertainties. Simulated annealing is used within this work as a method for learning the best configurations of interval and general type-2 fuzzy logic systems to maximize their modeling ability. The combination of simulated annealing with these models is presented in the modeling of four benchmark problems including real-world problems. The type-2 fuzzy logic syste...

  16. Simulation of thermal behavior of residential buildings using fuzzy active learning method

    OpenAIRE

    Masoud Taheri Shahraein; Hamid Taheri Shahraiyni; Melika Sanaeifar

    2015-01-01

    In this paper, a fuzzy modeling technique called Modified Active Learning Method (MALM) was introduced and utilized for fuzzy simulation of indoor and inner surface temperatures in residential buildings using meteorological data and its capability for fuzzy simulation was compared with other studies. The case studies for simulations were two residential apartments in the Fakouri and Rezashahr neighborhoods of Mashhad, Iran. The hourly inner surface and indoor temperature data were accumulated...

  17. Robot path planning in dynamic environment based on reinforcement learning

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Proposes an adaptive learning method based on reinforcement learning for robot path planning prob lem, which enables the robot to adaptively learn and perform effective path planning, to avoid the moving obsta cles and reach the target. Thereby achieving automatic construction of path planning strategy and making the system adaptive to multi-robots system dynamic environments, and concludes from computer simulation experi ment that the method is powerful to solve the problem of multi-robot path planning, and it is a meaningful try to apply reinforcement learning techniques in multi-robot systems to develop the system's intelligence degree.

  18. Model-based reinforcement learning under concurrent schedules of reinforcement in rodents.

    Science.gov (United States)

    Huh, Namjung; Jo, Suhyun; Kim, Hoseok; Sul, Jung Hoon; Jung, Min Whan

    2009-05-01

    Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's knowledge or model of the environment in model-based reinforcement learning algorithms. To investigate how animals update value functions, we trained rats under two different free-choice tasks. The reward probability of the unchosen target remained unchanged in one task, whereas it increased over time since the target was last chosen in the other task. The results show that goal choice probability increased as a function of the number of consecutive alternative choices in the latter, but not the former task, indicating that the animals were aware of time-dependent increases in arming probability and used this information in choosing goals. In addition, the choice behavior in the latter task was better accounted for by a model-based reinforcement learning algorithm. Our results show that rats adopt a decision-making process that cannot be accounted for by simple reinforcement learning models even in a relatively simple binary choice task, suggesting that rats can readily improve their decision-making strategy through the knowledge of their environments.

  19. A new accelerating algorithm for multi-agent reinforcement learning

    Institute of Scientific and Technical Information of China (English)

    ZHANG Ru-bo; ZHONG Yu; GU Guo-chang

    2005-01-01

    In multi-agent systems, joint-action must be employed to achieve cooperation because the evaluation of the behavior of an agent often depends on the other agents' behaviors. However, joint-action reinforcement learning algorithms suffer the slow convergence rate because of the enormous learning space produced by jointaction. In this article, a prediction-based reinforcement learning algorithm is presented for multi-agent cooperation tasks, which demands all agents to learn predicting the probabilities of actions that other agents may execute. A multi-robot cooperation experiment is run to test the efficacy of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation policy much faster than the primitive reinforcement learning algorithm.

  20. Reinforcement Learning in Robotics: Applications and Real-World Challenges

    Directory of Open Access Journals (Sweden)

    Petar Kormushev

    2013-07-01

    Full Text Available In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn, improve, adapt and reproduce tasks with dynamically changing constraints based on exploration and autonomous learning. We give a summary of the state-of-the-art of reinforcement learning in the context of robotics, in terms of both algorithms and policy representations. Numerous challenges faced by the policy representation in robotics are identified. Three recent examples for the application of reinforcement learning to real-world robots are described: a pancake flipping task, a bipedal walking energy minimization task and an archery-based aiming task. In all examples, a state-of-the-art expectation-maximization-based reinforcement learning is used, and different policy representations are proposed and evaluated for each task. The proposed policy representations offer viable solutions to six rarely-addressed challenges in policy representations: correlations, adaptability, multi-resolution, globality, multi-dimensionality and convergence. Both the successes and the practical difficulties encountered in these examples are discussed. Based on insights from these particular cases, conclusions are drawn about the state-of-the-art and the future perspective directions for reinforcement learning in robotics.

  1. Multi-agent machine learning a reinforcement approach

    CERN Document Server

    Schwartz, H M

    2014-01-01

    The book begins with a chapter on traditional methods of supervised learning, covering recursive least squares learning, mean square error methods, and stochastic approximation. Chapter 2 covers single agent reinforcement learning. Topics include learning value functions, Markov games, and TD learning with eligibility traces. Chapter 3 discusses two player games including two player matrix games with both pure and mixed strategies. Numerous algorithms and examples are presented. Chapter 4 covers learning in multi-player games, stochastic games, and Markov games, focusing on learning multi-pla

  2. Improving Robot Motor Learning with Negatively Valenced Reinforcement Signals.

    Science.gov (United States)

    Navarro-Guerrero, Nicolás; Lowe, Robert J; Wermter, Stefan

    2017-01-01

    Both nociception and punishment signals have been used in robotics. However, the potential for using these negatively valenced types of reinforcement learning signals for robot learning has not been exploited in detail yet. Nociceptive signals are primarily used as triggers of preprogrammed action sequences. Punishment signals are typically disembodied, i.e., with no or little relation to the agent-intrinsic limitations, and they are often used to impose behavioral constraints. Here, we provide an alternative approach for nociceptive signals as drivers of learning rather than simple triggers of preprogrammed behavior. Explicitly, we use nociception to expand the state space while we use punishment as a negative reinforcement learning signal. We compare the performance-in terms of task error, the amount of perceived nociception, and length of learned action sequences-of different neural networks imbued with punishment-based reinforcement signals for inverse kinematic learning. We contrast the performance of a version of the neural network that receives nociceptive inputs to that without such a process. Furthermore, we provide evidence that nociception can improve learning-making the algorithm more robust against network initializations-as well as behavioral performance by reducing the task error, perceived nociception, and length of learned action sequences. Moreover, we provide evidence that punishment, at least as typically used within reinforcement learning applications, may be detrimental in all relevant metrics.

  3. Obtaining ABET Student Outcome Satisfaction from Course Learning Outcome Data Using Fuzzy Logic

    Science.gov (United States)

    Imam, Muhammad Hasan; Tasadduq, Imran Ali; Ahmad, Abdul-Rahim; Aldosari, Fahd

    2017-01-01

    One of the approaches for obtaining the satisfaction data for ABET "Student Outcomes" (SOs) is to transform Course Learning Outcomes (CLOs) satisfaction data obtained through assessment of CLOs to SO satisfaction data. Considering the fuzzy nature of metrics of CLOs and SOs, a Fuzzy Logic algorithm has been proposed to extract SO…

  4. Microstimulation of the human substantia nigra alters reinforcement learning.

    Science.gov (United States)

    Ramayya, Ashwin G; Misra, Amrit; Baltuch, Gordon H; Kahana, Michael J

    2014-05-14

    Animal studies have shown that substantia nigra (SN) dopaminergic (DA) neurons strengthen action-reward associations during reinforcement learning, but their role in human learning is not known. Here, we applied microstimulation in the SN of 11 patients undergoing deep brain stimulation surgery for the treatment of Parkinson's disease as they performed a two-alternative probability learning task in which rewards were contingent on stimuli, rather than actions. Subjects demonstrated decreased learning from reward trials that were accompanied by phasic SN microstimulation compared with reward trials without stimulation. Subjects who showed large decreases in learning also showed an increased bias toward repeating actions after stimulation trials; therefore, stimulation may have decreased learning by strengthening action-reward associations rather than stimulus-reward associations. Our findings build on previous studies implicating SN DA neurons in preferentially strengthening action-reward associations during reinforcement learning.

  5. Social Learning, Reinforcement and Crime: Evidence from Three European Cities

    Science.gov (United States)

    Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina

    2012-01-01

    This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…

  6. Role of dopamine D2 receptors in human reinforcement learning.

    Science.gov (United States)

    Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W

    2014-09-01

    Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well.

  7. Social Learning, Reinforcement and Crime: Evidence from Three European Cities

    Science.gov (United States)

    Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina

    2012-01-01

    This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…

  8. Reinforcement learning in complementarity game and population dynamics

    Science.gov (United States)

    Jost, Jürgen; Li, Wei

    2014-02-01

    We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005), 10.1016/j.physa.2004.07.005] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.

  9. Generalization of value in reinforcement learning by humans.

    Science.gov (United States)

    Wimmer, G Elliott; Daw, Nathaniel D; Shohamy, Daphna

    2012-04-01

    Research in decision-making has focused on the role of dopamine and its striatal targets in guiding choices via learned stimulus-reward or stimulus-response associations, behavior that is well described by reinforcement learning theories. However, basic reinforcement learning is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision-making. A candidate mechanism for this type of learning comes from the domain of memory, which has highlighted a role for the hippocampus in learning of stimulus-stimulus relations, typically dissociated from the role of the striatum in stimulus-response learning. Here, we used functional magnetic resonance imaging and computational model-based analyses to examine the joint contributions of these mechanisms to reinforcement learning. Humans performed a reinforcement learning task with added relational structure, modeled after tasks used to isolate hippocampal contributions to memory. On each trial participants chose one of four options, but the reward probabilities for pairs of options were correlated across trials. This (uninstructed) relationship between pairs of options potentially enabled an observer to learn about option values based on experience with the other options and to generalize across them. We observed blood oxygen level-dependent (BOLD) activity related to learning in the striatum and also in the hippocampus. By comparing a basic reinforcement learning model to one augmented to allow feedback to generalize between correlated options, we tested whether choice behavior and BOLD activity were influenced by the opportunity to generalize across correlated options. Although such generalization goes beyond standard computational accounts of reinforcement learning and striatal BOLD, both choices and striatal BOLD activity were better explained by the augmented model. Consistent with the hypothesized role for the hippocampus in this generalization, functional

  10. Human-level control through deep reinforcement learning

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-01

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  11. Human-level control through deep reinforcement learning.

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-26

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  12. Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.

    Science.gov (United States)

    Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna

    2016-09-01

    Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing.

  13. Can model-free reinforcement learning explain deontological moral judgments?

    Science.gov (United States)

    Ayars, Alisabeth

    2016-05-01

    Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework.

  14. Time representation in reinforcement learning models of the basal ganglia

    Directory of Open Access Journals (Sweden)

    Samuel Joseph Gershman

    2014-01-01

    Full Text Available Reinforcement learning models have been influential in understanding many aspects of basal ganglia function, from reward prediction to action selection. Time plays an important role in these models, but there is still no theoretical consensus about what kind of time representation is used by the basal ganglia. We review several theoretical accounts and their supporting evidence. We then discuss the relationship between reinforcement learning models and the timing mechanisms that have been attributed to the basal ganglia. We hypothesize that a single computational system may underlie both reinforcement learning and interval timing—the perception of duration in the range of seconds to hours. This hypothesis, which extends earlier models by incorporating a time-sensitive action selection mechanism, may have important implications for understanding disorders like Parkinson's disease in which both decision making and timing are impaired.

  15. [Multiple Dopamine Signals and Their Contributions to Reinforcement Learning].

    Science.gov (United States)

    Matsumoto, Masayuki

    2016-10-01

    Midbrain dopamine neurons are activated by reward and sensory cue that predicts reward. Their responses resemble reward prediction error that indicates the discrepancy between obtained and expected reward values, which has been thought to play an important role as a teaching signal in reinforcement learning. Indeed, pharmacological blockade of dopamine transmission interferes with reinforcement learning. Recent studies reported, however, that not all dopamine neurons transmit the reward-related signal. They found that a subset of dopamine neurons transmits signals related to non-rewarding, salient experiences such as aversive stimulations and cognitively demanding events. How these signals contribute to animal behavior is not yet well understood. This article reviews recent findings on dopamine signals related to rewarding and non-rewarding experiences, and discusses their contributions to reinforcement learning.

  16. Adaptive Fuzzy Systems in Computational Intelligence

    Science.gov (United States)

    Berenji, Hamid R.

    1996-01-01

    In recent years, the interest in computational intelligence techniques, which currently includes neural networks, fuzzy systems, and evolutionary programming, has grown significantly and a number of their applications have been developed in the government and industry. In future, an essential element in these systems will be fuzzy systems that can learn from experience by using neural network in refining their performances. The GARIC architecture, introduced earlier, is an example of a fuzzy reinforcement learning system which has been applied in several control domains such as cart-pole balancing, simulation of to Space Shuttle orbital operations, and tether control. A number of examples from GARIC's applications in these domains will be demonstrated.

  17. A New Approach of Learning Hierarchy Construction Based on Fuzzy Logic

    Directory of Open Access Journals (Sweden)

    Ali AAJLI

    2014-10-01

    Full Text Available In recent years, adaptive learning systems rely increasingly on learning hierarchy to customize the educational logic developed in their courses. Most approaches do not consider that the relationships of prerequisites between the skills are fuzzy relationships. In this article, we describe a new approach of a practical application of fuzzy logic techniques to the construction of learning hierarchies. For this, we use a learning hierarchy predefined by one or more experts of a specific field. However, the relationships of prerequisites between the skills in the learning hierarchy are not definitive and they are fuzzy relationships. Indeed, we measure relevance degree of all relationships existing in this learning hierarchy and we try to answer to the following question: Is the relationships of prerequisites predefined in initial learning hierarchy are correctly established or not?

  18. Fuzzy comprehensive evaluation model of interuniversity collaborative learning based on network

    Science.gov (United States)

    Wenhui, Ma; Yu, Wang

    2017-06-01

    Learning evaluation is an effective method, which plays an important role in the network education evaluation system. But most of the current network learning evaluation methods still use traditional university education evaluation system, which do not take into account of web-based learning characteristics, and they are difficult to fit the rapid development of interuniversity collaborative learning based on network. Fuzzy comprehensive evaluation method is used to evaluate interuniversity collaborative learning based on the combination of fuzzy theory and analytic hierarchy process. Analytic hierarchy process is used to determine the weight of evaluation factors of each layer and to carry out the consistency check. According to the fuzzy comprehensive evaluation method, we establish interuniversity collaborative learning evaluation mathematical model. The proposed scheme provides a new thought for interuniversity collaborative learning evaluation based on network.

  19. The Computational Development of Reinforcement Learning during Adolescence

    Science.gov (United States)

    Palminteri, Stefano; Coricelli, Giorgio; Blakemore, Sarah-Jayne

    2016-01-01

    Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents’ behaviour was better explained by a basic reinforcement learning algorithm, adults’ behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence. PMID:27322574

  20. The Computational Development of Reinforcement Learning during Adolescence.

    Directory of Open Access Journals (Sweden)

    Stefano Palminteri

    2016-06-01

    Full Text Available Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed. Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback and a value contextualisation module (enabling symmetrical reward and punishment learning. Unlike adults, adolescent performance did not benefit from counterfactual (complete feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.

  1. Stress modulates reinforcement learning in younger and older adults.

    Science.gov (United States)

    Lighthall, Nichole R; Gorlick, Marissa A; Schoeke, Andrej; Frank, Michael J; Mather, Mara

    2013-03-01

    Animal research and human neuroimaging studies indicate that stress increases dopamine levels in brain regions involved in reward processing, and stress also appears to increase the attractiveness of addictive drugs. The current study tested the hypothesis that stress increases reward salience, leading to more effective learning about positive than negative outcomes in a probabilistic selection task. Changes to dopamine pathways with age raise the question of whether stress effects on incentive-based learning differ by age. Thus, the present study also examined whether effects of stress on reinforcement learning differed for younger (age 18-34) and older participants (age 65-85). Cold pressor stress was administered to half of the participants in each age group, and salivary cortisol levels were used to confirm biophysiological response to cold stress. After the manipulation, participants completed a probabilistic learning task involving positive and negative feedback. In both younger and older adults, stress enhanced learning about cues that predicted positive outcomes. In addition, during the initial learning phase, stress diminished sensitivity to recent feedback across age groups. These results indicate that stress affects reinforcement learning in both younger and older adults and suggests that stress exerts different effects on specific components of reinforcement learning depending on their neural underpinnings.

  2. Reinforcement and inference in cross-situational word learning

    Directory of Open Access Journals (Sweden)

    Paulo F.C. Tilles

    2013-11-01

    Full Text Available Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.

  3. A reinforcement learning approach to gait training improves retention.

    Science.gov (United States)

    Hasson, Christopher J; Manczurowsky, Julia; Yen, Sheng-Che

    2015-01-01

    Many gait training programs are based on supervised learning principles: an individual is guided towards a desired gait pattern with directional error feedback. While this results in rapid adaptation, improvements quickly disappear. This study tested the hypothesis that a reinforcement learning approach improves retention and transfer of a new gait pattern. The results of a pilot study and larger experiment are presented. Healthy subjects were randomly assigned to either a supervised group, who received explicit instructions and directional error feedback while they learned a new gait pattern on a treadmill, or a reinforcement group, who was only shown whether they were close to or far from the desired gait. Subjects practiced for 10 min, followed by immediate and overnight retention and over-ground transfer tests. The pilot study showed that subjects could learn a new gait pattern under a reinforcement learning paradigm. The larger experiment, which had twice as many subjects (16 in each group) showed that the reinforcement group had better overnight retention than the supervised group (a 32% vs. 120% error increase, respectively), but there were no differences for over-ground transfer. These results suggest that encouraging participants to find rewarding actions through self-guided exploration is beneficial for retention.

  4. Enhanced Experience Replay for Deep Reinforcement Learning

    Science.gov (United States)

    2015-11-01

    researchers have started to study video games as a simpler proxy problem that relies on the same principles. The current state-of-the-art system (Mnih et...al. 2015) uses a convolutional neural network to automatically extract relevant features from the video - game display, then uses reinforcement...large collections of hand-labeled training data and are usually used to solve problems that can be posed as classification or regression tasks. On the

  5. Learning the specific quality of taste reinforcement in larval Drosophila.

    Science.gov (United States)

    Schleyer, Michael; Miura, Daisuke; Tanimura, Teiichi; Gerber, Bertram

    2015-01-27

    The only property of reinforcement insects are commonly thought to learn about is its value. We show that larval Drosophila not only remember the value of reinforcement (How much?), but also its quality (What?). This is demonstrated both within the appetitive domain by using sugar vs amino acid as different reward qualities, and within the aversive domain by using bitter vs high-concentration salt as different qualities of punishment. From the available literature, such nuanced memories for the quality of reinforcement are unexpected and pose a challenge to present models of how insect memory is organized. Given that animals as simple as larval Drosophila, endowed with but 10,000 neurons, operate with both reinforcement value and quality, we suggest that both are fundamental aspects of mnemonic processing-in any brain.

  6. Reinforcement learning in young adults with developmental language impairment.

    Science.gov (United States)

    Lee, Joanna C; Tomblin, J Bruce

    2012-12-01

    The aim of the study was to examine reinforcement learning (RL) in young adults with developmental language impairment (DLI) within the context of a neurocomputational model of the basal ganglia-dopamine system (Frank, Seeberger, & O'Reilly, 2004). Two groups of young adults, one with DLI and the other without, were recruited. A probabilistic selection task was used to assess how participants implicitly extracted reinforcement history from the environment based on probabilistic positive/negative feedback. The findings showed impaired RL in individuals with DLI, indicating an altered gating function of the striatum in testing. However, they exploited similar learning strategies as comparison participants at the beginning of training, reflecting relatively intact functions of the prefrontal cortex to rapidly update reinforcement information. Within the context of Frank's model, these results can be interpreted as evidence for alterations in the basal ganglia of individuals with DLI.

  7. Robot Navigation using Reinforcement Learning and Slow Feature Analysis

    CERN Document Server

    Böhmer, Wendelin

    2012-01-01

    The application of reinforcement learning algorithms onto real life problems always bears the challenge of filtering the environmental state out of raw sensor readings. While most approaches use heuristics, biology suggests that there must exist an unsupervised method to construct such filters automatically. Besides the extraction of environmental states, the filters have to represent them in a fashion that support modern reinforcement algorithms. Many popular algorithms use a linear architecture, so one should aim at filters that have good approximation properties in combination with linear functions. This thesis wants to propose the unsupervised method slow feature analysis (SFA) for this task. Presented with a random sequence of sensor readings, SFA learns a set of filters. With growing model complexity and training examples, the filters converge against trigonometric polynomial functions. These are known to possess excellent approximation capabilities and should therfore support the reinforcement algorith...

  8. A Simple and Effective Remedial Learning System with a Fuzzy Expert System

    Science.gov (United States)

    Lin, C.-C.; Guo, K.-H.; Lin, Y.-C.

    2016-01-01

    This study aims at implementing a simple and effective remedial learning system. Based on fuzzy inference, a remedial learning material selection system is proposed for a digital logic course. Two learning concepts of the course have been used in the proposed system: number systems and combinational logic. We conducted an experiment to validate…

  9. A Simple and Effective Remedial Learning System with a Fuzzy Expert System

    Science.gov (United States)

    Lin, C.-C.; Guo, K.-H.; Lin, Y.-C.

    2016-01-01

    This study aims at implementing a simple and effective remedial learning system. Based on fuzzy inference, a remedial learning material selection system is proposed for a digital logic course. Two learning concepts of the course have been used in the proposed system: number systems and combinational logic. We conducted an experiment to validate…

  10. Reinforcement and Systemic Machine Learning for Decision Making

    CERN Document Server

    Kulkarni, Parag

    2012-01-01

    Reinforcement and Systemic Machine Learning for Decision Making There are always difficulties in making machines that learn from experience. Complete information is not always available-or it becomes available in bits and pieces over a period of time. With respect to systemic learning, there is a need to understand the impact of decisions and actions on a system over that period of time. This book takes a holistic approach to addressing that need and presents a new paradigm-creating new learning applications and, ultimately, more intelligent machines. The first book of its kind in this new an

  11. An Adaptive Neuro-Fuzzy Inference System Based Modeling for Corrosion-Damaged Reinforced HSC Beams Strengthened with External Glass Fibre Reinforced Polymer Laminates

    Directory of Open Access Journals (Sweden)

    P. N. Raghunath

    2012-01-01

    Full Text Available Problem statement: This study presents the results of ANFIS based model proposed for predicting the performance characteristics of reinforced HSC beams subjected to different levels of corrosion damage and strengthened with externally bonded glass fibre reinforced polymer laminates. Approach: A total of 21 beams specimens of size 150, 250×3000 mm were cast and tested. Results: Out of the 21 specimens, 7 specimens were tested without any corrosion damage (R-Series, 7 after inducing 10% corrosion damage (ASeries and another 7 after inducing 25% corrosion damage (B-Series. Out of the seven specimens in each series, one was tested without any laminate, three specimens were tested after applying 3 mm thick CSM, UDC and WR laminates and another three specimens after applying 5mm thick CSM, UDC and WR laminates. Conclusion/Recommendations: The test results show that the beams strengthened with externally bonded GFRP laminates exhibit increased strength, stiffness, ductility and composite action until failure. An Adaptive Neuro-Fuzzy Inference System (ANFIS model is developed for predicting the study parameters for input values lying within the range of this experimental study.

  12. Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

    Directory of Open Access Journals (Sweden)

    Takahiro Ezaki

    2016-07-01

    Full Text Available Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.

  13. Reinforcement learning accounts for moody conditional cooperation behavior: experimental results

    Science.gov (United States)

    Horita, Yutaka; Takezawa, Masanori; Inukai, Keigo; Kita, Toshimasa; Masuda, Naoki

    2017-01-01

    In social dilemma games, human participants often show conditional cooperation (CC) behavior or its variant called moody conditional cooperation (MCC), with which they basically tend to cooperate when many other peers have previously cooperated. Recent computational studies showed that CC and MCC behavioral patterns could be explained by reinforcement learning. In the present study, we use a repeated multiplayer prisoner’s dilemma game and the repeated public goods game played by human participants to examine whether MCC is observed across different types of game and the possibility that reinforcement learning explains observed behavior. We observed MCC behavior in both games, but the MCC that we observed was different from that observed in the past experiments. In the present study, whether or not a focal participant cooperated previously affected the overall level of cooperation, instead of changing the tendency of cooperation in response to cooperation of other participants in the previous time step. We found that, across different conditions, reinforcement learning models were approximately as accurate as a MCC model in describing the experimental results. Consistent with the previous computational studies, the present results suggest that reinforcement learning may be a major proximate mechanism governing MCC behavior. PMID:28071646

  14. Traffic Light Control by Multiagent Reinforcement Learning Systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.J.H.M.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  15. Joy, Distress, Hope, and Fear in Reinforcement Learning (Extended Abstract)

    NARCIS (Netherlands)

    Jacobs, E.J.; Broekens, J.; Jonker, C.M.

    2014-01-01

    In this paper we present a mapping between joy, distress, hope and fear, and Reinforcement Learning primitives. Joy / distress is a signal that is derived from the RL update signal, while hope/fear is derived from the utility of the current state. Agent-based simulation experiments replicate psychol

  16. Generalized domains for empirical evaluations in reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Tanner, B.; Taylor, M.E.; Stone, P.

    2009-01-01

    Many empirical results in reinforcement learning are based on a very small set of environments. These results often represent the best algorithm parameters that were found after an ad-hoc tuning or fitting process. We argue that presenting tuned scores from a small set of environments leads to metho

  17. Generalized domains for empirical evaluations in reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Tanner, B.; Taylor, M.E.; Stone, P.

    2009-01-01

    Many empirical results in reinforcement learning are based on a very small set of environments. These results often represent the best algorithm parameters that were found after an ad-hoc tuning or fitting process. We argue that presenting tuned scores from a small set of environments leads to

  18. Reinforcement Learning in Young Adults with Developmental Language Impairment

    Science.gov (United States)

    Lee, Joanna C.; Tomblin, J. Bruce

    2012-01-01

    The aim of the study was to examine reinforcement learning (RL) in young adults with developmental language impairment (DLI) within the context of a neurocomputational model of the basal ganglia-dopamine system (Frank, Seeberger, & O'Reilly, 2004). Two groups of young adults, one with DLI and the other without, were recruited. A probabilistic…

  19. Reinforcement learning accounts for moody conditional cooperation behavior: experimental results.

    Science.gov (United States)

    Horita, Yutaka; Takezawa, Masanori; Inukai, Keigo; Kita, Toshimasa; Masuda, Naoki

    2017-01-10

    In social dilemma games, human participants often show conditional cooperation (CC) behavior or its variant called moody conditional cooperation (MCC), with which they basically tend to cooperate when many other peers have previously cooperated. Recent computational studies showed that CC and MCC behavioral patterns could be explained by reinforcement learning. In the present study, we use a repeated multiplayer prisoner's dilemma game and the repeated public goods game played by human participants to examine whether MCC is observed across different types of game and the possibility that reinforcement learning explains observed behavior. We observed MCC behavior in both games, but the MCC that we observed was different from that observed in the past experiments. In the present study, whether or not a focal participant cooperated previously affected the overall level of cooperation, instead of changing the tendency of cooperation in response to cooperation of other participants in the previous time step. We found that, across different conditions, reinforcement learning models were approximately as accurate as a MCC model in describing the experimental results. Consistent with the previous computational studies, the present results suggest that reinforcement learning may be a major proximate mechanism governing MCC behavior.

  20. Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

    Science.gov (United States)

    Ezaki, Takahiro; Horita, Yutaka; Takezawa, Masanori; Masuda, Naoki

    2016-07-01

    Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody) conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.

  1. Generalized domains for empirical evaluations in reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Tanner, B.; Taylor, M.E.; Stone, P.

    2009-01-01

    Many empirical results in reinforcement learning are based on a very small set of environments. These results often represent the best algorithm parameters that were found after an ad-hoc tuning or fitting process. We argue that presenting tuned scores from a small set of environments leads to metho

  2. A Reinforcement Learning Agent for Minutiae Extraction from Fingerprints

    NARCIS (Netherlands)

    Bazen, Asker M.; Otterlo, van Martijn; Gerez, Sabih H.; Poel, Mannes; Kröse, B.; De Rijke, M.; Schreiber, G.; Someren, van M.

    2001-01-01

    In this paper we show that reinforcement learning can be used for minutiae detection in fingerprint matching. Minutiae are characteristic features of fingerprints that determine their uniqueness. Classical approaches use a series of image processing steps for this task, but lack robustness because t

  3. Reinforcement learning: the good, the bad and the ugly.

    Science.gov (United States)

    Dayan, Peter; Niv, Yael

    2008-04-01

    Reinforcement learning provides both qualitative and quantitative frameworks for understanding and modeling adaptive decision-making in the face of rewards and punishments. Here we review the latest dispatches from the forefront of this field, and map out some of the territories where lie monsters.

  4. Scaling Ant Colony Optimization with Hierarchical Reinforcement Learning Partitioning

    Science.gov (United States)

    2007-09-01

    on Mathematical Statistics and Probabilities, 281–297. 1967. 14. Parr , Ronald and Stuart Russell . “Reinforcement Learning with Hierarchies of...to affect the environment and the environment to affect the learning [14]. Parr does explain the environment can be partially observable and the...the execution of all other machines and monitors the completion of all machine actions. Parr uses a grid world to explain the setup of the navigation

  5. The fuzzy probability model for durability of reinforced concrete members%钢筋混凝土构件耐久性的模糊概率模型

    Institute of Scientific and Technical Information of China (English)

    袁秋平; 袁非鼎; 许凌波

    2011-01-01

    By analyzing the reason which affect durable injury of reinforced concrete members, combined with the fuzzy and randomieity of the durable injury of reinforced concrete members, assessing the durable state of reinforced concrete member with the fuzzy probability model. Durable state of reinforced concrete members were divided into three grades,in this model,five factors which affect durability of reinforced concrete members were selected, each of these factors was assessed. The function of normal distribution was used as membership function for evaluating each factor. Contrasted the assessment results of the fuzzy probability model with the assessment results of fuzzy comprehensive judgment,the assessment results of the fuzzy probability model closer reality than the assessment results of fuzzy comprehensive judgment, it not only shows that the model provides a new way for assessing the durable state of reinforced concrete member, but also this model is superior than fuzzy comprehensive judgment.%通过分析影响钢筋混凝土构件耐久性损伤的原因,结合钢筋混凝土构件耐久性损伤的模糊性和随机性,选用了模糊概率评估模型,将钢筋混凝土构件耐久性状况划分成4个等级,选取了5个影响构件耐久性的主要因子,先对各个影响因子进行单因子评价,并选用正态分布作为各影响因子的隶属度函数,对钢筋混凝土构件耐久性状况进行评估.通过工程实例验证表明:通过对模糊概率评估模型评估结果与模糊综合评判法预测结果进行对比,用模糊概率评估模型进行预测,更接近于实际情况,说明了该模型不仅为钢筋混凝土构件耐久性状况的评估问题提供了一种新途径,而且该方法比传统的模糊综合评判法更优越.

  6. Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement.

    Science.gov (United States)

    Fernández, Thalía; Bosch-Bayard, Jorge; Harmony, Thalía; Caballero, María I; Díaz-Comas, Lourdes; Galán, Lídice; Ricardo-Garcell, Josefina; Aubert, Eduardo; Otero-Ojeda, Gloria

    2016-03-01

    Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer.

  7. Intelligent Inventory Control via Ruminative Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Tatpong Katanyukul

    2014-01-01

    learning (RRL has been introduced recently based on this approach. RRL is motivated by how humans contemplate the consequences of their actions in trying to learn how to make a better decision. This study further investigates the issues of RRL and proposes new RRL methods applied to inventory management. Our investigation provides insight into different RRL characteristics, and our experimental results show the viability of the new methods.

  8. A Bayesian Sampling Approach to Exploration in Reinforcement Learning

    CERN Document Server

    Asmuth, John; Littman, Michael L; Nouri, Ali; Wingate, David

    2012-01-01

    We present a modular approach to reinforcement learning that uses a Bayesian representation of the uncertainty over models. The approach, BOSS (Best of Sampled Set), drives exploration by sampling multiple models from the posterior and selecting actions optimistically. It extends previous work by providing a rule for deciding when to resample and how to combine the models. We show that our algorithm achieves nearoptimal reward with high probability with a sample complexity that is low relative to the speed at which the posterior distribution converges during learning. We demonstrate that BOSS performs quite favorably compared to state-of-the-art reinforcement-learning approaches and illustrate its flexibility by pairing it with a non-parametric model that generalizes across states.

  9. Reinforcement learning for mobile robot: from reaction to deliberation

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    Reinforcement learning has been widely used for mobile robot learning and control. Some progress of this kind of approaches is surveyed and argued in a new way which emphasizes on different levels of algorithms according to different complexity of tasks. The central conjecture is that approaches which combine reactive and deliberative control to robotics scale better to complex real-world applications than purely reactive or deliberative ones. This paper describes basic reactive reinforcement learning algorithms and two classes of approaches to achieve deliberation, which are modular methods and hierarchical methofs. By combining reactive and deliberative paradigms, the whole system gains advantages froh different control levels. The paper gives results of experiments as a case study to verify the effectiveness of the propos ed approaches.

  10. Mobile robots exploration through cnn-based reinforcement learning.

    Science.gov (United States)

    Tai, Lei; Liu, Ming

    2016-01-01

    Exploration in an unknown environment is an elemental application for mobile robots. In this paper, we outlined a reinforcement learning method aiming for solving the exploration problem in a corridor environment. The learning model took the depth image from an RGB-D sensor as the only input. The feature representation of the depth image was extracted through a pre-trained convolutional-neural-networks model. Based on the recent success of deep Q-network on artificial intelligence, the robot controller achieved the exploration and obstacle avoidance abilities in several different simulated environments. It is the first time that the reinforcement learning is used to build an exploration strategy for mobile robots through raw sensor information.

  11. Working memory contributions to reinforcement learning impairments in schizophrenia.

    Science.gov (United States)

    Collins, Anne G E; Brown, Jaime K; Gold, James M; Waltz, James A; Frank, Michael J

    2014-10-08

    Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia.

  12. Challenges in the Verification of Reinforcement Learning Algorithms

    Science.gov (United States)

    Van Wesel, Perry; Goodloe, Alwyn E.

    2017-01-01

    Machine learning (ML) is increasingly being applied to a wide array of domains from search engines to autonomous vehicles. These algorithms, however, are notoriously complex and hard to verify. This work looks at the assumptions underlying machine learning algorithms as well as some of the challenges in trying to verify ML algorithms. Furthermore, we focus on the specific challenges of verifying reinforcement learning algorithms. These are highlighted using a specific example. Ultimately, we do not offer a solution to the complex problem of ML verification, but point out possible approaches for verification and interesting research opportunities.

  13. Amygdala and Ventral Striatum Make Distinct Contributions to Reinforcement Learning.

    Science.gov (United States)

    Costa, Vincent D; Dal Monte, Olga; Lucas, Daniel R; Murray, Elisabeth A; Averbeck, Bruno B

    2016-10-19

    Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL, we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with an RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys' choice reaction times, which emphasized a speed-accuracy trade-off that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL.

  14. STUDY ON FUZZY SELF-LEARNING CONTROL SYSTEM FOR SHIP STEERING

    Institute of Scientific and Technical Information of China (English)

    LIU Qing; WU Xiu-heng; ZOU Zao-jian

    2004-01-01

    Fuzzy control has shown success in some application areas and emerged as an alternative to some conventional control schemes. There are also some drawbacks in this approach, for example it is hard to justify the choice of fuzzy controller parameters and control rules, and control precision is low, and so on. Fuzzy control is developing towards self-learning and adaptive. The ship steering motion is a nonlinear, coupling, time-delay complicated system. How to control it effectively is the problem that many scholars are studying. In this paper, based on the repeated control of the robot, the self-learning arithmetic was worked out. The arithmetic was realized in fuzzy logic way and used in cargo steering. It is the first time for the arithmetic to be used in cargo steering. Our simulation results show that the arithmetic is effective and has several potential advantages over conventional fuzzy control.This work lays a foundation in modeling and analyzing the fuzzy learning control system.

  15. Effective reinforcement learning following cerebellar damage requires a balance between exploration and motor noise.

    Science.gov (United States)

    Therrien, Amanda S; Wolpert, Daniel M; Bastian, Amy J

    2016-01-01

    Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise.

  16. Reinforcing Communication Skills While Registered Nurses Simultaneously Learn Course Content: A Response to Learning Needs.

    Science.gov (United States)

    De Simone, Barbara B.

    1994-01-01

    Fifteen nursing students participated in Integrated Skills Reinforcement, an approach that reinforces writing, reading, speaking, and listening skills while students learn course content. Pre-/postassessment of writing showed that 93% achieved writing improvement. All students agreed that the approach improved understanding of course content, a…

  17. Reinforcement learning agents providing advice in complex video games

    Science.gov (United States)

    Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa

    2014-01-01

    This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

  18. Democratic reinforcement: learning via self-organization

    Energy Technology Data Exchange (ETDEWEB)

    Stassinopoulos, D. [Florida Atlantic Univ., Boca Raton, FL (United States); Bak, P. [Brookhaven National Lab., Upton, NY (United States)

    1995-12-31

    The problem of learning in the absence of external intelligence is discussed in the context of a simple model. The model consists of a set of randomly connected, or layered integrate-and fire neurons. Inputs to and outputs from the environment are connected randomly to subsets of neurons. The connections between firing neurons are strengthened or weakened according to whether the action is successful or not. The model departs from the traditional gradient-descent based approaches to learning by operating at a highly susceptible ``critical`` state, with low activity and sparse connections between firing neurons. Quantitative studies on the performance of our model in a simple association task show that by tuning our system close to this critical state we can obtain dramatic gains in performance.

  19. Pleasurable music affects reinforcement learning according to the listener

    Science.gov (United States)

    Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira

    2013-01-01

    Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875

  20. Pleasurable music affects reinforcement learning according to the listener.

    Science.gov (United States)

    Gold, Benjamin P; Frank, Michael J; Bogert, Brigitte; Brattico, Elvira

    2013-01-01

    Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy.

  1. Pleasurable music affects reinforcement learning according to the listener

    Directory of Open Access Journals (Sweden)

    Benjamin P Gold

    2013-08-01

    Full Text Available Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB, and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy.

  2. Exploring compact reinforcement-learning representations with linear regression

    CERN Document Server

    Walsh, Thomas J; Diuk, Carlos; Littman, Michael L

    2012-01-01

    This paper presents a new algorithm for online linear regression whose efficiency guarantees satisfy the requirements of the KWIK (Knows What It Knows) framework. The algorithm improves on the complexity bounds of the current state-of-the-art procedure in this setting. We explore several applications of this algorithm for learning compact reinforcement-learning representations. We show that KWIK linear regression can be used to learn the reward function of a factored MDP and the probabilities of action outcomes in Stochastic STRIPS and Object Oriented MDPs, none of which have been proven to be efficiently learnable in the RL setting before. We also combine KWIK linear regression with other KWIK learners to learn larger portions of these models, including experiments on learning factored MDP transition and reward functions together.

  3. Multiagent cooperation and competition with deep reinforcement learning

    Science.gov (United States)

    Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

    2017-01-01

    Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments. PMID:28380078

  4. Emotional Multiagent Reinforcement Learning in Spatial Social Dilemmas.

    Science.gov (United States)

    Yu, Chao; Zhang, Minjie; Ren, Fenghui; Tan, Guozhen

    2015-12-01

    Social dilemmas have attracted extensive interest in the research of multiagent systems in order to study the emergence of cooperative behaviors among selfish agents. Understanding how agents can achieve cooperation in social dilemmas through learning from local experience is a critical problem that has motivated researchers for decades. This paper investigates the possibility of exploiting emotions in agent learning in order to facilitate the emergence of cooperation in social dilemmas. In particular, the spatial version of social dilemmas is considered to study the impact of local interactions on the emergence of cooperation in the whole system. A double-layered emotional multiagent reinforcement learning framework is proposed to endow agents with internal cognitive and emotional capabilities that can drive these agents to learn cooperative behaviors. Experimental results reveal that various network topologies and agent heterogeneities have significant impacts on agent learning behaviors in the proposed framework, and under certain circumstances, high levels of cooperation can be achieved among the agents.

  5. Multiagent cooperation and competition with deep reinforcement learning.

    Science.gov (United States)

    Tampuu, Ardi; Matiisen, Tambet; Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

    2017-01-01

    Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

  6. Adaptive segmentation of digital mammograms through reinforcement learning

    Institute of Scientific and Technical Information of China (English)

    LIU Xin-yue; FANG Xiao-xuan; HUANG Lian-qing

    2005-01-01

    An approach based on reinfocement learning for the automated segmentation is presented. The approach consists of two modules:segmentation module and learning module. The segmentation module uses the region-growing algorithm combined with the smooth filtering and the morphological filtering to segment mammograms. The learning module uses the segmentation output as the feedback to learn to select the optimal parameter settings of the segmentation algorithm according to the image properties using reinforcement learning techniques. The approach can adapt itself to various kinds of mammograms through training and therefore obviates the tedious and error-prone tuning of parameter settings manually. Quantitative test results show that the approach is accurate for several kinds of mammograms. Compared to previously proposed approaches,the approach is more adaptable to different mammograms.

  7. Deep Direct Reinforcement Learning for Financial Signal Representation and Trading.

    Science.gov (United States)

    Deng, Yue; Bao, Feng; Kong, Youyong; Ren, Zhiquan; Dai, Qionghai

    2017-03-01

    Can we train the computer to beat experienced traders for financial assert trading? In this paper, we try to address this challenge by introducing a recurrent deep neural network (NN) for real-time financial signal representation and trading. Our model is inspired by two biological-related learning concepts of deep learning (DL) and reinforcement learning (RL). In the framework, the DL part automatically senses the dynamic market condition for informative feature learning. Then, the RL module interacts with deep representations and makes trading decisions to accumulate the ultimate rewards in an unknown environment. The learning system is implemented in a complex NN that exhibits both the deep and recurrent structures. Hence, we propose a task-aware backpropagation through time method to cope with the gradient vanishing issue in deep training. The robustness of the neural system is verified on both the stock and the commodity future markets under broad testing conditions.

  8. Reward and reinforcement activity in the nucleus accumbens during learning

    Directory of Open Access Journals (Sweden)

    John Thomas Gale

    2014-04-01

    Full Text Available The nucleus accumbens core (NAcc has been implicated in learning associations between sensory cues and profitable motor responses. However, the precise mechanisms that underlie these functions remain unclear. We recorded single-neuron activity from the NAcc of primates trained to perform a visual-motor associative learning task. During learning, we found two distinct classes of NAcc neurons. The first class demonstrated progressive increases in firing rates at the go-cue, feedback/tone and reward epochs of the task, as novel associations were learned. This suggests that these neurons may play a role in the exploitation of rewarding behaviors. In contrast, the second class exhibited attenuated firing rates, but only at the reward epoch of the task. These findings suggest that some NAcc neurons play a role in reward-based reinforcement during learning.

  9. A Proposal of Adaptive PID Controller Based on Reinforcement Learning

    Institute of Scientific and Technical Information of China (English)

    WANG Xue-song; CHENG Yu-hu; SUN Wei

    2007-01-01

    Aimed at the lack of self-tuning PID parameters in conventional PID controllers, the structure and learning algorithm of an adaptive PID controller based on reinforcement learning were proposed. Actor-Critic learning was used to tune PID parameters in an adaptive way by taking advantage of the model-free and on-line learning properties of reinforcement learning effectively. In order to reduce the demand of storage space and to improve the learning efficiency,a single RBF neural network was used to approximate the policy function of Actor and the value function of Critic simultaneously. The inputs of RBF network are the system error, as well as the first and the second-order differences of error. The Actor can realize the mapping from the system state to PID parameters, while the Critic evaluates the outputs of the Actor and produces TD error. Based on TD error performance index and gradient descent method, the updating rules of RBF kernel function and network weights were given. Simulation results show that the proposed controller is efficient for complex nonlinear systems and it is perfectly adaptable and strongly robust, which is better than that of a conventional PID controller.

  10. Simulation-based optimization parametric optimization techniques and reinforcement learning

    CERN Document Server

    Gosavi, Abhijit

    2003-01-01

    Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning introduces the evolving area of simulation-based optimization. The book's objective is two-fold: (1) It examines the mathematical governing principles of simulation-based optimization, thereby providing the reader with the ability to model relevant real-life problems using these techniques. (2) It outlines the computational technology underlying these methods. Taken together these two aspects demonstrate that the mathematical and computational methods discussed in this book do work. Broadly speaking, the book has two parts: (1) parametric (static) optimization and (2) control (dynamic) optimization. Some of the book's special features are: *An accessible introduction to reinforcement learning and parametric-optimization techniques. *A step-by-step description of several algorithms of simulation-based optimization. *A clear and simple introduction to the methodology of neural networks. *A gentle introduction to converg...

  11. Reinforcement learning for adaptive dialogue systems

    CERN Document Server

    Rieser, Verena

    2011-01-01

    The past decade has seen a revolution in the field of spoken dialogue systems. As in other areas of Computer Science and Artificial Intelligence, data-driven methods are now being used to drive new methodologies for system development and evaluation. This book is a unique contribution to that ongoing change. A new methodology for developing spoken dialogue systems is described in detail. The journey starts and ends with human behaviour in interaction, and explores methods for learning from the data, for building simulation environments for training and testing systems, and for evaluating the r

  12. A fuzzy method to learn text classifier from labeled and unlabeled examples

    Institute of Scientific and Technical Information of China (English)

    刘宏; 黄上腾

    2004-01-01

    In text classification, labeling documents is a tedious and costly task, as it would consume a lot of expert time. On the other hand, it usually is easier to obtain a lot of unlabeled documents, with the help of some tools like Digital Library, Crawler Programs, and Searching Engine. To learn text classifier from labeled and unlabeled examples, a novel fuzzy method is proposed. Firstly, a Seeded Fuzzy c-means Clustering algorithm is proposed to learn fuzzy clusters from a set of labeled and unlabeled examples. Secondly, based on the resulting fuzzy clusters, some examples with high confidence are selected to construct training data set. Finally,the constructed training data set is used to train Fuzzy Support Vector Machine, and get text classifier. Empirical results on two benchmark datasets indicate that, by incorporating unlabeled examples into learning process,the method performs significantly better than FSVM trained with a small number of labeled examples only. Also, the method proposed performs at least as well as the related method-EM with Naive Bayes. One advantage of the method proposed is that it does not rely on any parametric assumptions about the data as it is usually the case with generative methods widely used in semi-supervised learning.

  13. A Review on Anti-Plagiarism Approach Using Reinforcement Learning

    OpenAIRE

    Mr. Sudhir D. Salunkhe

    2013-01-01

    Now a days Plagiarism becomes serious problem, especially in academics and education and detecting plagiarism is a challenging task, particularly text plagiarism in student’s documents. Students or any other author makes plagiarism of original document and puts it as own document without giving credit to original author. To detect such dishonesty in document writing an anti-plagiarism system is proposed. In which reinforcement learning is can be used to get fast response of plagiarism in susp...

  14. Intranasal oxytocin enhances socially-reinforced learning in rhesus monkeys

    Directory of Open Access Journals (Sweden)

    Lisa A Parr

    2014-09-01

    Full Text Available There are currently no drugs approved for the treatment of social deficits associated with autism spectrum disorders (ASD. One hypothesis for these deficits is that individuals with ASD lack the motivation to attend to social cues because those cues are not implicitly rewarding. Therefore, any drug that could enhance the rewarding quality of social stimuli could have a profound impact on the treatment of ASD, and other social disorders. Oxytocin (OT is a neuropeptide that has been effective in enhancing social cognition and social reward in humans. The present study examined the ability of OT to selectively enhance learning after social compared to nonsocial reward in rhesus monkeys, an important species for modeling the neurobiology of social behavior in humans. Monkeys were required to learn an implicit visual matching task after receiving either intranasal (IN OT or Placebo (saline. Correct trials were rewarded with the presentation of positive and negative social (play faces/threat faces or nonsocial (banana/cage locks stimuli, plus food. Incorrect trials were not rewarded. Results demonstrated a strong effect of socially-reinforced learning, monkeys’ performed significantly better when reinforced with social versus nonsocial stimuli. Additionally, socially-reinforced learning was significantly better and occurred faster after IN-OT compared to placebo treatment. Performance in the IN-OT, but not Placebo, condition was also significantly better when the reinforcement stimuli were emotionally positive compared to negative facial expressions. These data support the hypothesis that OT may function to enhance prosocial behavior in primates by increasing the rewarding quality of emotionally positive, social compared to emotionally negative or nonsocial images. These data also support the use of the rhesus monkey as a model for exploring the neurobiological basis of social behavior and its impairment.

  15. Comparative study of a learning fuzzy PID controller and a self-tuning controller.

    Science.gov (United States)

    Kazemian, H B

    2001-01-01

    The self-organising fuzzy controller is an extension of the rule-based fuzzy controller with an additional learning capability. The self-organising fuzzy (SOF) is used as a master controller to readjust conventional PID gains at the actuator level during the system operation, copying the experience of a human operator. The application of the self-organising fuzzy PID (SOF-PID) controller to a 2-link non-linear revolute-joint robot-arm is studied using path tracking trajectories at the setpoint. For the purpose of comparison, the same experiments are repeated by using the self-tuning controller subject to the same data supplied at the setpoint. For the path tracking experiments, the output trajectories of the SOF-PID controller followed the specified path closer and smoother than the self-tuning controller.

  16. Simulation of thermal behavior of residential buildings using fuzzy active learning method

    Directory of Open Access Journals (Sweden)

    Masoud Taheri Shahraein

    2015-01-01

    Full Text Available In this paper, a fuzzy modeling technique called Modified Active Learning Method (MALM was introduced and utilized for fuzzy simulation of indoor and inner surface temperatures in residential buildings using meteorological data and its capability for fuzzy simulation was compared with other studies. The case studies for simulations were two residential apartments in the Fakouri and Rezashahr neighborhoods of Mashhad, Iran. The hourly inner surface and indoor temperature data were accumulated during measurements taken in 2010 and 2011 in different rooms of the apartments under heating and natural ventilation conditions. Hourly meteorological data (dry bulb temperature, wind speed and direction and solar radiation were measured by a meteorological station and utilized with zero to three hours lags as input variables for the simulation of inner surface and indoor temperatures. The results of simulations demonstrated the capability of MALM to be used for nonlinear fuzzy simulation of inner surface and indoor temperatures in residential apartments.

  17. Storage and recall capabilities of fuzzy morphological associative memories with adjunction-based learning.

    Science.gov (United States)

    Valle, Marcos Eduardo; Sussner, Peter

    2011-01-01

    We recently employed concepts of mathematical morphology to introduce fuzzy morphological associative memories (FMAMs), a broad class of fuzzy associative memories (FAMs). We observed that many well-known FAM models can be classified as belonging to the class of FMAMs. Moreover, we developed a general learning strategy for FMAMs using the concept of adjunction of mathematical morphology. In this paper, we describe the properties of FMAMs with adjunction-based learning. In particular, we characterize the recall phase of these models. Furthermore, we prove several theorems concerning the storage capacity, noise tolerance, fixed points, and convergence of auto-associative FMAMs. These theorems are corroborated by experimental results concerning the reconstruction of noisy images. Finally, we successfully employ FMAMs with adjunction-based learning in order to implement fuzzy rule-based systems in an application to a time-series prediction problem in industry.

  18. Experienced Gray Wolf Optimization Through Reinforcement Learning and Neural Networks.

    Science.gov (United States)

    Emary, E; Zawbaa, Hossam M; Grosan, Crina

    2017-01-10

    In this paper, a variant of gray wolf optimization (GWO) that uses reinforcement learning principles combined with neural networks to enhance the performance is proposed. The aim is to overcome, by reinforced learning, the common challenge of setting the right parameters for the algorithm. In GWO, a single parameter is used to control the exploration/exploitation rate, which influences the performance of the algorithm. Rather than using a global way to change this parameter for all the agents, we use reinforcement learning to set it on an individual basis. The adaptation of the exploration rate for each agent depends on the agent's own experience and the current terrain of the search space. In order to achieve this, experience repository is built based on the neural network to map a set of agents' states to a set of corresponding actions that specifically influence the exploration rate. The experience repository is updated by all the search agents to reflect experience and to enhance the future actions continuously. The resulted algorithm is called experienced GWO (EGWO) and its performance is assessed on solving feature selection problems and on finding optimal weights for neural networks algorithm. We use a set of performance indicators to evaluate the efficiency of the method. Results over various data sets demonstrate an advance of the EGWO over the original GWO and over other metaheuristics, such as genetic algorithms and particle swarm optimization.

  19. Cocaine addiction as a homeostatic reinforcement learning disorder.

    Science.gov (United States)

    Keramati, Mehdi; Durand, Audrey; Girardeau, Paul; Gutkin, Boris; Ahmed, Serge H

    2017-03-01

    Drug addiction implicates both reward learning and homeostatic regulation mechanisms of the brain. This has stimulated 2 partially successful theoretical perspectives on addiction. Many important aspects of addiction, however, remain to be explained within a single, unified framework that integrates the 2 mechanisms. Building upon a recently developed homeostatic reinforcement learning theory, the authors focus on a key transition stage of addiction that is well modeled in animals, escalation of drug use, and propose a computational theory of cocaine addiction where cocaine reinforces behavior due to its rapid homeostatic corrective effect, whereas its chronic use induces slow and long-lasting changes in homeostatic setpoint. Simulations show that our new theory accounts for key behavioral and neurobiological features of addiction, most notably, escalation of cocaine use, drug-primed craving and relapse, individual differences underlying dose-response curves, and dopamine D2-receptor downregulation in addicts. The theory also generates unique predictions about cocaine self-administration behavior in rats that are confirmed by new experimental results. Viewing addiction as a homeostatic reinforcement learning disorder coherently explains many behavioral and neurobiological aspects of the transition to cocaine addiction, and suggests a new perspective toward understanding addiction. (PsycINFO Database Record

  20. Individual differences in reinforcement learning: behavioral, electrophysiological, and neuroimaging correlates.

    Science.gov (United States)

    Santesso, Diane L; Dillon, Daniel G; Birk, Jeffrey L; Holmes, Avram J; Goetz, Elena; Bogdan, Ryan; Pizzagalli, Diego A

    2008-08-15

    During reinforcement learning, phasic modulations of activity in midbrain dopamine neurons are conveyed to the dorsal anterior cingulate cortex (dACC) and basal ganglia (BG) and serve to guide adaptive responding. While the animal literature supports a role for the dACC in integrating reward history over time, most human electrophysiological studies of dACC function have focused on responses to single positive and negative outcomes. The present electrophysiological study investigated the role of the dACC in probabilistic reward learning in healthy subjects using a task that required integration of reinforcement history over time. We recorded the feedback-related negativity (FRN) to reward feedback in subjects who developed a response bias toward a more frequently rewarded ("rich") stimulus ("learners") versus subjects who did not ("non-learners"). Compared to non-learners, learners showed more positive (i.e., smaller) FRNs and greater dACC activation upon receiving reward for correct identification of the rich stimulus. In addition, dACC activation and a bias to select the rich stimulus were positively correlated. The same participants also completed a monetary incentive delay (MID) task administered during functional magnetic resonance imaging. Compared to non-learners, learners displayed stronger BG responses to reward in the MID task. These findings raise the possibility that learners in the probabilistic reinforcement task were characterized by stronger dACC and BG responses to rewarding outcomes. Furthermore, these results highlight the importance of the dACC to probabilistic reward learning in humans.

  1. Application of reinforcement learning for segmentation of transrectal ultrasound images

    Directory of Open Access Journals (Sweden)

    Tizhoosh Hamid R

    2008-04-01

    Full Text Available Abstract Background Among different medical image modalities, ultrasound imaging has a very widespread clinical use. But, due to some factors, such as poor image contrast, noise and missing or diffuse boundaries, the ultrasound images are inherently difficult to segment. An important application is estimation of the location and volume of the prostate in transrectal ultrasound (TRUS images. For this purpose, manual segmentation is a tedious and time consuming procedure. Methods We introduce a new method for the segmentation of the prostate in transrectal ultrasound images, using a reinforcement learning scheme. This algorithm is used to find the appropriate local values for sub-images and to extract the prostate. It contains an offline stage, where the reinforcement learning agent uses some images and manually segmented versions of these images to learn from. The reinforcement agent is provided with reward/punishment, determined objectively to explore/exploit the solution space. After this stage, the agent has acquired knowledge stored in the Q-matrix. The agent can then use this knowledge for new input images to extract a coarse version of the prostate. Results We have carried out experiments to segment TRUS images. The results demonstrate the potential of this approach in the field of medical image segmentation. Conclusion By using the proposed method, we can find the appropriate local values and segment the prostate. This approach can be used for segmentation tasks containing one object of interest. To improve this prototype, more investigations are needed.

  2. Towards autonomous neuroprosthetic control using Hebbian reinforcement learning

    Science.gov (United States)

    Mahmoudi, Babak; Pohlmeyer, Eric A.; Prins, Noeline W.; Geng, Shijia; Sanchez, Justin C.

    2013-12-01

    Objective. Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. Approach. Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. Main results. The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. Significance. By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.

  3. Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play

    NARCIS (Netherlands)

    van der Ree, Michiel; Wiering, Marco

    2013-01-01

    This paper compares three strategies in using reinforcement learning algorithms to let an artificial agent learnto play the game of Othello. The three strategies that are compared are: Learning by self-play, learning from playing against a fixed opponent, and learning from playing against a fixed

  4. Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play

    NARCIS (Netherlands)

    van der Ree, Michiel; Wiering, Marco

    2013-01-01

    This paper compares three strategies in using reinforcement learning algorithms to let an artificial agent learnto play the game of Othello. The three strategies that are compared are: Learning by self-play, learning from playing against a fixed opponent, and learning from playing against a fixed op

  5. A comparative analysis of three metaheuristic methods applied to fuzzy cognitive maps learning

    Directory of Open Access Journals (Sweden)

    Bruno A. Angélico

    2013-12-01

    Full Text Available This work analyses the performance of three different population-based metaheuristic approaches applied to Fuzzy cognitive maps (FCM learning in qualitative control of processes. Fuzzy cognitive maps permit to include the previous specialist knowledge in the control rule. Particularly, Particle Swarm Optimization (PSO, Genetic Algorithm (GA and an Ant Colony Optimization (ACO are considered for obtaining appropriate weight matrices for learning the FCM. A statistical convergence analysis within 10000 simulations of each algorithm is presented. In order to validate the proposed approach, two industrial control process problems previously described in the literature are considered in this work.

  6. Two-layer networked learning control using self-learning fuzzy control algorithms

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Since the existing single-layer networked control systems have some inherent limitations and cannot effectively handle the problems associated with unreliable networks, a novel two-layer networked learning control system (NLCS) is proposed in this paper. Its lower layer has a number of local controllers that are operated independently, and its upper layer has a learning agent that communicates with the independent local controllers in the lower layer. To implement such a system, a packet-discard strategy is firstly developed to deal with network-induced delay and data packet loss. A cubic spline interpolator is then employed to compensate the lost data. Finally, the output of the learning agent based on a novel radial basis function neural network (RBFNN) is used to update the parameters of fuzzy controllers. A nonlinear heating, ventilation and air-conditioning (HVAC) system is used to demonstrate the feasibility and effectiveness of the proposed system.

  7. Distributed Reinforcement Learning Approach for Vehicular Ad Hoc Networks

    Science.gov (United States)

    Wu, Celimuge; Kumekawa, Kazuya; Kato, Toshihiko

    In Vehicular Ad hoc Networks (VANETs), general purpose ad hoc routing protocols such as AODV cannot work efficiently due to the frequent changes in network topology caused by vehicle movement. This paper proposes a VANET routing protocol QLAODV (Q-Learning AODV) which suits unicast applications in high mobility scenarios. QLAODV is a distributed reinforcement learning routing protocol, which uses a Q-Learning algorithm to infer network state information and uses unicast control packets to check the path availability in a real time manner in order to allow Q-Learning to work efficiently in a highly dynamic network environment. QLAODV is favored by its dynamic route change mechanism, which makes it capable of reacting quickly to network topology changes. We present an analysis of the performance of QLAODV by simulation using different mobility models. The simulation results show that QLAODV can efficiently handle unicast applications in VANETs.

  8. Use of Inverse Reinforcement Learning for Identity Prediction

    Science.gov (United States)

    Hayes, Roy; Bao, Jonathan; Beling, Peter; Horowitz, Barry

    2011-01-01

    We adopt Markov Decision Processes (MDP) to model sequential decision problems, which have the characteristic that the current decision made by a human decision maker has an uncertain impact on future opportunity. We hypothesize that the individuality of decision makers can be modeled as differences in the reward function under a common MDP model. A machine learning technique, Inverse Reinforcement Learning (IRL), was used to learn an individual's reward function based on limited observation of his or her decision choices. This work serves as an initial investigation for using IRL to analyze decision making, conducted through a human experiment in a cyber shopping environment. Specifically, the ability to determine the demographic identity of users is conducted through prediction analysis and supervised learning. The results show that IRL can be used to correctly identify participants, at a rate of 68% for gender and 66% for one of three college major categories.

  9. The ubiquity of model-based reinforcement learning.

    Science.gov (United States)

    Doll, Bradley B; Simon, Dylan A; Daw, Nathaniel D

    2012-12-01

    The reward prediction error (RPE) theory of dopamine (DA) function has enjoyed great success in the neuroscience of learning and decision-making. This theory is derived from model-free reinforcement learning (RL), in which choices are made simply on the basis of previously realized rewards. Recently, attention has turned to correlates of more flexible, albeit computationally complex, model-based methods in the brain. These methods are distinguished from model-free learning by their evaluation of candidate actions using expected future outcomes according to a world model. Puzzlingly, signatures from these computations seem to be pervasive in the very same regions previously thought to support model-free learning. Here, we review recent behavioral and neural evidence about these two systems, in attempt to reconcile their enigmatic cohabitation in the brain.

  10. Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning

    OpenAIRE

    Zhao, Tiancheng; Eskenazi, Maxine

    2016-01-01

    This paper presents an end-to-end framework for task-oriented dialog systems using a variant of Deep Recurrent Q-Networks (DRQN). The model is able to interface with a relational database and jointly learn policies for both language understanding and dialog strategy. Moreover, we propose a hybrid algorithm that combines the strength of reinforcement learning and supervised learning to achieve faster learning speed. We evaluated the proposed model on a 20 Question Game conversational game simu...

  11. Performance Improvement of Fuzzy and Neuro Fuzzy Systems: Prediction of Learning Disabilities in School-age Children

    Directory of Open Access Journals (Sweden)

    Julie M. David

    2013-11-01

    Full Text Available Learning Disability (LD is a classification including several disorders in which a child has difficulty in learning in a typical manner, usually caused by an unknown factor or factors. LD affects about 15% of children enrolled in schools. The prediction of learning disability is a complicated task since the identification of LD from diverse features or signs is a complicated problem. There is no cure for learning disabilities and they are life-long. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. The aim of this paper is to develop a new algorithm for imputing missing values and to determine the significance of the missing value imputation method and dimensionality reduction method in the performance of fuzzy and neuro fuzzy classifiers with specific emphasis on prediction of learning disabilities in school age children. In the basic assessment method for prediction of LD, checklists are generally used and the data cases thus collected fully depends on the mood of children and may have also contain redundant as well as missing values. Therefore, in this study, we are proposing a new algorithm, viz. the correlation based new algorithm for imputing the missing values and Principal Component Analysis (PCA for reducing the irrelevant attributes. After the study, it is found that, the preprocessing methods applied by us improves the quality of data and thereby increases the accuracy of the classifiers. The system is implemented in Math works Software Mat Lab 7.10. The results obtained from this study have illustrated that the developed missing value imputation method is very good contribution in prediction system and is capable of improving the performance of a classifier.

  12. Preliminary Work for Examining the Scalability of Reinforcement Learning

    Science.gov (United States)

    Clouse, Jeff

    1998-01-01

    Researchers began studying automated agents that learn to perform multiple-step tasks early in the history of artificial intelligence (Samuel, 1963; Samuel, 1967; Waterman, 1970; Fikes, Hart & Nilsonn, 1972). Multiple-step tasks are tasks that can only be solved via a sequence of decisions, such as control problems, robotics problems, classic problem-solving, and game-playing. The objective of agents attempting to learn such tasks is to use the resources they have available in order to become more proficient at the tasks. In particular, each agent attempts to develop a good policy, a mapping from states to actions, that allows it to select actions that optimize a measure of its performance on the task; for example, reducing the number of steps necessary to complete the task successfully. Our study focuses on reinforcement learning, a set of learning techniques where the learner performs trial-and-error experiments in the task and adapts its policy based on the outcome of those experiments. Much of the work in reinforcement learning has focused on a particular, simple representation, where every problem state is represented explicitly in a table, and associated with each state are the actions that can be chosen in that state. A major advantage of this table lookup representation is that one can prove that certain reinforcement learning techniques will develop an optimal policy for the current task. The drawback is that the representation limits the application of reinforcement learning to multiple-step tasks with relatively small state-spaces. There has been a little theoretical work that proves that convergence to optimal solutions can be obtained when using generalization structures, but the structures are quite simple. The theory says little about complex structures, such as multi-layer, feedforward artificial neural networks (Rumelhart & McClelland, 1986), but empirical results indicate that the use of reinforcement learning with such structures is promising

  13. Partial reinforcement effects on learning and extinction of place preferences in the water maze.

    Science.gov (United States)

    Prados, José; Sansa, Joan; Artigas, Antonio A

    2008-11-01

    In two experiments, two groups of rats were trained in a navigation task according to either a continuous or a partial schedule of reinforcement. In Experiment 1, animals that were given continuous reinforcement extinguished the spatial response of approaching the goal location more readily than animals given partial reinforcement-a partial reinforcement extinction effect. In Experiment 2, after partially or continuously reinforced training, animals were trained in a new task that made use of the same reinforcer according to a continuous reinforcement schedule. Animals initially given partial reinforcement performed better in the novel task than did rats initially given continuous reinforcement. These results replicate, in the spatial domain, well-known partial reinforcement phenomena typically observed in the context of Pavlovian and instrumental conditioning, suggesting that similar principles govern spatial and associative learning. The results reported support the notion that salience modulation processes play a key role in determining partial reinforcement effects.

  14. Context Transfer in Reinforcement Learning Using Action-Value Functions

    Directory of Open Access Journals (Sweden)

    Amin Mousavi

    2014-01-01

    Full Text Available This paper discusses the notion of context transfer in reinforcement learning tasks. Context transfer, as defined in this paper, implies knowledge transfer between source and target tasks that share the same environment dynamics and reward function but have different states or action spaces. In other words, the agents learn the same task while using different sensors and actuators. This requires the existence of an underlying common Markov decision process (MDP to which all the agents’ MDPs can be mapped. This is formulated in terms of the notion of MDP homomorphism. The learning framework is Q-learning. To transfer the knowledge between these tasks, the feature space is used as a translator and is expressed as a partial mapping between the state-action spaces of different tasks. The Q-values learned during the learning process of the source tasks are mapped to the sets of Q-values for the target task. These transferred Q-values are merged together and used to initialize the learning process of the target task. An interval-based approach is used to represent and merge the knowledge of the source tasks. Empirical results show that the transferred initialization can be beneficial to the learning process of the target task.

  15. Manifold-Based Reinforcement Learning via Locally Linear Reconstruction.

    Science.gov (United States)

    Xu, Xin; Huang, Zhenhua; Zuo, Lei; He, Haibo

    2017-04-01

    Feature representation is critical not only for pattern recognition tasks but also for reinforcement learning (RL) methods to solve learning control problems under uncertainties. In this paper, a manifold-based RL approach using the principle of locally linear reconstruction (LLR) is proposed for Markov decision processes with large or continuous state spaces. In the proposed approach, an LLR-based feature learning scheme is developed for value function approximation in RL, where a set of smooth feature vectors is generated by preserving the local approximation properties of neighboring points in the original state space. By using the proposed feature learning scheme, an LLR-based approximate policy iteration (API) algorithm is designed for learning control problems with large or continuous state spaces. The relationship between the value approximation error of a new data point and the estimated values of its nearest neighbors is analyzed. In order to compare different feature representation and learning approaches for RL, a comprehensive simulation and experimental study was conducted on three benchmark learning control problems. It is illustrated that under a wide range of parameter settings, the LLR-based API algorithm can obtain better learning control performance than the previous API methods with different feature representation schemes.

  16. Context transfer in reinforcement learning using action-value functions.

    Science.gov (United States)

    Mousavi, Amin; Nadjar Araabi, Babak; Nili Ahmadabadi, Majid

    2014-01-01

    This paper discusses the notion of context transfer in reinforcement learning tasks. Context transfer, as defined in this paper, implies knowledge transfer between source and target tasks that share the same environment dynamics and reward function but have different states or action spaces. In other words, the agents learn the same task while using different sensors and actuators. This requires the existence of an underlying common Markov decision process (MDP) to which all the agents' MDPs can be mapped. This is formulated in terms of the notion of MDP homomorphism. The learning framework is Q-learning. To transfer the knowledge between these tasks, the feature space is used as a translator and is expressed as a partial mapping between the state-action spaces of different tasks. The Q-values learned during the learning process of the source tasks are mapped to the sets of Q-values for the target task. These transferred Q-values are merged together and used to initialize the learning process of the target task. An interval-based approach is used to represent and merge the knowledge of the source tasks. Empirical results show that the transferred initialization can be beneficial to the learning process of the target task.

  17. Credit assignment in movement-dependent reinforcement learning.

    Science.gov (United States)

    McDougle, Samuel D; Boggess, Matthew J; Crossley, Matthew J; Parvin, Darius; Ivry, Richard B; Taylor, Jordan A

    2016-06-14

    When a person fails to obtain an expected reward from an object in the environment, they face a credit assignment problem: Did the absence of reward reflect an extrinsic property of the environment or an intrinsic error in motor execution? To explore this problem, we modified a popular decision-making task used in studies of reinforcement learning, the two-armed bandit task. We compared a version in which choices were indicated by key presses, the standard response in such tasks, to a version in which the choices were indicated by reaching movements, which affords execution failures. In the key press condition, participants exhibited a strong risk aversion bias; strikingly, this bias reversed in the reaching condition. This result can be explained by a reinforcement model wherein movement errors influence decision-making, either by gating reward prediction errors or by modifying an implicit representation of motor competence. Two further experiments support the gating hypothesis. First, we used a condition in which we provided visual cues indicative of movement errors but informed the participants that trial outcomes were independent of their actual movements. The main result was replicated, indicating that the gating process is independent of participants' explicit sense of control. Second, individuals with cerebellar degeneration failed to modulate their behavior between the key press and reach conditions, providing converging evidence of an implicit influence of movement error signals on reinforcement learning. These results provide a mechanistically tractable solution to the credit assignment problem.

  18. Reinforcement learning of periodical gaits in locomotion robots

    Science.gov (United States)

    Svinin, Mikhail; Yamada, Kazuyaki; Ushio, S.; Ueda, Kanji

    1999-08-01

    Emergence of stable gaits in locomotion robots is studied in this paper. A classifier system, implementing an instance- based reinforcement learning scheme, is used for sensory- motor control of an eight-legged mobile robot. Important feature of the classifier system is its ability to work with the continuous sensor space. The robot does not have a prior knowledge of the environment, its own internal model, and the goal coordinates. It is only assumed that the robot can acquire stable gaits by learning how to reach a light source. During the learning process the control system, is self-organized by reinforcement signals. Reaching the light source defines a global reward. Forward motion gets a local reward, while stepping back and falling down get a local punishment. Feasibility of the proposed self-organized system is tested under simulation and experiment. The control actions are specified at the leg level. It is shown that, as learning progresses, the number of the action rules in the classifier systems is stabilized to a certain level, corresponding to the acquired gait patterns.

  19. Beyond simple reinforcement learning: the computational neurobiology of reward-learning and valuation.

    Science.gov (United States)

    O'Doherty, John P

    2012-04-01

    Neural computational accounts of reward-learning have been dominated by the hypothesis that dopamine neurons behave like a reward-prediction error and thus facilitate reinforcement learning in striatal target neurons. While this framework is consistent with a lot of behavioral and neural evidence, this theory fails to account for a number of behavioral and neurobiological observations. In this special issue of EJN we feature a combination of theoretical and experimental papers highlighting some of the explanatory challenges faced by simple reinforcement-learning models and describing some of the ways in which the framework is being extended in order to address these challenges.

  20. An English Vocabulary Learning System Based on Fuzzy Theory and Memory Cycle

    Science.gov (United States)

    Wang, Tzone I.; Chiu, Ti Kai; Huang, Liang Jun; Fu, Ru Xuan; Hsieh, Tung-Cheng

    This paper proposes an English Vocabulary Learning System based on the Fuzzy Theory and the Memory Cycle Theory to help a learner to memorize vocabularies easily. By using fuzzy inferences and personal memory cycles, it is possible to find an article that best suits a learner. After reading an article, a quiz is provided for the learner to improve his/her memory of the vocabulary in the article. Early researches use just explicit response (ex. quiz exam) to update memory cycles of newly learned vocabulary; apart from that approach, this paper proposes a methodology that also modify implicitly the memory cycles of learned word. By intensive reading of articles recommended by our approach, a learner learns new words quickly and reviews learned words implicitly as well, and by which the vocabulary ability of the learner improves efficiently.

  1. Extinction learning, reconsolidation and the internal reinforcement hypothesis.

    Science.gov (United States)

    Eisenhardt, Dorothea; Menzel, Randolf

    2007-02-01

    Retrieving a consolidated memory--by exposing an animal to the learned stimulus but not to the associated reinforcement--leads to two opposing processes: one that weakens the old memory as a result of extinction learning, and another that strengthens the old, already-consolidated memory as a result of some less well-understood form of learning. This latter process of memory strengthening is often referred to as "reconsolidation", since protein synthesis can inhibit this form of memory formation. Although the behavioral phenomena of the two antagonizing forms of learning are well documented, the mechanisms behind the corresponding processes of memory formation are still quite controversial. Referring to results of extinction/reconsolidation experiments in honeybees, we argue that two opposing learning processes--with their respective consolidation phases and memories--are initiated by retrieval trials: extinction learning and reminder learning, the latter leading to the phenomenon of spontaneous recovery from extinction, a process that can be blocked with protein synthesis inhibition.

  2. ADOPEL: ADAPTIVE DATA COLLECTION PROTOCOL USING REINFORCEMENT LEARNING FOR VANETS

    Directory of Open Access Journals (Sweden)

    Ahmed Soua

    2014-01-01

    Full Text Available Efficient propagation of information over a vehicular wireless network has usually remained the focus of the research community. Although, scanty contributions have been made in the field of vehicular data collection and more especially in applying learning techniques to such a very changing networking scheme. These smart learning approaches excel in making the collecting operation more reactive to nodes mobility and topology changes compared to traditional techniques where a simple adaptation of MANETs propositions was carried out. To grasp the efficiency opportunities offered by these learning techniques, an Adaptive Data collection Protocol using reinforcement Learning (ADOPEL is proposed for VANETs. The proposal is based on a distributed learning algorithm on which a reward function is defined. This latter takes into account the delay and the number of aggregatable packets. The Q-learning technique offers to vehicles the opportunity to optimize their interactions with the very dynamic environment through their experience in the network. Compared to non-learning schemes, our proposal confirms its efficiency and achieves a good tradeoff between delay and collection ratio.

  3. An Improved Reinforcement Learning System Using Affective Factors

    Directory of Open Access Journals (Sweden)

    Takashi Kuremoto

    2013-07-01

    Full Text Available As a powerful and intelligent machine learning method, reinforcement learning (RL has been widely used in many fields such as game theory, adaptive control, multi-agent system, nonlinear forecasting, and so on. The main contribution of this technique is its exploration and exploitation approaches to find the optimal solution or semi-optimal solution of goal-directed problems. However, when RL is applied to multi-agent systems (MASs, problems such as “curse of dimension”, “perceptual aliasing problem”, and uncertainty of the environment constitute high hurdles to RL. Meanwhile, although RL is inspired by behavioral psychology and reward/punishment from the environment is used, higher mental factors such as affects, emotions, and motivations are rarely adopted in the learning procedure of RL. In this paper, to challenge agents learning in MASs, we propose a computational motivation function, which adopts two principle affective factors “Arousal” and “Pleasure” of Russell’s circumplex model of affects, to improve the learning performance of a conventional RL algorithm named Q-learning (QL. Compared with the conventional QL, computer simulations of pursuit problems with static and dynamic preys were carried out, and the results showed that the proposed method results in agents having a faster and more stable learning performance.

  4. Multi-timescale Nexting in a Reinforcement Learning Robot

    CERN Document Server

    Modayil, Joseph; Sutton, Richard S

    2011-01-01

    The term "nexting" has been used by psychologists to refer to the propensity of people and many other animals to continually predict what will happen next in an immediate, local, and personal sense. The ability to "next" constitutes a basic kind of awareness and knowledge of one's environment. In this paper we present results with a robot that learns to next in real time, predicting thousands of features of the world's state, including all sensory inputs, at timescales from 0.1 to 8 seconds. This was achieved by treating each state feature as a reward and applying temporal-difference methods to learn a corresponding value function with a discount rate corresponding to the timescale. That is, instead of predicting a single distinguished reward on a long timescale, as in conventional reinforcement learning, we predicted many state features at multiple short timescales. Although this approach is conceptually straightforward, there are many computational and performance challenges in implementing it in real time ...

  5. Manufacturing Scheduling Using Colored Petri Nets and Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Maria Drakaki

    2017-02-01

    Full Text Available Agent-based intelligent manufacturing control systems are capable to efficiently respond and adapt to environmental changes. Manufacturing system adaptation and evolution can be addressed with learning mechanisms that increase the intelligence of agents. In this paper a manufacturing scheduling method is presented based on Timed Colored Petri Nets (CTPNs and reinforcement learning (RL. CTPNs model the manufacturing system and implement the scheduling. In the search for an optimal solution a scheduling agent uses RL and in particular the Q-learning algorithm. A warehouse order-picking scheduling is presented as a case study to illustrate the method. The proposed scheduling method is compared to existing methods. Simulation and state space results are used to evaluate performance and identify system properties.

  6. PAC-Bayesian Policy Evaluation for Reinforcement Learning

    CERN Document Server

    Fard, Mahdi MIlani; Szepesvari, Csaba

    2012-01-01

    Bayesian priors offer a compact yet general means of incorporating domain knowledge into many learning tasks. The correctness of the Bayesian analysis and inference, however, largely depends on accuracy and correctness of these priors. PAC-Bayesian methods overcome this problem by providing bounds that hold regardless of the correctness of the prior distribution. This paper introduces the first PAC-Bayesian bound for the batch reinforcement learning problem with function approximation. We show how this bound can be used to perform model-selection in a transfer learning scenario. Our empirical results confirm that PAC-Bayesian policy evaluation is able to leverage prior distributions when they are informative and, unlike standard Bayesian RL approaches, ignore them when they are misleading.

  7. Risk-sensitive reinforcement learning algorithms with generalized average criterion

    Institute of Scientific and Technical Information of China (English)

    YIN Chang-ming; WANG Han-xing; ZHAO Fei

    2007-01-01

    A new algorithm is proposed, which immolates the optimality of control policies potentially to obtain the robusticity of solutions. The robusticity of solutions maybe becomes a very important property for a learning system when there exists non-matching between theory models and practical physical system, or the practical system is not static,or the availability of a control action changes along with the variety of time. The main contribution is that a set of approximation algorithms and their convergence results are given. A generalized average operator instead of the general optimal operator max (or min) is applied to study a class of important learning algorithms, dynamic programming algorithms, and discuss their convergences from theoretic point of view. The purpose for this research is to improve the robusticity of reinforcement learning algorithms theoretically.

  8. An Upside to Reward Sensitivity: The Hippocampus Supports Enhanced Reinforcement Learning in Adolescence.

    Science.gov (United States)

    Davidow, Juliet Y; Foerde, Karin; Galván, Adriana; Shohamy, Daphna

    2016-10-05

    Adolescents are notorious for engaging in reward-seeking behaviors, a tendency attributed to heightened activity in the brain's reward systems during adolescence. It has been suggested that reward sensitivity in adolescence might be adaptive, but evidence of an adaptive role has been scarce. Using a probabilistic reinforcement learning task combined with reinforcement learning models and fMRI, we found that adolescents showed better reinforcement learning and a stronger link between reinforcement learning and episodic memory for rewarding outcomes. This behavioral benefit was related to heightened prediction error-related BOLD activity in the hippocampus and to stronger functional connectivity between the hippocampus and the striatum at the time of reinforcement. These findings reveal an important role for the hippocampus in reinforcement learning in adolescence and suggest that reward sensitivity in adolescence is related to adaptive differences in how adolescents learn from experience.

  9. Evaluation of students' perceptions on game based learning program using fuzzy set conjoint analysis

    Science.gov (United States)

    Sofian, Siti Siryani; Rambely, Azmin Sham

    2017-04-01

    An effectiveness of a game based learning (GBL) can be determined from an application of fuzzy set conjoint analysis. The analysis was used due to the fuzziness in determining individual perceptions. This study involved a survey collected from 36 students aged 16 years old of SMK Mersing, Johor who participated in a Mathematics Discovery Camp organized by UKM research group called PRISMatik. The aim of this research was to determine the effectiveness of the module delivered to cultivate interest in mathematics subject in the form of game based learning through different values. There were 11 games conducted for the participants and students' perceptions based on the evaluation of six criteria were measured. A seven-point Likert scale method was used to collect students' preferences and perceptions. This scale represented seven linguistic terms to indicate their perceptions on each module of GBLs. Score of perceptions were transformed into degree of similarity using fuzzy set conjoint analysis. It was found that Geometric Analysis Recreation (GEAR) module was able to increase participant preference corresponded to the six attributes generated. The computations were also made for the other 10 games conducted during the camp. Results found that interest, passion and team work were the strongest values obtained from GBL activities in this camp as participants stated very strongly agreed that these attributes fulfilled their preferences in every module. This was an indicator of efficiency for the program. The evaluation using fuzzy conjoint analysis implicated the successfulness of a fuzzy approach to evaluate students' perceptions toward GBL.

  10. The Effects of Large Disturbances on On-Line Reinforcement Learning for aWalking Robot

    NARCIS (Netherlands)

    Schuitema, E.; Caarls, W.; Wisse, M.; Jonker, P.P.; Babuska, R.

    2010-01-01

    Reinforcement Learning is a promising paradigm for adding learning capabilities to humanoid robots. One of the difficulties of the real world is the presence of disturbances. In Reinforcement Learning, disturbances are typically dealt with stochastically. However, large and infrequent disturbances d

  11. Reconciling Reinforcement Learning Models with Behavioral Extinction and Renewal: Implications for Addiction, Relapse, and Problem Gambling

    Science.gov (United States)

    Redish, A. David; Jensen, Steve; Johnson, Adam; Kurth-Nelson, Zeb

    2007-01-01

    Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL…

  12. The Effects of Large Disturbances on On-Line Reinforcement Learning for aWalking Robot

    NARCIS (Netherlands)

    Schuitema, E.; Caarls, W.; Wisse, M.; Jonker, P.P.; Babuska, R.

    2010-01-01

    Reinforcement Learning is a promising paradigm for adding learning capabilities to humanoid robots. One of the difficulties of the real world is the presence of disturbances. In Reinforcement Learning, disturbances are typically dealt with stochastically. However, large and infrequent disturbances d

  13. Two Sides of the Same Coin: Learning via Positive and Negative Reinforcers in the Human Striatum

    OpenAIRE

    Niznikiewicz, Michael A.; Delgado, Mauricio R.

    2011-01-01

    The human striatum has been previously implicated in the processing of positive reinforcement, but less is known about its role in processing negative reinforcement. In this experiment, participants learn specific approach or avoid responses, mediated by positive and negative reinforcers respectively, to investigate how affective learning and associated neural activity are influenced by the motivational context in which learning occurs. The paradigm was divided into two discrete sessions, whe...

  14. Distributed Reinforcement Learning for Policy Synchronization in Infinite-Horizon Dec-POMDPs

    Science.gov (United States)

    2012-01-01

    REPORT Distributed Reinforcement Learning for PolicySynchronization in Infinite-Horizon Dec-POMDPs 14. ABSTRACT 16. SECURITY CLASSIFICATION OF: In many...ADDRESSES U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 15. SUBJECT TERMS Dec-POMDPs, reinforcement learning , multi...Rev 8/98) Prescribed by ANSI Std. Z39.18 - Distributed Reinforcement Learning for PolicySynchronization in Infinite-Horizon Dec-POMDPs Report

  15. Multi-agent reinforcement learning using modular neural network Q-learning algorithms

    Institute of Scientific and Technical Information of China (English)

    YANG Yin-xian; FANG Kai

    2005-01-01

    Reinforcement learning is an excellent approach which is used in artificial intelligence,automatic control, etc. However, ordinary reinforcement learning algorithm, such as Q-learning with lookup table cannot cope with extremely complex and dynamic environment due to the huge state space. To reduce the state space, modular neural network Q-learning algorithm is proposed, which combines Q-learning algorithm with neural network and module method. Forward feedback neural network, Elman neural network and radius-basis neural network are separately employed to construct such algorithm. It is revealed that Elman neural network Q-learning algorithm has the best performance under the condition that the same neural network training method, i.e. gradient descent error back-propagation algorithm is applied.

  16. Determining e-Portfolio Elements in Learning Process Using Fuzzy Delphi Analysis

    Science.gov (United States)

    Mohamad, Syamsul Nor Azlan; Embi, Mohamad Amin; Nordin, Norazah

    2015-01-01

    The present article introduces the Fuzzy Delphi method results obtained in the study on determining e-Portfolio elements in learning process for art and design context. This method bases on qualified experts that assure the validity of the collected information. In particular, the confirmation of elements is based on experts' opinion and…

  17. An Example of the Use of Fuzzy Set Concepts in Modeling Learning Disability.

    Science.gov (United States)

    Horvath, Michael J.; And Others

    1980-01-01

    The way a particular clinician judges, from data, the degree to which a child is in the category "learning disabled" was modeled on the basis of clinician's statement of the traits that comprise the handicap. The model illustrates the use of fuzzy set theory. (Author/RL)

  18. A Study on the Rare Factors Exploration of Learning Effectiveness by Using Fuzzy Data Mining

    Science.gov (United States)

    Chen, Chen-Tung; Chang, Kai-Yi

    2017-01-01

    The phenomenon of low fertility has been negatively impacted on the social structure of the educational environment in Taiwan. To increase the learning effectiveness of students became the most important issue for the Universities in Taiwan. Due to the subjective judgment of evaluators and the attributes of influenced factors are always fuzzy, it…

  19. Optimal control in microgrid using multi-agent reinforcement learning.

    Science.gov (United States)

    Li, Fu-Dong; Wu, Min; He, Yong; Chen, Xin

    2012-11-01

    This paper presents an improved reinforcement learning method to minimize electricity costs on the premise of satisfying the power balance and generation limit of units in a microgrid with grid-connected mode. Firstly, the microgrid control requirements are analyzed and the objective function of optimal control for microgrid is proposed. Then, a state variable "Average Electricity Price Trend" which is used to express the most possible transitions of the system is developed so as to reduce the complexity and randomicity of the microgrid, and a multi-agent architecture including agents, state variables, action variables and reward function is formulated. Furthermore, dynamic hierarchical reinforcement learning, based on change rate of key state variable, is established to carry out optimal policy exploration. The analysis shows that the proposed method is beneficial to handle the problem of "curse of dimensionality" and speed up learning in the unknown large-scale world. Finally, the simulation results under JADE (Java Agent Development Framework) demonstrate the validity of the presented method in optimal control for a microgrid with grid-connected mode.

  20. Individual Differences in Reinforcement Learning: Behavioral, Electrophysiological, and Neuroimaging Correlates

    Science.gov (United States)

    Santesso, Diane L.; Dillon, Daniel G.; Birk, Jeffrey L.; Holmes, Avram J.; Goetz, Elena; Bogdan, Ryan; Pizzagalli, Diego A.

    2008-01-01

    During reinforcement learning, phasic modulations of activity in midbrain dopamine neurons are conveyed to the dorsal anterior cingulate cortex (dACC) and basal ganglia and serve to guide adaptive responding. While the animal literature supports a role for the dACC in integrating reward history over time, most human electrophysiological studies of dACC function have focused on responses to single positive and negative outcomes. The present electrophysiological study investigated the role of the dACC in probabilistic reward learning in healthy subjects using a task that required integration of reinforcement history over time. We recorded the feedback-related negativity (FRN) to reward feedback in subjects who developed a response bias toward a more frequently rewarded (“rich”) stimulus (“learners”) versus subjects who did not (“non-learners”). Compared to non-learners, learners showed more positive (i.e., smaller) FRNs and greater dACC activation upon receiving reward for correct identification of the rich stimulus. In addition, dACC activation and a bias to select the rich stimulus were positively correlated. The same participants also completed a monetary incentive delay (MID) task administered during functional magnetic resonance imaging. Compared to non-learners, learners displayed stronger basal ganglia responses to reward in the MID task. These findings raise the possibility that learners in the probabilistic reinforcement task were characterized by stronger dACC and basal ganglia responses to rewarding outcomes. Furthermore, these results highlight the importance of the dACC to probabilistic reward learning in humans. PMID:18595740

  1. Simulation of rat behavior by a reinforcement learning algorithm in consideration of appearance probabilities of reinforcement signals.

    Science.gov (United States)

    Murakoshi, Kazushi; Noguchi, Takuya

    2005-04-01

    Brown and Wanger [Brown, R.T., Wanger, A.R., 1964. Resistance to punishment and extinction following training with shock or nonreinforcement. J. Exp. Psychol. 68, 503-507] investigated rat behaviors with the following features: (1) rats were exposed to reward and punishment at the same time, (2) environment changed and rats relearned, and (3) rats were stochastically exposed to reward and punishment. The results are that exposure to nonreinforcement produces resistance to the decremental effects of behavior after stochastic reward schedule and that exposure to both punishment and reinforcement produces resistance to the decremental effects of behavior after stochastic punishment schedule. This paper aims to simulate the rat behaviors by a reinforcement learning algorithm in consideration of appearance probabilities of reinforcement signals. The former algorithms of reinforcement learning were unable to simulate the behavior of the feature (3). We improve the former reinforcement learning algorithms by controlling learning parameters in consideration of the acquisition probabilities of reinforcement signals. The proposed algorithm qualitatively simulates the result of the animal experiment of Brown and Wanger.

  2. Reinforcement Learning Based Web Service Compositions for Mobile Business

    Science.gov (United States)

    Zhou, Juan; Chen, Shouming

    In this paper, we propose a new solution to Reactive Web Service Composition, via molding with Reinforcement Learning, and introducing modified (alterable) QoS variables into the model as elements in the Markov Decision Process tuple. Moreover, we give an example of Reactive-WSC-based mobile banking, to demonstrate the intrinsic capability of the solution in question of obtaining the optimized service composition, characterized by (alterable) target QoS variable sets with optimized values. Consequently, we come to the conclusion that the solution has decent potentials in boosting customer experiences and qualities of services in Web Services, and those in applications in the whole electronic commerce and business sector.

  3. Curiosity driven reinforcement learning for motion planning on humanoids

    OpenAIRE

    Mikhail eFrank; Jürgen eLeitner; Marijn eStollenga; Alexander eFörster; Jürgen eSchmidhuber

    2014-01-01

    Most previous work on textit{artificial curiosity} and textit{intrinsic motivation} focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study artificial curiosity in a more realistic setting, we emph{embody} a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning framework consists of a state-of-the...

  4. Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

    Science.gov (United States)

    Uchibe, Eiji; Doya, Kenji

    2008-12-01

    Understanding the design principle of reward functions is a substantial challenge both in artificial intelligence and neuroscience. Successful acquisition of a task usually requires not only rewards for goals, but also for intermediate states to promote effective exploration. This paper proposes a method for designing 'intrinsic' rewards of autonomous agents by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use Cyber Rodent robots, in which collision avoidance, recharging from battery packs, and 'mating' by software reproduction are three major 'extrinsic' rewards. We show in hardware experiments that the robots can find appropriate 'intrinsic' rewards for the vision of battery packs and other robots to promote approach behaviors.

  5. Distributed reinforcement learning for adaptive and robust network intrusion response

    Science.gov (United States)

    Malialis, Kleanthis; Devlin, Sam; Kudenko, Daniel

    2015-07-01

    Distributed denial of service (DDoS) attacks constitute a rapidly evolving threat in the current Internet. Multiagent Router Throttling is a novel approach to defend against DDoS attacks where multiple reinforcement learning agents are installed on a set of routers and learn to rate-limit or throttle traffic towards a victim server. The focus of this paper is on online learning and scalability. We propose an approach that incorporates task decomposition, team rewards and a form of reward shaping called difference rewards. One of the novel characteristics of the proposed system is that it provides a decentralised coordinated response to the DDoS problem, thus being resilient to DDoS attacks themselves. The proposed system learns remarkably fast, thus being suitable for online learning. Furthermore, its scalability is successfully demonstrated in experiments involving 1000 learning agents. We compare our approach against a baseline and a popular state-of-the-art throttling technique from the network security literature and show that the proposed approach is more effective, adaptive to sophisticated attack rate dynamics and robust to agent failures.

  6. Reinforcement learning versus proportional-integral-derivative control of hypnosis in a simulated intraoperative patient

    National Research Council Canada - National Science Library

    Moore, Brett L; Quasny, Todd M; Doufas, Anthony G

    2011-01-01

    .... We investigated the application of reinforcement learning (RL), an intelligent systems control method, to closed-loop BIS-guided, propofol-induced hypnosis in simulated intraoperative patients...

  7. Longitudinal investigation on learned helplessness tested under negative and positive reinforcement involving stimulus control.

    Science.gov (United States)

    Oliveira, Emileane C; Hunziker, Maria Helena

    2014-07-01

    In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title.

  8. Two Novel On-policy Reinforcement Learning Algorithms based on TD(lambda)-methods

    NARCIS (Netherlands)

    Wiering, M.A.; Hasselt, H. van

    2007-01-01

    This paper describes two novel on-policy reinforcement learning algorithms, named QV(lambda)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(lambda)-methods. The difference between the algorithms is that QV-learning uses the learned

  9. Fuzzy Neural Networks Learning by Variable-Dimensional Quantum-behaved Particle Swarm Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    Jing Zhao

    2013-10-01

    Full Text Available The evolutionary learning of fuzzy neural networks (FNN consists of structure learning to determine the proper number of fuzzy rules and parameters learning to adjust the network parameters. Many optimization algorithms can be applied to evolve FNN. However the search space of most algorithms has fixed dimension, which can not suit to dynamic structure learning of FNN. We propose a novel technique, which is named the variable-dimensional quantum-behaved particle swarm optimization algorithm (VDQPSO, to address the problem. In the proposed algorithm, the optimum dimension, which is unknown at the beginning, is updated together with the position of swarm. The optimum dimension converged at the end of the optimization process corresponds to a unique FNN structure where the optimum parameters can be achieved. The results of the prediction of chaotic time series experiment show that the proposed technique is effective. It can evolve to optimum or near-optimum FNN structure and optimum parameters.

  10. Metropolis Criterion Based Fuzzy Q-Learning Energy Management for Smart Grids

    Directory of Open Access Journals (Sweden)

    Haibin Yu

    2012-12-01

    Full Text Available For the energy management problems for demand response in electricity grid, a Metropolis Criterion based fuzzy Q-learning consumer energy management controller (CEMC is proposed. Because of the uncertainties and highly time-varying, it is not easy to accurately obtain the complete information for the consumer behavior in electricity grid. In this case, the Q-learning, which is independent of mathematic model, and prior-knowledge, has good performance. The fuzzy inference and Metropolis Criterion are introduced in order to facilitate generalization in large state space and balance exploration and exploitation in action selection in Q-learning individually. Simulation results show that the proposed controller can learn to take the best action to regulate consumer behavior with the features of low average end-user financial costs and high consumer satisfaction.

  11. Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators.

    Science.gov (United States)

    Yang, Qinmin; Jagannathan, Sarangapani

    2012-04-01

    In this paper, reinforcement learning state- and output-feedback-based adaptive critic controller designs are proposed by using the online approximators (OLAs) for a general multi-input and multioutput affine unknown nonlinear discretetime systems in the presence of bounded disturbances. The proposed controller design has two entities, an action network that is designed to produce optimal signal and a critic network that evaluates the performance of the action network. The critic estimates the cost-to-go function which is tuned online using recursive equations derived from heuristic dynamic programming. Here, neural networks (NNs) are used both for the action and critic whereas any OLAs, such as radial basis functions, splines, fuzzy logic, etc., can be utilized. For the output-feedback counterpart, an additional NN is designated as the observer to estimate the unavailable system states, and thus, separation principle is not required. The NN weight tuning laws for the controller schemes are also derived while ensuring uniform ultimate boundedness of the closed-loop system using Lyapunov theory. Finally, the effectiveness of the two controllers is tested in simulation on a pendulum balancing system and a two-link robotic arm system.

  12. On the Performance of Maximum Likelihood Inverse Reinforcement Learning

    CERN Document Server

    Ratia, Héctor; Martinez-Cantin, Ruben

    2012-01-01

    Inverse reinforcement learning (IRL) addresses the problem of recovering a task description given a demonstration of the optimal policy used to solve such a task. The optimal policy is usually provided by an expert or teacher, making IRL specially suitable for the problem of apprenticeship learning. The task description is encoded in the form of a reward function of a Markov decision process (MDP). Several algorithms have been proposed to find the reward function corresponding to a set of demonstrations. One of the algorithms that has provided best results in different applications is a gradient method to optimize a policy squared error criterion. On a parallel line of research, other authors have presented recently a gradient approximation of the maximum likelihood estimate of the reward signal. In general, both approaches approximate the gradient estimate and the criteria at different stages to make the algorithm tractable and efficient. In this work, we provide a detailed description of the different metho...

  13. Computational Model of Music Sight Reading: A Reinforcement Learning Approach

    CERN Document Server

    Yahya, Keyvan

    2010-01-01

    Although the Music Sight Reading process usually has been studied from the cognitive or neurological view points, but the computational learning methods like the Reinforcement Learning have not yet been used to modeling of such processes. In this paper with regards to essential properties of our specific problem, we consider the value function concept and will indicate that the optimum policy can be obtained by the method we offer without to be getting involved with computing of the complex value functions which are in most of cases inexact. Also, the algorithm we will offer here is somehow a PDE based algorithm which is associated with a stochastic optimization programming and we consider that in this case, this one is more applicable than the normative algorithms like temporal difference method.

  14. Reinforcement Learning in Distributed Domains: Beyond Team Games

    Science.gov (United States)

    Wolpert, David H.; Sill, Joseph; Turner, Kagan

    2000-01-01

    Distributed search algorithms are crucial in dealing with large optimization problems, particularly when a centralized approach is not only impractical but infeasible. Many machine learning concepts have been applied to search algorithms in order to improve their effectiveness. In this article we present an algorithm that blends Reinforcement Learning (RL) and hill climbing directly, by using the RL signal to guide the exploration step of a hill climbing algorithm. We apply this algorithm to the domain of a constellations of communication satellites where the goal is to minimize the loss of importance weighted data. We introduce the concept of 'ghost' traffic, where correctly setting this traffic induces the satellites to act to optimize the world utility. Our results indicated that the bi-utility search introduced in this paper outperforms both traditional hill climbing algorithms and distributed RL approaches such as team games.

  15. Fuzzy cognitive maps for applied sciences and engineering from fundamentals to extensions and learning algorithms

    CERN Document Server

    2014-01-01

    Fuzzy Cognitive Maps (FCM) constitute cognitive models in the form of fuzzy directed graphs consisting of two basic elements: the nodes, which basically correspond to “concepts” bearing different states of activation depending on the knowledge they represent, and the “edges” denoting the causal effects that each source node exercises on the receiving concept expressed through weights. Weights take values in the interval [-1,1], which denotes the positive, negative or neutral causal relationship between two concepts. An FCM can be typically obtained through linguistic terms, inherent to fuzzy systems, but with a structure similar to the neural networks, which facilitates data processing, and has capabilities for training and adaptation. During the last 10 years, an exponential growth of published papers in FCMs was followed showing great impact potential. Different FCM structures and learning schemes have been developed, while numerous studies report their use in many contexts with highly successful m...

  16. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Science.gov (United States)

    Kato, Ayaka; Morita, Kenji

    2016-10-01

    It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2) value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning

  17. Memory Transformation Enhances Reinforcement Learning in Dynamic Environments.

    Science.gov (United States)

    Santoro, Adam; Frankland, Paul W; Richards, Blake A

    2016-11-30

    Over the course of systems consolidation, there is a switch from a reliance on detailed episodic memories to generalized schematic memories. This switch is sometimes referred to as "memory transformation." Here we demonstrate a previously unappreciated benefit of memory transformation, namely, its ability to enhance reinforcement learning in a dynamic environment. We developed a neural network that is trained to find rewards in a foraging task where reward locations are continuously changing. The network can use memories for specific locations (episodic memories) and statistical patterns of locations (schematic memories) to guide its search. We find that switching from an episodic to a schematic strategy over time leads to enhanced performance due to the tendency for the reward location to be highly correlated with itself in the short-term, but regress to a stable distribution in the long-term. We also show that the statistics of the environment determine the optimal utilization of both types of memory. Our work recasts the theoretical question of why memory transformation occurs, shifting the focus from the avoidance of memory interference toward the enhancement of reinforcement learning across multiple timescales.

  18. Research on Cloud Computing Resources Provisioning Based on Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Zhiping Peng

    2015-01-01

    Full Text Available As one of the core issues for cloud computing, resource management adopts virtualization technology to shield the underlying resource heterogeneity and complexity which makes the massive distributed resources form a unified giant resource pool. It can achieve efficient resource provisioning by using the rational implementing resource management methods and techniques. Therefore, how to manage cloud computing resources effectively becomes a challenging research topic. By analyzing the executing progress of a user job in the cloud computing environment, we proposed a novel resource provisioning scheme based on the reinforcement learning and queuing theory in this study. With the introduction of the concepts of Segmentation Service Level Agreement (SSLA and Utilization Unit Time Cost (UUTC, we viewed the resource provisioning problem in cloud computing as a sequential decision issue, and then we designed a novel optimization object function and employed reinforcement learning to solve it. Experiment results not only demonstrated the effectiveness of the proposed scheme, but also proved to outperform the common methods of resource utilization rate in terms of SLA collision avoidance and user costs.

  19. The combination of appetitive and aversive reinforcers and the nature of their interaction during auditory learning.

    Science.gov (United States)

    Ilango, A; Wetzel, W; Scheich, H; Ohl, F W

    2010-03-31

    Learned changes in behavior can be elicited by either appetitive or aversive reinforcers. It is, however, not clear whether the two types of motivation, (approaching appetitive stimuli and avoiding aversive stimuli) drive learning in the same or different ways, nor is their interaction understood in situations where the two types are combined in a single experiment. To investigate this question we have developed a novel learning paradigm for Mongolian gerbils, which not only allows rewards and punishments to be presented in isolation or in combination with each other, but also can use these opposite reinforcers to drive the same learned behavior. Specifically, we studied learning of tone-conditioned hurdle crossing in a shuttle box driven by either an appetitive reinforcer (brain stimulation reward) or an aversive reinforcer (electrical footshock), or by a combination of both. Combination of the two reinforcers potentiated speed of acquisition, led to maximum possible performance, and delayed extinction as compared to either reinforcer alone. Additional experiments, using partial reinforcement protocols and experiments in which one of the reinforcers was omitted after the animals had been previously trained with the combination of both reinforcers, indicated that appetitive and aversive reinforcers operated together but acted in different ways: in this particular experimental context, punishment appeared to be more effective for initial acquisition and reward more effective to maintain a high level of conditioned responses (CRs). The results imply that learning mechanisms in problem solving were maximally effective when the initial punishment of mistakes was combined with the subsequent rewarding of correct performance.

  20. "Notice of Violation of IEEE Publication Principles" Multiobjective Reinforcement Learning: A Comprehensive Overview.

    Science.gov (United States)

    Liu, Chunming; Xu, Xin; Hu, Dewen

    2013-04-29

    Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.

  1. Teaching-Learning by Means of a Fuzzy-Causal User Model

    Science.gov (United States)

    Peña Ayala, Alejandro

    In this research the teaching-learning phenomenon that occurs during an E-learning experience is tackled from a fuzzy-causal perspective. The approach is suitable for dealing with intangible objects of a domain, such as personality, that are stated as linguistic variables. In addition, the bias that teaching content exerts on the user’s mind is sketched through causal relationships. Moreover, by means of fuzzy-causal inference, the user’s apprenticeship is estimated prior to delivering a lecture. This supposition is taken into account to adapt the behavior of a Web-based education system (WBES). As a result of an experimental trial, volunteers that took options of lectures chosen by this user model (UM) achieved higher learning than participants who received lectures’ options that were randomly selected. Such empirical evidence contributes to encourage researchers of the added value that a UM offers to adapt a WBES.

  2. Grounding the meanings in sensorimotor behavior using reinforcement learning

    Directory of Open Access Journals (Sweden)

    Igor eFarkaš

    2012-02-01

    Full Text Available The recent outburst of interest in cognitive developmental robotics is fueled by the ambition to propose ecologically plausible mechanisms of how, among other things, a learning agent/robot could ground linguistic meanings in its sensorimotor behaviour. Along this stream, we propose a model that allows the simulated iCub robot to learn the meanings of actions (point, touch and push oriented towards objects in robot's peripersonal space. In our experiments, the iCub learns to execute motor actions and comment on them. Architecturally, the model is composed of three neural-network-based modules that are trained in different ways. The first module, a two-layer perceptron, is trained by back-propagation to attend to the target position in the visual scene, given the low-level visual information and the feature-based target information. The second module, having the form of an actor-critic architecture, is the most distinguishing part of our model, and is trained by a continuous version of reinforcement learning to execute actions as sequences, based on a linguistic command. The third module, an echo-state network, is trained to provide the linguistic description of the executed actions. The trained model generalises well in case of novel action-target combinations with randomised initial arm positions. It can also promptly adapt its behavior if the action/target suddenly changes during motor execution.

  3. Reinforcement Learning in a Nonstationary Environment: The El Farol Problem

    Science.gov (United States)

    Bell, Ann Maria

    1999-01-01

    This paper examines the performance of simple learning rules in a complex adaptive system based on a coordination problem modeled on the El Farol problem. The key features of the El Farol problem are that it typically involves a medium number of agents and that agents' pay-off functions have a discontinuous response to increased congestion. First we consider a single adaptive agent facing a stationary environment. We demonstrate that the simple learning rules proposed by Roth and Er'ev can be extremely sensitive to small changes in the initial conditions and that events early in a simulation can affect the performance of the rule over a relatively long time horizon. In contrast, a reinforcement learning rule based on standard practice in the computer science literature converges rapidly and robustly. The situation is reversed when multiple adaptive agents interact: the RE algorithms often converge rapidly to a stable average aggregate attendance despite the slow and erratic behavior of individual learners, while the CS based learners frequently over-attend in the early and intermediate terms. The symmetric mixed strategy equilibria is unstable: all three learning rules ultimately tend towards pure strategies or stabilize in the medium term at non-equilibrium probabilities of attendance. The brittleness of the algorithms in different contexts emphasize the importance of thorough and thoughtful examination of simulation-based results.

  4. Reinforcement Learning for Routing in Cognitive Radio Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    Hasan A. A. Al-Rawi

    2014-01-01

    Full Text Available Cognitive radio (CR enables unlicensed users (or secondary users, SUs to sense for and exploit underutilized licensed spectrum owned by the licensed users (or primary users, PUs. Reinforcement learning (RL is an artificial intelligence approach that enables a node to observe, learn, and make appropriate decisions on action selection in order to maximize network performance. Routing enables a source node to search for a least-cost route to its destination node. While there have been increasing efforts to enhance the traditional RL approach for routing in wireless networks, this research area remains largely unexplored in the domain of routing in CR networks. This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate through simulation. New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing. Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs’ network performance without significantly jeopardizing PUs’ network performance, specifically SUs’ interference to PUs.

  5. Reinforcement learning for routing in cognitive radio ad hoc networks.

    Science.gov (United States)

    Al-Rawi, Hasan A A; Yau, Kok-Lim Alvin; Mohamad, Hafizal; Ramli, Nordin; Hashim, Wahidah

    2014-01-01

    Cognitive radio (CR) enables unlicensed users (or secondary users, SUs) to sense for and exploit underutilized licensed spectrum owned by the licensed users (or primary users, PUs). Reinforcement learning (RL) is an artificial intelligence approach that enables a node to observe, learn, and make appropriate decisions on action selection in order to maximize network performance. Routing enables a source node to search for a least-cost route to its destination node. While there have been increasing efforts to enhance the traditional RL approach for routing in wireless networks, this research area remains largely unexplored in the domain of routing in CR networks. This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate) through simulation. New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing. Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs' network performance without significantly jeopardizing PUs' network performance, specifically SUs' interference to PUs.

  6. Adventitious Reinforcement of Maladaptive Stimulus Control Interferes with Learning.

    Science.gov (United States)

    Saunders, Kathryn J; Hine, Kathleen; Hayashi, Yusuke; Williams, Dean C

    2016-09-01

    Persistent error patterns sometimes develop when teaching new discriminations. These patterns can be adventitiously reinforced, especially during long periods of chance-level responding (including baseline). Such behaviors can interfere with learning a new discrimination. They can also disrupt already learned discriminations, if they re-emerge during teaching procedures that generate errors. We present an example of this process. Our goal was to teach a boy with intellectual disabilities to touch one of two shapes on a computer screen (in technical terms, a simple simultaneous discrimination). We used a size-fading procedure. The correct stimulus was at full size, and the incorrect-stimulus size increased in increments of 10 %. Performance was nearly error free up to and including 60 % of full size. In a probe session with the incorrect stimulus at full size, however, accuracy plummeted. Also, a pattern of switching between choices, which apparently had been established in classroom instruction, re-emerged. The switching pattern interfered with already-learned discriminations. Despite having previously mastered a fading step with the incorrect stimulus up to 60 %, we were unable to maintain consistently high accuracy beyond 20 % of full size. We refined the teaching program such that fading was done in smaller steps (5 %), and decisions to "step back" to a smaller incorrect stimulus were made after every 5-instead of 20-trials. Errors were rare, switching behavior stopped, and he mastered the discrimination. This is a practical example of the importance of designing instruction that prevents adventitious reinforcement of maladaptive discriminated response patterns by reducing errors during acquisition.

  7. Magnetic induction of hyperthermia by a modified self-learning fuzzy temperature controller

    Science.gov (United States)

    Wang, Wei-Cheng; Tai, Cheng-Chi

    2017-07-01

    The aim of this study involved developing a temperature controller for magnetic induction hyperthermia (MIH). A closed-loop controller was applied to track a reference model to guarantee a desired temperature response. The MIH system generated an alternating magnetic field to heat a high magnetic permeability material. This wireless induction heating had few side effects when it was extensively applied to cancer treatment. The effects of hyperthermia strongly depend on the precise control of temperature. However, during the treatment process, the control performance is degraded due to severe perturbations and parameter variations. In this study, a modified self-learning fuzzy logic controller (SLFLC) with a gain tuning mechanism was implemented to obtain high control performance in a wide range of treatment situations. This implementation was performed by appropriately altering the output scaling factor of a fuzzy inverse model to adjust the control rules. In this study, the proposed SLFLC was compared to the classical self-tuning fuzzy logic controller and fuzzy model reference learning control. Additionally, the proposed SLFLC was verified by conducting in vitro experiments with porcine liver. The experimental results indicated that the proposed controller showed greater robustness and excellent adaptability with respect to the temperature control of the MIH system.

  8. Reinforcement Learning to Train Ms. Pac-Man Using Higher-order Action-relative Inputs

    NARCIS (Netherlands)

    Bom, Luuk; Henken, Ruud; Wiering, Marco

    2013-01-01

    Reinforcement learning algorithms enable an agent to optimize its behavior from interacting with a specific environment. Although some very successful applications of reinforcement learning algorithms have been developed, it is still an open research question how to scale up to large dynamic

  9. Vision-based landing of a simulated unmanned aerial vehicle with fast reinforcement learning

    OpenAIRE

    2010-01-01

    Landing is one of the difficult challenges for an unmanned aerial vehicle (UAV). In this paper, we propose a vision-based landing approach for an autonomous UAV using reinforcement learning (RL). The autonomous UAV learns the landing skill from scratch by interacting with the environment. The reinforcement learning algorithm explored and extended in this study is Least-Squares Policy Iteration (LSPI) to gain a fast learning process and a smooth landing trajectory. The proposed approach has...

  10. Delivery of Learning Knowledge Objects Using Fuzzy Clustering

    Science.gov (United States)

    Sabitha, A. Sai; Mehrotra, Deepti; Bansal, Abhay

    2016-01-01

    e-Learning industry is rapidly changing and the current learning trends are based on personalized, social and mobile learning, content reusability, cloud-based and talent management. The learning systems have attained a significant growth catering to the needs of a wide range of learners, having different approaches and styles of learning. Objects…

  11. A multiplicative reinforcement learning model capturing learning dynamics and interindividual variability in mice.

    Science.gov (United States)

    Bathellier, Brice; Tee, Sui Poh; Hrovat, Christina; Rumpel, Simon

    2013-12-03

    Both in humans and in animals, different individuals may learn the same task with strikingly different speeds; however, the sources of this variability remain elusive. In standard learning models, interindividual variability is often explained by variations of the learning rate, a parameter indicating how much synapses are updated on each learning event. Here, we theoretically show that the initial connectivity between the neurons involved in learning a task is also a strong determinant of how quickly the task is learned, provided that connections are updated in a multiplicative manner. To experimentally test this idea, we trained mice to perform an auditory Go/NoGo discrimination task followed by a reversal to compare learning speed when starting from naive or already trained synaptic connections. All mice learned the initial task, but often displayed sigmoid-like learning curves, with a variable delay period followed by a steep increase in performance, as often observed in operant conditioning. For all mice, learning was much faster in the subsequent reversal training. An accurate fit of all learning curves could be obtained with a reinforcement learning model endowed with a multiplicative learning rule, but not with an additive rule. Surprisingly, the multiplicative model could explain a large fraction of the interindividual variability by variations in the initial synaptic weights. Altogether, these results demonstrate the power of multiplicative learning rules to account for the full dynamics of biological learning and suggest an important role of initial wiring in the brain for predispositions to different tasks.

  12. Fuzzy logic and neural network technologies

    Science.gov (United States)

    Villarreal, James A.; Lea, Robert N.; Savely, Robert T.

    1992-01-01

    Applications of fuzzy logic technologies in NASA projects are reviewed to examine their advantages in the development of neural networks for aerospace and commercial expert systems and control. Examples of fuzzy-logic applications include a 6-DOF spacecraft controller, collision-avoidance systems, and reinforcement-learning techniques. The commercial applications examined include a fuzzy autofocusing system, an air conditioning system, and an automobile transmission application. The practical use of fuzzy logic is set in the theoretical context of artificial neural systems (ANSs) to give the background for an overview of ANS research programs at NASA. The research and application programs include the Network Execution and Training Simulator and faster training algorithms such as the Difference Optimized Training Scheme. The networks are well suited for pattern-recognition applications such as predicting sunspots, controlling posture maintenance, and conducting adaptive diagnoses.

  13. How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

    Science.gov (United States)

    Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C

    2014-03-01

    Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward

  14. Influence of the Migration Process on the Learning Performances of Fuzzy Knowledge Bases

    DEFF Research Database (Denmark)

    Akrout, Khaled; Baron, Luc; Balazinski, Marek;

    2007-01-01

    This paper presents the influence of the process of migration between populations in GENO-FLOU, which is an environment of learning of fuzzy knowledge bases by genetic algorithms. Initially the algorithm did not use the process of migration. For the learning, the algorithm uses a hybrid coding......, binary for the base of rules and real for the data base. This hybrid coding used with a set of specialized operators of reproduction proven to be an effective environment of learning. Simulations were made in this environment by adding a process of migration. While varying the number of populations...

  15. Method of Dynamic Knowledge Representation and Learning Based on Fuzzy Petri Nets

    Institute of Scientific and Technical Information of China (English)

    WEI Sheng-jun; HU Chang-zhen; SUN Ming-qian

    2008-01-01

    A method of knowledge representation and learning based on fuzzy Petri nets was designed. In this way the parameters of weights, threshold value and certainty factor in knowledge model can be adjusted dynamically. The advantages of knowledge representation based on production rules and neural networks were integrated into this method. Just as production knowledge representation, this method has clear structure and specific parameters meaning. In addition, it has learning and parallel reasoning ability as neural networks knowledge representation does. The result of simulation shows that the learning algorithm can converge, and the parameters of weights, threshold value and certainty factor can reach the ideal level after training.

  16. Curiosity Driven Reinforcement Learning for Motion Planning on Humanoids

    Directory of Open Access Journals (Sweden)

    Mikhail eFrank

    2014-01-01

    Full Text Available Most previous work on textit{artificial curiosity} and textit{intrinsic motivation} focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study artificial curiosity in a more realistic setting, we emph{embody} a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning framework consists of a state-of-the-art, low-level, reactive control layer, which controls the iCub while respecting constraints, and a high-level curious agent, which explores the iCub's state-action space through information gain maximization, learning a world model from experience, controlling the actual iCub hardware in real-time. To the best of our knowledge, this is the first ever embodied, curious agent for real-time motion planning on a humanoid. We demonstrate that it can learn compact Markov models to represent large regions of the iCub's configuration space, and that the iCub explores textit{intelligently}, showing textit{interest} in its physical constraints as well as in objects it finds in its environment

  17. Curiosity driven reinforcement learning for motion planning on humanoids.

    Science.gov (United States)

    Frank, Mikhail; Leitner, Jürgen; Stollenga, Marijn; Förster, Alexander; Schmidhuber, Jürgen

    2014-01-06

    Most previous work on artificial curiosity (AC) and intrinsic motivation focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study AC in a more realistic setting, we embody a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning (RL) framework consists of a state-of-the-art, low-level, reactive control layer, which controls the iCub while respecting constraints, and a high-level curious agent, which explores the iCub's state-action space through information gain maximization, learning a world model from experience, controlling the actual iCub hardware in real-time. To the best of our knowledge, this is the first ever embodied, curious agent for real-time motion planning on a humanoid. We demonstrate that it can learn compact Markov models to represent large regions of the iCub's configuration space, and that the iCub explores intelligently, showing interest in its physical constraints as well as in objects it finds in its environment.

  18. Software Agent with Reinforcement Learning Approach for Medical Image Segmentation

    Institute of Scientific and Technical Information of China (English)

    Mahsa Chitsaz; Chaw Seng Woo

    2011-01-01

    Many image segmentation solutions are problem-based. Medical images have very similar grey level and texture among the interested objects. Therefore, medical image segmentation requires improvements although there have been researches done since the last few decades. We design a self-learning framework to extract several objects of interest simultaneously from Computed Tomography (CT) images. Our segmentation method has a learning phase that is based on reinforcement learning (RL) system. Each RL agent works on a particular sub-image of an input image to find a suitable value for each object in it. The RL system is define by state, action and reward. We defined some actions for each state in the sub-image. A reward function computes reward for each action of the RL agent. Finally, the valuable information, from discovering all states of the interest objects, will be stored in a Q-matrix and the final result can be applied in segmentation of similar images. The experimental results for cranial CT images demonstrated segmentation accuracy above 95%.

  19. Reinforcement learning in depression: A review of computational research.

    Science.gov (United States)

    Chen, Chong; Takahashi, Taiki; Nakagawa, Shin; Inoue, Takeshi; Kusumi, Ichiro

    2015-08-01

    Despite being considered primarily a mood disorder, major depressive disorder (MDD) is characterized by cognitive and decision making deficits. Recent research has employed computational models of reinforcement learning (RL) to address these deficits. The computational approach has the advantage in making explicit predictions about learning and behavior, specifying the process parameters of RL, differentiating between model-free and model-based RL, and the computational model-based functional magnetic resonance imaging and electroencephalography. With these merits there has been an emerging field of computational psychiatry and here we review specific studies that focused on MDD. Considerable evidence suggests that MDD is associated with impaired brain signals of reward prediction error and expected value ('wanting'), decreased reward sensitivity ('liking') and/or learning (be it model-free or model-based), etc., although the causality remains unclear. These parameters may serve as valuable intermediate phenotypes of MDD, linking general clinical symptoms to underlying molecular dysfunctions. We believe future computational research at clinical, systems, and cellular/molecular/genetic levels will propel us toward a better understanding of the disease.

  20. Novelty and Inductive Generalization in Human Reinforcement Learning.

    Science.gov (United States)

    Gershman, Samuel J; Niv, Yael

    2015-07-01

    In reinforcement learning (RL), a decision maker searching for the most rewarding option is often faced with the question: What is the value of an option that has never been tried before? One way to frame this question is as an inductive problem: How can I generalize my previous experience with one set of options to a novel option? We show how hierarchical Bayesian inference can be used to solve this problem, and we describe an equivalence between the Bayesian model and temporal difference learning algorithms that have been proposed as models of RL in humans and animals. According to our view, the search for the best option is guided by abstract knowledge about the relationships between different options in an environment, resulting in greater search efficiency compared to traditional RL algorithms previously applied to human cognition. In two behavioral experiments, we test several predictions of our model, providing evidence that humans learn and exploit structured inductive knowledge to make predictions about novel options. In light of this model, we suggest a new interpretation of dopaminergic responses to novelty.

  1. A Review on Anti-Plagiarism Approach Using Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Mr. Sudhir D. Salunkhe

    2013-07-01

    Full Text Available Now a days Plagiarism becomes serious problem, especially in academics and education and detecting plagiarism is a challenging task, particularly text plagiarism in student’s documents. Students or any other author makes plagiarism of original document and puts it as own document without giving credit to original author. To detect such dishonesty in document writing an anti-plagiarism system is proposed. In which reinforcement learning is can be used to get fast response of plagiarism in suspected document. The suspected document is compared with local as well as global database over the web. And then the final result will be calculated in terms of percentage for the suspected document

  2. Optimism in Reinforcement Learning Based on Kullback-Leibler Divergence

    CERN Document Server

    Filippi, Sarah; Garivier, Aurélien

    2010-01-01

    We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. Optimism is usually implemented by carrying out extended value iterations, under a constraint of consistency with the estimated model transition probabilities. In this paper, we strongly argue in favor of using the Kullback-Leibler (KL) divergence for this purpose. By study- ing the linear maximization problem under KL constraints, we provide an efficient algorithm for solving KL-optimistic extended value iteration. When implemented within the structure of UCRL2, the near-optimal method introduced by [Auer et al, 2008], this algorithm also achieves bounded regrets in the undiscounted case. We however provide some geometric arguments as well as a concrete illustration on a simulated example to explain the observed improved practical behavior, particularly when the MDP has reduced connectivity. To analyze this new algorithm, termed KL-UCRL, we also rely on recent deviation bou...

  3. Variance-Based Rewards for Approximate Bayesian Reinforcement Learning

    CERN Document Server

    Sorg, Jonathan; Lewis, Richard L

    2012-01-01

    The explore{exploit dilemma is one of the central challenges in Reinforcement Learning (RL). Bayesian RL solves the dilemma by providing the agent with information in the form of a prior distribution over environments; however, full Bayesian planning is intractable. Planning with the mean MDP is a common myopic approximation of Bayesian planning. We derive a novel reward bonus that is a function of the posterior distribution over environments, which, when added to the reward in planning with the mean MDP, results in an agent which explores efficiently and effectively. Although our method is similar to existing methods when given an uninformative or unstructured prior, unlike existing methods, our method can exploit structured priors. We prove that our method results in a polynomial sample complexity and empirically demonstrate its advantages in a structured exploration task.

  4. Conflict acts as an implicit cost in reinforcement learning.

    Science.gov (United States)

    Cavanagh, James F; Masters, Sean E; Bath, Kevin; Frank, Michael J

    2014-11-04

    Conflict has been proposed to act as a cost in action selection, implying a general function of medio-frontal cortex in the adaptation to aversive events. Here we investigate if response conflict acts as a cost during reinforcement learning by modulating experienced reward values in cortical and striatal systems. Electroencephalography recordings show that conflict diminishes the relationship between reward-related frontal theta power and cue preference yet it enhances the relationship between punishment and cue avoidance. Individual differences in the cost of conflict on reward versus punishment sensitivity are also related to a genetic polymorphism associated with striatal D1 versus D2 pathway balance (DARPP-32). We manipulate these patterns with the D2 agent cabergoline, which induces a strong bias to amplify the aversive value of punishment outcomes following conflict. Collectively, these findings demonstrate that interactive cortico-striatal systems implicitly modulate experienced reward and punishment values as a function of conflict.

  5. A reinforcement learning approach to instrumental contingency degradation in rats.

    Science.gov (United States)

    Dutech, Alain; Coutureau, Etienne; Marchand, Alain R

    2011-01-01

    Goal-directed action involves a representation of action consequences. Adapting to changes in action-outcome contingency requires the prefrontal region. Indeed, rats with lesions of the medial prefrontal cortex do not adapt their free operant response when food delivery becomes unrelated to lever-pressing. The present study explores the bases of this deficit through a combined behavioural and computational approach. We show that lesioned rats retain some behavioural flexibility and stop pressing if this action prevents food delivery. We attempt to model this phenomenon in a reinforcement learning framework. The model assumes that distinct action values are learned in an incremental manner in distinct states. The model represents states as n-uplets of events, emphasizing sequences rather than the continuous passage of time. Probabilities of lever-pressing and visits to the food magazine observed in the behavioural experiments are first analyzed as a function of these states, to identify sequences of events that influence action choice. Observed action probabilities appear to be essentially function of the last event that occurred, with reward delivery and waiting significantly facilitating magazine visits and lever-pressing respectively. Behavioural sequences of normal and lesioned rats are then fed into the model, action values are updated at each event transition according to the SARSA algorithm, and predicted action probabilities are derived through a softmax policy. The model captures the time course of learning, as well as the differential adaptation of normal and prefrontal lesioned rats to contingency degradation with the same parameters for both groups. The results suggest that simple temporal difference algorithms with low learning rates can largely account for instrumental learning and performance. Prefrontal lesioned rats appear to mainly differ from control rats in their low rates of visits to the magazine after a lever press, and their inability to

  6. Fuzzy control in robot-soccer, evolutionary learning in the first layer of control

    Directory of Open Access Journals (Sweden)

    Peter J Thomas

    2003-02-01

    Full Text Available In this paper an evolutionary algorithm is developed to learn a fuzzy knowledge base for the control of a soccer playing micro-robot from any configuration belonging to a grid of initial configurations to hit the ball along the ball to goal line of sight. The knowledge base uses relative co-ordinate system including left and right wheel velocities of the robot. Final path positions allow forward and reverse facing robot to ball and include its physical dimensions.

  7. A Reinforcement Learning Model Using Neural Networks for Music Sight Reading Learning Problem

    CERN Document Server

    Yahya, Keyvan

    2010-01-01

    Music Sight Reading is a complex process that when it is occurred in the brain, some learning attributes would be emerged. Besides giving a model based on actor-critic method in the Reinforcement Learning, the agent is considered to have a neural network structure. We studied on where the sight reading process is happened and also a serious problem which is how the synaptic weights would be adjusted through the learning process. The model we offer here is a computational model on which an updated weights equation to fixing the weights is accompanied too.

  8. Reinforcement learning for discounted values often loses the goal in the application to animal learning.

    Science.gov (United States)

    Yamaguchi, Yoshiya; Sakai, Yutaka

    2012-11-01

    The impulsive preference of an animal for an immediate reward implies that it might subjectively discount the value of potential future outcomes. A theoretical framework to maximize the discounted subjective value has been established in the reinforcement learning theory. The framework has been successfully applied in engineering. However, this study identified a limitation when applied to animal behavior, where in some cases, there is no learning goal. Here a possible learning framework was proposed that is well-posed in any cases and that is consistent with the impulsive preference.

  9. Does reward frequency or magnitude drive reinforcement-learning in attention-deficit/hyperactivity disorder?

    NARCIS (Netherlands)

    Luman, M.; van Meel, C.S.; Oosterlaan, J.; Sergeant, J.A.; Geurts, H.M.

    2009-01-01

    Children with attention-deficit/hyperactivity disorder (ADHD) show an impaired ability to use feedback in the context of learning. A stimulus-response learning task was used to investigate whether (1) children with ADHD displayed flatter learning curves, (2) reinforcement-learning in ADHD was sensit

  10. SVR learning-based spatiotemporal fuzzy logic controller for nonlinear spatially distributed dynamic systems.

    Science.gov (United States)

    Zhang, Xian-Xia; Jiang, Ye; Li, Han-Xiong; Li, Shao-Yuan

    2013-10-01

    A data-driven 3-D fuzzy-logic controller (3-D FLC) design methodology based on support vector regression (SVR) learning is developed for nonlinear spatially distributed dynamic systems. Initially, the spatial information expression and processing as well as the fuzzy linguistic expression and rule inference of a 3-D FLC are integrated into spatial fuzzy basis functions (SFBFs), and then the 3-D FLC can be depicted by a three-layer network structure. By relating SFBFs of the 3-D FLC directly to spatial kernel functions of an SVR, an equivalence relationship of the 3-D FLC and the SVR is established, which means that the 3-D FLC can be designed with the help of the SVR learning. Subsequently, for an easy implementation, a systematic SVR learning-based 3-D FLC design scheme is formulated. In addition, the universal approximation capability of the proposed 3-D FLC is presented. Finally, the control of a nonlinear catalytic packed-bed reactor is considered as an application to demonstrate the effectiveness of the proposed 3-D FLC.

  11. B-tree search reinforcement learning for model based intelligent agent

    Science.gov (United States)

    Bhuvaneswari, S.; Vignashwaran, R.

    2013-03-01

    Agents trained by learning techniques provide a powerful approximation of active solutions for naive approaches. In this study using B - Trees implying reinforced learning the data search for information retrieval is moderated to achieve accuracy with minimum search time. The impact of variables and tactics applied in training are determined using reinforcement learning. Agents based on these techniques perform satisfactory baseline and act as finite agents based on the predetermined model against competitors from the course.

  12. Learning to use working memory: a reinforcement learning gating model of rule acquisition in rats

    Directory of Open Access Journals (Sweden)

    Kevin eLloyd

    2012-10-01

    Full Text Available Learning to form appropriate, task-relevant working memory representations is a complex process central to cognition. Gating models frame working memory as a collection of past observations and use reinforcement learning to solve the problem of when to update these observations. Investigation of how gating models relate to brain and behavior remains, however, at an early stage. The current study sought to explore the ability of simple reinforcement learning gating models to replicate rule learning behavior in rats. Rats were trained in a maze-based spatial learning task that required animals to make trial-by-trial choices contingent upon their previous experience. Using an abstract version of this task, we tested the ability of two gating algorithms, one based on the Actor-Critic and the other on the State-Action-Reward-State-Action (SARSA algorithm, to generate behavior consistent with the rats’. Both models produced rule-acquisition behavior consistent with the experimental data, though only the SARSA gating model mirrored faster learning following rule reversal. We also found that both gating models learned multiple strategies in solving the initial task, a property which highlights the multi-agent nature of such models and which is of importance in considering the neural basis of individual differences in behavior.

  13. The Drive-Reinforcement Neuronal Model: A Real-Time Learning Mechanism For Unsupervised Learning

    Science.gov (United States)

    Klopf, A. H.

    1988-05-01

    The drive-reinforcement neuronal model is described as an example of a newly discovered class of real-time learning mechanisms that correlate earlier derivatives of inputs with later derivatives of outputs. The drive-reinforcement neuronal model has been demonstrated to predict a wide range of classical conditioning phenomena in animal learning. A variety of classes of connectionist and neural network models have been investigated in recent years (Hinton and Anderson, 1981; Levine, 1983; Barto, 1985; Feldman, 1985; Rumelhart and McClelland, 1986). After a brief review of these models, discussion will focus on the class of real-time models because they appear to be making the strongest contact with the experimental evidence of animal learning. Theoretical models in physics have inspired Boltzmann machines (Ackley, Hinton, and Sejnowski, 1985) and what are sometimes called Hopfield networks (Hopfield, 1982; Hopfield and Tank, 1986). These connectionist models utilize symmetric connections and adaptive equilibrium processes during which the networks settle into minimal energy states. Networks utilizing error-correction learning mechanisms go back to Rosenblatt's (1962) perception and Widrow's (1962) adaline and currently take the form of back propagation networks (Parker, 1985; Rumelhart, Hinton, and Williams, 1985, 1986). These networks require a "teacher" or "trainer" to provide error signals indicating the difference between desired and actual responses. Networks employing real-time learning mechanisms, in which the temporal association of signals is of fundamental importance, go back to Hebb (1949). Real-time learning mechanisms may require no teacher or trainer and thus may lend themselves to unsupervised learning. Such models have been extended by Klopf (1972, 1982), who introduced the notions of synaptic eligibility and generalized reinforcement. Sutton and Barto (1981) advanced this class of models by proposing that a derivative of the theoretical neuron's out

  14. A Reinforcement Learning Framework for Spiking Networks with Dynamic Synapses

    Directory of Open Access Journals (Sweden)

    Karim El-Laithy

    2011-01-01

    Full Text Available An integration of both the Hebbian-based and reinforcement learning (RL rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.

  15. Beamforming and Power Control in Sensor Arrays Using Reinforcement Learning

    Science.gov (United States)

    Almeida, Náthalee C.; Fernandes, Marcelo A.C.; Neto, Adrião D.D.

    2015-01-01

    The use of beamforming and power control, combined or separately, has advantages and disadvantages, depending on the application. The combined use of beamforming and power control has been shown to be highly effective in applications involving the suppression of interference signals from different sources. However, it is necessary to identify efficient methodologies for the combined operation of these two techniques. The most appropriate technique may be obtained by means of the implementation of an intelligent agent capable of making the best selection between beamforming and power control. The present paper proposes an algorithm using reinforcement learning (RL) to determine the optimal combination of beamforming and power control in sensor arrays. The RL algorithm used was Q-learning, employing an ε-greedy policy, and training was performed using the offline method. The simulations showed that RL was effective for implementation of a switching policy involving the different techniques, taking advantage of the positive characteristics of each technique in terms of signal reception. PMID:25808769

  16. Beamforming and Power Control in Sensor Arrays Using Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Náthalee C. Almeida

    2015-03-01

    Full Text Available The use of beamforming and power control, combined or separately, has advantages and disadvantages, depending on the application. The combined use of beamforming and power control has been shown to be highly effective in applications involving the suppression of interference signals from different sources. However, it is necessary to identify efficient methodologies for the combined operation of these two techniques. The most appropriate technique may be obtained by means of the implementation of an intelligent agent capable of making the best selection between beamforming and power control. The present paper proposes an algorithm using reinforcement learning (RL to determine the optimal combination of beamforming and power control in sensor arrays. The RL algorithm used was Q-learning, employing an ε-greedy policy, and training was performed using the offline method. The simulations showed that RL was effective for implementation of a switching policy involving the different techniques, taking advantage of the positive characteristics of each technique in terms of signal reception.

  17. A reinforcement learning model of joy, distress, hope and fear

    Science.gov (United States)

    Broekens, Joost; Jacobs, Elmer; Jonker, Catholijn M.

    2015-07-01

    In this paper we computationally study the relation between adaptive behaviour and emotion. Using the reinforcement learning framework, we propose that learned state utility, ?, models fear (negative) and hope (positive) based on the fact that both signals are about anticipation of loss or gain. Further, we propose that joy/distress is a signal similar to the error signal. We present agent-based simulation experiments that show that this model replicates psychological and behavioural dynamics of emotion. This work distinguishes itself by assessing the dynamics of emotion in an adaptive agent framework - coupling it to the literature on habituation, development, extinction and hope theory. Our results support the idea that the function of emotion is to provide a complex feedback signal for an organism to adapt its behaviour. Our work is relevant for understanding the relation between emotion and adaptation in animals, as well as for human-robot interaction, in particular how emotional signals can be used to communicate between adaptive agents and humans.

  18. Reinforcement learning techniques for controlling resources in power networks

    Science.gov (United States)

    Kowli, Anupama Sunil

    As power grids transition towards increased reliance on renewable generation, energy storage and demand response resources, an effective control architecture is required to harness the full functionalities of these resources. There is a critical need for control techniques that recognize the unique characteristics of the different resources and exploit the flexibility afforded by them to provide ancillary services to the grid. The work presented in this dissertation addresses these needs. Specifically, new algorithms are proposed, which allow control synthesis in settings wherein the precise distribution of the uncertainty and its temporal statistics are not known. These algorithms are based on recent developments in Markov decision theory, approximate dynamic programming and reinforcement learning. They impose minimal assumptions on the system model and allow the control to be "learned" based on the actual dynamics of the system. Furthermore, they can accommodate complex constraints such as capacity and ramping limits on generation resources, state-of-charge constraints on storage resources, comfort-related limitations on demand response resources and power flow limits on transmission lines. Numerical studies demonstrating applications of these algorithms to practical control problems in power systems are discussed. Results demonstrate how the proposed control algorithms can be used to improve the performance and reduce the computational complexity of the economic dispatch mechanism in a power network. We argue that the proposed algorithms are eminently suitable to develop operational decision-making tools for large power grids with many resources and many sources of uncertainty.

  19. Intelligent Fuzzy Spelling Evaluator for e-Learning Systems

    Science.gov (United States)

    Chakraborty, Udit Kr.; Konar, Debanjan; Roy, Samir; Choudhury, Sankhayan

    2016-01-01

    Evaluating Learners' Response in an e-Learning environment has been the topic of current research in areas of Human Computer Interaction, e-Learning, Education Technology and even Natural Language Processing. The current paper presents a twofold strategy to evaluate single word response of a learner in an e-Learning environment. The response of…

  20. Simulation of fuzzy adaptation of cognitive/learning styles to user navigation/presentation preferences by using MATLAB

    Directory of Open Access Journals (Sweden)

    Ilham N. HUSYINOV

    2014-01-01

    Full Text Available The purpose of this paper is to present a simulation methodology of a fuzzy adaptive interface in an environment of imperfect, multimodal, complex nonlinear hyper information space. A fuzzy adaptation of user’s information navigation and presentation preferences to cognitive/learning styles is simulated by using MATLAB. To this end, fuzzy if-then rules in natural language expressions are utilized. The important implications of this approach is that uncertain and vague information is handled and the design of human computer interaction system is facilitated with high level intelligence capability

  1. Rule-bases construction through self-learning for a table-based Sugeno-Takagi fuzzy logic control system

    Directory of Open Access Journals (Sweden)

    C. Boldisor

    2009-12-01

    Full Text Available A self-learning based methodology for building the rule-base of a fuzzy logic controller (FLC is presented and verified, aiming to engage intelligent characteristics to a fuzzy logic control systems. The methodology is a simplified version of those presented in today literature. Some aspects are intentionally ignored since it rarely appears in control system engineering and a SISO process is considered here. The fuzzy inference system obtained is a table-based Sugeno-Takagi type. System’s desired performance is defined by a reference model and rules are extracted from recorded data, after the correct control actions are learned. The presented algorithm is tested in constructing the rule-base of a fuzzy controller for a DC drive application. System’s performances and method’s viability are analyzed.

  2. Introduction to Fuzzy Set Theory

    Science.gov (United States)

    Kosko, Bart

    1990-01-01

    An introduction to fuzzy set theory is described. Topics covered include: neural networks and fuzzy systems; the dynamical systems approach to machine intelligence; intelligent behavior as adaptive model-free estimation; fuzziness versus probability; fuzzy sets; the entropy-subsethood theorem; adaptive fuzzy systems for backing up a truck-and-trailer; product-space clustering with differential competitive learning; and adaptive fuzzy system for target tracking.

  3. Reinforcement Learning for Predictive Analytics in Smart Cities

    Directory of Open Access Journals (Sweden)

    Kostas Kolomvatsos

    2017-06-01

    Full Text Available The digitization of our lives cause a shift in the data production as well as in the required data management. Numerous nodes are capable of producing huge volumes of data in our everyday activities. Sensors, personal smart devices as well as the Internet of Things (IoT paradigm lead to a vast infrastructure that covers all the aspects of activities in modern societies. In the most of the cases, the critical issue for public authorities (usually, local, like municipalities is the efficient management of data towards the support of novel services. The reason is that analytics provided on top of the collected data could help in the delivery of new applications that will facilitate citizens’ lives. However, the provision of analytics demands intelligent techniques for the underlying data management. The most known technique is the separation of huge volumes of data into a number of parts and their parallel management to limit the required time for the delivery of analytics. Afterwards, analytics requests in the form of queries could be realized and derive the necessary knowledge for supporting intelligent applications. In this paper, we define the concept of a Query Controller ( Q C that receives queries for analytics and assigns each of them to a processor placed in front of each data partition. We discuss an intelligent process for query assignments that adopts Machine Learning (ML. We adopt two learning schemes, i.e., Reinforcement Learning (RL and clustering. We report on the comparison of the two schemes and elaborate on their combination. Our aim is to provide an efficient framework to support the decision making of the QC that should swiftly select the appropriate processor for each query. We provide mathematical formulations for the discussed problem and present simulation results. Through a comprehensive experimental evaluation, we reveal the advantages of the proposed models and describe the outcomes results while comparing them with a

  4. Kernel-based least squares policy iteration for reinforcement learning.

    Science.gov (United States)

    Xu, Xin; Hu, Dewen; Lu, Xicheng

    2007-07-01

    In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, near-optimal control policies can be obtained without much a priori knowledge on dynamic models of control plants. In KLSPI, Mercer kernels are used in the policy evaluation of a policy iteration process, where a new kernel-based least squares temporal-difference algorithm called KLSTD-Q is proposed for efficient policy evaluation. To keep the sparsity and improve the generalization ability of KLSTD-Q solutions, a kernel sparsification procedure based on approximate linear dependency (ALD) is performed. Compared to the previous works on approximate RL methods, KLSPI makes two progresses to eliminate the main difficulties of existing results. One is the better convergence and (near) optimality guarantee by using the KLSTD-Q algorithm for policy evaluation with high precision. The other is the automatic feature selection using the ALD-based kernel sparsification. Therefore, the KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs). Experimental results on a typical RL task for a stochastic chain problem demonstrate that KLSPI can consistently achieve better learning efficiency and policy quality than the previous least squares policy iteration (LSPI) algorithm. Furthermore, the KLSPI method was also evaluated on two nonlinear feedback control problems, including a ship heading control problem and the swing up control of a double-link underactuated pendulum called acrobot. Simulation results illustrate that the proposed method can optimize controller performance using little a priori information of uncertain dynamic systems. It is also demonstrated that KLSPI can be applied to online learning control by incorporating

  5. Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System

    CERN Document Server

    Kearns, M; Singh, S; Walker, M; 10.1613/jair.859

    2011-01-01

    Designing the dialogue policy of a spoken dialogue system involves many nontrivial choices. This paper presents a reinforcement learning approach for automatically optimizing a dialogue policy, which addresses the technical challenges in applying reinforcement learning to a working dialogue system with human users. We report on the design, construction and empirical evaluation of NJFun, an experimental spoken dialogue system that provides users with access to information about fun things to do in New Jersey. Our results show that by optimizing its performance via reinforcement learning, NJFun measurably improves system performance.

  6. Intelligence moderates reinforcement learning: a mini-review of the neural evidence.

    Science.gov (United States)

    Chen, Chong

    2015-06-01

    Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence.

  7. Reinforcement Learning Based Novel Adaptive Learning Framework for Smart Grid Prediction

    Directory of Open Access Journals (Sweden)

    Tian Li

    2017-01-01

    Full Text Available Smart grid is a potential infrastructure to supply electricity demand for end users in a safe and reliable manner. With the rapid increase of the share of renewable energy and controllable loads in smart grid, the operation uncertainty of smart grid has increased briskly during recent years. The forecast is responsible for the safety and economic operation of the smart grid. However, most existing forecast methods cannot account for the smart grid due to the disabilities to adapt to the varying operational conditions. In this paper, reinforcement learning is firstly exploited to develop an online learning framework for the smart grid. With the capability of multitime scale resolution, wavelet neural network has been adopted in the online learning framework to yield reinforcement learning and wavelet neural network (RLWNN based adaptive learning scheme. The simulations on two typical prediction problems in smart grid, including wind power prediction and load forecast, validate the effectiveness and the scalability of the proposed RLWNN based learning framework and algorithm.

  8. Reference Function Based Spatiotemporal Fuzzy Logic Control Design Using Support Vector Regression Learning

    Directory of Open Access Journals (Sweden)

    Xian-Xia Zhang

    2013-01-01

    Full Text Available This paper presents a reference function based 3D FLC design methodology using support vector regression (SVR learning. The concept of reference function is introduced to 3D FLC for the generation of 3D membership functions (MF, which enhance the capability of the 3D FLC to cope with more kinds of MFs. The nonlinear mathematical expression of the reference function based 3D FLC is derived, and spatial fuzzy basis functions are defined. Via relating spatial fuzzy basis functions of a 3D FLC to kernel functions of an SVR, an equivalence relationship between a 3D FLC and an SVR is established. Therefore, a 3D FLC can be constructed using the learned results of an SVR. Furthermore, the universal approximation capability of the proposed 3D fuzzy system is proven in terms of the finite covering theorem. Finally, the proposed method is applied to a catalytic packed-bed reactor and simulation results have verified its effectiveness.

  9. An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email

    CERN Document Server

    Walker, M A

    2011-01-01

    This paper describes a novel method by which a spoken dialogue system can learn to choose an optimal dialogue strategy from its experience interacting with human users. The method is based on a combination of reinforcement learning and performance modeling of spoken dialogue systems. The reinforcement learning component applies Q-learning (Watkins, 1989), while the performance modeling component applies the PARADISE evaluation framework (Walker et al., 1997) to learn the performance function (reward) used in reinforcement learning. We illustrate the method with a spoken dialogue system named ELVIS (EmaiL Voice Interactive System), that supports access to email over the phone. We conduct a set of experiments for training an optimal dialogue strategy on a corpus of 219 dialogues in which human users interact with ELVIS over the phone. We then test that strategy on a corpus of 18 dialogues. We show that ELVIS can learn to optimize its strategy selection for agent initiative, for reading messages, and for summari...

  10. Dopamine-Dependent Reinforcement of Motor Skill Learning: Evidence from Gilles de la Tourette Syndrome

    Science.gov (United States)

    Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias

    2011-01-01

    Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…

  11. Dopamine-Dependent Reinforcement of Motor Skill Learning: Evidence from Gilles de la Tourette Syndrome

    Science.gov (United States)

    Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias

    2011-01-01

    Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…

  12. Fuzzy Boundary and Fuzzy Semiboundary

    OpenAIRE

    Athar, M.; Ahmad, B.

    2008-01-01

    We present several properties of fuzzy boundary and fuzzy semiboundary which have been supported by examples. Properties of fuzzy semi-interior, fuzzy semiclosure, fuzzy boundary, and fuzzy semiboundary have been obtained in product-related spaces. We give necessary conditions for fuzzy continuous (resp., fuzzy semicontinuous, fuzzy irresolute) functions. Moreover, fuzzy continuous (resp., fuzzy semicontinuous, fuzzy irresolute) functions have been characterized via fuzzy-derived (resp., fuzz...

  13. Efficient exploration through active learning for value function approximation in reinforcement learning.

    Science.gov (United States)

    Akiyama, Takayuki; Hachiya, Hirotaka; Sugiyama, Masashi

    2010-06-01

    Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The effectiveness of the proposed method, which we call active policy iteration (API), is demonstrated through simulations with a batting robot.

  14. Dynamic Fuzzy Logic-Based Quality of Interaction within Blended-Learning: The Rare and Contemporary Dance Cases

    Science.gov (United States)

    Dias, Sofia B.; Diniz, José A.; Hadjileontiadis, Leontios J.

    2014-01-01

    The combination of the process of pedagogical planning within the Blended (b-) learning environment with the users' quality of interaction ("QoI") with the Learning Management System (LMS) is explored here. The required "QoI" (both for professors and students) is estimated by adopting a fuzzy logic-based modeling approach,…

  15. Fast Reinforcement Learning for Energy-Efficient Wireless Communications

    CERN Document Server

    Mastronarde, Nicholas

    2010-01-01

    We consider the problem of energy-efficient point-to-point transmission of delay-sensitive data (e.g. multimedia data) over a fading channel. Existing research on this topic utilizes either physical-layer centric solutions, namely power-control and adaptive modulation and coding (AMC), or system-level solutions based on dynamic power management (DPM); however, there is currently no rigorous and unified framework for simultaneously utilizing both physical-layer centric and system-level techniques to achieve the minimum possible energy consumption, under delay constraints, in the presence of stochastic and a priori unknown traffic and channel conditions. In this report, we propose such a framework. We formulate the stochastic optimization problem as a Markov decision process (MDP) and solve it online using reinforcement learning. The advantages of the proposed online method are that (i) it does not require a priori knowledge of the traffic arrival and channel statistics to determine the jointly optimal power-co...

  16. Risk-Sensitive Reinforcement Learning Applied to Control under Constraints

    CERN Document Server

    Geibel, P; 10.1613/jair.1666

    2011-01-01

    In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some user-specified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in order to find a feasible solution for the constrained problem that has a good performance with respect to the value function. The...

  17. Off-policy reinforcement learning for H∞ control design.

    Science.gov (United States)

    Luo, Biao; Wu, Huai-Ning; Huang, Tingwen

    2015-01-01

    The H∞ control design problem is considered for nonlinear systems with unknown internal system model. It is known that the nonlinear H∞ control problem can be transformed into solving the so-called Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, model-based approaches cannot be used for approximately solving HJI equation, when the accurate system model is unavailable or costly to obtain in practice. To overcome these difficulties, an off-policy reinforcement leaning (RL) method is introduced to learn the solution of HJI equation from real system data instead of mathematical system model, and its convergence is proved. In the off-policy RL method, the system data can be generated with arbitrary policies rather than the evaluating policy, which is extremely important and promising for practical systems. For implementation purpose, a neural network (NN)-based actor-critic structure is employed and a least-square NN weight update algorithm is derived based on the method of weighted residuals. Finally, the developed NN-based off-policy RL method is tested on a linear F16 aircraft plant, and further applied to a rotational/translational actuator system.

  18. Temporal-difference reinforcement learning with distributed representations.

    Directory of Open Access Journals (Sweden)

    Zeb Kurth-Nelson

    Full Text Available Temporal-difference (TD algorithms have been proposed as models of reinforcement learning (RL. We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential discounting factors produce hyperbolic discounting in the behavior of the agent itself. We examine these issues in the context of a TD RL model in which state-belief is distributed over a set of exponentially-discounting "micro-Agents", each of which has a separate discounting factor (gamma. Each microAgent maintains an independent hypothesis about the state of the world, and a separate value-estimate of taking actions within that hypothesized state. The overall agent thus instantiates a flexible representation of an evolving world-state. As with other TD models, the value-error (delta signal within the model matches dopamine signals recorded from animals in standard conditioning reward-paradigms. The distributed representation of belief provides an explanation for the decrease in dopamine at the conditioned stimulus seen in overtrained animals, for the differences between trace and delay conditioning, and for transient bursts of dopamine seen at movement initiation. Because each microAgent also includes its own exponential discounting factor, the overall agent shows hyperbolic discounting, consistent with behavioral experiments.

  19. Cortical mechanisms for reinforcement learning in competitive games.

    Science.gov (United States)

    Seo, Hyojung; Lee, Daeyeol

    2008-12-12

    Game theory analyses optimal strategies for multiple decision makers interacting in a social group. However, the behaviours of individual humans and animals often deviate systematically from the optimal strategies described by game theory. The behaviours of rhesus monkeys (Macaca mulatta) in simple zero-sum games showed similar patterns, but their departures from the optimal strategies were well accounted for by a simple reinforcement-learning algorithm. During a computer-simulated zero-sum game, neurons in the dorsolateral prefrontal cortex often encoded the previous choices of the animal and its opponent as well as the animal's reward history. By contrast, the neurons in the anterior cingulate cortex predominantly encoded the animal's reward history. Using simple competitive games, therefore, we have demonstrated functional specialization between different areas of the primate frontal cortex involved in outcome monitoring and action selection. Temporally extended signals related to the animal's previous choices might facilitate the association between choices and their delayed outcomes, whereas information about the choices of the opponent might be used to estimate the reward expected from a particular action. Finally, signals related to the reward history might be used to monitor the overall success of the animal's current decision-making strategy.

  20. Robot Cognitive Control with a Neurophysiologically Inspired Reinforcement Learning Model

    Science.gov (United States)

    Khamassi, Mehdi; Lallée, Stéphane; Enel, Pierre; Procyk, Emmanuel; Dominey, Peter F.

    2011-01-01

    A major challenge in modern robotics is to liberate robots from controlled industrial settings, and allow them to interact with humans and changing environments in the real-world. The current research attempts to determine if a neurophysiologically motivated model of cortical function in the primate can help to address this challenge. Primates are endowed with cognitive systems that allow them to maximize the feedback from their environment by learning the values of actions in diverse situations and by adjusting their behavioral parameters (i.e., cognitive control) to accommodate unexpected events. In such contexts uncertainty can arise from at least two distinct sources – expected uncertainty resulting from noise during sensory-motor interaction in a known context, and unexpected uncertainty resulting from the changing probabilistic structure of the environment. However, it is not clear how neurophysiological mechanisms of reinforcement learning and cognitive control integrate in the brain to produce efficient behavior. Based on primate neuroanatomy and neurophysiology, we propose a novel computational model for the interaction between lateral prefrontal and anterior cingulate cortex reconciling previous models dedicated to these two functions. We deployed the model in two robots and demonstrate that, based on adaptive regulation of a meta-parameter β that controls the exploration rate, the model can robustly deal with the two kinds of uncertainties in the real-world. In addition the model could reproduce monkey behavioral performance and neurophysiological data in two problem-solving tasks. A last experiment extends this to human–robot interaction with the iCub humanoid, and novel sources of uncertainty corresponding to “cheating” by the human. The combined results provide concrete evidence for the ability of neurophysiologically inspired cognitive systems to control advanced robots in the real-world. PMID:21808619

  1. Applying reinforcement learning to the weapon assignment problem in air defence

    CSIR Research Space (South Africa)

    Mouton, H

    2011-12-01

    Full Text Available . The techniques investigated in this article were two methods from the machine-learning subfield of reinforcement learning (RL), namely a Monte Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy temporal-difference (TD) learning...

  2. Automatic feature selection for model-based reinforcement learning in factored MDPs

    NARCIS (Netherlands)

    Kroon, M.; Whiteson, S.; Wani, M.A.; Kantardzic, M.; Palade, V.; Kurgan, L.; Qi, A.

    2009-01-01

    Feature selection is an important challenge in machine learning. Unfortunately, most methods for automating feature selection are designed for supervised learning tasks and are thus either inapplicable or impractical for reinforcement learning. This paper presents a new approach to feature selection

  3. Nonparametric bayesian reward segmentation for skill discovery using inverse reinforcement learning

    CSIR Research Space (South Africa)

    Ranchod, P

    2015-10-01

    Full Text Available We present a method for segmenting a set of unstructured demonstration trajectories to discover reusable skills using inverse reinforcement learning (IRL). Each skill is characterised by a latent reward function which the demonstrator is assumed...

  4. [The model of the reward choice basing on the theory of reinforcement learning].

    Science.gov (United States)

    Smirnitskaia, I A; Frolov, A A; Merzhanova, G Kh

    2007-01-01

    We developed the model of alimentary instrumental conditioned bar-pressing reflex for cats making a choice between either immediate small reinforcement ("impulsive behavior") or delayed more valuable reinforcement ("self-control behavior"). Our model is based on the reinforcement learning theory. We emulated dopamine contribution by discount coefficient of this theory (a subjective decrease in the value of a delayed reinforcement). The results of computer simulation showed that "cats" with large discount coefficient demonstrated "self-control behavior"; small discount coefficient was associated with "impulsive behavior". This data are in agreement with the experimental data indicating that the impulsive behavior is due to a decreased amount of dopamine in striatum.

  5. Knowledge-Based Reinforcement Learning for Data Mining

    Science.gov (United States)

    Kudenko, Daniel; Grzes, Marek

    Data Mining is the process of extracting patterns from data. Two general avenues of research in the intersecting areas of agents and data mining can be distinguished. The first approach is concerned with mining an agent’s observation data in order to extract patterns, categorize environment states, and/or make predictions of future states. In this setting, data is normally available as a batch, and the agent’s actions and goals are often independent of the data mining task. The data collection is mainly considered as a side effect of the agent’s activities. Machine learning techniques applied in such situations fall into the class of supervised learning. In contrast, the second scenario occurs where an agent is actively performing the data mining, and is responsible for the data collection itself. For example, a mobile network agent is acquiring and processing data (where the acquisition may incur a certain cost), or a mobile sensor agent is moving in a (perhaps hostile) environment, collecting and processing sensor readings. In these settings, the tasks of the agent and the data mining are highly intertwined and interdependent (or even identical). Supervised learning is not a suitable technique for these cases. Reinforcement Learning (RL) enables an agent to learn from experience (in form of reward and punishment for explorative actions) and adapt to new situations, without a teacher. RL is an ideal learning technique for these data mining scenarios, because it fits the agent paradigm of continuous sensing and acting, and the RL agent is able to learn to make decisions on the sampling of the environment which provides the data. Nevertheless, RL still suffers from scalability problems, which have prevented its successful use in many complex real-world domains. The more complex the tasks, the longer it takes a reinforcement learning algorithm to converge to a good solution. For many real-world tasks, human expert knowledge is available. For example, human

  6. [The Identification of Lettuce Varieties by Using Unsupervised Possibilistic Fuzzy Learning Vector Quantization and Near Infrared Spectroscopy].

    Science.gov (United States)

    Wu, Xiao-hong; Cai, Pei-qiang; Wu, Bin; Sun, Jun; Ji, Gang

    2016-03-01

    To solve the noisy sensitivity problem of fuzzy learning vector quantization (FLVQ), unsupervised possibilistic fuzzy learning vector quantization (UPFLVQ) was proposed based on unsupervised possibilistic fuzzy clustering (UPFC). UPFLVQ aimed to use fuzzy membership values and typicality values of UPFC to update the learning rate of learning vector quantization network and cluster centers. UPFLVQ is an unsupervised machine learning algorithm and it can be applied to classify without learning samples. UPFLVQ was used in the identification of lettuce varieties by near infrared spectroscopy (NIS). Short wave and long wave near infrared spectra of three types of lettuces were collected by FieldSpec@3 portable spectrometer in the wave-length range of 350-2 500 nm. When the near infrared spectra were compressed by principal component analysis (PCA), the first three principal components explained 97.50% of the total variance in near infrared spectra. After fuzzy c-means (FCM). clustering was performed for its cluster centers as the initial cluster centers of UPFLVQ, UPFLVQ could classify lettuce varieties with the terminal fuzzy membership values and typicality values. The experimental results showed that UPFLVQ together with NIS provided an effective method of identification of lettuce varieties with advantages such as fast testing, high accuracy rate and non-destructive characteristics. UPFLVQ is a clustering algorithm by combining UPFC and FLVQ, and it need not prepare any learning samples for the identification of lettuce varieties by NIS. UPFLVQ is suitable for linear separable data clustering and it provides a novel method for fast and nondestructive identification of lettuce varieties.

  7. A New Fuzzy Stacked Generalization Technique for Deep learning and Analysis of its Performance

    CERN Document Server

    Ozay, Mete

    2012-01-01

    We propose a robust Fuzzy Stacked Generalization (FSG) technique for deep learning, which assures a better performance than that of the individual classifiers. FSG aggregates a set of fuzzy k- Nearest Neighbor (k-nn) classifiers in a two-level hierarchy. We make a thorough analysis to investigate the learning mechanism of the suggested deep learning architecture and analyze its performance. We suggest two hypotheses to boost the performance of the suggested architec-ture and show that the success of the FSG highly depends on how the individual classifiers share to learn the samples in the training set. Rather than the power of the individual base layer-classifiers, diversity and cooperation of the classifiers become an important issue to improve the overall performance of the proposed FSG. A weak classifier may boost the overall performance more than a strong classifier, if it is capable of recognizing the samples which are not recognized by the rest of the classifiers. Therefore, the problem of designing a d...

  8. Dissociating error-based and reinforcement-based loss functions during sensorimotor learning.

    Science.gov (United States)

    Cashaback, Joshua G A; McGregor, Heather R; Mohatarem, Ayman; Gribble, Paul L

    2017-07-01

    It has been proposed that the sensorimotor system uses a loss (cost) function to evaluate potential movements in the presence of random noise. Here we test this idea in the context of both error-based and reinforcement-based learning. In a reaching task, we laterally shifted a cursor relative to true hand position using a skewed probability distribution. This skewed probability distribution had its mean and mode separated, allowing us to dissociate the optimal predictions of an error-based loss function (corresponding to the mean of the lateral shifts) and a reinforcement-based loss function (corresponding to the mode). We then examined how the sensorimotor system uses error feedback and reinforcement feedback, in isolation and combination, when deciding where to aim the hand during a reach. We found that participants compensated differently to the same skewed lateral shift distribution depending on the form of feedback they received. When provided with error feedback, participants compensated based on the mean of the skewed noise. When provided with reinforcement feedback, participants compensated based on the mode. Participants receiving both error and reinforcement feedback continued to compensate based on the mean while repeatedly missing the target, despite receiving auditory, visual and monetary reinforcement feedback that rewarded hitting the target. Our work shows that reinforcement-based and error-based learning are separable and can occur independently. Further, when error and reinforcement feedback are in conflict, the sensorimotor system heavily weights error feedback over reinforcement feedback.

  9. The cerebellum: A neural system for the study of reinforcement learning

    Directory of Open Access Journals (Sweden)

    Rodney A. Swain

    2011-03-01

    Full Text Available In its strictest application, the term reinforcement learning refers to a computational approach to learning in which an agent (often a machine interacts with a mutable environment to maximize reward through trial and error. The approach borrows essentials from several fields, most notably Computer Science, Behavioral Neuroscience, and Psychology. At the most basic level, a neural system capable of mediating reinforcement learning must be able to acquire sensory information about the external environment and internal milieu (either directly or through connectivities with other brain regions, must be able to select a behavior to be executed, and must be capable of providing evaluative feedback about the success of that behavior. Given that Psychology informs us that reinforcers, both positive and negative, are stimuli or consequences that increase the probability that the immediately antecedent behavior will be repeated and that reinforcer strength or viability is modulated by the organism’s past experience with the reinforcer, its affect, and even the state of its muscles (e.g., eyes open or closed; it is the case that any neural system that supports reinforcement learning must also be sensitive to these same considerations. Once learning is established, such a neural system must finally be able to maintain continued response expression and prevent response drift. In this report, we examine both historical and recent evidence that the cerebellum satisfies all of these requirements. While we report evidence from a variety of learning paradigms, the majority of our discussion will focus on classical conditioning of the rabbit eye blink response as an ideal model system for the study of reinforcement and reinforcement learning.

  10. The cerebellum: a neural system for the study of reinforcement learning.

    Science.gov (United States)

    Swain, Rodney A; Kerr, Abigail L; Thompson, Richard F

    2011-01-01

    In its strictest application, the term "reinforcement learning" refers to a computational approach to learning in which an agent (often a machine) interacts with a mutable environment to maximize reward through trial and error. The approach borrows essentials from several fields, most notably Computer Science, Behavioral Neuroscience, and Psychology. At the most basic level, a neural system capable of mediating reinforcement learning must be able to acquire sensory information about the external environment and internal milieu (either directly or through connectivities with other brain regions), must be able to select a behavior to be executed, and must be capable of providing evaluative feedback about the success of that behavior. Given that Psychology informs us that reinforcers, both positive and negative, are stimuli or consequences that increase the probability that the immediately antecedent behavior will be repeated and that reinforcer strength or viability is modulated by the organism's past experience with the reinforcer, its affect, and even the state of its muscles (e.g., eyes open or closed); it is the case that any neural system that supports reinforcement learning must also be sensitive to these same considerations. Once learning is established, such a neural system must finally be able to maintain continued response expression and prevent response drift. In this report, we examine both historical and recent evidence that the cerebellum satisfies all of these requirements. While we report evidence from a variety of learning paradigms, the majority of our discussion will focus on classical conditioning of the rabbit eye blink response as an ideal model system for the study of reinforcement and reinforcement learning.

  11. $QD$-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations

    CERN Document Server

    Kar, Soummya; Poor, H Vincent

    2012-01-01

    The paper considers a class of multi-agent Markov decision processes (MDPs), in which the network agents respond differently (as manifested by the instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. The paper investigates a distributed reinforcement learning setup with no prior information on the global state transition and local agent cost statistics. Specifically, with the agents' objective consisting of minimizing a network-averaged infinite horizon discounted cost, the paper proposes a distributed version of $Q$-learning, $\\mathcal{QD}$-learning, in which the network agents collaborate by means of local processing and mutual information exchange over a sparse (possibly stochastic) communication network to achieve the network goal. Under the assumption that each agent is only aware of its local online cost data and the inter-agent communication network is \\emph{weakly} connected, the proposed distributed scheme is almost surely (a.s.) shown to ...

  12. Multi-agent reinforcement learning with cooperation based on eligibility traces

    Institute of Scientific and Technical Information of China (English)

    杨玉君; 程君实; 陈佳品

    2004-01-01

    The application of reinforcement learning is widely used by multi-agent systems in recent years. An agent uses a multi-agent system to cooperate with other agents to accomplish the given task, and one agent's be-havior usually affects the others' behaviors. In traditional reinforcement learning, one agent takes the others lo-cation, so it is difficult to consider the others' behavior, which decreases the learning efficiency. This paper proposes multi-agent reinforcement learning with cooperation based on eligibility traces, i.e. one agent esti-mates the other agent's behavior with the other agent's eligibility traces. The results of this simulation prove the validity of the proposed learning method.

  13. A modified Adaptive Wavelet PID Control Based on Reinforcement Learning for Wind Energy Conversion System Control

    Directory of Open Access Journals (Sweden)

    REZAZADEH, A.

    2010-05-01

    Full Text Available Nonlinear characteristics of wind turbines and electric generators necessitate complicated and nonlinear control of grid connected Wind Energy Conversion Systems (WECS. This paper proposes a modified self-tuning PID control strategy, using reinforcement learning for WECS control. The controller employs Actor-Critic learning in order to tune PID parameters adaptively. These Actor-Critic learning is a special kind of reinforcement learning that uses a single wavelet neural network to approximate the policy function of the Actor and the value function of the Critic simultaneously. These controllers are used to control a typical WECS in noiseless and noisy condition and results are compared with an adaptive Radial Basis Function (RBF PID control based on reinforcement learning and conventional PID control. Practical emulated results prove the capability and the robustness of the suggested controller versus the other PID controllers to control of the WECS. The ability of presented controller is tested by experimental setup.

  14. Experimental analysis of simulated reinforcement learning control for active and passive building thermal storage inventory

    Energy Technology Data Exchange (ETDEWEB)

    Liu, S. [Architectural Engineering, University of Nebraska-Lincoln, PKI 243, Omaha, NE (United States); Henze, G. P. [Architectural Engineering, University of Nebraska-Lincoln, PKI 203D, Omaha, NE (United States)

    2006-07-01

    This paper is the first part of a two-part investigation of a novel approach to optimally control commercial building passive and active thermal storage inventory. The proposed building control approach is based on simulated reinforcement learning, which is a hybrid control scheme that combines features of model-based optimal control and model-free learning control. An experimental study was carried out to analyze the performance of a hybrid controller installed in a full-scale laboratory facility. The first part presents an overview of the project with an emphasis on the theoretical foundation. The motivation of the research will be introduced first, followed by a review of past work. A brief introduction of the theory is provided including classic reinforcement learning and its variation, so-called simulated reinforcement learning, which constitutes the basic architecture of the hybrid learning controller. A detailed discussion of the experimental results will be presented in the companion paper. (author)

  15. Reinforcement Learning for Agents with Many Sensors and Actuators Acting in Categorizable Environments

    CERN Document Server

    Celaya, E; 10.1613/jair.1437

    2011-01-01

    In this paper, we confront the problem of applying reinforcement learning to agents that perceive the environment through many sensors and that can perform parallel actions using many actuators as is the case in complex autonomous robots. We argue that reinforcement learning can only be successfully applied to this case if strong assumptions are made on the characteristics of the environment in which the learning is performed, so that the relevant sensor readings and motor commands can be readily identified. The introduction of such assumptions leads to strongly-biased learning systems that can eventually lose the generality of traditional reinforcement-learning algorithms. In this line, we observe that, in realistic situations, the reward received by the robot depends only on a reduced subset of all the executed actions and that only a reduced subset of the sensor inputs (possibly different in each situation and for each action) are relevant to predict the reward. We formalize this property in the so called ...

  16. Integer-encoded massively parallel processing of fast-learning fuzzy ARTMAP neural networks

    Science.gov (United States)

    Bahr, Hubert A.; DeMara, Ronald F.; Georgiopoulos, Michael

    1997-04-01

    In this paper we develop techniques that are suitable for the parallel implementation of Fuzzy ARTMAP networks. Speedup and learning performance results are provided for execution on a DECmpp/Sx-1208 parallel processor consisting of a DEC RISC Workstation Front-End and MasPar MP-1 Back-End with 8,192 processors. Experiments of the parallel implementation were conducted on the Letters benchmark database developed by Frey and Slate. The results indicate a speedup on the order of 1000-fold which allows combined training and testing time of under four minutes.

  17. Optimized Assistive Human-Robot Interaction Using Reinforcement Learning.

    Science.gov (United States)

    Modares, Hamidreza; Ranatunga, Isura; Lewis, Frank L; Popa, Dan O

    2016-03-01

    An intelligent human-robot interaction (HRI) system with adjustable robot behavior is presented. The proposed HRI system assists the human operator to perform a given task with minimum workload demands and optimizes the overall human-robot system performance. Motivated by human factor studies, the presented control structure consists of two control loops. First, a robot-specific neuro-adaptive controller is designed in the inner loop to make the unknown nonlinear robot behave like a prescribed robot impedance model as perceived by a human operator. In contrast to existing neural network and adaptive impedance-based control methods, no information of the task performance or the prescribed robot impedance model parameters is required in the inner loop. Then, a task-specific outer-loop controller is designed to find the optimal parameters of the prescribed robot impedance model to adjust the robot's dynamics to the operator skills and minimize the tracking error. The outer loop includes the human operator, the robot, and the task performance details. The problem of finding the optimal parameters of the prescribed robot impedance model is transformed into a linear quadratic regulator (LQR) problem which minimizes the human effort and optimizes the closed-loop behavior of the HRI system for a given task. To obviate the requirement of the knowledge of the human model, integral reinforcement learning is used to solve the given LQR problem. Simulation results on an x - y table and a robot arm, and experimental implementation results on a PR2 robot confirm the suitability of the proposed method.

  18. Human dorsal striatal activity during choice discriminates reinforcement learning behavior from the gambler's fallacy.

    Science.gov (United States)

    Jessup, Ryan K; O'Doherty, John P

    2011-04-27

    Reinforcement learning theory has generated substantial interest in neurobiology, particularly because of the resemblance between phasic dopamine and reward prediction errors. Actor-critic theories have been adapted to account for the functions of the striatum, with parts of the dorsal striatum equated to the actor. Here, we specifically test whether the human dorsal striatum--as predicted by an actor-critic instantiation--is used on a trial-to-trial basis at the time of choice to choose in accordance with reinforcement learning theory, as opposed to a competing strategy: the gambler's fallacy. Using a partial-brain functional magnetic resonance imaging scanning protocol focused on the striatum and other ventral brain areas, we found that the dorsal striatum is more active when choosing consistent with reinforcement learning compared with the competing strategy. Moreover, an overlapping area of dorsal striatum along with the ventral striatum was found to be correlated with reward prediction errors at the time of outcome, as predicted by the actor-critic framework. These findings suggest that the same region of dorsal striatum involved in learning stimulus-response associations may contribute to the control of behavior during choice, thereby using those learned associations. Intriguingly, neither reinforcement learning nor the gambler's fallacy conformed to the optimal choice strategy on the specific decision-making task we used. Thus, the dorsal striatum may contribute to the control of behavior according to reinforcement learning even when the prescriptions of such an algorithm are suboptimal in terms of maximizing future rewards.

  19. Reinforcement learning and dopamine in schizophrenia: dimensions of symptoms or specific features of a disease group?

    Directory of Open Access Journals (Sweden)

    Lorenz eDeserno

    2013-12-01

    Full Text Available Abnormalities in reinforcement learning are a key finding in schizophrenia and have been proposed to be linked to elevated levels of dopamine neurotransmission. Behavioral deficits in reinforcement learning and their neural correlates may contribute to the formation of clinical characteristics of schizophrenia. The ability to form predictions about future outcomes is fundamental for environmental interactions and depends on neuronal teaching signals, like reward prediction errors. While aberrant prediction errors, that encode non-salient events as surprising, have been proposed to contribute to the formation of positive symptoms, a failure to build neural representations of decision values may result in negative symptoms. Here, we review behavioral and neuroimaging research in schizophrenia and focus on studies that implemented reinforcement learning models. In addition, we discuss studies that combined reinforcement learning with measures of dopamine. Thereby, we suggest how reinforcement learning abnormalities in schizophrenia may contribute to the formation of psychotic symptoms and may interact with cognitive deficits. These ideas point towards an interplay of more rigid versus flexible control over reinforcement learning. Pronounced deficits in the flexible or model-based domain may allow for a detailed characterization of well-established cognitive deficits in schizophrenia patients based on computational models of learning. Finally, we propose a framework based on the potentially crucial contribution of dopamine to dysfunctional reinforcement learning on the level of neural networks. Future research may strongly benefit from computational modeling but also requires further methodological improvement for clinical group studies. These research tools may help to improve our understanding of disease-specific mechanisms and may help to identify clinically relevant subgroups of the heterogeneous entity schizophrenia.

  20. Reinforcement learning and dopamine in schizophrenia: dimensions of symptoms or specific features of a disease group?

    Science.gov (United States)

    Deserno, Lorenz; Boehme, Rebecca; Heinz, Andreas; Schlagenhauf, Florian

    2013-12-23

    Abnormalities in reinforcement learning are a key finding in schizophrenia and have been proposed to be linked to elevated levels of dopamine neurotransmission. Behavioral deficits in reinforcement learning and their neural correlates may contribute to the formation of clinical characteristics of schizophrenia. The ability to form predictions about future outcomes is fundamental for environmental interactions and depends on neuronal teaching signals, like reward prediction errors. While aberrant prediction errors, that encode non-salient events as surprising, have been proposed to contribute to the formation of positive symptoms, a failure to build neural representations of decision values may result in negative symptoms. Here, we review behavioral and neuroimaging research in schizophrenia and focus on studies that implemented reinforcement learning models. In addition, we discuss studies that combined reinforcement learning with measures of dopamine. Thereby, we suggest how reinforcement learning abnormalities in schizophrenia may contribute to the formation of psychotic symptoms and may interact with cognitive deficits. These ideas point toward an interplay of more rigid versus flexible control over reinforcement learning. Pronounced deficits in the flexible or model-based domain may allow for a detailed characterization of well-established cognitive deficits in schizophrenia patients based on computational models of learning. Finally, we propose a framework based on the potentially crucial contribution of dopamine to dysfunctional reinforcement learning on the level of neural networks. Future research may strongly benefit from computational modeling but also requires further methodological improvement for clinical group studies. These research tools may help to improve our understanding of disease-specific mechanisms and may help to identify clinically relevant subgroups of the heterogeneous entity schizophrenia.

  1. Learning processes affecting human decision making: An assessment of reinforcer-selective Pavlovian-to-instrumental transfer following reinforcer devaluation.

    Science.gov (United States)

    Allman, Melissa J; DeLeon, Iser G; Cataldo, Michael F; Holland, Peter C; Johnson, Alexander W

    2010-07-01

    In reinforcer-selective transfer, Pavlovian stimuli that are predictive of specific outcomes bias performance toward responses associated with those outcomes. Although this phenomenon has been extensively examined in rodents, recent assessments have extended to humans. Using a stock market paradigm adults were trained to associate particular symbols and responses with particular currencies. During the first test, individuals showed a preference for responding on actions associated with the same outcome as that predicted by the presented stimulus (i.e., a reinforcer-selective transfer effect). In the second test of the experiment, one of the currencies was devalued. We found it notable that this served to reduce responses to those stimuli associated with the devalued currency. This finding is in contrast to that typically observed in rodent studies, and suggests that participants in this task represented the sensory features that differentiate the reinforcers and their value during reinforcer-selective transfer. These results are discussed in terms of implications for understanding associative learning processes in humans and the ability of reward-paired cues to direct adaptive and maladaptive behavior.

  2. FPGA implementation of neuro-fuzzy system with improved PSO learning.

    Science.gov (United States)

    Karakuzu, Cihan; Karakaya, Fuat; Çavuşlu, Mehmet Ali

    2016-07-01

    This paper presents the first hardware implementation of neuro-fuzzy system (NFS) with its metaheuristic learning ability on field programmable gate array (FPGA). Metaheuristic learning of NFS for all of its parameters is accomplished by using the improved particle swarm optimization (iPSO). As a second novelty, a new functional approach, which does not require any memory and multiplier usage, is proposed for the Gaussian membership functions of NFS. NFS and its learning using iPSO are implemented on Xilinx Virtex5 xc5vlx110-3ff1153 and efficiency of the proposed implementation tested on two dynamic system identification problems and licence plate detection problem as a practical application. Results indicate that proposed NFS implementation and membership function approximation is as effective as the other approaches available in the literature but requires less hardware resources.

  3. Reinforcement learning deficits in people with schizophrenia persist after extended trials.

    Science.gov (United States)

    Cicero, David C; Martin, Elizabeth A; Becker, Theresa M; Kerns, John G

    2014-12-30

    Previous research suggests that people with schizophrenia have difficulty learning from positive feedback and when learning needs to occur rapidly. However, they seem to have relatively intact learning from negative feedback when learning occurs gradually. Participants are typically given a limited amount of acquisition trials to learn the reward contingencies and then tested about what they learned. The current study examined whether participants with schizophrenia continue to display these deficits when given extra time to learn the contingences. Participants with schizophrenia and matched healthy controls completed the Probabilistic Selection Task, which measures positive and negative feedback learning separately. Participants with schizophrenia showed a deficit in learning from both positive feedback and negative feedback. These reward learning deficits persisted even if people with schizophrenia are given extra time (up to 10 blocks of 60 trials) to learn the reward contingencies. These results suggest that the observed deficits cannot be attributed solely to slower learning and instead reflect a specific deficit in reinforcement learning.

  4. Dynamic Sensor Tasking for Space Situational Awareness via Reinforcement Learning

    Science.gov (United States)

    Linares, R.; Furfaro, R.

    2016-09-01

    This paper studies the Sensor Management (SM) problem for optical Space Object (SO) tracking. The tasking problem is formulated as a Markov Decision Process (MDP) and solved using Reinforcement Learning (RL). The RL problem is solved using the actor-critic policy gradient approach. The actor provides a policy which is random over actions and given by a parametric probability density function (pdf). The critic evaluates the policy by calculating the estimated total reward or the value function for the problem. The parameters of the policy action pdf are optimized using gradients with respect to the reward function. Both the critic and the actor are modeled using deep neural networks (multi-layer neural networks). The policy neural network takes the current state as input and outputs probabilities for each possible action. This policy is random, and can be evaluated by sampling random actions using the probabilities determined by the policy neural network's outputs. The critic approximates the total reward using a neural network. The estimated total reward is used to approximate the gradient of the policy network with respect to the network parameters. This approach is used to find the non-myopic optimal policy for tasking optical sensors to estimate SO orbits. The reward function is based on reducing the uncertainty for the overall catalog to below a user specified uncertainty threshold. This work uses a 30 km total position error for the uncertainty threshold. This work provides the RL method with a negative reward as long as any SO has a total position error above the uncertainty threshold. This penalizes policies that take longer to achieve the desired accuracy. A positive reward is provided when all SOs are below the catalog uncertainty threshold. An optimal policy is sought that takes actions to achieve the desired catalog uncertainty in minimum time. This work trains the policy in simulation by letting it task a single sensor to "learn" from its performance

  5. Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control

    OpenAIRE

    Zhang, Fangyi; Leitner, Jürgen; Milford, Michael; Upcroft, Ben; Corke, Peter

    2015-01-01

    This paper introduces a machine learning based system for controlling a robotic manipulator with visual perception only. The capability to autonomously learn robot controllers solely from raw-pixel images and without any prior knowledge of configuration is shown for the first time. We build upon the success of recent deep reinforcement learning and develop a system for learning target reaching with a three-joint robot manipulator using external visual observation. A Deep Q Network (DQN) was d...

  6. Maximization of Learning Speed Due to Neuronal Redundancy in Reinforcement Learning

    Science.gov (United States)

    Takiyama, Ken

    2016-11-01

    Adaptable neural activity contributes to the flexibility of human behavior, which is optimized in situations such as motor learning and decision making. Although learning signals in motor learning and decision making are low-dimensional, neural activity, which is very high dimensional, must be modified to achieve optimal performance based on the low-dimensional signal, resulting in a severe credit-assignment problem. Despite this problem, the human brain contains a vast number of neurons, leaving an open question: what is the functional significance of the huge number of neurons? Here, I address this question by analyzing a redundant neural network with a reinforcement-learning algorithm in which the numbers of neurons and output units are N and M, respectively. Because many combinations of neural activity can generate the same output under the condition of N ≫ M, I refer to the index N - M as neuronal redundancy. Although greater neuronal redundancy makes the credit-assignment problem more severe, I demonstrate that a greater degree of neuronal redundancy facilitates learning speed. Thus, in an apparent contradiction of the credit-assignment problem, I propose the hypothesis that a functional role of a huge number of neurons or a huge degree of neuronal redundancy is to facilitate learning speed.

  7. A reinforcement learning approach to model interactions between landmarks and geometric cues during spatial learning.

    Science.gov (United States)

    Sheynikhovich, Denis; Arleo, Angelo

    2010-12-13

    In contrast to predictions derived from the associative learning theory, a number of behavioral studies suggested the absence of competition between geometric cues and landmarks in some experimental paradigms. In parallel to these studies, neurobiological experiments suggested the existence of separate independent memory systems which may not always interact according to classic associative principles. In this paper we attempt to combine these two lines of research by proposing a model of spatial learning that is based on the theory of multiple memory systems. In our model, a place-based locale strategy uses activities of modeled hippocampal place cells to drive navigation to a hidden goal, while a stimulus-response taxon strategy, presumably mediated by the dorso-lateral striatum, learns landmark-approaching behavior. A strategy selection network, proposed to reside in the prefrontal cortex, implements a simple reinforcement learning rule to switch behavioral strategies. The model is used to reproduce the results of a behavioral experiment in which an interaction between a landmark and geometric cues was studied. We show that this model, built on the basis of neurobiological data, can explain the lack of competition between the landmark and geometry, potentiation of geometry learning by the landmark, and blocking. Namely, we propose that the geometry potentiation is a consequence of cooperation between memory systems during learning, while blocking is due to competition between the memory systems during action selection.

  8. Oxytocin enhances amygdala-dependent, socially reinforced learning and emotional empathy in humans.

    Science.gov (United States)

    Hurlemann, René; Patin, Alexandra; Onur, Oezguer A; Cohen, Michael X; Baumgartner, Tobias; Metzler, Sarah; Dziobek, Isabel; Gallinat, Juergen; Wagner, Michael; Maier, Wolfgang; Kendrick, Keith M

    2010-04-07

    Oxytocin (OT) is becoming increasingly established as a prosocial neuropeptide in humans with therapeutic potential in treatment of social, cognitive, and mood disorders. However, the potential of OT as a general facilitator of human learning and empathy is unclear. The current double-blind experiments on healthy adult male volunteers investigated first whether treatment with intranasal OT enhanced learning performance on a feedback-guided item-category association task where either social (smiling and angry faces) or nonsocial (green and red lights) reinforcers were used, and second whether it increased either cognitive or emotional empathy measured by the Multifaceted Empathy Test. Further experiments investigated whether OT-sensitive behavioral components required a normal functional amygdala. Results in control groups showed that learning performance was improved when social rather than nonsocial reinforcement was used. Intranasal OT potentiated this social reinforcement advantage and greatly increased emotional, but not cognitive, empathy in response to both positive and negative valence stimuli. Interestingly, after OT treatment, emotional empathy responses in men were raised to levels similar to those found in untreated women. Two patients with selective bilateral damage to the amygdala (monozygotic twins with congenital Urbach-Wiethe disease) were impaired on both OT-sensitive aspects of these learning and empathy tasks, but performed normally on nonsocially reinforced learning and cognitive empathy. Overall these findings provide the first demonstration that OT can facilitate amygdala-dependent, socially reinforced learning and emotional empathy in men.

  9. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning.

    Science.gov (United States)

    Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B J

    2014-06-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.

  10. Learning Control of Fixed-Wing Unmanned Aerial Vehicles Using Fuzzy Neural Networks

    Directory of Open Access Journals (Sweden)

    Erdal Kayacan

    2017-01-01

    Full Text Available A learning control strategy is preferred for the control and guidance of a fixed-wing unmanned aerial vehicle to deal with lack of modeling and flight uncertainties. For learning the plant model as well as changing working conditions online, a fuzzy neural network (FNN is used in parallel with a conventional P (proportional controller. Among the learning algorithms in the literature, a derivative-free one, sliding mode control (SMC theory-based learning algorithm, is preferred as it has been proved to be computationally efficient in real-time applications. Its proven robustness and finite time converging nature make the learning algorithm appropriate for controlling an unmanned aerial vehicle as the computational power is always limited in unmanned aerial vehicles (UAVs. The parameter update rules and stability conditions of the learning are derived, and the proof of the stability of the learning algorithm is shown by using a candidate Lyapunov function. Intensive simulations are performed to illustrate the applicability of the proposed controller which includes the tracking of a three-dimensional trajectory by the UAV subject to time-varying wind conditions. The simulation results show the efficiency of the proposed control algorithm, especially in real-time control systems because of its computational efficiency.

  11. Temporal Memory Reinforcement Learning for the Autonomous Micro-mobile Robot Based-behavior

    Institute of Scientific and Technical Information of China (English)

    Yang Yujun(杨玉君); Cheng Junshi; Chen Jiapin; Li Xiaohai

    2004-01-01

    This paper presents temporal memory reinforcement learning for the autonomous micro-mobile robot based-behavior. Human being has a memory oblivion process, i.e. the earlier to memorize, the earlier to forget, only the repeated thing can be remembered firmly. Enlightening forms this, and the robot need not memorize all the past states, at the same time economizes the EMS memory space, which is not enough in the MPU of our AMRobot. The proposed algorithm is an extension of the Q-learning, which is an incremental reinforcement learning method. The results of simulation have shown that the algorithm is valid.

  12. Batch Mode Reinforcement Learning based on the Synthesis of Artificial Trajectories.

    Science.gov (United States)

    Fonteneau, Raphael; Murphy, Susan A; Wehenkel, Louis; Ernst, Damien

    2013-09-01

    In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value function. As an alternative to the use of function approximators, we rely on the synthesis of "artificial trajectories" from the given sample of trajectories, and show that this idea opens new avenues for designing and analyzing algorithms for batch mode reinforcement learning.

  13. Striatum-medial prefrontal cortex connectivity predicts developmental changes in reinforcement learning

    NARCIS (Netherlands)

    van den Bos, W.; Cohen, M.X.; Kahnt, T.; Crone, E.A.

    2012-01-01

    During development, children improve in learning from feedback to adapt their behavior. However, it is still unclear which neural mechanisms might underlie these developmental changes. In the current study, we used a reinforcement learning model to investigate neurodevelopmental changes in the repre

  14. When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition

    NARCIS (Netherlands)

    Janssen, Christian P.; Gray, Wayne D.

    Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when

  15. When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition

    NARCIS (Netherlands)

    Janssen, Christian P.; Gray, Wayne D.

    2012-01-01

    Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when

  16. Video Demo: Deep Reinforcement Learning for Coordination in Traffic Light Control

    NARCIS (Netherlands)

    van der Pol, E.; Oliehoek, F.A.; Bosse, T.; Bredeweg, B.

    2016-01-01

    This video demonstration contrasts two approaches to coordination in traffic light control using reinforcement learning: earlier work, based on a deconstruction of the state space into a linear combination of vehicle states, and our own approach based on the Deep Q-learning algorithm.

  17. When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition

    Science.gov (United States)

    Janssen, Christian P.; Gray, Wayne D.

    2012-01-01

    Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other…

  18. An Evaluation of Pedagogical Tutorial Tactics for a Natural Language Tutoring System: A Reinforcement Learning Approach

    Science.gov (United States)

    Chi, Min; VanLehn, Kurt; Litman, Diane; Jordan, Pamela

    2011-01-01

    Pedagogical strategies are policies for a tutor to decide the next action when there are multiple actions available. When the content is controlled to be the same across experimental conditions, there has been little evidence that tutorial decisions have an impact on students' learning. In this paper, we applied Reinforcement Learning (RL) to…

  19. Adaptive Design of Role Differentiation by Division of Reward Function in Multi-Agent Reinforcement Learning

    Science.gov (United States)

    Taniguchi, Tadahiro; Tabuchi, Kazuma; Sawaragi, Tetsuo

    There are several problems which discourage an organization from achieving tasks, e.g., partial observation, credit assignment, and concurrent learning in multi-agent reinforcement learning. In many conventional approaches, each agent estimates hidden states, e.g., sensor inputs, positions, and policies of other agents, and reduces the uncertainty in the partially-observable Markov decision process (POMDP), which partially solve the multiagent reinforcement learning problem. In contrast, people reduce uncertainty in human organizations in the real world by autonomously dividing the roles played by individual agents. In a framework of reinforcement learning, roles are mainly represented by goals for individual agents. This paper presents a method for generating internal rewards from manager agents to worker agents. It also explicitly divides the roles, which enables a POMDP task for each agent to be transformed into a simple MDP task under certain conditions. Several situational experiments are also described and the validity of the proposed method is evaluated.

  20. Learning control of inverted pendulum system by neural network driven fuzzy reasoning: The learning function of NN-driven fuzzy reasoning under changes of reasoning environment

    Science.gov (United States)

    Hayashi, Isao; Nomura, Hiroyoshi; Wakami, Noboru

    1991-01-01

    Whereas conventional fuzzy reasonings are associated with tuning problems, which are lack of membership functions and inference rule designs, a neural network driven fuzzy reasoning (NDF) capable of determining membership functions by neural network is formulated. In the antecedent parts of the neural network driven fuzzy reasoning, the optimum membership function is determined by a neural network, while in the consequent parts, an amount of control for each rule is determined by other plural neural networks. By introducing an algorithm of neural network driven fuzzy reasoning, inference rules for making a pendulum stand up from its lowest suspended point are determined for verifying the usefulness of the algorithm.

  1. Distributed adaptive fuzzy iterative learning control of coordination problems for higher order multi-agent systems

    Science.gov (United States)

    Li, Jinsha; Li, Junmin

    2016-07-01

    In this paper, the adaptive fuzzy iterative learning control scheme is proposed for coordination problems of Mth order (M ≥ 2) distributed multi-agent systems. Every follower agent has a higher order integrator with unknown nonlinear dynamics and input disturbance. The dynamics of the leader are a higher order nonlinear systems and only available to a portion of the follower agents. With distributed initial state learning, the unified distributed protocols combined time-domain and iteration-domain adaptive laws guarantee that the follower agents track the leader uniformly on [0, T]. Then, the proposed algorithm extends to achieve the formation control. A numerical example and a multiple robotic system are provided to demonstrate the performance of the proposed approach.

  2. AVALIAÇÃO DA QUALIDADE DO E-LEARNING: USO DO FUZZY SERVQUAL

    Directory of Open Access Journals (Sweden)

    Nara Medianeira Stefano

    2017-06-01

    Full Text Available Este artigo teve o objetivo de avaliar a qualidade do e-learning em uma empresa que o utiliza como ferramenta de treinamento para seus colaboradores. Para isso foi utilizada a escala SERVQUAL integrada à teoria dos conjuntos fuzzy (FSERVQUAL. Foi elaborado e validado um instrumento de pesquisa (com expectativa versus percepção, com seis critérios e vinte e dois subcritérios. No entanto, os resultados mostraram que de forma geral existem gaps que podem ser melhorados, principalmente no que se refere às atitudes comportamentais do instrutor. Espera-se dessa forma, poder auxiliar na melhoria do e-learning que é utilizado na empresa, e assim ter colaboradores mais motivados a realizar suas tarefas.

  3. A Framework for Hierarchical Perception-Action Learning Utilizing Fuzzy Reasoning.

    Science.gov (United States)

    Windridge, David; Felsberg, Michael; Shaukat, Affan

    2013-02-01

    Perception-action (P-A) learning is an approach to cognitive system building that seeks to reduce the complexity associated with conventional environment-representation/action-planning approaches. Instead, actions are directly mapped onto the perceptual transitions that they bring about, eliminating the need for intermediate representation and significantly reducing training requirements. We here set out a very general learning framework for cognitive systems in which online learning of the P-A mapping may be conducted within a symbolic processing context, so that complex contextual reasoning can influence the P-A mapping. In utilizing a variational calculus approach to define a suitable objective function, the P-A mapping can be treated as an online learning problem via gradient descent using partial derivatives. Our central theoretical result is to demonstrate top-down modulation of low-level perceptual confidences via the Jacobian of the higher levels of a subsumptive P-A hierarchy. Thus, the separation of the Jacobian as a multiplying factor between levels within the objective function naturally enables the integration of abstract symbolic manipulation in the form of fuzzy deductive logic into the P-A mapping learning. We experimentally demonstrate that the resulting framework achieves significantly better accuracy than using P-A learning without top-down modulation. We also demonstrate that it permits novel forms of context-dependent multilevel P-A mapping, applying the mechanism in the context of an intelligent driver assistance system.

  4. A Classification Model and an Open E-Learning System Based on Intuitionistic Fuzzy Sets for Instructional Design Concepts

    Science.gov (United States)

    Güyer, Tolga; Aydogdu, Seyhmus

    2016-01-01

    This study suggests a classification model and an e-learning system based on this model for all instructional theories, approaches, models, strategies, methods, and technics being used in the process of instructional design that constitutes a direct or indirect resource for educational technology based on the theory of intuitionistic fuzzy sets…

  5. The drift diffusion model as the choice rule in reinforcement learning.

    Science.gov (United States)

    Pedersen, Mads Lund; Frank, Michael J; Biele, Guido

    2016-12-13

    Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.

  6. Short-term memory traces for action bias in human reinforcement learning.

    Science.gov (United States)

    Bogacz, Rafal; McClure, Samuel M; Li, Jian; Cohen, Jonathan D; Montague, P Read

    2007-06-11

    Recent experimental and theoretical work on reinforcement learning has shed light on the neural bases of learning from rewards and punishments. One fundamental problem in reinforcement learning is the credit assignment problem, or how to properly assign credit to actions that lead to reward or punishment following a delay. Temporal difference learning solves this problem, but its efficiency can be significantly improved by the addition of eligibility traces (ET). In essence, ETs function as decaying memories of previous choices that are used to scale synaptic weight changes. It has been shown in theoretical studies that ETs spanning a number of actions may improve the performance of reinforcement learning. However, it remains an open question whether including ETs that persist over sequences of actions allows reinforcement learning models to better fit empirical data regarding the behaviors of humans and other animals. Here, we report an experiment in which human subjects performed a sequential economic decision game in which the long-term optimal strategy differed from the strategy that leads to the greatest short-term return. We demonstrate that human subjects' performance in the task is significantly affected by the time between choices in a surprising and seemingly counterintuitive way. However, this behavior is naturally explained by a temporal difference learning model which includes ETs persisting across actions. Furthermore, we review recent findings that suggest that short-term synaptic plasticity in dopamine neurons may provide a realistic biophysical mechanism for producing ETs that persist on a timescale consistent with behavioral observations.

  7. Bridging the Gap between Reinforcement Learning and Knowledge Representation: A Logical Off- and On-Policy Framework

    CERN Document Server

    Saad, Emad

    2010-01-01

    Knowledge Representation is important issue in reinforcement learning. In this paper, we bridge the gap between reinforcement learning and knowledge representation, by providing a rich knowledge representation framework, based on normal logic programs with answer set semantics, that is capable of solving model-free reinforcement learning problems for more complex do-mains and exploits the domain-specific knowledge. We prove the correctness of our approach. We show that the complexity of finding an offline and online policy for a model-free reinforcement learning problem in our approach is NP-complete. Moreover, we show that any model-free reinforcement learning problem in MDP environment can be encoded as a SAT problem. The importance of that is model-free reinforcement

  8. Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning.

    Science.gov (United States)

    Frank, Michael J; Moustafa, Ahmed A; Haughey, Heather M; Curran, Tim; Hutchison, Kent E

    2007-10-09

    What are the genetic and neural components that support adaptive learning from positive and negative outcomes? Here, we show with genetic analyses that three independent dopaminergic mechanisms contribute to reward and avoidance learning in humans. A polymorphism in the DARPP-32 gene, associated with striatal dopamine function, predicted relatively better probabilistic reward learning. Conversely, the C957T polymorphism of the DRD2 gene, associated with striatal D2 receptor function, predicted the degree to which participants learned to avoid choices that had been probabilistically associated with negative outcomes. The Val/Met polymorphism of the COMT gene, associated with prefrontal cortical dopamine function, predicted participants' ability to rapidly adapt behavior on a trial-to-trial basis. These findings support a neurocomputational dissociation between striatal and prefrontal dopaminergic mechanisms in reinforcement learning. Computational maximum likelihood analyses reveal independent gene effects on three reinforcement learning parameters that can explain the observed dissociations.

  9. EFFICIENT SPECTRUM UTILIZATION IN COGNITIVE RADIO THROUGH REINFORCEMENT LEARNING

    Directory of Open Access Journals (Sweden)

    Dhananjay Kumar

    2013-09-01

    Full Text Available Machine learning schemes can be employed in cognitive radio systems to intelligently locate the spectrum holes with some knowledge about the operating environment. In this paper, we formulate a variation of Actor Critic Learning algorithm known as Continuous Actor Critic Learning Automaton (CACLA and compare this scheme with Actor Critic Learning scheme and existing Q–learning scheme. Simulation results show that our CACLA scheme has lesser execution time and achieves higher throughput compared to other two schemes.

  10. Deficits in reinforcement learning but no link to apathy in patients with schizophrenia.

    Science.gov (United States)

    Hartmann-Riemer, Matthias N; Aschenbrenner, Steffen; Bossert, Magdalena; Westermann, Celina; Seifritz, Erich; Tobler, Philippe N; Weisbrod, Matthias; Kaiser, Stefan

    2017-01-10

    Negative symptoms in schizophrenia have been linked to selective reinforcement learning deficits in the context of gains combined with intact loss-avoidance learning. Fundamental mechanisms of reinforcement learning and choice are prediction error signaling and the precise representation of reward value for future decisions. It is unclear which of these mechanisms contribute to the impairments in learning from positive outcomes observed in schizophrenia. A recent study suggested that patients with severe apathy symptoms show deficits in the representation of expected value. Considering the fundamental relevance for the understanding of these symptoms, we aimed to assess the stability of these findings across studies. Sixty-four patients with schizophrenia and 19 healthy control participants performed a probabilistic reward learning task. They had to associate stimuli with gain or loss-avoidance. In a transfer phase participants indicated valuation of the previously learned stimuli by choosing among them. Patients demonstrated an overall impairment in learning compared to healthy controls. No effects of apathy symptoms on task indices were observed. However, patients with schizophrenia learned better in the context of loss-avoidance than in the context of gain. Earlier findings were thus partially replicated. Further studies are needed to clarify the mechanistic link between negative symptoms and reinforcement learning.

  11. Study and application of reinforcement learning based on DAI in cooperative strategy of robot soccer

    Institute of Scientific and Technical Information of China (English)

    GUO Qi; ZHANG Da-zhi; YANG Yong-tian

    2009-01-01

    A dynamic cooperation model of multi-agent is established by combining reinforcement learning with distributed artificial intelligence (DAI), in which the concept of individual optimization loses its meaning be-cause of the dependence of repayment on each agent itself and the choice of other agents. Utilizing the idea of DAI, the intellectual unit of each robot and the change of task and environment, each agent can make decisions independently and finish various complicated tasks by communication and reciprocation between each other. The method is superior to other reinforcement learning methods commonly used in the multi-agent system. It can im-prove the convergence velocity of reinforcement learning, decrease requirements of computer memory, and en-hance the capability of computing and logical ratiocinating for agent. The result of a simulated robot soccer match proves that the proposed cooperative strategy is valid.

  12. Neural correlates of reinforcement learning and social preferences in competitive bidding.

    Science.gov (United States)

    van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M

    2013-01-30

    In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.

  13. Partial reinforcement and context switch effects in human predictive learning.

    Science.gov (United States)

    Abad, María J F; Ramos-Alvarez, Manuel M; Rosas, Juan M

    2009-01-01

    Human participants were trained in a trial-by-trial contingency judgements task in which they had to predict the probability of an outcome (diarrhoea) following different cues (food names) in different contexts (restaurants). Cue P was paired with the outcome on half of the trials (partial reinforcement), while cue C was paired with the outcome on all the trials (continuous reinforcement), both cues in Context A. Test was conducted in both Context A and a different but equally familiar context (B). Context change decreased judgements to C, but not to P (Experiment 1). This effect was found only in the cue trained in the context where a different cue was partially reinforced (Experiment 2). Context switch effects disappeared when different cues received partial reinforcement in both contexts of training (Experiment 3). The implications of these results for an explanation of context switch effects in terms of ambiguity in the meaning of the cues prompting attention to the context (e.g., Bouton, 1997) are discussed.

  14. Adversarial Reinforcement Learning in a Cyber Security Simulation}

    NARCIS (Netherlands)

    Elderman, Richard; Pater, Leon; Thie, Albert; Drugan, Madalina; Wiering, Marco

    2017-01-01

    This paper focuses on cyber-security simulations in networks modeled as a Markov game with incomplete information and stochastic elements. The resulting game is an adversarial sequential decision making problem played with two agents, the attacker and defender. The two agents pit one reinforcement l

  15. Antipsychotic dose modulates behavioral and neural responses to feedback during reinforcement learning in schizophrenia.

    Science.gov (United States)

    Insel, Catherine; Reinen, Jenna; Weber, Jochen; Wager, Tor D; Jarskog, L Fredrik; Shohamy, Daphna; Smith, Edward E

    2014-03-01

    Schizophrenia is characterized by an abnormal dopamine system, and dopamine blockade is the primary mechanism of antipsychotic treatment. Consistent with the known role of dopamine in reward processing, prior research has demonstrated that patients with schizophrenia exhibit impairments in reward-based learning. However, it remains unknown how treatment with antipsychotic medication impacts the behavioral and neural signatures of reinforcement learning in schizophrenia. The goal of this study was to examine whether antipsychotic medication modulates behavioral and neural responses to prediction error coding during reinforcement learning. Patients with schizophrenia completed a reinforcement learning task while undergoing functional magnetic resonance imaging. The task consisted of two separate conditions in which participants accumulated monetary gain or avoided monetary loss. Behavioral results indicated that antipsychotic medication dose was associated with altered behavioral approaches to learning, such that patients taking higher doses of medication showed increased sensitivity to negative reinforcement. Higher doses of antipsychotic medication were also associated with higher learning rates (LRs), suggesting that medication enhanced sensitivity to trial-by-trial feedback. Neuroimaging data demonstrated that antipsychotic dose was related to differences in neural signatures of feedback prediction error during the loss condition. Specifically, patients taking higher doses of medication showed attenuated prediction error responses in the striatum and the medial prefrontal cortex. These findings indicate that antipsychotic medication treatment may influence motivational processes in patients with schizophrenia.

  16. Integration of reinforcement learning and optimal decision-making theories of the basal ganglia.

    Science.gov (United States)

    Bogacz, Rafal; Larsen, Tobias

    2011-04-01

    This article seeks to integrate two sets of theories describing action selection in the basal ganglia: reinforcement learning theories describing learning which actions to select to maximize reward and decision-making theories proposing that the basal ganglia selects actions on the basis of sensory evidence accumulated in the cortex. In particular, we present a model that integrates the actor-critic model of reinforcement learning and a model assuming that the cortico-basal-ganglia circuit implements a statistically optimal decision-making procedure. The values of cortico-striatal weights required for optimal decision making in our model differ from those provided by standard reinforcement learning models. Nevertheless, we show that an actor-critic model converges to the weights required for optimal decision making when biologically realistic limits on synaptic weights are introduced. We also describe the model's predictions concerning reaction times and neural responses during learning, and we discuss directions required for further integration of reinforcement learning and optimal decision-making theories.

  17. 一种基于模糊形式概念分析的模糊本体学习方法%A FUZZY ONTOLOGY LEARNING METHOD BASED ON FUZZY FORMAL CONCEPT ANALYSIS

    Institute of Scientific and Technical Information of China (English)

    马迪; 李冠宇

    2014-01-01

    模糊本体是语义网中处理模糊信息的重要工具,而模糊本体学习是构建模糊本体的一种有效方法,因此模糊本体学习已逐渐成为现今本体研究的热点。作为模糊本体的另一种图结构的表现形式,模糊概念格构造与演化的研究也渐渐引起人们的关注。模糊形式概念分析是一种基于模糊形式背景表示形式概念的新模型,是由模糊集理论与形式概念分析结合而成,其主要表现形式即是模糊概念格。这种模糊概念层次结构是数据分析及规则提取的有效工具,且支持概念间相似度的计算。提出一种基于模糊形式概念分析的模糊本体学习方法,意图从领域文档中获取模糊概念和模糊概念关系,并通过模糊形式概念分析,将其添加到源模糊本体转化的模糊概念格中,以完成模糊本体学习。%Fuzzy ontology is an important tool used to deal with fuzzy information in semantic Web,while fuzzy ontology learning is an ef-fective method to construct the fuzzy ontology,therefore it has gradually become the focus in current ontology research.As another graph-structured manifestation of fuzzy ontology,the studies on the construction and evolution of fuzzy concept lattice are also increasingly attractedscholarsattentions.Fuzzy formal concept analysis is a new model which employs fuzzy formal background to represent formal concepts,and isthe integration of fuzzy set theory and the formal concept analysis,its major manifestation is the fuzzy concept lattice.This fuzzy concept hier-archical structure is an effective tool for data analysis and rule extraction and supports the inter-concept similarity calculation.With the inten-tion to acquire fuzzy concepts and fuzzy concept relations from domain documents,in the paper we propose a fuzzy ontology learning methodwhich is based on fuzzy formal concept analysis,and add it to fuzzy concept lattice transformed from source fuzzy

  18. Multi-agent reinforcement learning based on policies of global objective

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    In general-sum games, taking all agent's collective rationality into account, we define agents' global objective,and propose a novel multi-agent reinforcement learning(RL) algorithm based on global policy. In each learning step, all agents commit to select the global policy to achieve the global goal. We prove this learning algorithm converges given certain restrictions on stage games of learned Q values, and show that it has quite lower computation time complexity than already developed multi-agent learning algorithms for general-sum games. An example is analyzed to show the algorithm' s merits.

  19. Prune-able fuzzy ART neural architecture for robot map learning and navigation in dynamic environments.

    Science.gov (United States)

    Araújo, Rui

    2006-09-01

    Mobile robots must be able to build their own maps to navigate in unknown worlds. Expanding a previously proposed method based on the fuzzy ART neural architecture (FARTNA), this paper introduces a new online method for learning maps of unknown dynamic worlds. For this purpose the new Prune-able fuzzy adaptive resonance theory neural architecture (PAFARTNA) is introduced. It extends the FARTNA self-organizing neural network with novel mechanisms that provide important dynamic adaptation capabilities. Relevant PAFARTNA properties are formulated and demonstrated. A method is proposed for the perception of object removals, and then integrated with PAFARTNA. The proposed methods are integrated into a navigation architecture. With the new navigation architecture the mobile robot is able to navigate in changing worlds, and a degree of optimality is maintained, associated to a shortest path planning approach implemented in real-time over the underlying global world model. Experimental results obtained with a Nomad 200 robot are presented demonstrating the feasibility and effectiveness of the proposed methods.

  20. Reinforcement learning control with approximation of time-dependent agent dynamics

    Science.gov (United States)

    Kirkpatrick, Kenton Conrad

    Reinforcement Learning has received a lot of attention over the years for systems ranging from static game playing to dynamic system control. Using Reinforcement Learning for control of dynamical systems provides the benefit of learning a control policy without needing a model of the dynamics. This opens the possibility of controlling systems for which the dynamics are unknown, but Reinforcement Learning methods like Q-learning do not explicitly account for time. In dynamical systems, time-dependent characteristics can have a significant effect on the control of the system, so it is necessary to account for system time dynamics while not having to rely on a predetermined model for the system. In this dissertation, algorithms are investigated for expanding the Q-learning algorithm to account for the learning of sampling rates and dynamics approximations. For determining a proper sampling rate, it is desired to find the largest sample time that still allows the learning agent to control the system to goal achievement. An algorithm called Sampled-Data Q-learning is introduced for determining both this sample time and the control policy associated with that sampling rate. Results show that the algorithm is capable of achieving a desired sampling rate that allows for system control while not sampling "as fast as possible". Determining an approximation of an agent's dynamics can be beneficial for the control of hierarchical multiagent systems by allowing a high-level supervisor to use the dynamics approximations for task allocation decisions. To this end, algorithms are investigated for learning first- and second-order dynamics approximations. These algorithms are respectively called First-Order Dynamics Learning and Second-Order Dynamics Learning. The dynamics learning algorithms are evaluated on several examples that show their capability to learn accurate approximations of state dynamics. All of these algorithms are then evaluated on hierarchical multiagent systems

  1. Hippocampal lesions facilitate instrumental learning with delayed reinforcement but induce impulsive choice in rats

    Directory of Open Access Journals (Sweden)

    Cheung Timothy HC

    2005-05-01

    Full Text Available Abstract Background Animals must frequently act to influence the world even when the reinforcing outcomes of their actions are delayed. Learning with action-outcome delays is a complex problem, and little is known of the neural mechanisms that bridge such delays. When outcomes are delayed, they may be attributed to (or associated with the action that caused them, or mistakenly attributed to other stimuli, such as the environmental context. Consequently, animals that are poor at forming context-outcome associations might learn action-outcome associations better with delayed reinforcement than normal animals. The hippocampus contributes to the representation of environmental context, being required for aspects of contextual conditioning. We therefore hypothesized that animals with hippocampal lesions would be better than normal animals at learning to act on the basis of delayed reinforcement. We tested the ability of hippocampal-lesioned rats to learn a free-operant instrumental response using delayed reinforcement, and what is potentially a related ability – the ability to exhibit self-controlled choice, or to sacrifice an immediate, small reward in order to obtain a delayed but larger reward. Results Rats with sham or excitotoxic hippocampal lesions acquired an instrumental response with different delays (0, 10, or 20 s between the response and reinforcer delivery. These delays retarded learning in normal rats. Hippocampal-lesioned rats responded slightly less than sham-operated controls in the absence of delays, but they became better at learning (relative to shams as the delays increased; delays impaired learning less in hippocampal-lesioned rats than in shams. In contrast, lesioned rats exhibited impulsive choice, preferring an immediate, small reward to a delayed, larger reward, even though they preferred the large reward when it was not delayed. Conclusion These results support the view that the hippocampus hinders action-outcome learning

  2. Reinforcement Learning Based LVQ Clustering Approach%基于强化学习的LVQ聚类方法

    Institute of Scientific and Technical Information of China (English)

    程小平; 邱玉辉

    2002-01-01

    A reinforcement clustering framework which constitutes Bernoulli stochastic neural units is proposed inthis paper. Reinforcement learning mechanism is introduced to LVQ clustering problems. Related algorithm LVQ-Ris developed and its property is analyzed in detail. The authors conclude that reinforcement learning can be also intro-duced to other on-line competitive clustering methods. Experiments show that LVQ-R has better performance than o-riginal LVQ approach.

  3. 基于强化学习的多Agent系统%The Multi-Agent System Based on Reinforcement Learning

    Institute of Scientific and Technical Information of China (English)

    唐文彬; 朱淼良

    2003-01-01

    Reinforcement learning allows agent that has no knowledge of an environment to cooperate more efficacious each other. This paper presents an approach for developing multi-agent reinforcement learning systems based on equation principle. The experiment shows agent can produces the desired behavior under all kinds of situation.

  4. Recurrent fuzzy neural network by using feedback error learning approaches for LFC in interconnected power system

    Energy Technology Data Exchange (ETDEWEB)

    Sabahi, Kamel; Teshnehlab, Mohammad; Shoorhedeli, Mahdi Aliyari [Department of Electrical Engineering, K.N. Toosi University of Technology, Intelligent System Lab, Tehran (Iran)

    2009-04-15

    In this study, a new adaptive controller based on modified feedback error learning (FEL) approaches is proposed for load frequency control (LFC) problem. The FEL strategy consists of intelligent and conventional controllers in feedforward and feedback paths, respectively. In this strategy, a conventional feedback controller (CFC), i.e. proportional, integral and derivative (PID) controller, is essential to guarantee global asymptotic stability of the overall system; and an intelligent feedforward controller (INFC) is adopted to learn the inverse of the controlled system. Therefore, when the INFC learns the inverse of controlled system, the tracking of reference signal is done properly. Generally, the CFC is designed at nominal operating conditions of the system and, therefore, fails to provide the best control performance as well as global stability over a wide range of changes in the operating conditions of the system. So, in this study a supervised controller (SC), a lookup table based controller, is addressed for tuning of the CFC. During abrupt changes of the power system parameters, the SC adjusts the PID parameters according to these operating conditions. Moreover, for improving the performance of overall system, a recurrent fuzzy neural network (RFNN) is adopted in INFC instead of the conventional neural network, which was used in past studies. The proposed FEL controller has been compared with the conventional feedback error learning controller (CFEL) and the PID controller through some performance indices. (author)

  5. Extraversion differentiates between model-based and model-free strategies in a reinforcement learning task.

    Science.gov (United States)

    Skatova, Anya; Chan, Patricia A; Daw, Nathaniel D

    2013-01-01

    Prominent computational models describe a neural mechanism for learning from reward prediction errors, and it has been suggested that variations in this mechanism are reflected in personality factors such as trait extraversion. However, although trait extraversion has been linked to improved reward learning, it is not yet known whether this relationship is selective for the particular computational strategy associated with error-driven learning, known as model-free reinforcement learning, vs. another strategy, model-based learning, which the brain is also known to employ. In the present study we test this relationship by examining whether humans' scores on an extraversion scale predict individual differences in the balance between model-based and model-free learning strategies in a sequentially structured decision task designed to distinguish between them. In previous studies with this task, participants have shown a combination of both types of learning, but with substantial individual variation in the balance between them. In the current study, extraversion predicted worse behavior across both sorts of learning. However, the hypothesis that extraverts would be selectively better at model-free reinforcement learning held up among a subset of the more engaged participants, and overall, higher task engagement was associated with a more selective pattern by which extraversion predicted better model-free learning. The findings indicate a relationship between a broad personality orientation and detailed computational learning mechanisms. Results like those in the present study suggest an intriguing and rich relationship between core neuro-computational mechanisms and broader life orientations and outcomes.

  6. Reinforcing communication skills while registered nurses simultaneously learn course content: a response to learning needs.

    Science.gov (United States)

    DeSimone, B B

    1994-01-01

    This article describes the implementation and evaluation of Integrated Skills Reinforcement (ISR) in a baccalaureate nursing course entitled "Principles of Health Assessment" for 15 registered nurse students. ISR is a comprehensive teaching-learning approach that simultaneously reinforces student writing, reading, speaking, and listening skills while they learn course content. The purpose of this study was to assess the influence of ISR on writing skills and student satisfaction. A learner's guide and teacher's guide, created in advance by the teacher, described specific language activities and assignments that were implemented throughout the ISR course. During each class, the teacher promoted discussion, collaboration, and co-inquiry among students, using course content as the vehicle of exchange. Writing was assessed at the beginning and end of the course. The influence of ISR on the content, organization, sentence structure, tone, and strength of position of student writing was analyzed. Writing samples were scored by an independent evaluator trained in methods of holistic scoring. Ninety-three per cent (14 of 15 students) achieved writing growth from .5 to 1.5 points on a scale of 6 points. Student response to both the ISR approach and specific ISR activities was assessed by teacher-created surveys administered at the middle-end of the course. One hundred per cent of the students at the end of this project agreed that the ISR activities, specifically the writing and reading activities, helped them better understand the course content. These responses differed from evaluations written by the same students at the middle of the course. The ISR approach fostered analysis and communication through active collaboration, behaviors cited as critical for effective participation of nurses in today's complex health care environment.

  7. Application of self-learning proportion-integral fuzzy control to jig separator in the Ombilin coal mine, Indonesia

    Energy Technology Data Exchange (ETDEWEB)

    A. Tayaoka; K. Yoshino; Y. Jinnouchi; Y. Kubo; K. Okada; K. Suzuki [Kitakyushu National College of Technology (Japan)

    2005-07-01

    The coal preparation plant uses a variety of equipment, and one of the most important of them is the gravity-based separator. A proportional control system is used for controlling the tailing discharger of the separator. The operator manually adjusts the proportional control parameters while checking the tailings discharged from the separator outlet. Normally, however, these adjustments have to be made on a trial and error basis and are consequently very difficult to perform. To resolve this problem, we propose, as a result of this study, a proportional integral (PI) control system that features a self-learning function based on fuzzy logic. Normally, when fuzzy logic is used, it is necessary to determine the parameters for the fuzzy rules. In the proposed system, these fuzzy rule parameters are adjusted automatically and new rules are added when necessary. Learning with this structure of logic reasoning is accomplished by the steepest descent method. With this method, it is possible to generate rules even when there is no information available about the parameters at all. This study tries to model the separator jig system to investigate and develop a control system for it. Furthermore, we have demonstrated the validity of the proposed system, which is applied to the coal preparation plant in the Ombilin Coal Mine, on the basis of a number of experimental results. 6 refs., 11 figs.

  8. Improving Multi agent Systems Based on Reinforcement Learning and Case Base Reasoning

    Directory of Open Access Journals (Sweden)

    Sara Esfandiari

    2012-01-01

    Full Text Available In this paper, a new algorithm based on case base reasoning and reinforcement learning is proposed to increase the rate convergence of the Selfish Q-Learning algorithms in multi-agent systems. In the propose method, we investigate how making improved action selection in reinforcement learning (RL algorithm. In the proposed method, the new combined model using case base reasoning systems and a new optimized function has been proposed to select the action, which has led to an increase in algorithms based on Selfish Q-learning. The algorithm mentioned has been used for solving the problem of cooperative Markovs games as one of the models of Markov based multi-agent systems. The results of experiments on two ground have shown that the proposed algorithm perform better than the existing algorithms in terms of speed and accuracy of reaching the optimal policy.

  9. An Improved Reinforcement Learning Algorithm for Cooperative Behaviors of Mobile Robots

    Directory of Open Access Journals (Sweden)

    Yong Song

    2014-01-01

    Full Text Available Reinforcement learning algorithm for multirobot will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequential Q-learning based on knowledge sharing is presented. The rule repository of robots behaviors is firstly initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in the database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the matching weight will be refined. Otherwise the new rule will be appended to the database. The robots learn according to a given sequence and share the behavior database. We examine the algorithm by multirobot following-surrounding behavior, and find that the improved algorithm can effectively accelerate the convergence speed.

  10. Reinforcement function design and bias for efficient learning in mobile robots

    Energy Technology Data Exchange (ETDEWEB)

    Touzet, C. [Oak Ridge National Lab., TN (United States). Computer Science and Mathematics Div.; Santos, J.M. [Univ. of Buenos Aires (Argentina). Dept. Computacion

    1998-06-01

    The main paradigm in sub-symbolic learning robot domain is the reinforcement learning method. Various techniques have been developed to deal with the memorization/generalization problem, demonstrating the superior ability of artificial neural network implementations. In this paper, the authors address the issue of designing the reinforcement so as to optimize the exploration part of the learning. They also present and summarize works relative to the use of bias intended to achieve the effective synthesis of the desired behavior. Demonstrative experiments involving a self-organizing map implementation of the Q-learning and real mobile robots (Nomad 200 and Khepera) in a task of obstacle avoidance behavior synthesis are described. 3 figs., 5 tabs.

  11. The limits and robustness of reinforcement learning in Lewis signalling games

    Science.gov (United States)

    Catteeuw, David; Manderick, Bernard

    2014-04-01

    Lewis signalling games are a standard model to study the emergence of language. We introduce win-stay/lose-inaction, a random process that only updates behaviour on success and never deviates from what was once successful, prove that it always ends up in a state of optimal communication in all Lewis signalling games, and predict the number of interactions it needs to do so: N3 interactions for Lewis signalling games with N equiprobable types. We show three reinforcement learning algorithms (Roth-Erev learning, Q-learning, and Learning Automata) that can imitate win-stay/lose-inaction and can even cope with errors in Lewis signalling games.

  12. Experiments with Online Reinforcement Learning in Real-Time Strategy Games

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-time strategy (RTS) games provide a challenging platform to implement online reinforcement learning (RL) techniques in a real application. Computer, as one game player, monitors opponents' (human or other computers) strategies and then updates its own policy using RL methods. In this article...

  13. Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Taylor, M.E.; Stone, P.

    2010-01-01

    Temporal difference and evolutionary methods are two of the most common approaches to solving reinforcement learning problems. However, there is little consensus on their relative merits and there have been few empirical studies that directly compare their performance. This article aims to address

  14. Life Span Differences in Electrophysiological Correlates of Monitoring Gains and Losses during Probabilistic Reinforcement Learning

    Science.gov (United States)

    Hammerer, Dorothea; Li, Shu-Chen; Muller, Viktor; Lindenberger, Ulman

    2011-01-01

    By recording the feedback-related negativity (FRN) in response to gains and losses, we investigated the contribution of outcome monitoring mechanisms to age-associated differences in probabilistic reinforcement learning. Specifically, we assessed the difference of the monitoring reactions to gains and losses to investigate the monitoring of…

  15. Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Taylor, M.E.; Stone, P.

    2010-01-01

    Temporal difference and evolutionary methods are two of the most common approaches to solving reinforcement learning problems. However, there is little consensus on their relative merits and there have been few empirical studies that directly compare their performance. This article aims to address t

  16. Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Taylor, M.E.; Stone, P.

    2010-01-01

    Temporal difference and evolutionary methods are two of the most common approaches to solving reinforcement learning problems. However, there is little consensus on their relative merits and there have been few empirical studies that directly compare their performance. This article aims to address t

  17. Your Classroom as an Experiment in Education: The Reinforcement Theory of Learning

    Science.gov (United States)

    Fuller, Robert G.

    1976-01-01

    Presents the reinforcement theory of learning and explains how it relates to the Keller Plan of instruction. Advocates the use of the Keller Plan as an alternative to the lecture-demonstration system and provides an annotated bibliography of pertinent material. (GS)

  18. Bayes factors for reinforcement-learning models of the Iowa Gambling Task

    NARCIS (Netherlands)

    Steingroever, H.; Wetzels, R.; Wagenmakers, E.-J.

    2016-01-01

    The psychological processes that underlie performance on the Iowa gambling task (IGT) are often isolated with the help of reinforcement-learning (RL) models. The most popular method to compare RL models is the BIC post hoc fit criterion—a criterion that considers goodness-of-fit relative to model co

  19. Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

    NARCIS (Netherlands)

    Dimitrakakis, C.; Filipe, J.; Fred, A.; Sharp, B.

    2010-01-01

    There has been a lot of recent work on Bayesian methods for reinforcement learning exhibiting near-optimal online performance. The main obstacle facing such methods is that in most problems of interest, the optimal solution involves planning in an infinitely large tree. However, it is possible to ob

  20. Complexity of stochastic branch and bound for belief tree search in Bayesian reinforcement learning

    NARCIS (Netherlands)

    Dimitrakakis, C.

    2009-01-01

    There has been a lot of recent work on Bayesian methods for reinforcement learning exhibiting near-optimal online performance. The main obstacle facing such methods is that in most problems of interest, the optimal solution involves planning in an infinitely large tree. However, it is possible to ob

  1. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning

    OpenAIRE

    Jones, Rebecca M.; Somerville, Leah; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B. J.

    2014-01-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated....

  2. Utilising reinforcement learning to develop strategies for driving auditory neural implants

    Science.gov (United States)

    Lee, Geoffrey W.; Zambetta, Fabio; Li, Xiaodong; Paolini, Antonio G.

    2016-08-01

    Objective. In this paper we propose a novel application of reinforcement learning to the area of auditory neural stimulation. We aim to develop a simulation environment which is based off real neurological responses to auditory and electrical stimulation in the cochlear nucleus (CN) and inferior colliculus (IC) of an animal model. Using this simulator we implement closed loop reinforcement learning algorithms to determine which methods are most effective at learning effective acoustic neural stimulation strategies. Approach. By recording a comprehensive set of acoustic frequency presentations and neural responses from a set of animals we created a large database of neural responses to acoustic stimulation. Extensive electrical stimulation in the CN and the recording of neural responses in the IC provides a mapping of how the auditory system responds to electrical stimuli. The combined dataset is used as the foundation for the simulator, which is used to implement and test learning algorithms. Main results. Reinforcement learning, utilising a modified n-Armed Bandit solution, is implemented to demonstrate the model’s function. We show the ability to effectively learn stimulation patterns which mimic the cochlea’s ability to covert acoustic frequencies to neural activity. Time taken to learn effective replication using neural stimulation takes less than 20 min under continuous testing. Significance. These results show the utility of reinforcement learning in the field of neural stimulation. These results can be coupled with existing sound processing technologies to develop new auditory prosthetics that are adaptable to the recipients current auditory pathway. The same process can theoretically be abstracted to other sensory and motor systems to develop similar electrical replication of neural signals.

  3. Active-learning strategies: the use of a game to reinforce learning in nursing education. A case study.

    Science.gov (United States)

    Boctor, Lisa

    2013-03-01

    The majority of nursing students are kinesthetic learners, preferring a hands-on, active approach to education. Research shows that active-learning strategies can increase student learning and satisfaction. This study looks at the use of one active-learning strategy, a Jeopardy-style game, 'Nursopardy', to reinforce Fundamentals of Nursing material, aiding in students' preparation for a standardized final exam. The game was created keeping students varied learning styles and the NCLEX blueprint in mind. The blueprint was used to create 5 categories, with 26 total questions. Student survey results, using a five-point Likert scale showed that they did find this learning method enjoyable and beneficial to learning. More research is recommended regarding learning outcomes, when using active-learning strategies, such as games. Copyright © 2012 Elsevier Ltd. All rights reserved.

  4. Evaluation framework based on fuzzy measured method in adaptive learning systems

    Directory of Open Access Journals (Sweden)

    Houda Zouari Ounaies, ,

    2008-01-01

    Full Text Available Currently, e-learning systems are mainly web-based applications and tackle a wide range of users all over the world. Fitting learners’ needs is considered as a key issue to guaranty the success of these systems. Many researches work on providing adaptive systems. Nevertheless, evaluation of the adaptivity is still in an exploratory phase. Adaptation methods are a basic factor to guaranty an effective adaptation. This issue is referred as meta-adaptation in numerous researches. In our research on the development of an evaluation framework of adaptive web-based learning systems, adaptation method assessment is a fundamental aspect. Currently, measures significantly lack to express the adaptive systems features and need to be explored. Consequently, we propose a three-fold approach. Firstly, specific adaptation measurement criteria are suggested. Secondly, experts and learners assess these criteria and both current learning situation and similar past experiences are considered. Finally, fuzzy group decision making theory is adopted to integrate different perceptions related to the adaptive system.

  5. A transfer learning framework for traffic video using neuro-fuzzy approach

    Indian Academy of Sciences (India)

    P M ASHOK KUMAR; V VAIDEHI

    2017-09-01

    One of the main challenges in the Traffic Anomaly Detection (TAD) system is the ability to deal with unknown target scenes. As a result, the TAD system performs less in detecting anomalies. This paper introduces a novelty in the form of Adaptive Neuro-Fuzzy Inference System-Lossy-Count-based Topic Extraction (ANFIS-LCTE) for classification of anomalies in source and target traffic scenes. The process of transforming the input variables, learning the semantic rules in source scene and transferring the model to target scene achieves the transfer learning property. The proposed ANFIS-LCTE transfer learning model consists offour steps. (1) Low level visual items are extracted only for motion regions using optical flow technique. (2)Temporal transactions are created using aggregation of visual items for each set of frames. (3) An LCTE is applied for each set of temporal transaction to extract latent sequential topics. (4) ANFIS training is done with the back-propagation gradient descent method. The proposed ANFIS model framework is tested on standard dataset and performance is evaluated in terms of training performance and classification accuracies. Experimental results confirm that the proposed ANFIS-LCTE approach performs well in both source and targetdatasets.

  6. Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults.

    Science.gov (United States)

    Vernetti, Angélina; Smith, Tim J; Senju, Atsushi

    2017-03-15

    While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting.

  7. Experimental analysis of simulated reinforcement learning control for active and passive building thermal storage inventory

    Energy Technology Data Exchange (ETDEWEB)

    Liu, S. [Architectural Engineering, University of Nebraska-Lincoln, PKI 243, Omaha, NE (United States); Henze, G. P. [Architectural Engineering, University of Nebraska-Lincoln, PKI 203D, Omaha, NE (United States)

    2006-07-01

    This paper is the second part of a two-part investigation of a novel approach to optimally control commercial building passive and active thermal storage inventory. The proposed building control approach is based on simulated reinforcement learning, which is a hybrid control scheme that combines features of model-based optimal control and model-free learning control. An experimental study was carried out to analyze the performance of a hybrid controller installed in a full-scale laboratory facility. The first paper introduced the theoretical foundation of this investigation including the fundamental theory of reinforcement learning control. This companion paper presents a discussion and analysis of the experimental results. The results confirm the feasibility of the proposed control approach. Operating cost savings were attained with the proposed control approach compared with conventional building control; however, the savings are lower than for the case of model-based predictive optimal control. As for the case of model-based predictive control, the performance of the hybrid controller is largely affected by the quality of the training model, and extensive real-time learning is required for the learning controller to eliminate any false cues it receives during the initial training period. Nevertheless, compared with standard reinforcement learning, the proposed hybrid controller is much more readily implemented in a commercial building. (author)

  8. Learning and altering behaviours by reinforcement: Neurocognitive differences between children and adults

    Directory of Open Access Journals (Sweden)

    E. Shephard

    2014-01-01

    Full Text Available This study examined neurocognitive differences between children and adults in the ability to learn and adapt simple stimulus–response associations through feedback. Fourteen typically developing children (mean age = 10.2 and 15 healthy adults (mean age = 25.5 completed a simple task in which they learned to associate visually presented stimuli with manual responses based on performance feedback (acquisition phase, and then reversed and re-learned those associations following an unexpected change in reinforcement contingencies (reversal phase. Electrophysiological activity was recorded throughout task performance. We found no group differences in learning-related changes in performance (reaction time, accuracy or in the amplitude of event-related potentials (ERPs associated with stimulus processing (P3 ERP or feedback processing (feedback-related negativity; FRN during the acquisition phase. However, children's performance was significantly more disrupted by the reversal than adults and FRN amplitudes were significantly modulated by the reversal phase in children but not adults. These findings indicate that children have specific difficulties with reinforcement learning when acquired behaviours must be altered. This may be caused by the added demands on immature executive functioning, specifically response monitoring, created by the requirement to reverse the associations, or a developmental difference in the way in which children and adults approach reinforcement learning.

  9. Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning.

    Science.gov (United States)

    Sidarta, Ananda; Vahdat, Shahabeddin; Bernardi, Nicolò F; Ostry, David J

    2016-11-16

    As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration.

  10. Fuzzy Control Tutorial

    DEFF Research Database (Denmark)

    Dotoli, M.; Jantzen, Jan

    1999-01-01

    The tutorial concerns automatic control of an inverted pendulum, especially rule based control by means of fuzzy logic. A ball balancer, implemented in a software simulator in Matlab, is used as a practical case study. The objectives of the tutorial are to teach the basics of fuzzy control......, and to show how to apply fuzzy logic in automatic control. The tutorial is distance learning, where students interact one-to-one with the teacher using e-mail....

  11. Fuzzy Control Tutorial

    DEFF Research Database (Denmark)

    Dotoli, M.; Jantzen, Jan

    1999-01-01

    The tutorial concerns automatic control of an inverted pendulum, especially rule based control by means of fuzzy logic. A ball balancer, implemented in a software simulator in Matlab, is used as a practical case study. The objectives of the tutorial are to teach the basics of fuzzy control, and t......, and to show how to apply fuzzy logic in automatic control. The tutorial is distance learning, where students interact one-to-one with the teacher using e-mail....

  12. Cerebellar and prefrontal cortex contributions to adaptation, strategies, and reinforcement learning.

    Science.gov (United States)

    Taylor, Jordan A; Ivry, Richard B

    2014-01-01

    Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question.

  13. Cognitively inspired reinforcement learning architecture and its application to giant-swing motion control.

    Science.gov (United States)

    Uragami, Daisuke; Takahashi, Tatsuji; Matsuo, Yoshiki

    2014-02-01

    Many algorithms and methods in artificial intelligence or machine learning were inspired by human cognition. As a mechanism to handle the exploration-exploitation dilemma in reinforcement learning, the loosely symmetric (LS) value function that models causal intuition of humans was proposed (Shinohara et al., 2007). While LS shows the highest correlation with causal induction by humans, it has been reported that it effectively works in multi-armed bandit problems that form the simplest class of tasks representing the dilemma. However, the scope of application of LS was limited to the reinforcement learning problems that have K actions with only one state (K-armed bandit problems). This study proposes LS-Q learning architecture that can deal with general reinforcement learning tasks with multiple states and delayed reward. We tested the learning performance of the new architecture in giant-swing robot motion learning, where uncertainty and unknown-ness of the environment is huge. In the test, the help of ready-made internal models or functional approximation of the state space were not given. The simulations showed that while the ordinary Q-learning agent does not reach giant-swing motion because of stagnant loops (local optima with low rewards), LS-Q escapes such loops and acquires giant-swing. It is confirmed that the smaller number of states is, in other words, the more coarse-grained the division of states and the more incomplete the state observation is, the better LS-Q performs in comparison with Q-learning. We also showed that the high performance of LS-Q depends comparatively little on parameter tuning and learning time. This suggests that the proposed method inspired by human cognition works adaptively in real environments.

  14. User/Tutor Optimal Learning Path in E-Learning Using Comprehensive Neuro-Fuzzy Approach

    Science.gov (United States)

    Fazlollahtabar, Hamed; Mahdavi, Iraj

    2009-01-01

    Internet evolution has affected all industrial, commercial, and especially learning activities in the new context of e-learning. Due to cost, time, or flexibility e-learning has been adopted by participators as an alternative training method. By development of computer-based devices and new methods of teaching, e-learning has emerged. The…

  15. User/Tutor Optimal Learning Path in E-Learning Using Comprehensive Neuro-Fuzzy Approach

    Science.gov (United States)

    Fazlollahtabar, Hamed; Mahdavi, Iraj

    2009-01-01

    Internet evolution has affected all industrial, commercial, and especially learning activities in the new context of e-learning. Due to cost, time, or flexibility e-learning has been adopted by participators as an alternative training method. By development of computer-based devices and new methods of teaching, e-learning has emerged. The…

  16. Autonomous Inter-Task Transfer in Reinforcement Learning Domains

    Science.gov (United States)

    2008-08-01

    because it is only learning the source task(s) to assist learning in the target task.4 This scenario can be thought of as expressing an “ engineering ...available actions. Skills are extracted using the ILP engine Aleph [Srinivasan, 2001] by using the F1 score (the harmonic mean of precision and recall). These...Aamodt and Enric Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches, 1994. Mazda Ahmadi, Matthew E

  17. Stress affects instrumental learning based on positive or negative reinforcement in interaction with personality in domestic horses.

    Science.gov (United States)

    Valenchon, Mathilde; Lévy, Frédéric; Moussu, Chantal; Lansade, Léa

    2017-01-01

    The present study investigated how stress affects instrumental learning performance in horses (Equus caballus) depending on the type of reinforcement. Horses were assigned to four groups (N = 15 per group); each group received training with negative or positive reinforcement in the presence or absence of stressors unrelated to the learning task. The instrumental learning task consisted of the horse entering one of two compartments at the appearance of a visual signal given by the experimenter. In the absence of stressors unrelated to the task, learning performance did not differ between negative and positive reinforcements. The presence of stressors unrelated to the task (exposure to novel and sudden stimuli) impaired learning performance. Interestingly, this learning deficit was smaller when the negative reinforcement was used. The negative reinforcement, considered as a stressor related to the task, could have counterbalanced the impact of the extrinsic stressor by focusing attention toward the learning task. In addition, learning performance appears to differ between certain dimensions of personality depending on the presence of stressors and the type of reinforcement. These results suggest that when negative reinforcement is used (i.e. stressor related to the task), the most fearful horses may be the best performers in the absence of stressors but the worst performers when stressors are present. On the contrary, when positive reinforcement is used, the most fearful horses appear to be consistently the worst performers, with and without exposure to stressors unrelated to the learning task. This study is the first to demonstrate in ungulates that stress affects learning performance differentially according to the type of reinforcement and in interaction with personality. It provides fundamental and applied perspectives in the understanding of the relationships between personality and training abilities.

  18. Fuzzy contractibility

    OpenAIRE

    GÜNER, Erdal

    2007-01-01

    Abstract. In this paper, .rstly some fundamental concepts are included re- lating to fuzzy topological spaces. Secondly, the fuzzy connected set is intro- duced. Finally, de.ning fuzzy contractible space, it is shown that X is a fuzzy contractible space if and only if X is fuzzy homotopic equivalent with a fuzzy single-point space.

  19. A neural learning approach for adaptive image restoration using a fuzzy model-based network architecture.

    Science.gov (United States)

    Wong, H S; Guan, L

    2001-01-01

    We address the problem of adaptive regularization in image restoration by adopting a neural-network learning approach. Instead of explicitly specifying the local regularization parameter values, they are regarded as network weights which are then modified through the supply of appropriate training examples. The desired response of the network is in the form of a gray level value estimate of the current pixel using weighted order statistic (WOS) filter. However, instead of replacing the previous value with this estimate, this is used to modify the network weights, or equivalently, the regularization parameters such that the restored gray level value produced by the network is closer to this desired response. In this way, the single WOS estimation scheme can allow appropriate parameter values to emerge under different noise conditions, rather than requiring their explicit selection in each occasion. In addition, we also consider the separate regularization of edges and textures due to their different noise masking capabilities. This in turn requires discriminating between these two feature types. Due to the inability of conventional local variance measures to distinguish these two high variance features, we propose the new edge-texture characterization (ETC) measure which performs this discrimination based on a scalar value only. This is then incorporated into a fuzzified form of the previous neural network which determines the degree of membership of each high variance pixel in two fuzzy sets, the EDGE and TEXTURE fuzzy sets, from the local ETC value, and then evaluates the appropriate regularization parameter by appropriately combining these two membership function values.

  20. A model of reward choice based on the theory of reinforcement learning.

    Science.gov (United States)

    Smirnitskaya, I A; Frolov, A A; Merzhanova, G Kh

    2008-03-01

    A model explaining behavioral "impulsivity" and "self-control" is proposed on the basis of the theory of reinforcement learning. The discount coefficient gamma, which in this theory accounts for the subjective reduction in the value of a delayed reinforcement, is identified with the overall level of dopaminergic neuron activity which, according to published data, also determines the behavioral variant. Computer modeling showed that high values of gamma are characteristic of predominantly "self-controlled" subjects, while smaller values of gamma are characteristic of "impulsive" subjects.

  1. Bi-directional effect of increasing doses of baclofen on reinforcement learning

    Directory of Open Access Journals (Sweden)

    Jean eTerrier

    2011-07-01

    Full Text Available In rodents as well as in humans, efficient reinforcement learning depends on dopamine (DA released from ventral tegmental area (VTA neurons. It has been shown that in brain slices of mice, GABAB-receptor agonists at low concentrations increase the firing frequency of VTA-DA neurons, while high concentrations reduce the firing frequency. It remains however elusive whether baclofen can modulate reinforcement learning. Here, in a double blind study in 34 healthy human volunteers, we tested the effects of a low and a high concentration of oral baclofen in a gambling task associated with monetary reward. A low (20 mg dose of baclofen increased the efficiency of reward-associated learning but had no effect on the avoidance of monetary loss. A high (50 mg dose of baclofen on the other hand did not affect the learning curve. At the end of the task, subjects who received 20 mg baclofen p.o. were more accurate in choosing the symbol linked to the highest probability of earning money compared to the control group (89.55±1.39% vs 81.07±1.55%, p=0.002. Our results support a model where baclofen, at low concentrations, causes a disinhibition of DA neurons, increases DA levels and thus facilitates reinforcement learning.

  2. Bi-directional effect of increasing doses of baclofen on reinforcement learning.

    Science.gov (United States)

    Terrier, Jean; Ort, Andres; Yvon, Cédric; Saj, Arnaud; Vuilleumier, Patrik; Lüscher, Christian

    2011-01-01

    In rodents as well as in humans, efficient reinforcement learning depends on dopamine (DA) released from ventral tegmental area (VTA) neurons. It has been shown that in brain slices of mice, GABA(B)-receptor agonists at low concentrations increase the firing frequency of VTA-DA neurons, while high concentrations reduce the firing frequency. It remains however elusive whether baclofen can modulate reinforcement learning in humans. Here, in a double-blind study in 34 healthy human volunteers, we tested the effects of a low and a high concentration of oral baclofen, a high affinity GABA(B)-receptor agonist, in a gambling task associated with monetary reward. A low (20 mg) dose of baclofen increased the efficiency of reward-associated learning but had no effect on the avoidance of monetary loss. A high (50 mg) dose of baclofen on the other hand did not affect the learning curve. At the end of the task, subjects who received 20 mg baclofen p.o. were more accurate in choosing the symbol linked to the highest probability of earning money compared to the control group (89.55 ± 1.39 vs. 81.07 ± 1.55%, p = 0.002). Our results support a model where baclofen, at low concentrations, causes a disinhibition of DA neurons, increases DA levels and thus facilitates reinforcement learning.

  3. A cholinergic feedback circuit to regulate striatal population uncertainty and optimize reinforcement learning.

    Science.gov (United States)

    Franklin, Nicholas T; Frank, Michael J

    2015-12-25

    Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments.

  4. Reinforced AdaBoost learning for object detection with local pattern representations.

    Science.gov (United States)

    Lee, Younghyun; Han, David K; Ko, Hanseok

    2013-01-01

    A reinforced AdaBoost learning algorithm is proposed for object detection with local pattern representations. In implementing AdaBoost learning, the proposed algorithm employs an exponential criterion as a cost function and Newton's method for its optimization. In particular, we introduce an optimal selection of weak classifiers minimizing the cost function and derive the reinforced predictions based on a judicial confidence estimate to determine the classification results. The weak classifier of the proposed method produces real-valued predictions while that of the conventional AdaBoost method produces integer valued predictions of +1 or -1. Hence, in the conventional learning algorithms, the entire sample weights are updated by the same rate. On the contrary, the proposed learning algorithm allows the sample weights to be updated individually depending on the confidence level of each weak classifier prediction, thereby reducing the number of weak classifier iterations for convergence. Experimental classification performance on human face and license plate images confirm that the proposed method requires smaller number of weak classifiers than the conventional learning algorithm, resulting in higher learning and faster classification rates. An object detector implemented based on the proposed learning algorithm yields better performance in field tests in terms of higher detection rate with lower false positives than that of the conventional learning algorithm.

  5. Robot-assisted motor training: assistance decreases exploration during reinforcement learning.

    Science.gov (United States)

    Sans-Muntadas, Albert; Duarte, Jaime E; Reinkensmeyer, David J

    2014-01-01

    Reinforcement learning (RL) is a form of motor learning that robotic therapy devices could potentially manipulate to promote neurorehabilitation. We developed a system that requires trainees to use RL to learn a predefined target movement. The system provides higher rewards for movements that are more similar to the target movement. We also developed a novel algorithm that rewards trainees of different abilities with comparable reward sizes. This algorithm measures a trainee's performance relative to their best performance, rather than relative to an absolute target performance, to determine reward. We hypothesized this algorithm would permit subjects who cannot normally achieve high reward levels to do so while still learning. In an experiment with 21 unimpaired human subjects, we found that all subjects quickly learned to make a first target movement with and without the reward equalization. However, artificially increasing reward decreased the subjects' tendency to engage in exploration and therefore slowed learning, particularly when we changed the target movement. An anti-slacking watchdog algorithm further slowed learning. These results suggest that robotic algorithms that assist trainees in achieving rewards or in preventing slacking might, over time, discourage the exploration needed for reinforcement learning.

  6. TEXPLORE temporal difference reinforcement learning for robots and time-constrained domains

    CERN Document Server

    Hester, Todd

    2013-01-01

    This book presents and develops new reinforcement learning methods that enable fast and robust learning on robots in real-time. Robots have the potential to solve many problems in society, because of their ability to work in dangerous places doing necessary jobs that no one wants or is able to do. One barrier to their widespread deployment is that they are mainly limited to tasks where it is possible to hand-program behaviors for every situation that may be encountered. For robots to meet their potential, they need methods that enable them to learn and adapt to novel situations that they were not programmed for. Reinforcement learning (RL) is a paradigm for learning sequential decision making processes and could solve the problems of learning and adaptation on robots. This book identifies four key challenges that must be addressed for an RL algorithm to be practical for robotic control tasks. These RL for Robotics Challenges are: 1) it must learn in very few samples; 2) it must learn in domains with continuou...

  7. The effects of aging on the interaction between reinforcement learning and attention.

    Science.gov (United States)

    Radulescu, Angela; Daniel, Reka; Niv, Yael

    2016-11-01

    Reinforcement learning (RL) in complex environments relies on selective attention to uncover those aspects of the environment that are most predictive of reward. Whereas previous work has focused on age-related changes in RL, it is not known whether older adults learn differently from younger adults when selective attention is required. In 2 experiments, we examined how aging affects the interaction between RL and selective attention. Younger and older adults performed a learning task in which only 1 stimulus dimension was relevant to predicting reward, and within it, 1 "target" feature was the most rewarding. Participants had to discover this target feature through trial and error. In Experiment 1, stimuli varied on 1 or 3 dimensions and participants received hints that revealed the target feature, the relevant dimension, or gave no information. Group-related differences in accuracy and RTs differed systematically as a function of the number of dimensions and the type of hint available. In Experiment 2 we used trial-by-trial computational modeling of the learning process to test for age-related differences in learning strategies. Behavior of both young and older adults was explained well by a reinforcement-learning model that uses selective attention to constrain learning. However, the model suggested that older adults restricted their learning to fewer features, employing more focused attention than younger adults. Furthermore, this difference in strategy predicted age-related deficits in accuracy. We discuss these results suggesting that a narrower filter of attention may reflect an adaptation to the reduced capabilities of the reinforcement learning system. (PsycINFO Database Record

  8. Spiking neural networks with different reinforcement learning (RL) schemes in a multiagent setting.

    Science.gov (United States)

    Christodoulou, Chris; Cleanthous, Aristodemos

    2010-12-31

    This paper investigates the effectiveness of spiking agents when trained with reinforcement learning (RL) in a challenging multiagent task. In particular, it explores learning through reward-modulated spike-timing dependent plasticity (STDP) and compares it to reinforcement of stochastic synaptic transmission in the general-sum game of the Iterated Prisoner's Dilemma (IPD). More specifically, a computational model is developed where we implement two spiking neural networks as two "selfish" agents learning simultaneously but independently, competing in the IPD game. The purpose of our system (or collective) is to maximise its accumulated reward in the presence of reward-driven competing agents within the collective. This can only be achieved when the agents engage in a behaviour of mutual cooperation during the IPD. Previously, we successfully applied reinforcement of stochastic synaptic transmission to the IPD game. The current study utilises reward-modulated STDP with eligibility trace and results show that the system managed to exhibit the desired behaviour by establishing mutual cooperation between the agents. It is noted that the cooperative outcome was attained after a relatively short learning period which enhanced the accumulation of reward by the system. As in our previous implementation, the successful application of the learning algorithm to the IPD becomes possible only after we extended it with additional global reinforcement signals in order to enhance competition at the neuronal level. Moreover it is also shown that learning is enhanced (as indicated by an increased IPD cooperative outcome) through: (i) strong memory for each agent (regulated by a high eligibility trace time constant) and (ii) firing irregularity produced by equipping the agents' LIF neurons with a partial somatic reset mechanism.

  9. Reinforcement Learning for Constrained Energy Trading Games With Incomplete Information.

    Science.gov (United States)

    Wang, Huiwei; Huang, Tingwen; Liao, Xiaofeng; Abu-Rub, Haitham; Chen, Guo

    2017-10-01

    This paper considers the problem of designing adaptive learning algorithms to seek the Nash equilibrium (NE) of the constrained energy trading game among individually strategic players with incomplete information. In this game, each player uses the learning automaton scheme to generate the action probability distribution based on his/her private information for maximizing his own averaged utility. It is shown that if one of admissible mixed-strategies converges to the NE with probability one, then the averaged utility and trading quantity almost surely converge to their expected ones, respectively. For the given discontinuous pricing function, the utility function has already been proved to be upper semicontinuous and payoff secure which guarantee the existence of the mixed-strategy NE. By the strict diagonal concavity of the regularized Lagrange function, the uniqueness of NE is also guaranteed. Finally, an adaptive learning algorithm is provided to generate the strategy probability distribution for seeking the mixed-strategy NE.

  10. Agnostic System Identification for Model-Based Reinforcement Learning

    CERN Document Server

    Ross, Stephane

    2012-01-01

    A fundamental problem in control is to learn a model of a system from observations that is useful for controller synthesis. To provide good performance guarantees, existing methods must assume that the real system is in the class of models considered during learning. We present an iterative method with strong guarantees even in the agnostic case where the system is not in the class. In particular, we show that any no-regret online learning algorithm can be used to obtain a near-optimal policy, provided some model achieves low training error and access to a good exploration distribution. Our approach applies to both discrete and continuous domains. We demonstrate its efficacy and scalability on a challenging helicopter domain from the literature.

  11. Homeostatic reinforcement learning for integrating reward collection and physiological stability.

    Science.gov (United States)

    Keramati, Mehdi; Gutkin, Boris

    2014-12-02

    Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated by internal states. Within this framework, we mathematically prove that seeking rewards is equivalent to the fundamental objective of physiological stability, defining the notion of physiological rationality of behavior. We further suggest a formal basis for temporal discounting of rewards by showing that discounting motivates animals to follow the shortest path in the space of physiological variables toward the desired setpoint. We also explain how animals learn to act predictively to preclude prospective homeostatic challenges, and several other behavioral patterns. Finally, we suggest a computational role for interaction between hypothalamus and the brain reward system.

  12. Heart Disease Diagnosis Utilizing Hybrid Fuzzy Wavelet Neural Network and Teaching Learning Based Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    Jamal Salahaldeen Majeed Alneamy

    2014-01-01

    Full Text Available Among the various diseases that threaten human life is heart disease. This disease is considered to be one of the leading causes of death in the world. Actually, the medical diagnosis of heart disease is a complex task and must be made in an accurate manner. Therefore, a software has been developed based on advanced computer technologies to assist doctors in the diagnostic process. This paper intends to use the hybrid teaching learning based optimization (TLBO algorithm and fuzzy wavelet neural network (FWNN for heart disease diagnosis. The TLBO algorithm is applied to enhance performance of the FWNN. The hybrid TLBO algorithm with FWNN is used to classify the Cleveland heart disease dataset obtained from the University of California at Irvine (UCI machine learning repository. The performance of the proposed method (TLBO_FWNN is estimated using K-fold cross validation based on mean square error (MSE, classification accuracy, and the execution time. The experimental results show that TLBO_FWNN has an effective performance for diagnosing heart disease with 90.29% accuracy and superior performance compared to other methods in the literature.

  13. Urban Traffic Control Using Adjusted Reinforcement Learning in a Multi-agent System

    Directory of Open Access Journals (Sweden)

    Mahshid Helali Moghadam

    2013-09-01

    Full Text Available Dynamism, continuous changes of states and the necessity to respond quickly are the specific characteristics of the environment in a traffic control system. Proposing an appropriate and flexible strategy to meet the existing requirements is always an important issue in traffic control. This study presents an adaptive approach to control urban traffic using multi-agent systems and a reinforcement learning augmented by an adjusting pre-learning stage. In this approach, the agent primarily uses some statistical traffic data and then uses traffic engineering theories for computing appropriate values of the traffic parameters. Having these primary values, the agents start the reinforcement learning based on the basic calculated information. The proposed approach, at first finds the approximate optimal zone for traffic parameters based on traffic engineering theories. Then using an appropriate reinforcement learning, it tries to exploit the best point according to different conditions. This approach was implemented on a network in traffic simulator software. The network was composed of six four phased intersections and 17 two lane streets. In the simulation, pedestrians were not considered in the system. The load of the network is defined in terms of Origin-Destination matrices whose entries represent the number of trips from an origin to a destination as a function of time. The simulation ran for five hours and an average traffic volume was used. According to the simulation results, the proposed approach behaved adaptively in different conditions and had better performance than the theory-based fixed-time control.

  14. The left hemisphere learns what is right: Hemispatial reward learning depends on reinforcement learning processes in the contralateral hemisphere.

    Science.gov (United States)

    Aberg, Kristoffer Carl; Doell, Kimberly Crystal; Schwartz, Sophie

    2016-08-01

    Orienting biases refer to consistent, trait-like direction of attention or locomotion toward one side of space. Recent studies suggest that such hemispatial biases may determine how well people memorize information presented in the left or right hemifield. Moreover, lesion studies indicate that learning rewarded stimuli in one hemispace depends on the integrity of the contralateral striatum. However, the exact neural and computational mechanisms underlying the influence of individual orienting biases on reward learning remain unclear. Because reward-based behavioural adaptation depends on the dopaminergic system and prediction error (PE) encoding in the ventral striatum, we hypothesized that hemispheric asymmetries in dopamine (DA) function may determine individual spatial biases in reward learning. To test this prediction, we acquired fMRI in 33 healthy human participants while they performed a lateralized reward task. Learning differences between hemispaces were assessed by presenting stimuli, assigned to different reward probabilities, to the left or right of central fixation, i.e. presented in the left or right visual hemifield. Hemispheric differences in DA function were estimated through differential fMRI responses to positive vs. negative feedback in the left vs. right ventral striatum, and a computational approach was used to identify the neural correlates of PEs. Our results show that spatial biases favoring reward learning in the right (vs. left) hemifield were associated with increased reward responses in the left hemisphere and relatively better neural encoding of PEs for stimuli presented in the right (vs. left) hemifield. These findings demonstrate that trait-like spatial biases implicate hemisphere-specific learning mechanisms, with individual differences between hemispheres contributing to reinforcing spatial biases.

  15. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia.

    Science.gov (United States)

    Markou, Athina; Salamone, John D; Bussey, Timothy J; Mar, Adam C; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-11-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu) meeting. A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits.

  16. Goal-directed and habit-like modulations of stimulus processing during reinforcement learning.

    Science.gov (United States)

    Luque, David; Beesley, Tom; Morris, Richard; Jack, Bradley N; Griffiths, Oren; Whitford, Thomas; Le Pelley, Mike E

    2017-02-13

    Recent research has shown that perceptual processing of stimuli previously associated with high-value rewards is automatically prioritized, even when rewards are no longer available. It has been hypothesized that such reward-related modulation of stimulus salience is conceptually similar to an 'attentional habit'. Recording event-related potentials in humans during a reinforcement learning task, we show strong evidence in favor of this hypothesis. Resistance to outcome devaluation (the defining feature of a habit) was shown by the stimulus-locked P1 component, reflecting activity in the extrastriate visual cortex. Analysis at longer latencies revealed a positive component (corresponding to the P3b, from 550 to 700ms) sensitive to outcome devaluation. Thus, distinct spatio-temporal patterns of brain activity were observed corresponding to habitual and goal-directed processes. These results demonstrate that reinforcement learning engages both attentional habits and goal-directed processes in parallel. Consequences for brain and computational models of reinforcement learning are discussed.Significance statementThe human attentional network adapts in order to detect stimuli that predict important rewards. A recent hypothesis suggests that the visual cortex automatically prioritizes reward-related stimuli, driven by cached representations of reward value -i.e., Stimulus-Response habits. Alternatively the neural system may track the current value of the predicted outcome. Our results demonstrate for the first time that visual cortex activity is increased for reward-related stimuli even when the rewarding event is temporarily devalued. In contrast, longer latency brain activity was specifically sensitive to transient changes in reward value. Therefore, we show that both habit-like attention and goal directed processes occur in the same learning episode at different latencies. This result has important consequences for computational models of reinforcement learning.

  17. Extraversion differentiates between model-based and model-free strategies in a reinforcement learning task

    Directory of Open Access Journals (Sweden)

    Anya eSkatova

    2013-09-01

    Full Text Available Prominent computational models describe a neural mechanism for learning from reward prediction errors, and it has been suggested that variations in this mechanism are reflected in personality factors such as trait extraversion. However, although trait extraversion has been linked to improved reward learning, it is not yet known whether this relationship is selective for the particular computational strategy associated with error-driven learning, known as model-free reinforcement learning, versus another strategy, model-based learning, which the brain is also known to employ. In the present study we test this relationship by examining whether humans’ scores on an extraversion scale predict individual differences in the balance between model-based and model-free learning strategies in a sequentially structured decision task designed to distinguish between them. In previous studies with this task, participants have shown a combination of both types of learning, but with substantial individual variation in the balance between them. In the current study, extraversion predicted worse behavior across both sorts of learning. However, the hypothesis that extraverts would be selectively better at model-free reinforcement learning held up among a subset of the more engaged participants, and overall, higher task engagement was associated with a more selective pattern by which extraversion predicted better model-free learning. The findings indicate a relationship between a broad personality orientation and detailed computational learning mechanisms. Results like those in the present study suggest an intriguing and rich relationship between core neuro-computational mechanisms and broader life orientations and outcomes.

  18. Reinforcement learning on slow features of high-dimensional input streams.

    Science.gov (United States)

    Legenstein, Robert; Wilbert, Niko; Wiskott, Laurenz

    2010-08-19

    Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA) network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning.

  19. Reinforcement learning on slow features of high-dimensional input streams.

    Directory of Open Access Journals (Sweden)

    Robert Legenstein

    Full Text Available Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning.

  20. The Virtual Learning Commons: Supporting the Fuzzy Front End of Scientific Research with Emerging Technologies

    Science.gov (United States)

    Pennington, D. D.; Gandara, A.; Gris, I.

    2012-12-01

    The Virtual Learning Commons (VLC), funded by the National Science Foundation Office of Cyberinfrastructure CI-Team Program, is a combination of Semantic Web, mash up, and social networking tools that supports knowledge sharing and innovation across scientific disciplines in research and education communities and networks. The explosion of scientific resources (data, models, algorithms, tools, and cyberinfrastructure) challenges the ability of researchers to be aware of resources that might benefit them. Even when aware, it can be difficult to understand enough about those resources to become potential adopters or re-users. Often scientific data and emerging technologies have little documentation, especially about the context of their use. The VLC tackles this challenge by providing mechanisms for individuals and groups of researchers to organize Web resources into virtual collections, and engage each other around those collections in order to a) learn about potentially relevant resources that are available; b) design research that leverages those resources; and c) develop initial work plans. The VLC aims to support the "fuzzy front end" of innovation, where novel ideas emerge and there is the greatest potential for impact on research design. It is during the fuzzy front end that conceptual collisions across disciplines and exposure to diverse perspectives provide opportunity for creative thinking that can lead to inventive outcomes. The VLC integrates Semantic Web functionality for structuring distributed information, mash up functionality for retrieving and displaying information, and social media for discussing/rating information. We are working to provide three views of information that support researchers in different ways: 1. Innovation Marketplace: supports users as they try to understand what research is being conducted, who is conducting it, where they are located, and who they collaborate with; 2. Conceptual Mapper: supports users as they organize their

  1. Stress Modulates Reinforcement Learning in Younger and Older Adults

    OpenAIRE

    Lighthall, Nichole R.; Gorlick, Marissa A.; Schoeke, Andrej; Frank, Michael J.; Mather, Mara

    2012-01-01

    Animal research and human neuroimaging studies indicate that stress increases dopamine levels in brain regions involved in reward processing and stress also appears to increase the attractiveness of addictive drugs. The current study tested the hypothesis that stress increases reward salience, leading to more effective learning about positive than negative outcomes in a probabilistic selection task. Changes to dopamine pathways with age raise the question of whether stress effects on incentiv...

  2. Prediction error in reinforcement learning: a meta-analysis of neuroimaging studies.

    Science.gov (United States)

    Garrison, Jane; Erdeniz, Burak; Done, John

    2013-08-01

    Activation likelihood estimation (ALE) meta-analyses were used to examine the neural correlates of prediction error in reinforcement learning. The findings are interpreted in the light of current computational models of learning and action selection. In this context, particular consideration is given to the comparison of activation patterns from studies using instrumental and Pavlovian conditioning, and where reinforcement involved rewarding or punishing feedback. The striatum was the key brain area encoding for prediction error, with activity encompassing dorsal and ventral regions for instrumental and Pavlovian reinforcement alike, a finding which challenges the functional separation of the striatum into a dorsal 'actor' and a ventral 'critic'. Prediction error activity was further observed in diverse areas of predominantly anterior cerebral cortex including medial prefrontal cortex and anterior cingulate cortex. Distinct patterns of prediction error activity were found for studies using rewarding and aversive reinforcers; reward prediction errors were observed primarily in the striatum while aversive prediction errors were found more widely including insula and habenula.

  3. Genetic Learning of Fuzzy Expert Systems for Decision Support in the Automated Process of Wooden Boards Cutting

    Directory of Open Access Journals (Sweden)

    Yaroslav MATSYSHYN

    2014-03-01

    Full Text Available Sawing solid wood (lumber, wooden boards into blanks is an important technological operation, which has significant influence on the efficiency of the woodworking industry as a whole. Selecting a rational variant of lumber cutting is a complex multicriteria problem with many stochastic factors, characterized by incomplete information and fuzzy attributes. About this property by currently used automatic optimizing cross-cut saw is not always rational use of wood raw material. And since the optimization algorithms of these saw functions as a “black box”, their improvement is not possible. Therefore topical the task of developing a new approach to the optimal cross-cutting that takes into account stochastic properties of wood as a material from biological origin. Here we propose a new approach to the problem of lumber optimal cutting in the conditions of uncertainty of lumber quantity and fuzziness lengths of defect-free areas. To account for these conditions, we applied the methods of fuzzy sets theory and used a genetic algorithm to simulate the process of human learning in the implementation the technological operation. Thus, the rules of behavior with yet another defect-free area is defined in fuzzy expert system that can be configured to perform specific production tasks using genetic algorithm. The author's implementation of the genetic algorithm is used to set up the parameters of fuzzy expert system. Working capacity of the developed system verified on simulated and real-world data. Implementation of this approach will make it suitable for the control of automated or fully automatic optimizing cross cutting of solid wood.

  4. NNF and NNPrF—Fuzzy Petri Nets Based on Neural Network for Knowledge Representation,Reasoning and Learning

    Institute of Scientific and Technical Information of China (English)

    周奕; 吴时霖

    1996-01-01

    This paper proposes NNF-a fuzzy Petri Net system based on neural network for proposition logic repesentation,and gives the formal definition of NNF.For the NNF model,forward reasoning algorithm,backward reasoning algorithm and knowledge learning algorithm are discussed based on weight training algorithm of neural network-Back Propagation algorithm.Thus NNF is endowed with the ability of learning a rule.The paper concludes with a discussion on extending NNF to predicate logic,forming NNPrF,and proposing the formal definition and a reasoning algorithm of NNPrF.

  5. Fuzzeval: A Fuzzy Controller-Based Approach in Adaptive Learning for Backgammon Game

    DEFF Research Database (Denmark)

    Heinze, Mikael; Ortiz-Arroyo, Daniel; Larsen, Henrik Legind

    2005-01-01

    In this paper we investigate the effectiveness of applying fuzzy controllers to create strong computer player programs in the domain of backgammon. Fuzzeval, our proposed mechanism, consists of a fuzzy controller that dynamically evaluates the perceived strength of the board configurations it re-...

  6. Selection for Reinforcement-Free Learning Ability as an Organizing Factor in the Evolution of Cognition

    Directory of Open Access Journals (Sweden)

    Solvi Arnold

    2013-01-01

    Full Text Available This research explores the relation between environmental structure and neurocognitive structure. We hypothesize that selection pressure on abilities for efficient learning (especially in settings with limited or no reward information translates into selection pressure on correspondence relations between neurocognitive and environmental structure, since such correspondence allows for simple changes in the environment to be handled with simple learning updates in neurocognitive structure. We present a model in which a simple form of reinforcement-free learning is evolved in neural networks using neuromodulation and analyze the effect this selection for learning ability has on the virtual species' neural organization. We find a higher degree of organization than in a control population evolved without learning ability and discuss the relation between the observed neural structure and the environmental structure. We discuss our findings in the context of the environmental complexity thesis, the Baldwin effect, and other interactions between adaptation processes.

  7. Channel Decision in Cognitive Radio Enabled Sensor Networks: A Reinforcement Learning Approach

    Directory of Open Access Journals (Sweden)

    Joshua Abolarinwa

    2015-08-01

    Full Text Available Recent advancements in the field of cognitive radio technology have paved way for cognitive radio-based wireless sensor networks. This has been tipped to be the next generation sensor. Spectrum sensing and energy efficient channel access are two important operations in this network. In this paper, we propose the use of machine learning and decision making capability of reinforcement learning to address the problem of energy efficiency associated with channel access in cognitive radio aided sensor networks. A simple learning algorithm was developed to improve network parameters such as secondary user throughput, channel availability in relation to the sensing time. Comparing the results obtained from simulations with other channel access without intelligent learning such as random channel assignment and dynamic channel assignment, the learning algorithm produced better performance in terms of throughput, energy efficiency and other quality of service requirement of the network application.

  8. The Study of Reinforcement Learning for Traffic Self-Adaptive Control under Multiagent Markov Game Environment

    Directory of Open Access Journals (Sweden)

    Lun-Hui Xu

    2013-01-01

    Full Text Available Urban traffic self-adaptive control problem is dynamic and uncertain, so the states of traffic environment are hard to be observed. Efficient agent which controls a single intersection can be discovered automatically via multiagent reinforcement learning. However, in the majority of the previous works on this approach, each agent needed perfect observed information when interacting with the environment and learned individually with less efficient coordination. This study casts traffic self-adaptive control as a multiagent Markov game problem. The design employs traffic signal control agent (TSCA for each signalized intersection that coordinates with neighboring TSCAs. A mathematical model for TSCAs’ interaction is built based on nonzero-sum markov game which has been applied to let TSCAs learn how to cooperate. A multiagent Markov game reinforcement learning approach is constructed on the basis of single-agent Q-learning. This method lets each TSCA learn to update its Q-values under the joint actions and imperfect information. The convergence of the proposed algorithm is analyzed theoretically. The simulation results show that the proposed method is convergent and effective in realistic traffic self-adaptive control setting.

  9. Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states.

    Science.gov (United States)

    Cui, Yunduan; Matsubara, Takamitsu; Sugimoto, Kenji

    2017-06-29

    We propose a new value function approach for model-free reinforcement learning in Markov decision processes involving high dimensional states that addresses the issues of brittleness and intractable computational complexity, therefore rendering the value function approach based reinforcement learning algorithms applicable to high dimensional systems. Our new algorithm, Kernel Dynamic Policy Programming (KDPP) smoothly updates the value function in accordance to the Kullback-Leibler divergence between current and updated policies. Stabilizing the learning in this manner enables the application of the kernel trick to value function approximation, which greatly reduces computational requirements for learning in high dimensional state spaces. The performance of KDPP against other kernel trick based value function approaches is first investigated in a simulated n DOF manipulator reaching task, where only KDPP efficiently learned a viable policy at n=40. As an application to a real world high dimensional robot system, KDPP successfully learned the task of unscrewing a bottle cap via a Pneumatic Artificial Muscle (PAM) driven robotic hand with tactile sensors; a system with a state space of 32 dimensions, while given limited samples and with ordinary computing resources. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Single photon in hierarchical architecture for physical reinforcement learning: Photon intelligence

    CERN Document Server

    Naruse, Makoto; Drezet, Aurélien; Huant, Serge; Hori, Hirokazu; Kim, Song-Ju

    2016-01-01

    Understanding and using natural processes for intelligent functionalities, referred to as natural intelligence, has recently attracted interest from a variety of fields, including post-silicon computing for artificial intelligence and decision making in the behavioural sciences. In a past study, we successfully used the wave-particle duality of single photons to solve the two-armed bandit problem, which constitutes the foundation of reinforcement learning and decision making. In this study, we propose and confirm a hierarchical architecture for single-photon-based reinforcement learning and decision making that verifies the scalability of the principle. Specifically, the four-armed bandit problem is solved given zero prior knowledge in a two-layer hierarchical architecture, where polarization is autonomously adapted in order to effect adequate decision making using single-photon measurements. In the hierarchical structure, the notion of layer-dependent decisions emerges. The optimal solutions in the coarse la...

  11. A ciosed-loop algorithm to detect human face using color and reinforcement learning

    Institute of Scientific and Technical Information of China (English)

    吴东晖; 叶秀清; 顾伟康

    2002-01-01

    A closed-loop algorithm to detect human face using color information and reinforcement learning is presented in this paper. By using a skin-color selector, the regions with color "like" that of human skin are selected as candidates for human face. In the next stage, the candidates are matched with a face model and given an evaluation of the match degree by the matching module. And if the evaluation of the match result is too low, a reinforcement learning stage will start to search the best parameters of the skin-color selector. It has been tested using many photos of various ethnic groups under various lighting conditions, such as different light source, high light and shadow. And the experiment result proved that this algorithm is robust to the vary-ing lighting conditions and personal conditions.

  12. A closed-loop algorithm to detect human face using color and reinforcement learning

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    A closed-loop algorithm to detect human face using color information and reinforcement learning is presented in this paper. By using a skin-color selector, the regions with color “like" that of human skin are selected as candidates for human face. In the next stage, the candidates are matched with a face model and given an evaluation of the match degree by the matching module. And if the evaluation of the match result is too low, a reinforcement learning stage will start to search the best parameters of the skin-color selector. It has been tested using many photos of various ethnic groups under various lighting conditions, such as different light source, high light and shadow. And the experiment result proved that this algorithm is robust to the varying lighting conditions and personal conditions.

  13. Genetic Scheduling and Reinforcement Learning in Multirobot Systems for Intelligent Warehouses

    Directory of Open Access Journals (Sweden)

    Jiajia Dou

    2015-01-01

    Full Text Available A new hybrid solution is presented to improve the efficiency of intelligent warehouses with multirobot systems, where the genetic algorithm (GA based task scheduling is combined with reinforcement learning (RL based path planning for mobile robots. Reinforcement learning is an effective approach to search for a collision-free path in unknown dynamic environments. Genetic algorithm is a simple but splendid evolutionary search method that provides very good solutions for task allocation. In order to achieve higher efficiency of the intelligent warehouse system, we design a new solution by combining these two techniques and provide an effective and alternative way compared with other state-of-the-art methods. Simulation results demonstrate the effectiveness of the proposed approach regarding the optimization of travel time and overall efficiency of the intelligent warehouse system.

  14. Reinforcement learning and counterfactual reasoning explain adaptive behavior in a changing environment.

    Science.gov (United States)

    Zhang, Yunfeng; Paik, Jaehyon; Pirolli, Peter

    2015-04-01

    Animals routinely adapt to changes in the environment in order to survive. Though reinforcement learning may play a role in such adaptation, it is not clear that it is the only mechanism involved, as it is not well suited to producing rapid, relatively immediate changes in strategies in response to environmental changes. This research proposes that counterfactual reasoning might be an additional mechanism that facilitates change detection. An experiment is conducted in which a task state changes over time and the participants had to detect the changes in order to perform well and gain monetary rewards. A cognitive model is constructed that incorporates reinforcement learning with counterfactual reasoning to help quickly adjust the utility of task strategies in response to changes. The results show that the model can accurately explain human data and that counterfactual reasoning is key to reproducing the various effects observed in this change detection paradigm.

  15. Ventral tegmental area neurons in learned appetitive behavior and positive reinforcement.

    Science.gov (United States)

    Fields, Howard L; Hjelmstad, Gregory O; Margolis, Elyssa B; Nicola, Saleem M

    2007-01-01

    Ventral tegmental area (VTA) neuron firing precedes behaviors elicited by reward-predictive sensory cues and scales with the magnitude and unpredictability of received rewards. These patterns are consistent with roles in the performance of learned appetitive behaviors and in positive reinforcement, respectively. The VTA includes subpopulations of neurons with different afferent connections, neurotransmitter content, and projection targets. Because the VTA and substantia nigra pars compacta are the sole sources of striatal and limbic forebrain dopamine, measurements of dopamine release and manipulations of dopamine function have provided critical evidence supporting a VTA contribution to these functions. However, the VTA also sends GABAergic and glutamatergic projections to the nucleus accumbens and prefrontal cortex. Furthermore, VTA-mediated but dopamine-independent positive reinforcement has been demonstrated. Consequently, identifying the neurotransmitter content and projection target of VTA neurons recorded in vivo will be critical for determining their contribution to learned appetitive behaviors.

  16. An information-theoretic analysis of return maximization in reinforcement learning.

    Science.gov (United States)

    Iwata, Kazunori

    2011-12-01

    We present a general analysis of return maximization in reinforcement learning. This analysis does not require assumptions of Markovianity, stationarity, and ergodicity for the stochastic sequential decision processes of reinforcement learning. Instead, our analysis assumes the asymptotic equipartition property fundamental to information theory, providing a substantially different view from that in the literature. As our main results, we show that return maximization is achieved by the overlap of typical and best sequence sets, and we present a class of stochastic sequential decision processes with the necessary condition for return maximization. We also describe several examples of best sequences in terms of return maximization in the class of stochastic sequential decision processes, which satisfy the necessary condition.

  17. A Clustering and SVM Regression Learning-Based Spatiotemporal Fuzzy Logic Controller with Interpretable Structure for Spatially Distributed Systems

    Directory of Open Access Journals (Sweden)

    Xian-xia Zhang

    2012-01-01

    Full Text Available Many industrial processes and physical systems are spatially distributed systems. Recently, a novel 3-D FLC was developed for such systems. The previous study on the 3-D FLC was concentrated on an expert knowledge-based approach. However, in most of situations, we may lack the expert knowledge, while input-output data sets hidden with effective control laws are usually available. Under such circumstance, a data-driven approach could be a very effective way to design the 3-D FLC. In this study, we aim at developing a new 3-D FLC design methodology based on clustering and support vector machine (SVM regression. The design consists of three parts: initial rule generation, rule-base simplification, and parameter learning. Firstly, the initial rules are extracted by a nearest neighborhood clustering algorithm with Frobenius norm as a distance. Secondly, the initial rule-base is simplified by merging similar 3-D fuzzy sets and similar 3-D fuzzy rules based on similarity measure technique. Thirdly, the consequent parameters are learned by a linear SVM regression algorithm. Additionally, the universal approximation capability of the proposed 3-D fuzzy system is discussed. Finally, the control of a catalytic packed-bed reactor is taken as an application to demonstrate the effectiveness of the proposed 3-D FLC design.

  18. A Flexible Mechanism of Rule Selection Enables Rapid Feature-Based Reinforcement Learning.

    Science.gov (United States)

    Balcarras, Matthew; Womelsdorf, Thilo

    2016-01-01

    Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or color) and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or color. Two-thirds of subjects (n = 22/32) exhibited behavior that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behavior of other subjects (n = 10/32) was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioral rules by leveraging simple model-free reinforcement learning and context

  19. A flexible mechanism of rule selection enables rapid feature-based reinforcement learning

    Directory of Open Access Journals (Sweden)

    Matthew eBalcarras

    2016-03-01

    Full Text Available Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or colour and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or colour. Two-thirds of subjects (n=22/32 exhibited behaviour that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behaviour of other subjects (n=10/32 was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioural rules by leveraging simple model-free reinforcement

  20. Scaled Free-Energy Based Reinforcement Learning for Robust and Efficient Learning in High-Dimensional State Spaces

    Directory of Open Access Journals (Sweden)

    Stefan eElfwing

    2013-02-01

    Full Text Available Free-energy based reinforcement learning was proposed for learning in high-dimensional state- and action spaces, which cannot be handled by standard function approximation methods. In this study, we propose a scaled version of free-energy based reinforcement learning to achieve more robust and more efficient learning performance. The action-value function is approximated by the negative free energy of a restricted Boltzmann machine, divided by a constant scaling factor that is related to the size of the Boltzmann machine (the square root of the number of state nodes in this study. Our first task is a digit floor gridworld task, where the states are represented by images of handwritten digits from the MNIST data set. The purpose of the task is to investigate the proposed method's ability, through the extraction of task-relevant features in the hidden layer, to cluster images of the same digitand to cluster images of different digits that corresponds to states with the same optimal action. We also test the method's robustness with respect to different exploration schedules, i.e., different settings of the initial temperature and the temperature discount rate in softmax action selection. Our second task is a robot visual navigation task, where the robot can learn its position by the different colors of the lower part of four landmarks and it can infer the correct corner goal area by the color of the upper part of the landmarks. The state space consists of binarized camera images with, at most, nine different colors, which is equal to 6642 binary states. For both tasks, the learning performance is compared with standard free-energy based reinforcement learning and with function approximation where the action-value function is approximated by a two-layered feedforward neural network.