WorldWideScience

Sample records for robotic reinforcement learning

  1. Framework for robot skill learning using reinforcement learning

    Science.gov (United States)

    Wei, Yingzi; Zhao, Mingyang

    2003-09-01

    Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.

  2. Decentralized Reinforcement Learning of robot behaviors

    NARCIS (Netherlands)

    Leottau, David L.; Ruiz-del-Solar, Javier; Babuska, R.

    2018-01-01

    A multi-agent methodology is proposed for Decentralized Reinforcement Learning (DRL) of individual behaviors in problems where multi-dimensional action spaces are involved. When using this methodology, sub-tasks are learned in parallel by individual agents working toward a common goal. In

  3. Emotion in reinforcement learning agents and robots : A survey

    NARCIS (Netherlands)

    Moerland, T.M.; Broekens, D.J.; Jonker, C.M.

    2018-01-01

    This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action

  4. Optimized Assistive Human-Robot Interaction Using Reinforcement Learning.

    Science.gov (United States)

    Modares, Hamidreza; Ranatunga, Isura; Lewis, Frank L; Popa, Dan O

    2016-03-01

    An intelligent human-robot interaction (HRI) system with adjustable robot behavior is presented. The proposed HRI system assists the human operator to perform a given task with minimum workload demands and optimizes the overall human-robot system performance. Motivated by human factor studies, the presented control structure consists of two control loops. First, a robot-specific neuro-adaptive controller is designed in the inner loop to make the unknown nonlinear robot behave like a prescribed robot impedance model as perceived by a human operator. In contrast to existing neural network and adaptive impedance-based control methods, no information of the task performance or the prescribed robot impedance model parameters is required in the inner loop. Then, a task-specific outer-loop controller is designed to find the optimal parameters of the prescribed robot impedance model to adjust the robot's dynamics to the operator skills and minimize the tracking error. The outer loop includes the human operator, the robot, and the task performance details. The problem of finding the optimal parameters of the prescribed robot impedance model is transformed into a linear quadratic regulator (LQR) problem which minimizes the human effort and optimizes the closed-loop behavior of the HRI system for a given task. To obviate the requirement of the knowledge of the human model, integral reinforcement learning is used to solve the given LQR problem. Simulation results on an x - y table and a robot arm, and experimental implementation results on a PR2 robot confirm the suitability of the proposed method.

  5. Reinforcement Learning on autonomous humanoid robots

    NARCIS (Netherlands)

    Schuitema, E.

    2012-01-01

    Service robots have the potential to be of great value in households, health care and other labor intensive environments. However, these environments are typically unique, not very structured and frequently changing, which makes it difficult to make service robots robust and versatile through manual

  6. Emotion in reinforcement learning agents and robots: A survey

    OpenAIRE

    Moerland, T.M.; Broekens, D.J.; Jonker, C.M.

    2018-01-01

    This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action selection. Therefore, computational emotion models are usually grounded in the agent's decision making architecture, of which RL is an important subclass. Studying emotions in RL-based agents is useful for ...

  7. Multiagent Reinforcement Learning with Regret Matching for Robot Soccer

    Directory of Open Access Journals (Sweden)

    Qiang Liu

    2013-01-01

    Full Text Available This paper proposes a novel multiagent reinforcement learning (MARL algorithm Nash- learning with regret matching, in which regret matching is used to speed up the well-known MARL algorithm Nash- learning. It is critical that choosing a suitable strategy for action selection to harmonize the relation between exploration and exploitation to enhance the ability of online learning for Nash- learning. In Markov Game the joint action of agents adopting regret matching algorithm can converge to a group of points of no-regret that can be viewed as coarse correlated equilibrium which includes Nash equilibrium in essence. It is can be inferred that regret matching can guide exploration of the state-action space so that the rate of convergence of Nash- learning algorithm can be increased. Simulation results on robot soccer validate that compared to original Nash- learning algorithm, the use of regret matching during the learning phase of Nash- learning has excellent ability of online learning and results in significant performance in terms of scores, average reward and policy convergence.

  8. Safe robot execution in model-based reinforcement learning

    OpenAIRE

    Martínez Martínez, David; Alenyà Ribas, Guillem; Torras, Carme

    2015-01-01

    Task learning in robotics requires repeatedly executing the same actions in different states to learn the model of the task. However, in real-world domains, there are usually sequences of actions that, if executed, may produce unrecoverable errors (e.g. breaking an object). Robots should avoid repeating such errors when learning, and thus explore the state space in a more intelligent way. This requires identifying dangerous action effects to avoid including such actions in the generated plans...

  9. Challenges in adapting imitation and reinforcement learning to compliant robots

    Directory of Open Access Journals (Sweden)

    Calinon Sylvain

    2011-12-01

    Full Text Available There is an exponential increase of the range of tasks that robots are forecasted to accomplish. (Reprogramming these robots becomes a critical issue for their commercialization and for their applications to real-world scenarios in which users without expertise in robotics wish to adapt the robot to their needs. This paper addresses the problem of designing userfriendly human-robot interfaces to transfer skills in a fast and efficient manner. This paper presents recent work conducted at the Learning and Interaction group at ADVR-IIT, ranging from skill acquisition through kinesthetic teaching to self-refinement strategies initiated from demonstrations. Our group started to explore the use of imitation and exploration strategies that can take advantage of the compliant capabilities of recent robot hardware and control architectures.

  10. Intrinsically motivated reinforcement learning for human-robot interaction in the real-world.

    Science.gov (United States)

    Qureshi, Ahmed Hussain; Nakamura, Yutaka; Yoshikawa, Yuichiro; Ishiguro, Hiroshi

    2018-03-26

    For a natural social human-robot interaction, it is essential for a robot to learn the human-like social skills. However, learning such skills is notoriously hard due to the limited availability of direct instructions from people to teach a robot. In this paper, we propose an intrinsically motivated reinforcement learning framework in which an agent gets the intrinsic motivation-based rewards through the action-conditional predictive model. By using the proposed method, the robot learned the social skills from the human-robot interaction experiences gathered in the real uncontrolled environments. The results indicate that the robot not only acquired human-like social skills but also took more human-like decisions, on a test dataset, than a robot which received direct rewards for the task achievement. Copyright © 2018 Elsevier Ltd. All rights reserved.

  11. Temporal Memory Reinforcement Learning for the Autonomous Micro-mobile Robot Based-behavior

    Institute of Scientific and Technical Information of China (English)

    Yang Yujun(杨玉君); Cheng Junshi; Chen Jiapin; Li Xiaohai

    2004-01-01

    This paper presents temporal memory reinforcement learning for the autonomous micro-mobile robot based-behavior. Human being has a memory oblivion process, i.e. the earlier to memorize, the earlier to forget, only the repeated thing can be remembered firmly. Enlightening forms this, and the robot need not memorize all the past states, at the same time economizes the EMS memory space, which is not enough in the MPU of our AMRobot. The proposed algorithm is an extension of the Q-learning, which is an incremental reinforcement learning method. The results of simulation have shown that the algorithm is valid.

  12. TEXPLORE temporal difference reinforcement learning for robots and time-constrained domains

    CERN Document Server

    Hester, Todd

    2013-01-01

    This book presents and develops new reinforcement learning methods that enable fast and robust learning on robots in real-time. Robots have the potential to solve many problems in society, because of their ability to work in dangerous places doing necessary jobs that no one wants or is able to do. One barrier to their widespread deployment is that they are mainly limited to tasks where it is possible to hand-program behaviors for every situation that may be encountered. For robots to meet their potential, they need methods that enable them to learn and adapt to novel situations that they were not programmed for. Reinforcement learning (RL) is a paradigm for learning sequential decision making processes and could solve the problems of learning and adaptation on robots. This book identifies four key challenges that must be addressed for an RL algorithm to be practical for robotic control tasks. These RL for Robotics Challenges are: 1) it must learn in very few samples; 2) it must learn in domains with continuou...

  13. Vision-based Navigation and Reinforcement Learning Path Finding for Social Robots

    OpenAIRE

    Pérez Sala, Xavier

    2010-01-01

    We propose a robust system for automatic Robot Navigation in uncontrolled en- vironments. The system is composed by three main modules: the Arti cial Vision module, the Reinforcement Learning module, and the behavior control module. The aim of the system is to allow a robot to automatically nd a path that arrives to a pre xed goal. Turn and straight movements in uncontrolled environments are automatically estimated and controlled using the proposed modules. The Arti cial Vi...

  14. Intrinsic interactive reinforcement learning - Using error-related potentials for real world human-robot interaction.

    Science.gov (United States)

    Kim, Su Kyoung; Kirchner, Elsa Andrea; Stefes, Arne; Kirchner, Frank

    2017-12-14

    Reinforcement learning (RL) enables robots to learn its optimal behavioral strategy in dynamic environments based on feedback. Explicit human feedback during robot RL is advantageous, since an explicit reward function can be easily adapted. However, it is very demanding and tiresome for a human to continuously and explicitly generate feedback. Therefore, the development of implicit approaches is of high relevance. In this paper, we used an error-related potential (ErrP), an event-related activity in the human electroencephalogram (EEG), as an intrinsically generated implicit feedback (rewards) for RL. Initially we validated our approach with seven subjects in a simulated robot learning scenario. ErrPs were detected online in single trial with a balanced accuracy (bACC) of 91%, which was sufficient to learn to recognize gestures and the correct mapping between human gestures and robot actions in parallel. Finally, we validated our approach in a real robot scenario, in which seven subjects freely chose gestures and the real robot correctly learned the mapping between gestures and actions (ErrP detection (90% bACC)). In this paper, we demonstrated that intrinsically generated EEG-based human feedback in RL can successfully be used to implicitly improve gesture-based robot control during human-robot interaction. We call our approach intrinsic interactive RL.

  15. Bio-robots automatic navigation with graded electric reward stimulation based on Reinforcement Learning.

    Science.gov (United States)

    Zhang, Chen; Sun, Chao; Gao, Liqiang; Zheng, Nenggan; Chen, Weidong; Zheng, Xiaoxiang

    2013-01-01

    Bio-robots based on brain computer interface (BCI) suffer from the lack of considering the characteristic of the animals in navigation. This paper proposed a new method for bio-robots' automatic navigation combining the reward generating algorithm base on Reinforcement Learning (RL) with the learning intelligence of animals together. Given the graded electrical reward, the animal e.g. the rat, intends to seek the maximum reward while exploring an unknown environment. Since the rat has excellent spatial recognition, the rat-robot and the RL algorithm can convergent to an optimal route by co-learning. This work has significant inspiration for the practical development of bio-robots' navigation with hybrid intelligence.

  16. Reinforcement function design and bias for efficient learning in mobile robots

    International Nuclear Information System (INIS)

    Touzet, C.; Santos, J.M.

    1998-01-01

    The main paradigm in sub-symbolic learning robot domain is the reinforcement learning method. Various techniques have been developed to deal with the memorization/generalization problem, demonstrating the superior ability of artificial neural network implementations. In this paper, the authors address the issue of designing the reinforcement so as to optimize the exploration part of the learning. They also present and summarize works relative to the use of bias intended to achieve the effective synthesis of the desired behavior. Demonstrative experiments involving a self-organizing map implementation of the Q-learning and real mobile robots (Nomad 200 and Khepera) in a task of obstacle avoidance behavior synthesis are described. 3 figs., 5 tabs

  17. Performance Comparison of Two Reinforcement Learning Algorithms for Small Mobile Robots

    Czech Academy of Sciences Publication Activity Database

    Neruda, Roman; Slušný, Stanislav

    2009-01-01

    Roč. 2, č. 1 (2009), s. 59-68 ISSN 2005-4297 R&D Projects: GA MŠk(CZ) 1M0567 Grant - others:GA UK(CZ) 7637/2007 Institutional research plan: CEZ:AV0Z10300504 Keywords : reinforcement learning * mobile robots * inteligent agents Subject RIV: IN - Informatics, Computer Science http://www.sersc.org/journals/IJCA/vol2_no1/7.pdf

  18. Reinforcement learning for a biped robot based on a CPG-actor-critic method.

    Science.gov (United States)

    Nakamura, Yutaka; Mori, Takeshi; Sato, Masa-aki; Ishii, Shin

    2007-08-01

    Animals' rhythmic movements, such as locomotion, are considered to be controlled by neural circuits called central pattern generators (CPGs), which generate oscillatory signals. Motivated by this biological mechanism, studies have been conducted on the rhythmic movements controlled by CPG. As an autonomous learning framework for a CPG controller, we propose in this article a reinforcement learning method we call the "CPG-actor-critic" method. This method introduces a new architecture to the actor, and its training is roughly based on a stochastic policy gradient algorithm presented recently. We apply this method to an automatic acquisition problem of control for a biped robot. Computer simulations show that training of the CPG can be successfully performed by our method, thus allowing the biped robot to not only walk stably but also adapt to environmental changes.

  19. Study and Application of Reinforcement Learning in Cooperative Strategy of the Robot Soccer Based on BDI Model

    Directory of Open Access Journals (Sweden)

    Wu Bo-ying

    2009-11-01

    Full Text Available The dynamic cooperation model of multi-Agent is formed by combining reinforcement learning with BDI model. In this model, the concept of the individual optimization loses its meaning, because the repayment of each Agent dose not only depend on itsself but also on the choice of other Agents. All Agents can pursue a common optimum solution and try to realize the united intention as a whole to a maximum limit. The robot moves to its goal, depending on the present positions of the other robots that cooperate with it and the present position of the ball. One of these robots cooperating with it is controlled to move by man with a joystick. In this way, Agent can be ensured to search for each state-action as frequently as possible when it carries on choosing movements, so as to shorten the time of searching for the movement space so that the convergence speed of reinforcement learning can be improved. The validity of the proposed cooperative strategy for the robot soccer has been proved by combining theoretical analysis with simulation robot soccer match (11vs11 .

  20. Reaching control of a full-torso, modelled musculoskeletal robot using muscle synergies emergent under reinforcement learning

    International Nuclear Information System (INIS)

    Diamond, A; Holland, O E

    2014-01-01

    ‘Anthropomimetic’ robots mimic both human morphology and internal structure—skeleton, muscles, compliance and high redundancy—thus presenting a formidable challenge to conventional control. Here we derive a novel controller for this class of robot which learns effective reaching actions through the sustained activation of weighted muscle synergies, an approach which draws upon compelling, recent evidence from animal and human studies, but is almost unexplored to date in the musculoskeletal robot literature. Since the effective synergy patterns for a given robot will be unknown, we derive a reinforcement-learning approach intended to allow their emergence, in particular those patterns aiding linearization of control. Using an extensive physics-based model of the anthropomimetic ECCERobot, we find that effective reaching actions can be learned comprising only two sequential motor co-activation patterns, each controlled by just a single common driving signal. Factor analysis shows the emergent muscle co-activations can be largely reconstructed using weighted combinations of only 13 common fragments. Testing these ‘candidate’ synergies as drivable units, the same controller now learns the reaching task both faster and better. (paper)

  1. Assist-as-needed robotic trainer based on reinforcement learning and its application to dart-throwing.

    Science.gov (United States)

    Obayashi, Chihiro; Tamei, Tomoya; Shibata, Tomohiro

    2014-05-01

    This paper proposes a novel robotic trainer for motor skill learning. It is user-adaptive inspired by the assist-as-needed principle well known in the field of physical therapy. Most previous studies in the field of the robotic assistance of motor skill learning have used predetermined desired trajectories, and it has not been examined intensively whether these trajectories were optimal for each user. Furthermore, the guidance hypothesis states that humans tend to rely too much on external assistive feedback, resulting in interference with the internal feedback necessary for motor skill learning. A few studies have proposed a system that adjusts its assistive strength according to the user's performance in order to prevent the user from relying too much on the robotic assistance. There are, however, problems in these studies, in that a physical model of the user's motor system is required, which is inherently difficult to construct. In this paper, we propose a framework for a robotic trainer that is user-adaptive and that neither requires a specific desired trajectory nor a physical model of the user's motor system, and we achieve this using model-free reinforcement learning. We chose dart-throwing as an example motor-learning task as it is one of the simplest throwing tasks, and its performance can easily be and quantitatively measured. Training experiments with novices, aiming at maximizing the score with the darts and minimizing the physical robotic assistance, demonstrate the feasibility and plausibility of the proposed framework. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. Algorithms for Reinforcement Learning

    CERN Document Server

    Szepesvari, Csaba

    2010-01-01

    Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms'

  3. Reinforcement learning in computer vision

    Science.gov (United States)

    Bernstein, A. V.; Burnaev, E. V.

    2018-04-01

    Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.

  4. Reinforcement Learning State-of-the-Art

    CERN Document Server

    Wiering, Marco

    2012-01-01

    Reinforcement learning encompasses both a science of adaptive behavior of rational beings in uncertain environments and a computational methodology for finding optimal behaviors for challenging problems in control, optimization and adaptive behavior of intelligent agents. As a field, reinforcement learning has progressed tremendously in the past decade. The main goal of this book is to present an up-to-date series of survey articles on the main contemporary sub-fields of reinforcement learning. This includes surveys on partially observable environments, hierarchical task decompositions, relational knowledge representation and predictive state representations. Furthermore, topics such as transfer, evolutionary methods and continuous spaces in reinforcement learning are surveyed. In addition, several chapters review reinforcement learning methods in robotics, in games, and in computational neuroscience. In total seventeen different subfields are presented by mostly young experts in those areas, and together the...

  5. Morphology Independent Learning in Modular Robots

    DEFF Research Database (Denmark)

    Christensen, David Johan; Bordignon, Mirko; Schultz, Ulrik Pagh

    2009-01-01

    Hand-coding locomotion controllers for modular robots is difficult due to their polymorphic nature. Instead, we propose to use a simple and distributed reinforcement learning strategy. ATRON modules with identical controllers can be assembled in any configuration. To optimize the robot’s locomotion...... speed its modules independently and in parallel adjust their behavior based on a single global reward signal. In simulation, we study the learning strategy’s performance on different robot configurations. On the physical platform, we perform learning experiments with ATRON robots learning to move as fast...

  6. Manifold Regularized Reinforcement Learning.

    Science.gov (United States)

    Li, Hongliang; Liu, Derong; Wang, Ding

    2018-04-01

    This paper introduces a novel manifold regularized reinforcement learning scheme for continuous Markov decision processes. Smooth feature representations for value function approximation can be automatically learned using the unsupervised manifold regularization method. The learned features are data-driven, and can be adapted to the geometry of the state space. Furthermore, the scheme provides a direct basis representation extension for novel samples during policy learning and control. The performance of the proposed scheme is evaluated on two benchmark control tasks, i.e., the inverted pendulum and the energy storage problem. Simulation results illustrate the concepts of the proposed scheme and show that it can obtain excellent performance.

  7. Fiber-Reinforced Origamic Robotic Actuator.

    Science.gov (United States)

    Yi, Juan; Chen, Xiaojiao; Song, Chaoyang; Wang, Zheng

    2018-02-01

    A novel pneumatic soft linear actuator Fiber-reinforced Origamic Robotic Actuator (FORA) is proposed with significant improvements on the popular McKibben-type actuators, offering nearly doubled motion range, substantially improved force profile, and significantly lower actuation pressure. The desirable feature set is made possible by a novel soft origamic chamber that expands radially while contracts axially when pressurized. Combining this new origamic chamber with a reinforcing fiber mesh, FORA generates very high traction force (over 150N) and very large contractile motion (over 50%) at very low input pressure (100 kPa). We developed quasi-static analytical models both to characterize the motion and forces and as guidelines for actuator design. Fabrication of FORA mostly involves consumer-grade three-dimensional (3D) printing. We provide a detailed list of materials and dimensions. Fabricated FORAs were tested on a dedicated platform against commercially available pneumatic artificial muscles from Shadow and Festo to showcase its superior performances and validate the analytical models with very good agreements. Finally, a robotic joint was developed driven by two antagonistic FORAs, to showcase the benefits of the performance improvements. With its simple structure, fully characterized mechanism, easy fabrication procedure, and highly desirable performance, FORA could be easily customized to application requirements and fabricated by anyone with access to a 3D printer. This will pave the way to the wider adaptation and application of soft robotic systems.

  8. The Reinforcement Learning Competition 2014

    OpenAIRE

    Dimitrakakis, Christos; Li, Guangliang; Tziortziotis, Nikoalos

    2014-01-01

    Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the development of very general learning agents for arbitrary reinforcement learning problems and to provide a test-bed for the unbiased evaluation of algorithms.

  9. Episodic reinforcement learning control approach for biped walking

    Directory of Open Access Journals (Sweden)

    Katić Duško

    2012-01-01

    Full Text Available This paper presents a hybrid dynamic control approach to the realization of humanoid biped robotic walk, focusing on the policy gradient episodic reinforcement learning with fuzzy evaluative feedback. The proposed structure of controller involves two feedback loops: a conventional computed torque controller and an episodic reinforcement learning controller. The reinforcement learning part includes fuzzy information about Zero-Moment- Point errors. Simulation tests using a medium-size 36-DOF humanoid robot MEXONE were performed to demonstrate the effectiveness of our method.

  10. Reinforcement learning: Solving two case studies

    Science.gov (United States)

    Duarte, Ana Filipa; Silva, Pedro; dos Santos, Cristina Peixoto

    2012-09-01

    Reinforcement Learning algorithms offer interesting features for the control of autonomous systems, such as the ability to learn from direct interaction with the environment, and the use of a simple reward signalas opposed to the input-outputs pairsused in classic supervised learning. The reward signal indicates the success of failure of the actions executed by the agent in the environment. In this work, are described RL algorithmsapplied to two case studies: the Crawler robot and the widely known inverted pendulum. We explore RL capabilities to autonomously learn a basic locomotion pattern in the Crawler, andapproach the balancing problem of biped locomotion using the inverted pendulum.

  11. Reinforcement Learning Approach to Generate Goal-directed Locomotion of a Snake-Like Robot with Screw-Drive Units

    DEFF Research Database (Denmark)

    Chatterjee, Sromona; Nachstedt, Timo; Tamosiunaite, Minija

    2014-01-01

    Abstract—In this paper we apply a policy improvement algorithm called Policy Improvement using Path Integrals (PI2) to generate goal-directed locomotion of a complex snake-like robot with screw-drive units. PI2 is numerically simple and has an ability to deal with high dimensional systems. Here...

  12. Robots Learn Writing

    Directory of Open Access Journals (Sweden)

    Huan Tan

    2012-01-01

    Full Text Available This paper proposes a general method for robots to learn motions and corresponding semantic knowledge simultaneously. A modified ISOMAP algorithm is used to convert the sampled 6D vectors of joint angles into 2D trajectories, and the required movements for writing numbers are learned from this modified ISOMAP-based model. Using this algorithm, the knowledge models are established. Learned motion and knowledge models are stored in a 2D latent space. Gaussian Process (GP method is used to model and represent these models. Practical experiments are carried out on a humanoid robot, named ISAC, to learn the semantic representations of numbers and the movements of writing numbers through imitation and to verify the effectiveness of this framework. This framework is applied into training a humanoid robot, named ISAC. At the learning stage, ISAC not only learns the dynamics of the movement required to write the numbers, but also learns the semantic meaning of the numbers which are related to the writing movements from the same data set. Given speech commands, ISAC recognizes the words and generated corresponding motion trajectories to write the numbers. This imitation learning method is implemented on a cognitive architecture to provide robust cognitive information processing.

  13. Learning motor skills from algorithms to robot experiments

    CERN Document Server

    Kober, Jens

    2014-01-01

    This book presents the state of the art in reinforcement learning applied to robotics both in terms of novel algorithms and applications. It discusses recent approaches that allow robots to learn motor skills and presents tasks that need to take into account the dynamic behavior of the robot and its environment, where a kinematic movement plan is not sufficient. The book illustrates a method that learns to generalize parameterized motor plans which is obtained by imitation or reinforcement learning, by adapting a small set of global parameters, and appropriate kernel-based reinforcement learning algorithms. The presented applications explore highly dynamic tasks and exhibit a very efficient learning process. All proposed approaches have been extensively validated with benchmarks tasks, in simulation, and on real robots. These tasks correspond to sports and games but the presented techniques are also applicable to more mundane household tasks. The book is based on the first author’s doctoral thesis, which wo...

  14. Deep Reinforcement Learning: An Overview

    OpenAIRE

    Li, Yuxi

    2017-01-01

    We give an overview of recent exciting achievements of deep reinforcement learning (RL). We discuss six core elements, six important mechanisms, and twelve applications. We start with background of machine learning, deep learning and reinforcement learning. Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. After that, we discuss important mechanisms for RL, including attention and memory, unsuperv...

  15. Construction of multi-agent mobile robots control system in the problem of persecution with using a modified reinforcement learning method based on neural networks

    Science.gov (United States)

    Patkin, M. L.; Rogachev, G. N.

    2018-02-01

    A method for constructing a multi-agent control system for mobile robots based on training with reinforcement using deep neural networks is considered. Synthesis of the management system is proposed to be carried out with reinforcement training and the modified Actor-Critic method, in which the Actor module is divided into Action Actor and Communication Actor in order to simultaneously manage mobile robots and communicate with partners. Communication is carried out by sending partners at each step a vector of real numbers that are added to the observation vector and affect the behaviour. Functions of Actors and Critic are approximated by deep neural networks. The Critics value function is trained by using the TD-error method and the Actor’s function by using DDPG. The Communication Actor’s neural network is trained through gradients received from partner agents. An environment in which a cooperative multi-agent interaction is present was developed, computer simulation of the application of this method in the control problem of two robots pursuing two goals was carried out.

  16. Evolutionary computation for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Wiering, M.; van Otterlo, M.

    2012-01-01

    Algorithms for evolutionary computation, which simulate the process of natural selection to solve optimization problems, are an effective tool for discovering high-performing reinforcement-learning policies. Because they can automatically find good representations, handle continuous action spaces,

  17. Machine Learning for Robotic Vision

    OpenAIRE

    Drummond, Tom

    2018-01-01

    Machine learning is a crucial enabling technology for robotics, in particular for unlocking the capabilities afforded by visual sensing. This talk will present research within Prof Drummond’s lab that explores how machine learning can be developed and used within the context of Robotic Vision.

  18. Reinforcement learning in supply chains.

    Science.gov (United States)

    Valluri, Annapurna; North, Michael J; Macal, Charles M

    2009-10-01

    Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.

  19. Adaptive representations for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.

    2010-01-01

    This book presents new algorithms for reinforcement learning, a form of machine learning in which an autonomous agent seeks a control policy for a sequential decision task. Since current methods typically rely on manually designed solution representations, agents that automatically adapt their own

  20. Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

    Science.gov (United States)

    Uchibe, Eiji; Doya, Kenji

    2008-12-01

    Understanding the design principle of reward functions is a substantial challenge both in artificial intelligence and neuroscience. Successful acquisition of a task usually requires not only rewards for goals, but also for intermediate states to promote effective exploration. This paper proposes a method for designing 'intrinsic' rewards of autonomous agents by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use Cyber Rodent robots, in which collision avoidance, recharging from battery packs, and 'mating' by software reproduction are three major 'extrinsic' rewards. We show in hardware experiments that the robots can find appropriate 'intrinsic' rewards for the vision of battery packs and other robots to promote approach behaviors.

  1. Learning ROS for robotics programming

    CERN Document Server

    Martinez, Aaron

    2013-01-01

    The book will take an easy-to-follow and engaging tutorial approach, providing a practical and comprehensive way to learn ROS.If you are a robotic enthusiast who wants to learn how to build and program your own robots in an easy-to-develop, maintainable and shareable way, ""Learning ROS for Robotics Programming"" is for you. In order to make the most of the book, you should have some C++ programming background, knowledge of GNU/Linux systems, and computer science in general. No previous background on ROS is required, since this book provides all the skills required. It is also advisable to hav

  2. Learning for intelligent mobile robots

    Science.gov (United States)

    Hall, Ernest L.; Liao, Xiaoqun; Alhaj Ali, Souma M.

    2003-10-01

    Unlike intelligent industrial robots which often work in a structured factory setting, intelligent mobile robots must often operate in an unstructured environment cluttered with obstacles and with many possible action paths. However, such machines have many potential applications in medicine, defense, industry and even the home that make their study important. Sensors such as vision are needed. However, in many applications some form of learning is also required. The purpose of this paper is to present a discussion of recent technical advances in learning for intelligent mobile robots. During the past 20 years, the use of intelligent industrial robots that are equipped not only with motion control systems but also with sensors such as cameras, laser scanners, or tactile sensors that permit adaptation to a changing environment has increased dramatically. However, relatively little has been done concerning learning. Adaptive and robust control permits one to achieve point to point and controlled path operation in a changing environment. This problem can be solved with a learning control. In the unstructured environment, the terrain and consequently the load on the robot"s motors are constantly changing. Learning the parameters of a proportional, integral and derivative controller (PID) and artificial neural network provides an adaptive and robust control. Learning may also be used for path following. Simulations that include learning may be conducted to see if a robot can learn its way through a cluttered array of obstacles. If a situation is performed repetitively, then learning can also be used in the actual application. To reach an even higher degree of autonomous operation, a new level of learning is required. Recently learning theories such as the adaptive critic have been proposed. In this type of learning a critic provides a grade to the controller of an action module such as a robot. The creative control process is used that is "beyond the adaptive critic." A

  3. Learning robotics using Python

    CERN Document Server

    Joseph, Lentin

    2015-01-01

    If you are an engineer, a researcher, or a hobbyist, and you are interested in robotics and want to build your own robot, this book is for you. Readers are assumed to be new to robotics but should have experience with Python.

  4. Rational and Mechanistic Perspectives on Reinforcement Learning

    Science.gov (United States)

    Chater, Nick

    2009-01-01

    This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…

  5. A Differentiable Physics Engine for Deep Learning in Robotics

    OpenAIRE

    Degrave, Jonas; Hermans, Michiel; Dambre, Joni; wyffels, Francis

    2016-01-01

    One of the most important fields in robotics is the optimization of controllers. Currently, robots are treated as a black box in this optimization process, which is the reason why derivative-free optimization methods such as evolutionary algorithms or reinforcement learning are omnipresent. We propose an implementation of a modern physics engine, which has the ability to differentiate control parameters. This has been implemented on both CPU and GPU. We show how this speeds up the optimizatio...

  6. Robot learning from human teachers

    CERN Document Server

    Chernova, Sonia

    2014-01-01

    Learning from Demonstration (LfD) explores techniques for learning a task policy from examples provided by a human teacher. The field of LfD has grown into an extensive body of literature over the past 30 years, with a wide variety of approaches for encoding human demonstrations and modeling skills and tasks. Additionally, we have recently seen a focus on gathering data from non-expert human teachers (i.e., domain experts but not robotics experts). In this book, we provide an introduction to the field with a focus on the unique technical challenges associated with designing robots that learn f

  7. Robot learning and error correction

    Science.gov (United States)

    Friedman, L.

    1977-01-01

    A model of robot learning is described that associates previously unknown perceptions with the sensed known consequences of robot actions. For these actions, both the categories of outcomes and the corresponding sensory patterns are incorporated in a knowledge base by the system designer. Thus the robot is able to predict the outcome of an action and compare the expectation with the experience. New knowledge about what to expect in the world may then be incorporated by the robot in a pre-existing structure whether it detects accordance or discrepancy between a predicted consequence and experience. Errors committed during plan execution are detected by the same type of comparison process and learning may be applied to avoiding the errors.

  8. Belief reward shaping in reinforcement learning

    CSIR Research Space (South Africa)

    Marom, O

    2018-02-01

    Full Text Available A key challenge in many reinforcement learning problems is delayed rewards, which can significantly slow down learning. Although reward shaping has previously been introduced to accelerate learning by bootstrapping an agent with additional...

  9. Memristive device based learning for navigation in robots.

    Science.gov (United States)

    Sarim, Mohammad; Kumar, Manish; Jha, Rashmi; Minai, Ali A

    2017-11-08

    Biomimetic robots have gained attention recently for various applications ranging from resource hunting to search and rescue operations during disasters. Biological species are known to intuitively learn from the environment, gather and process data, and make appropriate decisions. Such sophisticated computing capabilities in robots are difficult to achieve, especially if done in real-time with ultra-low energy consumption. Here, we present a novel memristive device based learning architecture for robots. Two terminal memristive devices with resistive switching of oxide layer are modeled in a crossbar array to develop a neuromorphic platform that can impart active real-time learning capabilities in a robot. This approach is validated by navigating a robot vehicle in an unknown environment with randomly placed obstacles. Further, the proposed scheme is compared with reinforcement learning based algorithms using local and global knowledge of the environment. The simulation as well as experimental results corroborate the validity and potential of the proposed learning scheme for robots. The results also show that our learning scheme approaches an optimal solution for some environment layouts in robot navigation.

  10. Autonomous Motion Learning for Intra-Vehicular Activity Space Robot

    Science.gov (United States)

    Watanabe, Yutaka; Yairi, Takehisa; Machida, Kazuo

    Space robots will be needed in the future space missions. So far, many types of space robots have been developed, but in particular, Intra-Vehicular Activity (IVA) space robots that support human activities should be developed to reduce human-risks in space. In this paper, we study the motion learning method of an IVA space robot with the multi-link mechanism. The advantage point is that this space robot moves using reaction force of the multi-link mechanism and contact forces from the wall as space walking of an astronaut, not to use a propulsion. The control approach is determined based on a reinforcement learning with the actor-critic algorithm. We demonstrate to clear effectiveness of this approach using a 5-link space robot model by simulation. First, we simulate that a space robot learn the motion control including contact phase in two dimensional case. Next, we simulate that a space robot learn the motion control changing base attitude in three dimensional case.

  11. Reinforcement learning or active inference?

    Science.gov (United States)

    Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J

    2009-07-29

    This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  12. Reinforcement learning or active inference?

    Directory of Open Access Journals (Sweden)

    Karl J Friston

    2009-07-01

    Full Text Available This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  13. Functional Contour-following via Haptic Perception and Reinforcement Learning.

    Science.gov (United States)

    Hellman, Randall B; Tekin, Cem; van der Schaar, Mihaela; Santos, Veronica J

    2018-01-01

    Many tasks involve the fine manipulation of objects despite limited visual feedback. In such scenarios, tactile and proprioceptive feedback can be leveraged for task completion. We present an approach for real-time haptic perception and decision-making for a haptics-driven, functional contour-following task: the closure of a ziplock bag. This task is challenging for robots because the bag is deformable, transparent, and visually occluded by artificial fingertip sensors that are also compliant. A deep neural net classifier was trained to estimate the state of a zipper within a robot's pinch grasp. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards by balancing exploration versus exploitation of the state-action space. The C-MAB learner outperformed a benchmark Q-learner by more efficiently exploring the state-action space while learning a hard-to-code task. The learned C-MAB policy was tested with novel ziplock bag scenarios and contours (wire, rope). Importantly, this work contributes to the development of reinforcement learning approaches that account for limited resources such as hardware life and researcher time. As robots are used to perform complex, physically interactive tasks in unstructured or unmodeled environments, it becomes important to develop methods that enable efficient and effective learning with physical testbeds.

  14. Switching Reinforcement Learning for Continuous Action Space

    Science.gov (United States)

    Nagayoshi, Masato; Murao, Hajime; Tamaki, Hisashi

    Reinforcement Learning (RL) attracts much attention as a technique of realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. This difficulty includes a problem of designing a suitable action space of an agent, i.e., satisfying two requirements in trade-off: (i) to keep the characteristics (or structure) of an original search space as much as possible in order to seek strategies that lie close to the optimal, and (ii) to reduce the search space as much as possible in order to expedite the learning process. In order to design a suitable action space adaptively, we propose switching RL model to mimic a process of an infant's motor development in which gross motor skills develop before fine motor skills. Then, a method for switching controllers is constructed by introducing and referring to the “entropy”. Further, through computational experiments by using robot navigation problems with one and two-dimensional continuous action space, the validity of the proposed method has been confirmed.

  15. Robotic inspection of fiber reinforced composites using phased array UT

    Science.gov (United States)

    Stetson, Jeffrey T.; De Odorico, Walter

    2014-02-01

    Ultrasound is the current NDE method of choice to inspect large fiber reinforced airframe structures. Over the last 15 years Cartesian based scanning machines using conventional ultrasound techniques have been employed by all airframe OEMs and their top tier suppliers to perform these inspections. Technical advances in both computing power and commercially available, multi-axis robots now facilitate a new generation of scanning machines. These machines use multiple end effector tools taking full advantage of phased array ultrasound technologies yielding substantial improvements in inspection quality and productivity. This paper outlines the general architecture for these new robotic scanning systems as well as details the variety of ultrasonic techniques available for use with them including advances such as wide area phased array scanning and sound field adaptation for non-flat, non-parallel surfaces.

  16. Decision Making in Reinforcement Learning Using a Modified Learning Space Based on the Importance of Sensors

    Directory of Open Access Journals (Sweden)

    Yasutaka Kishima

    2013-01-01

    Full Text Available Many studies have been conducted on the application of reinforcement learning (RL to robots. A robot which is made for general purpose has redundant sensors or actuators because it is difficult to assume an environment that the robot will face and a task that the robot must execute. In this case, -space on RL contains redundancy so that the robot must take much time to learn a given task. In this study, we focus on the importance of sensors with regard to a robot’s performance of a particular task. The sensors that are applicable to a task differ according to the task. By using the importance of the sensors, we try to adjust the state number of the sensors and to reduce the size of -space. In this paper, we define the measure of importance of a sensor for a task with the correlation between the value of each sensor and reward. A robot calculates the importance of the sensors and makes the size of -space smaller. We propose the method which reduces learning space and construct the learning system by putting it in RL. In this paper, we confirm the effectiveness of our proposed system with an experimental robot.

  17. Grounding the meanings in sensorimotor behavior using reinforcement learning

    Directory of Open Access Journals (Sweden)

    Igor eFarkaš

    2012-02-01

    Full Text Available The recent outburst of interest in cognitive developmental robotics is fueled by the ambition to propose ecologically plausible mechanisms of how, among other things, a learning agent/robot could ground linguistic meanings in its sensorimotor behaviour. Along this stream, we propose a model that allows the simulated iCub robot to learn the meanings of actions (point, touch and push oriented towards objects in robot's peripersonal space. In our experiments, the iCub learns to execute motor actions and comment on them. Architecturally, the model is composed of three neural-network-based modules that are trained in different ways. The first module, a two-layer perceptron, is trained by back-propagation to attend to the target position in the visual scene, given the low-level visual information and the feature-based target information. The second module, having the form of an actor-critic architecture, is the most distinguishing part of our model, and is trained by a continuous version of reinforcement learning to execute actions as sequences, based on a linguistic command. The third module, an echo-state network, is trained to provide the linguistic description of the executed actions. The trained model generalises well in case of novel action-target combinations with randomised initial arm positions. It can also promptly adapt its behavior if the action/target suddenly changes during motor execution.

  18. Morphology Independent Learning in Modular Robots

    DEFF Research Database (Denmark)

    Christensen, David Johan; Bordignon, Mirko; Schultz, Ulrik Pagh

    2009-01-01

    speed its modules independently and in parallel adjust their behavior based on a single global reward signal. In simulation, we study the learning strategy’s performance on different robot configurations. On the physical platform, we perform learning experiments with ATRON robots learning to move as fast...

  19. Morphology Independent Learning in Modular Robots

    DEFF Research Database (Denmark)

    Christensen, David Johan; Bordignon, Mirko; Schultz, Ulrik Pagh

    2009-01-01

    speed its modules independently and in parallel adjust their behavior based on a single global reward signal. In simulation, we study the learning strategy?s performance on different robot con?gurations. On the physical platform, we perform learning experiments with ATRON robots learning to move as fast...

  20. Reinforcement Learning in Repeated Portfolio Decisions

    OpenAIRE

    Diao, Linan; Rieskamp, Jörg

    2011-01-01

    How do people make investment decisions when they receive outcome feedback? We examined how well the standard mean-variance model and two reinforcement models predict people's portfolio decisions. The basic reinforcement model predicts a learning process that relies solely on the portfolio's overall return, whereas the proposed extended reinforcement model also takes the risk and covariance of the investments into account. The experimental results illustrate that people reacted sensitively to...

  1. Reinforcement learning improves behaviour from evaluative feedback

    Science.gov (United States)

    Littman, Michael L.

    2015-05-01

    Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.

  2. Learning to trade via direct reinforcement.

    Science.gov (United States)

    Moody, J; Saffell, M

    2001-01-01

    We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision-making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills.

  3. Using a board game to reinforce learning.

    Science.gov (United States)

    Yoon, Bona; Rodriguez, Leslie; Faselis, Charles J; Liappis, Angelike P

    2014-03-01

    Experiential gaming strategies offer a variation on traditional learning. A board game was used to present synthesized content of fundamental catheter care concepts and reinforce evidence-based practices relevant to nursing. Board games are innovative educational tools that can enhance active learning. Copyright 2014, SLACK Incorporated.

  4. Embedded Incremental Feature Selection for Reinforcement Learning

    Science.gov (United States)

    2012-05-01

    Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature

  5. Online reinforcement learning control for aerospace systems

    NARCIS (Netherlands)

    Zhou, Y.

    2018-01-01

    Reinforcement Learning (RL) methods are relatively new in the field of aerospace guidance, navigation, and control. This dissertation aims to exploit RL methods to improve the autonomy and online learning of aerospace systems with respect to the a priori unknown system and environment, dynamical

  6. Reinforcement Learning in Continuous Action Spaces

    NARCIS (Netherlands)

    Hasselt, H. van; Wiering, M.A.

    2007-01-01

    Quite some research has been done on Reinforcement Learning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new class of algorithms named Continuous Actor Critic Learning Automaton (CACLA)

  7. Value learning through reinforcement : The basics of dopamine and reinforcement learning

    NARCIS (Netherlands)

    Daw, N.D.; Tobler, P.N.; Glimcher, P.W.; Fehr, E.

    2013-01-01

    This chapter provides an overview of reinforcement learning and temporal difference learning and relates these topics to the firing properties of midbrain dopamine neurons. First, we review the RescorlaWagner learning rule and basic learning phenomena, such as blocking, which the rule explains. Then

  8. Robot Learning a New Subfield? The Robolearn-96 Workshop

    OpenAIRE

    Hexmoor, Henry; Meeden, Lisa; Murphy, Robin R.

    1997-01-01

    This article posits the idea of robot learning as a new subfield. The results of the Robolearn-96 Workshop provide evidence that learning in modern robotics is distinct from traditional machine learning. The article examines the role of robotics in the social and natural sciences and the potential impact of learning on robotics, generating both a continuum of research issues and a description of the divergent terminology, target domains, and standards of proof associated with robot learning. ...

  9. Serendipitous Offline Learning in a Neuromorphic Robot

    Directory of Open Access Journals (Sweden)

    Terrence C Stewart

    2016-02-01

    Full Text Available We demonstrate a hybrid neuromorphic learning paradigm that learns complex sensorimotor mappings based on a small set of hard-coded reflex behaviours. A mobile robot is first controlled by a basic set of reflexive hand-designed behaviours. All sensor data is provided via a spike-based silicon retina camera (eDVS, and all control is implemented via spiking neurons simulated on neuromorphic hardware (SpiNNaker. Given this control system, the robot is capable of simple obstacle avoidance and random exploration. To train the robot to perform more complex tasks, we observe the robot and find instances where he robot accidentally performs the desired action. Data recorded from the robot during these times is then used to update the neural control system, increasing the likelihood of the robot performing that task in the future, given a similar sensor state. As an example application of this general-purpose method of training, we demonstrate the robot learning to respond to novel sensory stimuli (a mirror by turning right if it is present at an intersection, and otherwise turning left. In general, this system can learn arbitrary relations between sensory input and motor behaviour.

  10. Exploration and Learning for Cognitive Robots

    NARCIS (Netherlands)

    Rudinac, M.

    2013-01-01

    Before a future with household robots is really feasible, those robots need to be easily adaptable to novel environments and users, be able to apply previously acquired knowledge, and able to learn from perceiving and interacting with the world and users around them. This thesis proposes a cognitive

  11. Towards autonomous neuroprosthetic control using Hebbian reinforcement learning.

    Science.gov (United States)

    Mahmoudi, Babak; Pohlmeyer, Eric A; Prins, Noeline W; Geng, Shijia; Sanchez, Justin C

    2013-12-01

    Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.

  12. Reinforcement Learning in Autism Spectrum Disorder

    Directory of Open Access Journals (Sweden)

    Manuela Schuetze

    2017-11-01

    Full Text Available Early behavioral interventions are recognized as integral to standard care in autism spectrum disorder (ASD, and often focus on reinforcing desired behaviors (e.g., eye contact and reducing the presence of atypical behaviors (e.g., echoing others' phrases. However, efficacy of these programs is mixed. Reinforcement learning relies on neurocircuitry that has been reported to be atypical in ASD: prefrontal-sub-cortical circuits, amygdala, brainstem, and cerebellum. Thus, early behavioral interventions rely on neurocircuitry that may function atypically in at least a subset of individuals with ASD. Recent work has investigated physiological, behavioral, and neural responses to reinforcers to uncover differences in motivation and learning in ASD. We will synthesize this work to identify promising avenues for future research that ultimately can be used to enhance the efficacy of early intervention.

  13. Autonomous reinforcement learning with experience replay.

    Science.gov (United States)

    Wawrzyński, Paweł; Tanwani, Ajay Kumar

    2013-05-01

    This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. Copyright © 2012 Elsevier Ltd. All rights reserved.

  14. Efficient abstraction selection in reinforcement learning

    NARCIS (Netherlands)

    Seijen, H. van; Whiteson, S.; Kester, L.

    2013-01-01

    This paper introduces a novel approach for abstraction selection in reinforcement learning problems modelled as factored Markov decision processes (MDPs), for which a state is described via a set of state components. In abstraction selection, an agent must choose an abstraction from a set of

  15. Reinforcement Learning for a New Piano Mover

    Directory of Open Access Journals (Sweden)

    Yuko Ishiwaka

    2005-08-01

    Full Text Available We attempt to achieve corporative behavior of autonomous decentralized agents constructed via Q-Learning, which is a type of reinforcement learning. As such, in the present paper, we examine the piano mover's problem. We propose a multi-agent architecture that has a training agent, learning agents and intermediate agent. Learning agents are heterogeneous and can communicate with each other. The movement of an object with three kinds of agent depends on the composition of the actions of the learning agents. By learning its own shape through the learning agents, avoidance of obstacles by the object is expected. We simulate the proposed method in a two-dimensional continuous world. Results obtained in the present investigation reveal the effectiveness of the proposed method.

  16. Reinforcement Learning Based Artificial Immune Classifier

    Directory of Open Access Journals (Sweden)

    Mehmet Karakose

    2013-01-01

    Full Text Available One of the widely used methods for classification that is a decision-making process is artificial immune systems. Artificial immune systems based on natural immunity system can be successfully applied for classification, optimization, recognition, and learning in real-world problems. In this study, a reinforcement learning based artificial immune classifier is proposed as a new approach. This approach uses reinforcement learning to find better antibody with immune operators. The proposed new approach has many contributions according to other methods in the literature such as effectiveness, less memory cell, high accuracy, speed, and data adaptability. The performance of the proposed approach is demonstrated by simulation and experimental results using real data in Matlab and FPGA. Some benchmark data and remote image data are used for experimental results. The comparative results with supervised/unsupervised based artificial immune system, negative selection classifier, and resource limited artificial immune classifier are given to demonstrate the effectiveness of the proposed new method.

  17. Social Influence as Reinforcement Learning

    Science.gov (United States)

    2016-01-13

    a brain region associated with motivation and reward learning. Further, individuals’ level of striatal activity in response to consensus tracks...experiment. Economics Letters, 2001. 71(3): p. 397-404. 14. Ledyard, J., Public goods: A survey of experimental research. Pub Econ , 1994.

  18. Beyond adaptive-critic creative learning for intelligent mobile robots

    Science.gov (United States)

    Liao, Xiaoqun; Cao, Ming; Hall, Ernest L.

    2001-10-01

    Intelligent industrial and mobile robots may be considered proven technology in structured environments. Teach programming and supervised learning methods permit solutions to a variety of applications. However, we believe that to extend the operation of these machines to more unstructured environments requires a new learning method. Both unsupervised learning and reinforcement learning are potential candidates for these new tasks. The adaptive critic method has been shown to provide useful approximations or even optimal control policies to non-linear systems. The purpose of this paper is to explore the use of new learning methods that goes beyond the adaptive critic method for unstructured environments. The adaptive critic is a form of reinforcement learning. A critic element provides only high level grading corrections to a cognition module that controls the action module. In the proposed system the critic's grades are modeled and forecasted, so that an anticipated set of sub-grades are available to the cognition model. The forecasting grades are interpolated and are available on the time scale needed by the action model. The success of the system is highly dependent on the accuracy of the forecasted grades and adaptability of the action module. Examples from the guidance of a mobile robot are provided to illustrate the method for simple line following and for the more complex navigation and control in an unstructured environment. The theory presented that is beyond the adaptive critic may be called creative theory. Creative theory is a form of learning that models the highest level of human learning - imagination. The application of the creative theory appears to not only be to mobile robots but also to many other forms of human endeavor such as educational learning and business forecasting. Reinforcement learning such as the adaptive critic may be applied to known problems to aid in the discovery of their solutions. The significance of creative theory is that it

  19. An architecture for an autonomous learning robot

    Science.gov (United States)

    Tillotson, Brian

    1988-01-01

    An autonomous learning device must solve the example bounding problem, i.e., it must divide the continuous universe into discrete examples from which to learn. We describe an architecture which incorporates an example bounder for learning. The architecture is implemented in the GPAL program. An example run with a real mobile robot shows that the program learns and uses new causal, qualitative, and quantitative relationships.

  20. Stochastic abstract policies: generalizing knowledge to improve reinforcement learning.

    Science.gov (United States)

    Koga, Marcelo L; Freire, Valdinei; Costa, Anna H R

    2015-01-01

    Reinforcement learning (RL) enables an agent to learn behavior by acquiring experience through trial-and-error interactions with a dynamic environment. However, knowledge is usually built from scratch and learning to behave may take a long time. Here, we improve the learning performance by leveraging prior knowledge; that is, the learner shows proper behavior from the beginning of a target task, using the knowledge from a set of known, previously solved, source tasks. In this paper, we argue that building stochastic abstract policies that generalize over past experiences is an effective way to provide such improvement and this generalization outperforms the current practice of using a library of policies. We achieve that contributing with a new algorithm, AbsProb-PI-multiple and a framework for transferring knowledge represented as a stochastic abstract policy in new RL tasks. Stochastic abstract policies offer an effective way to encode knowledge because the abstraction they provide not only generalizes solutions but also facilitates extracting the similarities among tasks. We perform experiments in a robotic navigation environment and analyze the agent's behavior throughout the learning process and also assess the transfer ratio for different amounts of source tasks. We compare our method with the transfer of a library of policies, and experiments show that the use of a generalized policy produces better results by more effectively guiding the agent when learning a target task.

  1. Reinforcement Learning and Savings Behavior.

    Science.gov (United States)

    Choi, James J; Laibson, David; Madrian, Brigitte C; Metrick, Andrew

    2009-12-01

    We show that individual investors over-extrapolate from their personal experience when making savings decisions. Investors who experience particularly rewarding outcomes from saving in their 401(k)-a high average and/or low variance return-increase their 401(k) savings rate more than investors who have less rewarding experiences with saving. This finding is not driven by aggregate time-series shocks, income effects, rational learning about investing skill, investor fixed effects, or time-varying investor-level heterogeneity that is correlated with portfolio allocations to stock, bond, and cash asset classes. We discuss implications for the equity premium puzzle and interventions aimed at improving household financial outcomes.

  2. Reinforcement learning for microgrid energy management

    International Nuclear Information System (INIS)

    Kuznetsova, Elizaveta; Li, Yan-Fu; Ruiz, Carlos; Zio, Enrico; Ault, Graham; Bell, Keith

    2013-01-01

    We consider a microgrid for energy distribution, with a local consumer, a renewable generator (wind turbine) and a storage facility (battery), connected to the external grid via a transformer. We propose a 2 steps-ahead reinforcement learning algorithm to plan the battery scheduling, which plays a key role in the achievement of the consumer goals. The underlying framework is one of multi-criteria decision-making by an individual consumer who has the goals of increasing the utilization rate of the battery during high electricity demand (so as to decrease the electricity purchase from the external grid) and increasing the utilization rate of the wind turbine for local use (so as to increase the consumer independence from the external grid). Predictions of available wind power feed the reinforcement learning algorithm for selecting the optimal battery scheduling actions. The embedded learning mechanism allows to enhance the consumer knowledge about the optimal actions for battery scheduling under different time-dependent environmental conditions. The developed framework gives the capability to intelligent consumers to learn the stochastic environment and make use of the experience to select optimal energy management actions. - Highlights: • A consumer exploits a 2 steps-ahead reinforcement learning for battery scheduling. • The Q-learning based mechanism is fed by the predictions of available wind power. • Wind speed state evolutions are modeled with a Markov chain model. • Optimal scheduling actions are learned through the occurrence of similar scenarios. • The consumer manifests a continuous enhance of his knowledge about optimal actions

  3. Reinforcement Learning and Savings Behavior*

    Science.gov (United States)

    Choi, James J.; Laibson, David; Madrian, Brigitte C.; Metrick, Andrew

    2009-01-01

    We show that individual investors over-extrapolate from their personal experience when making savings decisions. Investors who experience particularly rewarding outcomes from saving in their 401(k)—a high average and/or low variance return—increase their 401(k) savings rate more than investors who have less rewarding experiences with saving. This finding is not driven by aggregate time-series shocks, income effects, rational learning about investing skill, investor fixed effects, or time-varying investor-level heterogeneity that is correlated with portfolio allocations to stock, bond, and cash asset classes. We discuss implications for the equity premium puzzle and interventions aimed at improving household financial outcomes. PMID:20352013

  4. Online constrained model-based reinforcement learning

    CSIR Research Space (South Africa)

    Van Niekerk, B

    2017-08-01

    Full Text Available Constrained Model-based Reinforcement Learning Benjamin van Niekerk School of Computer Science University of the Witwatersrand South Africa Andreas Damianou∗ Amazon.com Cambridge, UK Benjamin Rosman Council for Scientific and Industrial Research, and School... MULTIPLE SHOOTING Using direct multiple shooting (Bock and Plitt, 1984), problem (1) can be transformed into a structured non- linear program (NLP). First, the time horizon [t0, t0 + T ] is partitioned into N equal subintervals [tk, tk+1] for k = 0...

  5. Enriching behavioral ecology with reinforcement learning methods.

    Science.gov (United States)

    Frankenhuis, Willem E; Panchanathan, Karthik; Barto, Andrew G

    2018-02-13

    This article focuses on the division of labor between evolution and development in solving sequential, state-dependent decision problems. Currently, behavioral ecologists tend to use dynamic programming methods to study such problems. These methods are successful at predicting animal behavior in a variety of contexts. However, they depend on a distinct set of assumptions. Here, we argue that behavioral ecology will benefit from drawing more than it currently does on a complementary collection of tools, called reinforcement learning methods. These methods allow for the study of behavior in highly complex environments, which conventional dynamic programming methods do not feasibly address. In addition, reinforcement learning methods are well-suited to studying how biological mechanisms solve developmental and learning problems. For instance, we can use them to study simple rules that perform well in complex environments. Or to investigate under what conditions natural selection favors fixed, non-plastic traits (which do not vary across individuals), cue-driven-switch plasticity (innate instructions for adaptive behavioral development based on experience), or developmental selection (the incremental acquisition of adaptive behavior based on experience). If natural selection favors developmental selection, which includes learning from environmental feedback, we can also make predictions about the design of reward systems. Our paper is written in an accessible manner and for a broad audience, though we believe some novel insights can be drawn from our discussion. We hope our paper will help advance the emerging bridge connecting the fields of behavioral ecology and reinforcement learning. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  6. Experiential Learning of Robotics Fundamentals Based on a Case Study of Robot-Assisted Stereotactic Neurosurgery

    Science.gov (United States)

    Faria, Carlos; Vale, Carolina; Machado, Toni; Erlhagen, Wolfram; Rito, Manuel; Monteiro, Sérgio; Bicho, Estela

    2016-01-01

    Robotics has been playing an important role in modern surgery, especially in procedures that require extreme precision, such as neurosurgery. This paper addresses the challenge of teaching robotics to undergraduate engineering students, through an experiential learning project of robotics fundamentals based on a case study of robot-assisted…

  7. The Robotic Decathlon: Project-Based Learning Labs and Curriculum Design for an Introductory Robotics Course

    Science.gov (United States)

    Cappelleri, D. J.; Vitoroulis, N.

    2013-01-01

    This paper presents a series of novel project-based learning labs for an introductory robotics course that are developed into a semester-long Robotic Decathlon. The last three events of the Robotic Decathlon are used as three final one-week-long project tasks; these replace a previous course project that was a semester-long robotics competition.…

  8. Manifold learning in machine vision and robotics

    Science.gov (United States)

    Bernstein, Alexander

    2017-02-01

    Smart algorithms are used in Machine vision and Robotics to organize or extract high-level information from the available data. Nowadays, Machine learning is an essential and ubiquitous tool to automate extraction patterns or regularities from data (images in Machine vision; camera, laser, and sonar sensors data in Robotics) in order to solve various subject-oriented tasks such as understanding and classification of images content, navigation of mobile autonomous robot in uncertain environments, robot manipulation in medical robotics and computer-assisted surgery, and other. Usually such data have high dimensionality, however, due to various dependencies between their components and constraints caused by physical reasons, all "feasible and usable data" occupy only a very small part in high dimensional "observation space" with smaller intrinsic dimensionality. Generally accepted model of such data is manifold model in accordance with which the data lie on or near an unknown manifold (surface) of lower dimensionality embedded in an ambient high dimensional observation space; real-world high-dimensional data obtained from "natural" sources meet, as a rule, this model. The use of Manifold learning technique in Machine vision and Robotics, which discovers a low-dimensional structure of high dimensional data and results in effective algorithms for solving of a large number of various subject-oriented tasks, is the content of the conference plenary speech some topics of which are in the paper.

  9. Educational resources and tools for robotic learning

    Directory of Open Access Journals (Sweden)

    Pablo Gil Vazquez

    2012-07-01

    Full Text Available Normal.dotm 0 0 1 139 795 Universidad de Salamanca 6 1 976 12.0 0 false 18 pt 18 pt 0 0 false false false /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} This paper discusses different teaching experiences which aims are the learning robotics in the university. These experiences are reflected in the development of several robotics courses and subjects at the University of Alicante.  The authors have created various educational platforms or they have used tools of free distribution and open source for the implementation of these courses. The main objetive of these courses is to teach the design and implementation of robotic solutions to solve various problems not only such as the control, programming and handling of robot but also the assembly, building and programming of educational mini-robots. On the one hand, new teaching tools are used such as simulators and virtual labs which make flexible the learning of robot arms. On the other hand, competitions are used to motivate students because this way, the students put into action the skills learned through building and programming low-cost mini-robots.

  10. SCAFFOLDINGAND REINFORCEMENT: USING DIGITAL LOGBOOKS IN LEARNING VOCABULARY

    OpenAIRE

    Khalifa, Salma Hasan Almabrouk; Shabdin, Ahmad Affendi

    2016-01-01

    Reinforcement and scaffolding are tested approaches to enhance learning achievements. Keeping a record of the learning process as well as the new learned words functions as scaffolding to help learners build a comprehensive vocabulary. Similarly, repetitive learning of new words reinforces permanent learning for long-term memory. Paper-based logbooks may prove to be good records of the learning process, but if learners use digital logbooks, the results may be even better. Digital logbooks wit...

  11. Robots show us how to teach them: feedback from robots shapes tutoring behavior during action learning.

    Science.gov (United States)

    Vollmer, Anna-Lisa; Mühlig, Manuel; Steil, Jochen J; Pitsch, Karola; Fritsch, Jannik; Rohlfing, Katharina J; Wrede, Britta

    2014-01-01

    Robot learning by imitation requires the detection of a tutor's action demonstration and its relevant parts. Current approaches implicitly assume a unidirectional transfer of knowledge from tutor to learner. The presented work challenges this predominant assumption based on an extensive user study with an autonomously interacting robot. We show that by providing feedback, a robot learner influences the human tutor's movement demonstrations in the process of action learning. We argue that the robot's feedback strongly shapes how tutors signal what is relevant to an action and thus advocate a paradigm shift in robot action learning research toward truly interactive systems learning in and benefiting from interaction.

  12. ROBOT LEARNING OF OBJECT MANIPULATION TASK ACTIONS FROM HUMAN DEMONSTRATIONS

    Directory of Open Access Journals (Sweden)

    Maria Kyrarini

    2017-08-01

    Full Text Available Robot learning from demonstration is a method which enables robots to learn in a similar way as humans. In this paper, a framework that enables robots to learn from multiple human demonstrations via kinesthetic teaching is presented. The subject of learning is a high-level sequence of actions, as well as the low-level trajectories necessary to be followed by the robot to perform the object manipulation task. The multiple human demonstrations are recorded and only the most similar demonstrations are selected for robot learning. The high-level learning module identifies the sequence of actions of the demonstrated task. Using Dynamic Time Warping (DTW and Gaussian Mixture Model (GMM, the model of demonstrated trajectories is learned. The learned trajectory is generated by Gaussian mixture regression (GMR from the learned Gaussian mixture model.  In online working phase, the sequence of actions is identified and experimental results show that the robot performs the learned task successfully.

  13. Human demonstrations for fast and safe exploration in reinforcement learning

    NARCIS (Netherlands)

    Schonebaum, G.K.; Junell, J.L.; van Kampen, E.

    2017-01-01

    Reinforcement learning is a promising framework for controlling complex vehicles with a high level of autonomy, since it does not need a dynamic model of the vehicle, and it is able to adapt to changing conditions. When learning from scratch, the performance of a reinforcement learning controller

  14. Reinforcement learning account of network reciprocity.

    Science.gov (United States)

    Ezaki, Takahiro; Masuda, Naoki

    2017-01-01

    Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model) approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results.

  15. Reinforcement learning account of network reciprocity.

    Directory of Open Access Journals (Sweden)

    Takahiro Ezaki

    Full Text Available Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results.

  16. Robotic Cooperative Learning Promotes Student STEM Interest

    Science.gov (United States)

    Mosley, Pauline; Ardito, Gerald; Scollins, Lauren

    2016-01-01

    The principal purpose of this investigation is to study the effect of robotic cooperative learning methodologies on middle school students' critical thinking, and STEM interest. The semi-experimental inquiry consisted of ninety four six-grade students (forty nine students in the experimental group, forty five students in the control group), chosen…

  17. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction.

    Science.gov (United States)

    de Greeff, Joachim; Belpaeme, Tony

    2015-01-01

    Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children's social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference); the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a "mental model" of the robot, tailoring the tutoring to the robot's performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot's bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance.

  18. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction.

    Directory of Open Access Journals (Sweden)

    Joachim de Greeff

    Full Text Available Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children's social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference; the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a "mental model" of the robot, tailoring the tutoring to the robot's performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot's bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance.

  19. Curiosity driven reinforcement learning for motion planning on humanoids

    Science.gov (United States)

    Frank, Mikhail; Leitner, Jürgen; Stollenga, Marijn; Förster, Alexander; Schmidhuber, Jürgen

    2014-01-01

    Most previous work on artificial curiosity (AC) and intrinsic motivation focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study AC in a more realistic setting, we embody a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning (RL) framework consists of a state-of-the-art, low-level, reactive control layer, which controls the iCub while respecting constraints, and a high-level curious agent, which explores the iCub's state-action space through information gain maximization, learning a world model from experience, controlling the actual iCub hardware in real-time. To the best of our knowledge, this is the first ever embodied, curious agent for real-time motion planning on a humanoid. We demonstrate that it can learn compact Markov models to represent large regions of the iCub's configuration space, and that the iCub explores intelligently, showing interest in its physical constraints as well as in objects it finds in its environment. PMID:24432001

  20. Fable II: Design of a Modular Robot for Creative Learning

    DEFF Research Database (Denmark)

    Pacheco, Moises; Fogh, Rune; Lund, Henrik Hautop

    2015-01-01

    Robotic systems have a high potential for creative learning if they are flexible, accessible and engaging for the user in the experimental process of building and programming robots. In this paper we describe the Fable modular robotic system for creative learning which we develop to enable and mo...

  1. Ensemble Network Architecture for Deep Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Xi-liang Chen

    2018-01-01

    Full Text Available The popular deep Q learning algorithm is known to be instability because of the Q-value’s shake and overestimation action values under certain conditions. These issues tend to adversely affect their performance. In this paper, we develop the ensemble network architecture for deep reinforcement learning which is based on value function approximation. The temporal ensemble stabilizes the training process by reducing the variance of target approximation error and the ensemble of target values reduces the overestimate and makes better performance by estimating more accurate Q-value. Our results show that this architecture leads to statistically significant better value evaluation and more stable and better performance on several classical control tasks at OpenAI Gym environment.

  2. Optimizing Chemical Reactions with Deep Reinforcement Learning.

    Science.gov (United States)

    Zhou, Zhenpeng; Li, Xiaocheng; Zare, Richard N

    2017-12-27

    Deep reinforcement learning was employed to optimize chemical reactions. Our model iteratively records the results of a chemical reaction and chooses new experimental conditions to improve the reaction outcome. This model outperformed a state-of-the-art blackbox optimization algorithm by using 71% fewer steps on both simulations and real reactions. Furthermore, we introduced an efficient exploration strategy by drawing the reaction conditions from certain probability distributions, which resulted in an improvement on regret from 0.062 to 0.039 compared with a deterministic policy. Combining the efficient exploration policy with accelerated microdroplet reactions, optimal reaction conditions were determined in 30 min for the four reactions considered, and a better understanding of the factors that control microdroplet reactions was reached. Moreover, our model showed a better performance after training on reactions with similar or even dissimilar underlying mechanisms, which demonstrates its learning ability.

  3. Vicarious reinforcement learning signals when instructing others.

    Science.gov (United States)

    Apps, Matthew A J; Lesage, Elise; Ramnani, Narender

    2015-02-18

    Reinforcement learning (RL) theory posits that learning is driven by discrepancies between the predicted and actual outcomes of actions (prediction errors [PEs]). In social environments, learning is often guided by similar RL mechanisms. For example, teachers monitor the actions of students and provide feedback to them. This feedback evokes PEs in students that guide their learning. We report the first study that investigates the neural mechanisms that underpin RL signals in the brain of a teacher. Neurons in the anterior cingulate cortex (ACC) signal PEs when learning from the outcomes of one's own actions but also signal information when outcomes are received by others. Does a teacher's ACC signal PEs when monitoring a student's learning? Using fMRI, we studied brain activity in human subjects (teachers) as they taught a confederate (student) action-outcome associations by providing positive or negative feedback. We examined activity time-locked to the students' responses, when teachers infer student predictions and know actual outcomes. We fitted a RL-based computational model to the behavior of the student to characterize their learning, and examined whether a teacher's ACC signals when a student's predictions are wrong. In line with our hypothesis, activity in the teacher's ACC covaried with the PE values in the model. Additionally, activity in the teacher's insula and ventromedial prefrontal cortex covaried with the predicted value according to the student. Our findings highlight that the ACC signals PEs vicariously for others' erroneous predictions, when monitoring and instructing their learning. These results suggest that RL mechanisms, processed vicariously, may underpin and facilitate teaching behaviors. Copyright © 2015 Apps et al.

  4. Optimizing microstimulation using a reinforcement learning framework.

    Science.gov (United States)

    Brockmeier, Austin J; Choi, John S; Distasio, Marcello M; Francis, Joseph T; Príncipe, José C

    2011-01-01

    The ability to provide sensory feedback is desired to enhance the functionality of neuroprosthetics. Somatosensory feedback provides closed-loop control to the motor system, which is lacking in feedforward neuroprosthetics. In the case of existing somatosensory function, a template of the natural response can be used as a template of desired response elicited by electrical microstimulation. In the case of no initial training data, microstimulation parameters that produce responses close to the template must be selected in an online manner. We propose using reinforcement learning as a framework to balance the exploration of the parameter space and the continued selection of promising parameters for further stimulation. This approach avoids an explicit model of the neural response from stimulation. We explore a preliminary architecture--treating the task as a k-armed bandit--using offline data recorded for natural touch and thalamic microstimulation, and we examine the methods efficiency in exploring the parameter space while concentrating on promising parameter forms. The best matching stimulation parameters, from k = 68 different forms, are selected by the reinforcement learning algorithm consistently after 334 realizations.

  5. Impedance learning for robotic contact tasks using natural actor-critic algorithm.

    Science.gov (United States)

    Kim, Byungchan; Park, Jooyoung; Park, Shinsuk; Kang, Sungchul

    2010-04-01

    Compared with their robotic counterparts, humans excel at various tasks by using their ability to adaptively modulate arm impedance parameters. This ability allows us to successfully perform contact tasks even in uncertain environments. This paper considers a learning strategy of motor skill for robotic contact tasks based on a human motor control theory and machine learning schemes. Our robot learning method employs impedance control based on the equilibrium point control theory and reinforcement learning to determine the impedance parameters for contact tasks. A recursive least-square filter-based episodic natural actor-critic algorithm is used to find the optimal impedance parameters. The effectiveness of the proposed method was tested through dynamic simulations of various contact tasks. The simulation results demonstrated that the proposed method optimizes the performance of the contact tasks in uncertain conditions of the environment.

  6. Robot technologies, autism and designs for learning

    DEFF Research Database (Denmark)

    Hansbøl, Mikala

    2015-01-01

    technologies involves several very different educational approaches to supporting young people’s learning and development. The paper discusses how robot technologies as learning resources have been related to the field of autism and education, and argues for a need to further expand the areas of application...... in the future, with a focus on children and young people diagnosed with autism spectrum disorders, their ICT interests and engagement in innovative and creative learning. The paper draws on international research and examples from the author’s own research into education for children and young people diagnosed...... with autism spectrum disorders, drawing on teachers’ and the students’ interests in working with ICT (e.g. robot technology)....

  7. What can we learn from Robots

    DEFF Research Database (Denmark)

    Clausen, Thea Juhl Roloff

    2015-01-01

    Robot Technology is going to be a big part of the future, but robotics is also the present. We ask the question, “What can we use this technology for and does it demand a new way to think digital literacy?” One thing we do know is that we must start having focus on the needs the future indicates....... With Robot Technology in a play-and learning perspective it is being a co-producer and being experimental there will be in focus. We must have an understanding of programming language, an understanding of what happens behind the interface and how to use this in an innovative perspective to help shape...

  8. Reinforcement Learning for Ramp Control: An Analysis of Learning Parameters

    Directory of Open Access Journals (Sweden)

    Chao Lu

    2016-08-01

    Full Text Available Reinforcement Learning (RL has been proposed to deal with ramp control problems under dynamic traffic conditions; however, there is a lack of sufficient research on the behaviour and impacts of different learning parameters. This paper describes a ramp control agent based on the RL mechanism and thoroughly analyzed the influence of three learning parameters; namely, learning rate, discount rate and action selection parameter on the algorithm performance. Two indices for the learning speed and convergence stability were used to measure the algorithm performance, based on which a series of simulation-based experiments were designed and conducted by using a macroscopic traffic flow model. Simulation results showed that, compared with the discount rate, the learning rate and action selection parameter made more remarkable impacts on the algorithm performance. Based on the analysis, some suggestionsabout how to select suitable parameter values that can achieve a superior performance were provided.

  9. Reusable Reinforcement Learning via Shallow Trails.

    Science.gov (United States)

    Yu, Yang; Chen, Shi-Yong; Da, Qing; Zhou, Zhi-Hua

    2018-06-01

    Reinforcement learning has shown great success in helping learning agents accomplish tasks autonomously from environment interactions. Meanwhile in many real-world applications, an agent needs to accomplish not only a fixed task but also a range of tasks. For this goal, an agent can learn a metapolicy over a set of training tasks that are drawn from an underlying distribution. By maximizing the total reward summed over all the training tasks, the metapolicy can then be reused in accomplishing test tasks from the same distribution. However, in practice, we face two major obstacles to train and reuse metapolicies well. First, how to identify tasks that are unrelated or even opposite with each other, in order to avoid their mutual interference in the training. Second, how to characterize task features, according to which a metapolicy can be reused. In this paper, we propose the MetA-Policy LEarning (MAPLE) approach that overcomes the two difficulties by introducing the shallow trail. It probes a task by running a roughly trained policy. Using the rewards of the shallow trail, MAPLE automatically groups similar tasks. Moreover, when the task parameters are unknown, the rewards of the shallow trail also serve as task features. Empirical studies on several controlling tasks verify that MAPLE can train metapolicies well and receives high reward on test tasks.

  10. A reinforcement learning model of joy, distress, hope and fear

    Science.gov (United States)

    Broekens, Joost; Jacobs, Elmer; Jonker, Catholijn M.

    2015-07-01

    In this paper we computationally study the relation between adaptive behaviour and emotion. Using the reinforcement learning framework, we propose that learned state utility, ?, models fear (negative) and hope (positive) based on the fact that both signals are about anticipation of loss or gain. Further, we propose that joy/distress is a signal similar to the error signal. We present agent-based simulation experiments that show that this model replicates psychological and behavioural dynamics of emotion. This work distinguishes itself by assessing the dynamics of emotion in an adaptive agent framework - coupling it to the literature on habituation, development, extinction and hope theory. Our results support the idea that the function of emotion is to provide a complex feedback signal for an organism to adapt its behaviour. Our work is relevant for understanding the relation between emotion and adaptation in animals, as well as for human-robot interaction, in particular how emotional signals can be used to communicate between adaptive agents and humans.

  11. Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers

    Science.gov (United States)

    Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.

    2014-01-01

    Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…

  12. Continuous residual reinforcement learning for traffic signal control optimization

    NARCIS (Netherlands)

    Aslani, Mohammad; Seipel, Stefan; Wiering, Marco

    2018-01-01

    Traffic signal control can be naturally regarded as a reinforcement learning problem. Unfortunately, it is one of the most difficult classes of reinforcement learning problems owing to its large state space. A straightforward approach to address this challenge is to control traffic signals based on

  13. Reinforcement learning in continuous state and action spaces

    NARCIS (Netherlands)

    H. P. van Hasselt (Hado); M.A. Wiering; M. van Otterlo

    2012-01-01

    textabstractMany traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action

  14. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction

    Science.gov (United States)

    de Greeff, Joachim; Belpaeme, Tony

    2015-01-01

    Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children’s social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference); the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a “mental model” of the robot, tailoring the tutoring to the robot’s performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot’s bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance. PMID:26422143

  15. Tank War Using Online Reinforcement Learning

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-Time Strategy(RTS) games provide a challenging platform to implement online reinforcement learning(RL) techniques in a real application. Computer as one player monitors opponents'(human or other computers) strategies and then updates its own policy using RL methods. In this paper, we propose...... a multi-layer framework for implementing the online RL in a RTS game. The framework significantly reduces the RL computational complexity by decomposing the state space in a hierarchical manner. We implement the RTS game - Tank General, and perform a thorough test on the proposed framework. The results...... show the effectiveness of our proposed framework and shed light on relevant issues on using the RL in RTS games....

  16. Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: a simulated robotic study.

    Science.gov (United States)

    Mirolli, Marco; Santucci, Vieri G; Baldassarre, Gianluca

    2013-03-01

    An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions. Copyright © 2013 Elsevier Ltd. All rights reserved.

  17. Reinforcement learning in complementarity game and population dynamics.

    Science.gov (United States)

    Jost, Jürgen; Li, Wei

    2014-02-01

    We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005)] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.

  18. Can young children learn words from a robot?

    OpenAIRE

    Moriguchi, Yusuke; Kanda, Takayuki; Ishiguro, Hiroshi; Shimada, Yoko; Itakura, Shoji

    2011-01-01

    Young children generally learn words from other people. Recent research has shown that children can learn new actions and skills from nonhuman agents. This study examines whether young children could learn words from a robot. Preschool children were shown a video in which either a woman (human condition) or a mechanical robot (robot condition) labeled novel objects. Then the children were asked to select the objects according to the names used in the video. The results revealed that children ...

  19. Robot soccer action selection based on Q learning

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    This paper researches robot soccer action selection based on Q learning . The robot learn to activate particular behavior given their current situation and reward signal. We adopt neural network to implementations of Q learning for their generalization properties and limited computer memory requirements

  20. Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer.

    Science.gov (United States)

    Hu, Yujing; Gao, Yang; An, Bo

    2015-07-01

    An important approach in multiagent reinforcement learning (MARL) is equilibrium-based MARL, which adopts equilibrium solution concepts in game theory and requires agents to play equilibrium strategies at each state. However, most existing equilibrium-based MARL algorithms cannot scale due to a large number of computationally expensive equilibrium computations (e.g., computing Nash equilibria is PPAD-hard) during learning. For the first time, this paper finds that during the learning process of equilibrium-based MARL, the one-shot games corresponding to each state's successive visits often have the same or similar equilibria (for some states more than 90% of games corresponding to successive visits have similar equilibria). Inspired by this observation, this paper proposes to use equilibrium transfer to accelerate equilibrium-based MARL. The key idea of equilibrium transfer is to reuse previously computed equilibria when each agent has a small incentive to deviate. By introducing transfer loss and transfer condition, a novel framework called equilibrium transfer-based MARL is proposed. We prove that although equilibrium transfer brings transfer loss, equilibrium-based MARL algorithms can still converge to an equilibrium policy under certain assumptions. Experimental results in widely used benchmarks (e.g., grid world game, soccer game, and wall game) show that the proposed framework: 1) not only significantly accelerates equilibrium-based MARL (up to 96.7% reduction in learning time), but also achieves higher average rewards than algorithms without equilibrium transfer and 2) scales significantly better than algorithms without equilibrium transfer when the state/action space grows and the number of agents increases.

  1. Locomotion training of legged robots using hybrid machine learning techniques

    Science.gov (United States)

    Simon, William E.; Doerschuk, Peggy I.; Zhang, Wen-Ran; Li, Andrew L.

    1995-01-01

    In this study artificial neural networks and fuzzy logic are used to control the jumping behavior of a three-link uniped robot. The biped locomotion control problem is an increment of the uniped locomotion control. Study of legged locomotion dynamics indicates that a hierarchical controller is required to control the behavior of a legged robot. A structured control strategy is suggested which includes navigator, motion planner, biped coordinator and uniped controllers. A three-link uniped robot simulation is developed to be used as the plant. Neurocontrollers were trained both online and offline. In the case of on-line training, a reinforcement learning technique was used to train the neurocontroller to make the robot jump to a specified height. After several hundred iterations of training, the plant output achieved an accuracy of 7.4%. However, when jump distance and body angular momentum were also included in the control objectives, training time became impractically long. In the case of off-line training, a three-layered backpropagation (BP) network was first used with three inputs, three outputs and 15 to 40 hidden nodes. Pre-generated data were presented to the network with a learning rate as low as 0.003 in order to reach convergence. The low learning rate required for convergence resulted in a very slow training process which took weeks to learn 460 examples. After training, performance of the neurocontroller was rather poor. Consequently, the BP network was replaced by a Cerebeller Model Articulation Controller (CMAC) network. Subsequent experiments described in this document show that the CMAC network is more suitable to the solution of uniped locomotion control problems in terms of both learning efficiency and performance. A new approach is introduced in this report, viz., a self-organizing multiagent cerebeller model for fuzzy-neural control of uniped locomotion is suggested to improve training efficiency. This is currently being evaluated for a possible

  2. Associative learning for a robot intelligence

    CERN Document Server

    Andreae, John H

    1998-01-01

    The explanation of brain functioning in terms of the association of ideas has been popular since the 17th century. Recently, however, the process of association has been dismissed as computationally inadequate by prominent cognitive scientists. In this book, a sharper definition of the term "association" is used to revive the process by showing that associative learning can indeed be computationally powerful. Within an appropriate organization, associative learning can be embodied in a robot to realize a human-like intelligence, which sets its own goals, exhibits unique unformalizable behaviou

  3. What Can Reinforcement Learning Teach Us About Non-Equilibrium Quantum Dynamics

    Science.gov (United States)

    Bukov, Marin; Day, Alexandre; Sels, Dries; Weinberg, Phillip; Polkovnikov, Anatoli; Mehta, Pankaj

    Equilibrium thermodynamics and statistical physics are the building blocks of modern science and technology. Yet, our understanding of thermodynamic processes away from equilibrium is largely missing. In this talk, I will reveal the potential of what artificial intelligence can teach us about the complex behaviour of non-equilibrium systems. Specifically, I will discuss the problem of finding optimal drive protocols to prepare a desired target state in quantum mechanical systems by applying ideas from Reinforcement Learning [one can think of Reinforcement Learning as the study of how an agent (e.g. a robot) can learn and perfect a given policy through interactions with an environment.]. The driving protocols learnt by our agent suggest that the non-equilibrium world features possibilities easily defying intuition based on equilibrium physics.

  4. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning.

    Science.gov (United States)

    Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S

    2011-01-01

    As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. © 2011 IEEE

  5. A Project-based Learning approach for teaching Robotics to ...

    African Journals Online (AJOL)

    In this research we used a project-based learning approach to teach robotics basics to undergraduate business computing students. The course coverage includes basic electronics, robot construction and programming using arduino. Students developed and tested a robot prototype. The project was evaluated using a ...

  6. Robotic Mitral Valve Repair: The Learning Curve.

    Science.gov (United States)

    Goodman, Avi; Koprivanac, Marijan; Kelava, Marta; Mick, Stephanie L; Gillinov, A Marc; Rajeswaran, Jeevanantham; Brzezinski, Anna; Blackstone, Eugene H; Mihaljevic, Tomislav

    Adoption of robotic mitral valve surgery has been slow, likely in part because of its perceived technical complexity and a poorly understood learning curve. We sought to correlate changes in technical performance and outcome with surgeon experience in the "learning curve" part of our series. From 2006 to 2011, two surgeons undertook robotically assisted mitral valve repair in 458 patients (intent-to-treat); 404 procedures were completed entirely robotically (as-treated). Learning curves were constructed by modeling surgical sequence number semiparametrically with flexible penalized spline smoothing best-fit curves. Operative efficiency, reflecting technical performance, improved for (1) operating room time for case 1 to cases 200 (early experience) and 400 (later experience), from 414 to 364 to 321 minutes (12% and 22% decrease, respectively), (2) cardiopulmonary bypass time, from 148 to 102 to 91 minutes (31% and 39% decrease), and (3) myocardial ischemic time, from 119 to 75 to 68 minutes (37% and 43% decrease). Composite postoperative complications, reflecting safety, decreased from 17% to 6% to 2% (63% and 85% decrease). Intensive care unit stay decreased from 32 to 28 to 24 hours (13% and 25% decrease). Postoperative stay fell from 5.2 to 4.5 to 3.8 days (13% and 27% decrease). There were no in-hospital deaths. Predischarge mitral regurgitation of less than 2+, reflecting effectiveness, was achieved in 395 (97.8%), without correlation to experience; return-to-work times did not change substantially with experience. Technical efficiency of robotic mitral valve repair improves with experience and permits its safe and effective conduct.

  7. Solution to reinforcement learning problems with artificial potential field

    Institute of Scientific and Technical Information of China (English)

    XIE Li-juan; XIE Guang-rong; CHEN Huan-wen; LI Xiao-li

    2008-01-01

    A novel method was designed to solve reinforcement learning problems with artificial potential field. Firstly a reinforcement learning problem was transferred to a path planning problem by using artificial potential field(APF), which was a very appropriate method to model a reinforcement learning problem. Secondly, a new APF algorithm was proposed to overcome the local minimum problem in the potential field methods with a virtual water-flow concept. The performance of this new method was tested by a gridworld problem named as key and door maze. The experimental results show that within 45 trials, good and deterministic policies are found in almost all simulations. In comparison with WIERING's HQ-learning system which needs 20 000 trials for stable solution, the proposed new method can obtain optimal and stable policy far more quickly than HQ-learning. Therefore, the new method is simple and effective to give an optimal solution to the reinforcement learning problem.

  8. Tunnel Ventilation Control Using Reinforcement Learning Methodology

    Science.gov (United States)

    Chu, Baeksuk; Kim, Dongnam; Hong, Daehie; Park, Jooyoung; Chung, Jin Taek; Kim, Tae-Hyung

    The main purpose of tunnel ventilation system is to maintain CO pollutant concentration and VI (visibility index) under an adequate level to provide drivers with comfortable and safe driving environment. Moreover, it is necessary to minimize power consumption used to operate ventilation system. To achieve the objectives, the control algorithm used in this research is reinforcement learning (RL) method. RL is a goal-directed learning of a mapping from situations to actions without relying on exemplary supervision or complete models of the environment. The goal of RL is to maximize a reward which is an evaluative feedback from the environment. In the process of constructing the reward of the tunnel ventilation system, two objectives listed above are included, that is, maintaining an adequate level of pollutants and minimizing power consumption. RL algorithm based on actor-critic architecture and gradient-following algorithm is adopted to the tunnel ventilation system. The simulations results performed with real data collected from existing tunnel ventilation system and real experimental verification are provided in this paper. It is confirmed that with the suggested controller, the pollutant level inside the tunnel was well maintained under allowable limit and the performance of energy consumption was improved compared to conventional control scheme.

  9. Flexible Heuristic Dynamic Programming for Reinforcement Learning in Quadrotors

    NARCIS (Netherlands)

    Helmer, Alexander; de Visser, C.C.; van Kampen, E.

    2018-01-01

    Reinforcement learning is a paradigm for learning decision-making tasks from interaction with the environment. Function approximators solve a part of the curse of dimensionality when learning in high-dimensional state and/or action spaces. It can be a time-consuming process to learn a good policy in

  10. Personalizing a Service Robot by Learning Human Habits from Behavioral Footprints

    Directory of Open Access Journals (Sweden)

    Kun Li

    2015-03-01

    Full Text Available For a domestic personal robot, personalized services are as important as predesigned tasks, because the robot needs to adjust the home state based on the operator's habits. An operator's habits are composed of cues, behaviors, and rewards. This article introduces behavioral footprints to describe the operator's behaviors in a house, and applies the inverse reinforcement learning technique to extract the operator's habits, represented by a reward function. We implemented the proposed approach with a mobile robot on indoor temperature adjustment, and compared this approach with a baseline method that recorded all the cues and behaviors of the operator. The result shows that the proposed approach allows the robot to reveal the operator's habits accurately and adjust the environment state accordingly.

  11. Approaches to probabilistic model learning for mobile manipulation robots

    CERN Document Server

    Sturm, Jürgen

    2013-01-01

    Mobile manipulation robots are envisioned to provide many useful services both in domestic environments as well as in the industrial context. Examples include domestic service robots that implement large parts of the housework, and versatile industrial assistants that provide automation, transportation, inspection, and monitoring services. The challenge in these applications is that the robots have to function under changing, real-world conditions, be able to deal with considerable amounts of noise and uncertainty, and operate without the supervision of an expert. This book presents novel learning techniques that enable mobile manipulation robots, i.e., mobile platforms with one or more robotic manipulators, to autonomously adapt to new or changing situations. The approaches presented in this book cover the following topics: (1) learning the robot's kinematic structure and properties using actuation and visual feedback, (2) learning about articulated objects in the environment in which the robot is operating,...

  12. Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions.

    Science.gov (United States)

    Tamosiunaite, Minija; Asfour, Tamim; Wörgötter, Florentin

    2009-03-01

    Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult.

  13. Gaussian Processes for Data-Efficient Learning in Robotics and Control.

    Science.gov (United States)

    Deisenroth, Marc Peter; Fox, Dieter; Rasmussen, Carl Edward

    2015-02-01

    Autonomous learning has been a promising direction in control and robotics for more than a decade since data-driven learning allows to reduce the amount of engineering knowledge, which is otherwise required. However, autonomous reinforcement learning (RL) approaches typically require many interactions with the system to learn controllers, which is a practical limitation in real systems, such as robots, where many interactions can be impractical and time consuming. To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, realistic simulators, pre-shaped policies, or specific knowledge about the underlying dynamics. In this paper, we follow a different approach and speed up learning by extracting more information from data. In particular, we learn a probabilistic, non-parametric Gaussian process transition model of the system. By explicitly incorporating model uncertainty into long-term planning and controller learning our approach reduces the effects of model errors, a key problem in model-based learning. Compared to state-of-the art RL our model-based policy search method achieves an unprecedented speed of learning. We demonstrate its applicability to autonomous learning in real robot and control tasks.

  14. Efficient collective swimming by harnessing vortices through deep reinforcement learning.

    Science.gov (United States)

    Verma, Siddhartha; Novati, Guido; Koumoutsakos, Petros

    2018-06-05

    Fish in schooling formations navigate complex flow fields replete with mechanical energy in the vortex wakes of their companions. Their schooling behavior has been associated with evolutionary advantages including energy savings, yet the underlying physical mechanisms remain unknown. We show that fish can improve their sustained propulsive efficiency by placing themselves in appropriate locations in the wake of other swimmers and intercepting judiciously their shed vortices. This swimming strategy leads to collective energy savings and is revealed through a combination of high-fidelity flow simulations with a deep reinforcement learning (RL) algorithm. The RL algorithm relies on a policy defined by deep, recurrent neural nets, with long-short-term memory cells, that are essential for capturing the unsteadiness of the two-way interactions between the fish and the vortical flow field. Surprisingly, we find that swimming in-line with a leader is not associated with energetic benefits for the follower. Instead, "smart swimmer(s)" place themselves at off-center positions, with respect to the axis of the leader(s) and deform their body to synchronize with the momentum of the oncoming vortices, thus enhancing their swimming efficiency at no cost to the leader(s). The results confirm that fish may harvest energy deposited in vortices and support the conjecture that swimming in formation is energetically advantageous. Moreover, this study demonstrates that deep RL can produce navigation algorithms for complex unsteady and vortical flow fields, with promising implications for energy savings in autonomous robotic swarms.

  15. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning

    National Research Council Canada - National Science Library

    Bowling, Michael

    2000-01-01

    .... In this paper we contribute a comprehensive presentation of the relevant techniques for solving stochastic games from both the game theory community and reinforcement learning communities. We examine the assumptions and limitations of these algorithms, and identify similarities between these algorithms, single agent reinforcement learners, and basic game theory techniques.

  16. Lung Nodule Detection via Deep Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Issa Ali

    2018-04-01

    Full Text Available Lung cancer is the most common cause of cancer-related death globally. As a preventive measure, the United States Preventive Services Task Force (USPSTF recommends annual screening of high risk individuals with low-dose computed tomography (CT. The resulting volume of CT scans from millions of people will pose a significant challenge for radiologists to interpret. To fill this gap, computer-aided detection (CAD algorithms may prove to be the most promising solution. A crucial first step in the analysis of lung cancer screening results using CAD is the detection of pulmonary nodules, which may represent early-stage lung cancer. The objective of this work is to develop and validate a reinforcement learning model based on deep artificial neural networks for early detection of lung nodules in thoracic CT images. Inspired by the AlphaGo system, our deep learning algorithm takes a raw CT image as input and views it as a collection of states, and output a classification of whether a nodule is present or not. The dataset used to train our model is the LIDC/IDRI database hosted by the lung nodule analysis (LUNA challenge. In total, there are 888 CT scans with annotations based on agreement from at least three out of four radiologists. As a result, there are 590 individuals having one or more nodules, and 298 having none. Our training results yielded an overall accuracy of 99.1% [sensitivity 99.2%, specificity 99.1%, positive predictive value (PPV 99.1%, negative predictive value (NPV 99.2%]. In our test, the results yielded an overall accuracy of 64.4% (sensitivity 58.9%, specificity 55.3%, PPV 54.2%, and NPV 60.0%. These early results show promise in solving the major issue of false positives in CT screening of lung nodules, and may help to save unnecessary follow-up tests and expenditures.

  17. Adaptive learning fuzzy control of a mobile robot

    International Nuclear Information System (INIS)

    Tsukada, Akira; Suzuki, Katsuo; Fujii, Yoshio; Shinohara, Yoshikuni

    1989-11-01

    In this report a problem is studied to construct a fuzzy controller for a mobile robot to move autonomously along a given reference direction curve, for which control rules are generated and acquired through an adaptive learning process. An adaptive learning fuzzy controller has been developed for a mobile robot. Good properties of the controller are shown through the travelling experiments of the mobile robot. (author)

  18. Robot Competence Development by Constructive Learning

    Science.gov (United States)

    Meng, Q.; Lee, M. H.; Hinde, C. J.

    This paper presents a constructive learning approach for developing sensor-motor mapping in autonomous systems. The system’s adaptation to environment changes is discussed and three methods are proposed to deal with long term and short term changes. The proposed constructive learning allows autonomous systems to develop network topology and adjust network parameters. The approach is supported by findings from psychology and neuroscience especially during infants cognitive development at early stages. A growing radial basis function network is introduced as a computational substrate for sensory-motor mapping learning. Experiments are conducted on a robot eye/hand coordination testbed and results show the incremental development of sensory-motor mapping and its adaptation to changes such as in tool-use.

  19. Neural Basis of Reinforcement Learning and Decision Making

    Science.gov (United States)

    Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan

    2012-01-01

    Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543

  20. Integrating distributed Bayesian inference and reinforcement learning for sensor management

    NARCIS (Netherlands)

    Grappiolo, C.; Whiteson, S.; Pavlin, G.; Bakker, B.

    2009-01-01

    This paper introduces a sensor management approach that integrates distributed Bayesian inference (DBI) and reinforcement learning (RL). DBI is implemented using distributed perception networks (DPNs), a multiagent approach to performing efficient inference, while RL is used to automatically

  1. applying reinforcement learning to the weapon assignment problem

    African Journals Online (AJOL)

    ismith

    Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy ..... closest to the threat should fire (that weapon also had the highest probability to ... Monte Carlo ..... “Reinforcement learning: Theory, methods and application to.

  2. Exploiting Best-Match Equations for Efficient Reinforcement Learning

    NARCIS (Netherlands)

    van Seijen, Harm; Whiteson, Shimon; van Hasselt, Hado; Wiering, Marco

    This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of model-free methods. Best-match learning works by approximating the solution to a set of best-match equations,

  3. Learning from demonstration: Teaching a myoelectric prosthesis with an intact limb via reinforcement learning.

    Science.gov (United States)

    Vasan, Gautham; Pilarski, Patrick M

    2017-07-01

    Prosthetic arms should restore and extend the capabilities of someone with an amputation. They should move naturally and be able to perform elegant, coordinated movements that approximate those of a biological arm. Despite these objectives, the control of modern-day prostheses is often nonintuitive and taxing. Existing devices and control approaches do not yet give users the ability to effect highly synergistic movements during their daily-life control of a prosthetic device. As a step towards improving the control of prosthetic arms and hands, we introduce an intuitive approach to training a prosthetic control system that helps a user achieve hard-to-engineer control behaviours. Specifically, we present an actor-critic reinforcement learning method that for the first time promises to allow someone with an amputation to use their non-amputated arm to teach their prosthetic arm how to move through a wide range of coordinated motions and grasp patterns. We evaluate our method during the myoelectric control of a multi-joint robot arm by non-amputee users, and demonstrate that by using our approach a user can train their arm to perform simultaneous gestures and movements in all three degrees of freedom in the robot's hand and wrist based only on information sampled from the robot and the user's above-elbow myoelectric signals. Our results indicate that this learning-from-demonstration paradigm may be well suited to use by both patients and clinicians with minimal technical knowledge, as it allows a user to personalize the control of his or her prosthesis without having to know the underlying mechanics of the prosthetic limb. These preliminary results also suggest that our approach may extend in a straightforward way to next-generation prostheses with precise finger and wrist control, such that these devices may someday allow users to perform fluid and intuitive movements like playing the piano, catching a ball, and comfortably shaking hands.

  4. Safe Exploration of State and Action Spaces in Reinforcement Learning

    OpenAIRE

    Garcia, Javier; Fernandez, Fernando

    2014-01-01

    In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some sta...

  5. Continuing Robot Skill Learning after Demonstration with Human Feedback

    Directory of Open Access Journals (Sweden)

    Argall Brenna D.

    2011-12-01

    Full Text Available Though demonstration-based approaches have been successfully applied to learning a variety of robot behaviors, there do exist some limitations. The ability to continue learning after demonstration, based on execution experience with the learned policy, therefore has proven to be an asset to many demonstration-based learning systems. This paper discusses important considerations for interfaces that provide feedback to adapt and improve demonstrated behaviors. Feedback interfaces developed for two robots with very different motion capabilities - a wheeled mobile robot and high degree-of-freedom humanoid - are highlighted.

  6. Learning Semantics of Gestural Instructions for Human-Robot Collaboration

    Science.gov (United States)

    Shukla, Dadhichi; Erkent, Özgür; Piater, Justus

    2018-01-01

    Designed to work safely alongside humans, collaborative robots need to be capable partners in human-robot teams. Besides having key capabilities like detecting gestures, recognizing objects, grasping them, and handing them over, these robots need to seamlessly adapt their behavior for efficient human-robot collaboration. In this context we present the fast, supervised Proactive Incremental Learning (PIL) framework for learning associations between human hand gestures and the intended robotic manipulation actions. With the proactive aspect, the robot is competent to predict the human's intent and perform an action without waiting for an instruction. The incremental aspect enables the robot to learn associations on the fly while performing a task. It is a probabilistic, statistically-driven approach. As a proof of concept, we focus on a table assembly task where the robot assists its human partner. We investigate how the accuracy of gesture detection affects the number of interactions required to complete the task. We also conducted a human-robot interaction study with non-roboticist users comparing a proactive with a reactive robot that waits for instructions. PMID:29615888

  7. Learning Semantics of Gestural Instructions for Human-Robot Collaboration.

    Science.gov (United States)

    Shukla, Dadhichi; Erkent, Özgür; Piater, Justus

    2018-01-01

    Designed to work safely alongside humans, collaborative robots need to be capable partners in human-robot teams. Besides having key capabilities like detecting gestures, recognizing objects, grasping them, and handing them over, these robots need to seamlessly adapt their behavior for efficient human-robot collaboration. In this context we present the fast, supervised Proactive Incremental Learning (PIL) framework for learning associations between human hand gestures and the intended robotic manipulation actions. With the proactive aspect, the robot is competent to predict the human's intent and perform an action without waiting for an instruction. The incremental aspect enables the robot to learn associations on the fly while performing a task. It is a probabilistic, statistically-driven approach. As a proof of concept, we focus on a table assembly task where the robot assists its human partner. We investigate how the accuracy of gesture detection affects the number of interactions required to complete the task. We also conducted a human-robot interaction study with non-roboticist users comparing a proactive with a reactive robot that waits for instructions.

  8. Effect of reinforcement learning on coordination of multiangent systems

    Science.gov (United States)

    Bukkapatnam, Satish T. S.; Gao, Greg

    2000-12-01

    For effective coordination of distributed environments involving multiagent systems, learning ability of each agent in the environment plays a crucial role. In this paper, we develop a simple group learning method based on reinforcement, and study its effect on coordination through application to a supply chain procurement scenario involving a computer manufacturer. Here, all parties are represented by self-interested, autonomous agents, each capable of performing specific simple tasks. They negotiate with each other to perform complex tasks and thus coordinate supply chain procurement. Reinforcement learning is intended to enable each agent to reach a best negotiable price within a shortest possible time. Our simulations of the application scenario under different learning strategies reveals the positive effects of reinforcement learning on an agent's as well as the system's performance.

  9. Instructional control of reinforcement learning: a behavioral and neurocomputational investigation.

    Science.gov (United States)

    Doll, Bradley B; Jacobs, W Jake; Sanfey, Alan G; Frank, Michael J

    2009-11-24

    Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.

  10. A line follower robot implementation using Lego's Mindstorms Kit and Q-Learning

    Directory of Open Access Journals (Sweden)

    Hector-Gabriel Acosta-Mesa

    2012-03-01

    Full Text Available Un problema común al trabajar con robots móviles es que la fase de programación puede ser un proceso largo, costoso y difícil para los programadores. Los Algoritmos de Aprendizaje por Refuerzo ofrecen uno de los marcos de trabajo más generales en el ámbito de aprendizaje de máquina. Este trabajo presenta un enfoque usando el algoritmo de Q-Learning en un robot Lego para que aprenda "por sí mismo" a seguir una línea negra dibujada en una superficie blanca. El entorno de programación utilizado en este trabajo es Matlab.A common problem working with mobile robots is that programming phase could be a long, expensive and heavy process for programmers. The reinforcement learning algorithms offer one of the most general frameworks in learning subjects. This work presents an approach using the Q-Learning algorithm on a Lego robot in order for it to learn by itself how to follow a blackline drawn down on a white surface, using Matlab [5] as programming environment.

  11. Adversarial Reinforcement Learning in a Cyber Security Simulation}

    OpenAIRE

    Elderman, Richard; Pater, Leon; Thie, Albert; Drugan, Madalina; Wiering, Marco

    2017-01-01

    This paper focuses on cyber-security simulations in networks modeled as a Markov game with incomplete information and stochastic elements. The resulting game is an adversarial sequential decision making problem played with two agents, the attacker and defender. The two agents pit one reinforcement learning technique, like neural networks, Monte Carlo learning and Q-learning, against each other and examine their effectiveness against learning opponents. The results showed that Monte Carlo lear...

  12. Multi-agent machine learning a reinforcement approach

    CERN Document Server

    Schwartz, H M

    2014-01-01

    The book begins with a chapter on traditional methods of supervised learning, covering recursive least squares learning, mean square error methods, and stochastic approximation. Chapter 2 covers single agent reinforcement learning. Topics include learning value functions, Markov games, and TD learning with eligibility traces. Chapter 3 discusses two player games including two player matrix games with both pure and mixed strategies. Numerous algorithms and examples are presented. Chapter 4 covers learning in multi-player games, stochastic games, and Markov games, focusing on learning multi-pla

  13. Applications of Deep Learning and Reinforcement Learning to Biological Data.

    Science.gov (United States)

    Mahmud, Mufti; Kaiser, Mohammed Shamim; Hussain, Amir; Vassanelli, Stefano

    2018-06-01

    Rapid advances in hardware-based technologies during the past decades have opened up new possibilities for life scientists to gather multimodal data in various application domains, such as omics, bioimaging, medical imaging, and (brain/body)-machine interfaces. These have generated novel opportunities for development of dedicated data-intensive machine learning techniques. In particular, recent research in deep learning (DL), reinforcement learning (RL), and their combination (deep RL) promise to revolutionize the future of artificial intelligence. The growth in computational power accompanied by faster and increased data storage, and declining computing costs have already allowed scientists in various fields to apply these techniques on data sets that were previously intractable owing to their size and complexity. This paper provides a comprehensive survey on the application of DL, RL, and deep RL techniques in mining biological data. In addition, we compare the performances of DL techniques when applied to different data sets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.

  14. Social Learning, Reinforcement and Crime: Evidence from Three European Cities

    Science.gov (United States)

    Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina

    2012-01-01

    This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…

  15. Systems control with generalized probabilistic fuzzy-reinforcement learning

    NARCIS (Netherlands)

    Hinojosa, J.; Nefti, S.; Kaymak, U.

    2011-01-01

    Reinforcement learning (RL) is a valuable learning method when the systems require a selection of control actions whose consequences emerge over long periods for which input-output data are not available. In most combinations of fuzzy systems and RL, the environment is considered to be

  16. Human-level control through deep reinforcement learning

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-01

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  17. Human-level control through deep reinforcement learning.

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-26

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  18. A Semi-Open Learning Environment for Mobile Robotics

    Directory of Open Access Journals (Sweden)

    Enrique Sucar

    2007-05-01

    Full Text Available We have developed a semi-open learning environment for mobile robotics, to learn through free exploration, but with specific performance criteria that guides the learning process. The environment includes virtual and remote robotics laboratories, and an intelligent virtual assistant the guides the students using the labs. A series of experiments in the virtual and remote labs are designed to gradually learn the basics of mobile robotics. Each experiment considers exploration and performance aspects, which are evaluated by the virtual assistant, giving feedback to the user. The virtual laboratory has been incorporated to a course in mobile robotics and used by a group of students. A preliminary evaluation shows that the intelligent tutor combined with the virtual laboratory can improve the learning process.

  19. Neural Behavior Chain Learning of Mobile Robot Actions

    Directory of Open Access Journals (Sweden)

    Lejla Banjanovic-Mehmedovic

    2012-01-01

    Full Text Available This paper presents a visual/motor behavior learning approach, based on neural networks. We propose Behavior Chain Model (BCM in order to create a way of behavior learning. Our behavior-based system evolution task is a mobile robot detecting a target and driving/acting towards it. First, the mapping relations between the image feature domain of the object and the robot action domain are derived. Second, a multilayer neural network for offline learning of the mapping relations is used. This learning structure through neural network training process represents a connection between the visual perceptions and motor sequence of actions in order to grip a target. Last, using behavior learning through a noticed action chain, we can predict mobile robot behavior for a variety of similar tasks in similar environment. Prediction results suggest that the methodology is adequate and could be recognized as an idea for designing different mobile robot behaviour assistance.

  20. Service Oriented Robotic Architecture for Space Robotics: Design, Testing, and Lessons Learned

    Science.gov (United States)

    Fluckiger, Lorenzo Jean Marc E; Utz, Hans Heinrich

    2013-01-01

    This paper presents the lessons learned from six years of experiments with planetary rover prototypes running the Service Oriented Robotic Architecture (SORA) developed by the Intelligent Robotics Group (IRG) at the NASA Ames Research Center. SORA relies on proven software engineering methods and technologies applied to space robotics. Based on a Service Oriented Architecture and robust middleware, SORA encompasses on-board robot control and a full suite of software tools necessary for remotely operated exploration missions. SORA has been eld tested in numerous scenarios of robotic lunar and planetary exploration. The experiments conducted by IRG with SORA exercise a large set of the constraints encountered in space applications: remote robotic assets, ight relevant science instruments, distributed operations, high network latencies and unreliable or intermittent communication links. In this paper, we present the results of these eld tests in regard to the developed architecture, and discuss its bene ts and limitations.

  1. A Contest-Oriented Project for Learning Intelligent Mobile Robots

    Science.gov (United States)

    Huang, Hsin-Hsiung; Su, Juing-Huei; Lee, Chyi-Shyong

    2013-01-01

    A contest-oriented project for undergraduate students to learn implementation skills and theories related to intelligent mobile robots is presented in this paper. The project, related to Micromouse, Robotrace (Robotrace is the title of Taiwanese and Japanese robot races), and line-maze contests was developed by the embedded control system research…

  2. Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.

    Science.gov (United States)

    Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna

    2016-09-01

    Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing.

  3. Can model-free reinforcement learning explain deontological moral judgments?

    Science.gov (United States)

    Ayars, Alisabeth

    2016-05-01

    Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. Time representation in reinforcement learning models of the basal ganglia

    Directory of Open Access Journals (Sweden)

    Samuel Joseph Gershman

    2014-01-01

    Full Text Available Reinforcement learning models have been influential in understanding many aspects of basal ganglia function, from reward prediction to action selection. Time plays an important role in these models, but there is still no theoretical consensus about what kind of time representation is used by the basal ganglia. We review several theoretical accounts and their supporting evidence. We then discuss the relationship between reinforcement learning models and the timing mechanisms that have been attributed to the basal ganglia. We hypothesize that a single computational system may underlie both reinforcement learning and interval timing—the perception of duration in the range of seconds to hours. This hypothesis, which extends earlier models by incorporating a time-sensitive action selection mechanism, may have important implications for understanding disorders like Parkinson's disease in which both decision making and timing are impaired.

  5. Students Learn Programming Faster through Robotic Simulation

    Science.gov (United States)

    Liu, Allison; Newsom, Jeff; Schunn, Chris; Shoop, Robin

    2013-01-01

    Schools everywhere are using robotics education to engage kids in applied science, technology, engineering, and mathematics (STEM) activities, but teaching programming can be challenging due to lack of resources. This article reports on using Robot Virtual Worlds (RVW) and curriculum available on the Internet to teach robot programming. It also…

  6. Evolutionary online behaviour learning and adaptation in real robots.

    Science.gov (United States)

    Silva, Fernando; Correia, Luís; Christensen, Anders Lyhne

    2017-07-01

    Online evolution of behavioural control on real robots is an open-ended approach to autonomous learning and adaptation: robots have the potential to automatically learn new tasks and to adapt to changes in environmental conditions, or to failures in sensors and/or actuators. However, studies have so far almost exclusively been carried out in simulation because evolution in real hardware has required several days or weeks to produce capable robots. In this article, we successfully evolve neural network-based controllers in real robotic hardware to solve two single-robot tasks and one collective robotics task. Controllers are evolved either from random solutions or from solutions pre-evolved in simulation. In all cases, capable solutions are found in a timely manner (1 h or less). Results show that more accurate simulations may lead to higher-performing controllers, and that completing the optimization process in real robots is meaningful, even if solutions found in simulation differ from solutions in reality. We furthermore demonstrate for the first time the adaptive capabilities of online evolution in real robotic hardware, including robots able to overcome faults injected in the motors of multiple units simultaneously, and to modify their behaviour in response to changes in the task requirements. We conclude by assessing the contribution of each algorithmic component on the performance of the underlying evolutionary algorithm.

  7. Adaptive Trajectory Tracking Control using Reinforcement Learning for Quadrotor

    Directory of Open Access Journals (Sweden)

    Wenjie Lou

    2016-02-01

    Full Text Available Inaccurate system parameters and unpredicted external disturbances affect the performance of non-linear controllers. In this paper, a new adaptive control algorithm under the reinforcement framework is proposed to stabilize a quadrotor helicopter. Based on a command-filtered non-linear control algorithm, adaptive elements are added and learned by policy-search methods. To predict the inaccurate system parameters, a new kernel-based regression learning method is provided. In addition, Policy learning by Weighting Exploration with the Returns (PoWER and Return Weighted Regression (RWR are utilized to learn the appropriate parameters for adaptive elements in order to cancel the effect of external disturbance. Furthermore, numerical simulations under several conditions are performed, and the ability of adaptive trajectory-tracking control with reinforcement learning are demonstrated.

  8. The Computational Development of Reinforcement Learning during Adolescence.

    Directory of Open Access Journals (Sweden)

    Stefano Palminteri

    2016-06-01

    Full Text Available Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed. Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback and a value contextualisation module (enabling symmetrical reward and punishment learning. Unlike adults, adolescent performance did not benefit from counterfactual (complete feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.

  9. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

    OpenAIRE

    He, Frank S.; Liu, Yang; Schwing, Alexander G.; Peng, Jian

    2016-01-01

    We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and...

  10. Interactive language learning by robots: the transition from babbling to word forms.

    Science.gov (United States)

    Lyon, Caroline; Nehaniv, Chrystopher L; Saunders, Joe

    2012-01-01

    The advent of humanoid robots has enabled a new approach to investigating the acquisition of language, and we report on the development of robots able to acquire rudimentary linguistic skills. Our work focuses on early stages analogous to some characteristics of a human child of about 6 to 14 months, the transition from babbling to first word forms. We investigate one mechanism among many that may contribute to this process, a key factor being the sensitivity of learners to the statistical distribution of linguistic elements. As well as being necessary for learning word meanings, the acquisition of anchor word forms facilitates the segmentation of an acoustic stream through other mechanisms. In our experiments some salient one-syllable word forms are learnt by a humanoid robot in real-time interactions with naive participants. Words emerge from random syllabic babble through a learning process based on a dialogue between the robot and the human participant, whose speech is perceived by the robot as a stream of phonemes. Numerous ways of representing the speech as syllabic segments are possible. Furthermore, the pronunciation of many words in spontaneous speech is variable. However, in line with research elsewhere, we observe that salient content words are more likely than function words to have consistent canonical representations; thus their relative frequency increases, as does their influence on the learner. Variable pronunciation may contribute to early word form acquisition. The importance of contingent interaction in real-time between teacher and learner is reflected by a reinforcement process, with variable success. The examination of individual cases may be more informative than group results. Nevertheless, word forms are usually produced by the robot after a few minutes of dialogue, employing a simple, real-time, frequency dependent mechanism. This work shows the potential of human-robot interaction systems in studies of the dynamics of early language

  11. Reinforcement and inference in cross-situational word learning.

    Science.gov (United States)

    Tilles, Paulo F C; Fontanari, José F

    2013-01-01

    Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation) inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only) reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.

  12. Learning Spatial Object Localization from Vision on a Humanoid Robot

    Directory of Open Access Journals (Sweden)

    Jürgen Leitner

    2012-12-01

    Full Text Available We present a combined machine learning and computer vision approach for robots to localize objects. It allows our iCub humanoid to quickly learn to provide accurate 3D position estimates (in the centimetre range of objects seen. Biologically inspired approaches, such as Artificial Neural Networks (ANN and Genetic Programming (GP, are trained to provide these position estimates using the two cameras and the joint encoder readings. No camera calibration or explicit knowledge of the robot's kinematic model is needed. We find that ANN and GP are not just faster and have lower complexity than traditional techniques, but also learn without the need for extensive calibration procedures. In addition, the approach is localizing objects robustly, when placed in the robot's workspace at arbitrary positions, even while the robot is moving its torso, head and eyes.

  13. Flow Navigation by Smart Microswimmers via Reinforcement Learning

    Science.gov (United States)

    Colabrese, Simona; Biferale, Luca; Celani, Antonio; Gustavsson, Kristian

    2017-11-01

    We have numerically modeled active particles which are able to acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. We show that those swimmers can learn effective strategies just by experience, using a reinforcement learning algorithm. As an example, we focus on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, exploiting the underlying flow whenever possible. The reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This work paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. ERC AdG NewTURB 339032.

  14. Reinforcement and Systemic Machine Learning for Decision Making

    CERN Document Server

    Kulkarni, Parag

    2012-01-01

    Reinforcement and Systemic Machine Learning for Decision Making There are always difficulties in making machines that learn from experience. Complete information is not always available-or it becomes available in bits and pieces over a period of time. With respect to systemic learning, there is a need to understand the impact of decisions and actions on a system over that period of time. This book takes a holistic approach to addressing that need and presents a new paradigm-creating new learning applications and, ultimately, more intelligent machines. The first book of its kind in this new an

  15. Reinforcement Learning for Online Control of Evolutionary Algorithms

    NARCIS (Netherlands)

    Eiben, A.; Horvath, Mark; Kowalczyk, Wojtek; Schut, Martijn

    2007-01-01

    The research reported in this paper is concerned with assessing the usefulness of reinforcment learning (RL) for on-line calibration of parameters in evolutionary algorithms (EA). We are running an RL procedure and the EA simultaneously and the RL is changing the EA parameters on-the-fly. We

  16. Perceptual learning rules based on reinforcers and attention

    NARCIS (Netherlands)

    Roelfsema, Pieter R.; van Ooyen, Arjen; Watanabe, Takeo

    2010-01-01

    How does the brain learn those visual features that are relevant for behavior? In this article, we focus on two factors that guide plasticity of visual representations. First, reinforcers cause the global release of diffusive neuromodulatory signals that gate plasticity. Second, attentional feedback

  17. Traffic light control by multiagent reinforcement learning systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.; Groen, F.C.A.; Babuška, R.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  18. Traffic Light Control by Multiagent Reinforcement Learning Systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.J.H.M.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  19. Optimal Control via Reinforcement Learning with Symbolic Policy Approximation

    NARCIS (Netherlands)

    Kubalìk, Jiřì; Alibekov, Eduard; Babuska, R.; Dochain, Denis; Henrion, Didier; Peaucelle, Dimitri

    2017-01-01

    Model-based reinforcement learning (RL) algorithms can be used to derive optimal control laws for nonlinear dynamic systems. With continuous-valued state and input variables, RL algorithms have to rely on function approximators to represent the value function and policy mappings. This paper

  20. Joy, Distress, Hope, and Fear in Reinforcement Learning (Extended Abstract)

    NARCIS (Netherlands)

    Jacobs, E.J.; Broekens, J.; Jonker, C.M.

    2014-01-01

    In this paper we present a mapping between joy, distress, hope and fear, and Reinforcement Learning primitives. Joy / distress is a signal that is derived from the RL update signal, while hope/fear is derived from the utility of the current state. Agent-based simulation experiments replicate

  1. Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

    Directory of Open Access Journals (Sweden)

    Takahiro Ezaki

    2016-07-01

    Full Text Available Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.

  2. Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement.

    Science.gov (United States)

    Fernández, Thalía; Bosch-Bayard, Jorge; Harmony, Thalía; Caballero, María I; Díaz-Comas, Lourdes; Galán, Lídice; Ricardo-Garcell, Josefina; Aubert, Eduardo; Otero-Ojeda, Gloria

    2016-03-01

    Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer.

  3. Brain-Machine Interface control of a robot arm using actor-critic rainforcement learning.

    Science.gov (United States)

    Pohlmeyer, Eric A; Mahmoudi, Babak; Geng, Shijia; Prins, Noeline; Sanchez, Justin C

    2012-01-01

    Here we demonstrate how a marmoset monkey can use a reinforcement learning (RL) Brain-Machine Interface (BMI) to effectively control the movements of a robot arm for a reaching task. In this work, an actor-critic RL algorithm used neural ensemble activity in the monkey's motor cortext to control the robot movements during a two-target decision task. This novel approach to decoding offers unique advantages for BMI control applications. Compared to supervised learning decoding methods, the actor-critic RL algorithm does not require an explicit set of training data to create a static control model, but rather it incrementally adapts the model parameters according to its current performance, in this case requiring only a very basic feedback signal. We show how this algorithm achieved high performance when mapping the monkey's neural states (94%) to robot actions, and only needed to experience a few trials before obtaining accurate real-time control of the robot arm. Since RL methods responsively adapt and adjust their parameters, they can provide a method to create BMIs that are robust against perturbations caused by changes in either the neural input space or the output actions they generate under different task requirements or goals.

  4. Project InterActions: A Multigenerational Robotic Learning Environment

    Science.gov (United States)

    Bers, Marina U.

    2007-12-01

    This paper presents Project InterActions, a series of 5-week workshops in which very young learners (4- to 7-year-old children) and their parents come together to build and program a personally meaningful robotic project in the context of a multigenerational robotics-based community of practice. The goal of these family workshops is to teach both parents and children about the mechanical and programming aspects involved in robotics, as well as to initiate them in a learning trajectory with and about technology. Results from this project address different ways in which parents and children learn together and provide insights into how to develop educational interventions that would educate parents, as well as children, in new domains of knowledge and skills such as robotics and new technologies.

  5. Structure identification in fuzzy inference using reinforcement learning

    Science.gov (United States)

    Berenji, Hamid R.; Khedkar, Pratap

    1993-01-01

    In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.

  6. A multiplicative reinforcement learning model capturing learning dynamics and interindividual variability in mice

    OpenAIRE

    Bathellier, Brice; Tee, Sui Poh; Hrovat, Christina; Rumpel, Simon

    2013-01-01

    Learning speed can strongly differ across individuals. This is seen in humans and animals. Here, we measured learning speed in mice performing a discrimination task and developed a theoretical model based on the reinforcement learning framework to account for differences between individual mice. We found that, when using a multiplicative learning rule, the starting connectivity values of the model strongly determine the shape of learning curves. This is in contrast to current learning models ...

  7. Distributed Economic Dispatch in Microgrids Based on Cooperative Reinforcement Learning.

    Science.gov (United States)

    Liu, Weirong; Zhuang, Peng; Liang, Hao; Peng, Jun; Huang, Zhiwu; Weirong Liu; Peng Zhuang; Hao Liang; Jun Peng; Zhiwu Huang; Liu, Weirong; Liang, Hao; Peng, Jun; Zhuang, Peng; Huang, Zhiwu

    2018-06-01

    Microgrids incorporated with distributed generation (DG) units and energy storage (ES) devices are expected to play more and more important roles in the future power systems. Yet, achieving efficient distributed economic dispatch in microgrids is a challenging issue due to the randomness and nonlinear characteristics of DG units and loads. This paper proposes a cooperative reinforcement learning algorithm for distributed economic dispatch in microgrids. Utilizing the learning algorithm can avoid the difficulty of stochastic modeling and high computational complexity. In the cooperative reinforcement learning algorithm, the function approximation is leveraged to deal with the large and continuous state spaces. And a diffusion strategy is incorporated to coordinate the actions of DG units and ES devices. Based on the proposed algorithm, each node in microgrids only needs to communicate with its local neighbors, without relying on any centralized controllers. Algorithm convergence is analyzed, and simulations based on real-world meteorological and load data are conducted to validate the performance of the proposed algorithm.

  8. Knowledge transfer for learning robot models via local procrustes analysis

    CSIR Research Space (South Africa)

    Makondo, N

    2015-11-01

    Full Text Available Learning of robot kinematic and dynamic models from data has attracted much interest recently as an alternative to manually defined models. However, the amount of data required to learn these models becomes large when the number of degrees...

  9. Learning feedforward controller for a mobile robot vehicle

    NARCIS (Netherlands)

    Starrenburg, J.G.; Starrenburg, J.G.; van Luenen, W.T.C.; van Luenen, W.T.C.; Oelen, W.; Oelen, W.; van Amerongen, J.

    1996-01-01

    This paper describes the design and realisation of an on-line learning posetracking controller for a three-wheeled mobile robot vehicle. The controller consists of two components. The first is a constant-gain feedback component, designed on the basis of a second-order model. The second is a learning

  10. Software for project-based learning of robot motion planning

    Science.gov (United States)

    Moll, Mark; Bordeaux, Janice; Kavraki, Lydia E.

    2013-12-01

    Motion planning is a core problem in robotics concerned with finding feasible paths for a given robot. Motion planning algorithms perform a search in the high-dimensional continuous space of robot configurations and exemplify many of the core algorithmic concepts of search algorithms and associated data structures. Motion planning algorithms can be explained in a simplified two-dimensional setting, but this masks many of the subtleties and complexities of the underlying problem. We have developed software for project-based learning of motion planning that enables deep learning. The projects that we have developed allow advanced undergraduate students and graduate students to reflect on the performance of existing textbook algorithms and their own variations on such algorithms. Formative assessment has been conducted at three institutions. The core of the software used for this teaching module is also used within the Robot Operating System, a widely adopted platform by the robotics research community. This allows for transfer of knowledge and skills to robotics research projects involving a large variety robot hardware platforms.

  11. Optimal and Autonomous Control Using Reinforcement Learning: A Survey.

    Science.gov (United States)

    Kiumarsi, Bahare; Vamvoudakis, Kyriakos G; Modares, Hamidreza; Lewis, Frank L

    2018-06-01

    This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.

  12. Challenges in the Verification of Reinforcement Learning Algorithms

    Science.gov (United States)

    Van Wesel, Perry; Goodloe, Alwyn E.

    2017-01-01

    Machine learning (ML) is increasingly being applied to a wide array of domains from search engines to autonomous vehicles. These algorithms, however, are notoriously complex and hard to verify. This work looks at the assumptions underlying machine learning algorithms as well as some of the challenges in trying to verify ML algorithms. Furthermore, we focus on the specific challenges of verifying reinforcement learning algorithms. These are highlighted using a specific example. Ultimately, we do not offer a solution to the complex problem of ML verification, but point out possible approaches for verification and interesting research opportunities.

  13. Amygdala and ventral striatum make distinct contributions to reinforcement learning

    Science.gov (United States)

    Costa, Vincent D.; Monte, Olga Dal; Lucas, Daniel R.; Murray, Elisabeth A.; Averbeck, Bruno B.

    2016-01-01

    Summary Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with a RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys’ choice reaction times, which emphasized a speed-accuracy tradeoff that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL. PMID:27720488

  14. A Neuro-Control Design Based on Fuzzy Reinforcement Learning

    DEFF Research Database (Denmark)

    Katebi, S.D.; Blanke, M.

    This paper describes a neuro-control fuzzy critic design procedure based on reinforcement learning. An important component of the proposed intelligent control configuration is the fuzzy credit assignment unit which acts as a critic, and through fuzzy implications provides adjustment mechanisms....... The fuzzy credit assignment unit comprises a fuzzy system with the appropriate fuzzification, knowledge base and defuzzification components. When an external reinforcement signal (a failure signal) is received, sequences of control actions are evaluated and modified by the action applier unit. The desirable...... ones instruct the neuro-control unit to adjust its weights and are simultaneously stored in the memory unit during the training phase. In response to the internal reinforcement signal (set point threshold deviation), the stored information is retrieved by the action applier unit and utilized for re...

  15. Projective Simulation compared to reinforcement learning

    OpenAIRE

    Bjerland, Øystein Førsund

    2015-01-01

    This thesis explores the model of projective simulation (PS), a novel approach for an artificial intelligence (AI) agent. The model of PS learns by interacting with the environment it is situated in, and allows for simulating actions before real action is taken. The action selection is based on a random walk through the episodic & compositional memory (ECM), which is a network of clips that represent previous experienced percepts. The network takes percepts as inpu...

  16. Reinforcement Learning Applications to Combat Identification

    Science.gov (United States)

    2017-03-01

    ruleset in an effort to mimic simplistic cognitive decision making of a TAO/MC and establishes parameters for the experimentation. Also, there is an...the process knowledge and decision - making abilities of the human decision maker. “[A] cognitive architecture provides the fixed processes and...have bearing on decisions to affect the learning rate in an operational implementation. 3. Cognitive Functions in CID The translation of the

  17. Reinforcement learning agents providing advice in complex video games

    Science.gov (United States)

    Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa

    2014-01-01

    This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

  18. Speeding up the learning of robot kinematics through function decomposition.

    Science.gov (United States)

    Ruiz de Angulo, Vicente; Torras, Carme

    2005-11-01

    The main drawback of using neural networks or other example-based learning procedures to approximate the inverse kinematics (IK) of robot arms is the high number of training samples (i.e., robot movements) required to attain an acceptable precision. We propose here a trick, valid for most industrial robots, that greatly reduces the number of movements needed to learn or relearn the IK to a given accuracy. This trick consists in expressing the IK as a composition of learnable functions, each having half the dimensionality of the original mapping. Off-line and on-line training schemes to learn these component functions are also proposed. Experimental results obtained by using nearest neighbors and parameterized self-organizing map, with and without the decomposition, show that the time savings granted by the proposed scheme grow polynomially with the precision required.

  19. Reinforcement learning for optimal control of low exergy buildings

    International Nuclear Information System (INIS)

    Yang, Lei; Nagy, Zoltan; Goffin, Philippe; Schlueter, Arno

    2015-01-01

    Highlights: • Implementation of reinforcement learning control for LowEx Building systems. • Learning allows adaptation to local environment without prior knowledge. • Presentation of reinforcement learning control for real-life applications. • Discussion of the applicability for real-life situations. - Abstract: Over a third of the anthropogenic greenhouse gas (GHG) emissions stem from cooling and heating buildings, due to their fossil fuel based operation. Low exergy building systems are a promising approach to reduce energy consumption as well as GHG emissions. They consists of renewable energy technologies, such as PV, PV/T and heat pumps. Since careful tuning of parameters is required, a manual setup may result in sub-optimal operation. A model predictive control approach is unnecessarily complex due to the required model identification. Therefore, in this work we present a reinforcement learning control (RLC) approach. The studied building consists of a PV/T array for solar heat and electricity generation, as well as geothermal heat pumps. We present RLC for the PV/T array, and the full building model. Two methods, Tabular Q-learning and Batch Q-learning with Memory Replay, are implemented with real building settings and actual weather conditions in a Matlab/Simulink framework. The performance is evaluated against standard rule-based control (RBC). We investigated different neural network structures and find that some outperformed RBC already during the learning phase. Overall, every RLC strategy for PV/T outperformed RBC by over 10% after the third year. Likewise, for the full building, RLC outperforms RBC in terms of meeting the heating demand, maintaining the optimal operation temperature and compensating more effectively for ground heat. This allows to reduce engineering costs associated with the setup of these systems, as well as decrease the return-of-invest period, both of which are necessary to create a sustainable, zero-emission building

  20. Pleasurable music affects reinforcement learning according to the listener

    Science.gov (United States)

    Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira

    2013-01-01

    Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875

  1. Multiagent cooperation and competition with deep reinforcement learning.

    Directory of Open Access Journals (Sweden)

    Ardi Tampuu

    Full Text Available Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

  2. Multiagent cooperation and competition with deep reinforcement learning.

    Science.gov (United States)

    Tampuu, Ardi; Matiisen, Tambet; Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

    2017-01-01

    Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

  3. Multiagent cooperation and competition with deep reinforcement learning

    Science.gov (United States)

    Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

    2017-01-01

    Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments. PMID:28380078

  4. Multiagent Reinforcement Learning Dynamic Spectrum Access in Cognitive Radios

    Directory of Open Access Journals (Sweden)

    Wu Chun

    2014-02-01

    Full Text Available A multiuser independent Q-learning method which does not need information interaction is proposed for multiuser dynamic spectrum accessing in cognitive radios. The method adopts self-learning paradigm, in which each CR user performs reinforcement learning only through observing individual performance reward without spending communication resource on information interaction with others. The reward is defined suitably to present channel quality and channel conflict status. The learning strategy of sufficient exploration, preference for good channel, and punishment for channel conflict is designed to implement multiuser dynamic spectrum accessing. In two users two channels scenario, a fast learning algorithm is proposed and the convergence to maximal whole reward is proved. The simulation results show that, with the proposed method, the CR system can obtain convergence of Nash equilibrium with large probability and achieve great performance of whole reward.

  5. Authoring Robot-Assisted Instructional Materials for Improving Learning Performance and Motivation in EFL Classrooms

    Science.gov (United States)

    Hong, Zeng-Wei; Huang, Yueh-Min; Hsu, Marie; Shen, Wei-Wei

    2016-01-01

    Anthropomorphized robots are regarded as beneficial tools in education due to their capabilities of improving teaching effectiveness and learning motivation. Therefore, one major trend of research, known as Robot- Assisted Language Learning (RALL), is trying to develop robots to support teaching and learning English as a foreign language (EFL). As…

  6. Simulation-based optimization parametric optimization techniques and reinforcement learning

    CERN Document Server

    Gosavi, Abhijit

    2003-01-01

    Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning introduces the evolving area of simulation-based optimization. The book's objective is two-fold: (1) It examines the mathematical governing principles of simulation-based optimization, thereby providing the reader with the ability to model relevant real-life problems using these techniques. (2) It outlines the computational technology underlying these methods. Taken together these two aspects demonstrate that the mathematical and computational methods discussed in this book do work. Broadly speaking, the book has two parts: (1) parametric (static) optimization and (2) control (dynamic) optimization. Some of the book's special features are: *An accessible introduction to reinforcement learning and parametric-optimization techniques. *A step-by-step description of several algorithms of simulation-based optimization. *A clear and simple introduction to the methodology of neural networks. *A gentle introduction to converg...

  7. Reinforcement active learning in the vibrissae system: optimal object localization.

    Science.gov (United States)

    Gordon, Goren; Dorfman, Nimrod; Ahissar, Ehud

    2013-01-01

    Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment. Copyright © 2012 Elsevier Ltd. All rights reserved.

  8. Human reinforcement learning subdivides structured action spaces by learning effector-specific values

    OpenAIRE

    Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.

    2009-01-01

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable, due to the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning – such as prediction error signals for action valuation associated with dopamine and the striatum – can cope with this “curse of dimensionality...

  9. Identifying Factors Reinforcing Robotization: Interactive Forces of Employment, Working Hour and Wage

    OpenAIRE

    Joonmo Cho; Jinha Kim

    2018-01-01

    Unlike previous studies on robotization approaching the future based on the cutting-edge technologies and adopting a framework where robotization is considered as an exogenous variable, this study considers that robotization occurs endogenously and uses it as a dependent variable for an objective examination of the effect of robotization on the labor market. To this end, a robotization indicator is created based on the actual number of industrial robots currently deployed in workplaces, and a...

  10. Learning and Chaining of Motor Primitives for Goal-directed Locomotion of a Snake-Like Robot with Screw-Drive Units

    DEFF Research Database (Denmark)

    Chatterjee, Sromona; Nachstedt, Timo; Tamosiunaite, Minija

    2015-01-01

    -directed locomotion for the robot. The behavioural primitives of the robot are generated using a reinforcement learning approach called "Policy Improvement with Path Integrals" (PI2). PI2 is numerically simple and has the ability to deal with high-dimensional systems. Here, PI2 is used to learn the robot’s motor...... controls by finding proper locomotion control parameters, like joint angles and screw-drive unit velocities, in a coordinated manner for different goals. Thus, it is able to generate a large repertoire of motor primitives, which are selectively stored to form a primitive library. The learning process...

  11. Reinforcement learning produces dominant strategies for the Iterated Prisoner's Dilemma.

    Directory of Open Access Journals (Sweden)

    Marc Harper

    Full Text Available We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms. These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.

  12. Intranasal oxytocin enhances socially-reinforced learning in rhesus monkeys

    Directory of Open Access Journals (Sweden)

    Lisa A Parr

    2014-09-01

    Full Text Available There are currently no drugs approved for the treatment of social deficits associated with autism spectrum disorders (ASD. One hypothesis for these deficits is that individuals with ASD lack the motivation to attend to social cues because those cues are not implicitly rewarding. Therefore, any drug that could enhance the rewarding quality of social stimuli could have a profound impact on the treatment of ASD, and other social disorders. Oxytocin (OT is a neuropeptide that has been effective in enhancing social cognition and social reward in humans. The present study examined the ability of OT to selectively enhance learning after social compared to nonsocial reward in rhesus monkeys, an important species for modeling the neurobiology of social behavior in humans. Monkeys were required to learn an implicit visual matching task after receiving either intranasal (IN OT or Placebo (saline. Correct trials were rewarded with the presentation of positive and negative social (play faces/threat faces or nonsocial (banana/cage locks stimuli, plus food. Incorrect trials were not rewarded. Results demonstrated a strong effect of socially-reinforced learning, monkeys’ performed significantly better when reinforced with social versus nonsocial stimuli. Additionally, socially-reinforced learning was significantly better and occurred faster after IN-OT compared to placebo treatment. Performance in the IN-OT, but not Placebo, condition was also significantly better when the reinforcement stimuli were emotionally positive compared to negative facial expressions. These data support the hypothesis that OT may function to enhance prosocial behavior in primates by increasing the rewarding quality of emotionally positive, social compared to emotionally negative or nonsocial images. These data also support the use of the rhesus monkey as a model for exploring the neurobiological basis of social behavior and its impairment.

  13. Reinforcement learning produces dominant strategies for the Iterated Prisoner's Dilemma.

    Science.gov (United States)

    Harper, Marc; Knight, Vincent; Jones, Martin; Koutsovoulos, Georgios; Glynatsi, Nikoleta E; Campbell, Owen

    2017-01-01

    We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.

  14. A learning-based semi-autonomous controller for robotic exploration of unknown disaster scenes while searching for victims.

    Science.gov (United States)

    Doroodgar, Barzin; Liu, Yugang; Nejat, Goldie

    2014-12-01

    Semi-autonomous control schemes can address the limitations of both teleoperation and fully autonomous robotic control of rescue robots in disaster environments by allowing a human operator to cooperate and share such tasks with a rescue robot as navigation, exploration, and victim identification. In this paper, we present a unique hierarchical reinforcement learning-based semi-autonomous control architecture for rescue robots operating in cluttered and unknown urban search and rescue (USAR) environments. The aim of the controller is to enable a rescue robot to continuously learn from its own experiences in an environment in order to improve its overall performance in exploration of unknown disaster scenes. A direction-based exploration technique is integrated in the controller to expand the search area of the robot via the classification of regions and the rubble piles within these regions. Both simulations and physical experiments in USAR-like environments verify the robustness of the proposed HRL-based semi-autonomous controller to unknown cluttered scenes with different sizes and varying types of configurations.

  15. Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning.

    Science.gov (United States)

    Mkrtchian, Anahit; Aylward, Jessica; Dayan, Peter; Roiser, Jonathan P; Robinson, Oliver J

    2017-10-01

    Serious and debilitating symptoms of anxiety are the most common mental health problem worldwide, accounting for around 5% of all adult years lived with disability in the developed world. Avoidance behavior-avoiding social situations for fear of embarrassment, for instance-is a core feature of such anxiety. However, as for many other psychiatric symptoms the biological mechanisms underlying avoidance remain unclear. Reinforcement learning models provide formal and testable characterizations of the mechanisms of decision making; here, we examine avoidance in these terms. A total of 101 healthy participants and individuals with mood and anxiety disorders completed an approach-avoidance go/no-go task under stress induced by threat of unpredictable shock. We show an increased reliance in the mood and anxiety group on a parameter of our reinforcement learning model that characterizes a prepotent (pavlovian) bias to withhold responding in the face of negative outcomes. This was particularly the case when the mood and anxiety group was under stress. This formal description of avoidance within the reinforcement learning framework provides a new means of linking clinical symptoms with biophysically plausible models of neural circuitry and, as such, takes us closer to a mechanistic understanding of mood and anxiety disorders. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  16. Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters.

    Science.gov (United States)

    Khamassi, Mehdi; Enel, Pierre; Dominey, Peter Ford; Procyk, Emmanuel

    2013-01-01

    Converging evidence suggest that the medial prefrontal cortex (MPFC) is involved in feedback categorization, performance monitoring, and task monitoring, and may contribute to the online regulation of reinforcement learning (RL) parameters that would affect decision-making processes in the lateral prefrontal cortex (LPFC). Previous neurophysiological experiments have shown MPFC activities encoding error likelihood, uncertainty, reward volatility, as well as neural responses categorizing different types of feedback, for instance, distinguishing between choice errors and execution errors. Rushworth and colleagues have proposed that the involvement of MPFC in tracking the volatility of the task could contribute to the regulation of one of RL parameters called the learning rate. We extend this hypothesis by proposing that MPFC could contribute to the regulation of other RL parameters such as the exploration rate and default action values in case of task shifts. Here, we analyze the sensitivity to RL parameters of behavioral performance in two monkey decision-making tasks, one with a deterministic reward schedule and the other with a stochastic one. We show that there exist optimal parameter values specific to each of these tasks, that need to be found for optimal performance and that are usually hand-tuned in computational models. In contrast, automatic online regulation of these parameters using some heuristics can help producing a good, although non-optimal, behavioral performance in each task. We finally describe our computational model of MPFC-LPFC interaction used for online regulation of the exploration rate and its application to a human-robot interaction scenario. There, unexpected uncertainties are produced by the human introducing cued task changes or by cheating. The model enables the robot to autonomously learn to reset exploration in response to such uncertain cues and events. The combined results provide concrete evidence specifying how prefrontal

  17. Structured Kernel Subspace Learning for Autonomous Robot Navigation.

    Science.gov (United States)

    Kim, Eunwoo; Choi, Sungjoon; Oh, Songhwai

    2018-02-14

    This paper considers two important problems for autonomous robot navigation in a dynamic environment, where the goal is to predict pedestrian motion and control a robot with the prediction for safe navigation. While there are several methods for predicting the motion of a pedestrian and controlling a robot to avoid incoming pedestrians, it is still difficult to safely navigate in a dynamic environment due to challenges, such as the varying quality and complexity of training data with unwanted noises. This paper addresses these challenges simultaneously by proposing a robust kernel subspace learning algorithm based on the recent advances in nuclear-norm and l 1 -norm minimization. We model the motion of a pedestrian and the robot controller using Gaussian processes. The proposed method efficiently approximates a kernel matrix used in Gaussian process regression by learning low-rank structured matrix (with symmetric positive semi-definiteness) to find an orthogonal basis, which eliminates the effects of erroneous and inconsistent data. Based on structured kernel subspace learning, we propose a robust motion model and motion controller for safe navigation in dynamic environments. We evaluate the proposed robust kernel learning in various tasks, including regression, motion prediction, and motion control problems, and demonstrate that the proposed learning-based systems are robust against outliers and outperform existing regression and navigation methods.

  18. Effect of Robotics-Enhanced Inquiry-Based Learning in Elementary Science Education in South Korea

    Science.gov (United States)

    Park, Jungho

    2015-01-01

    Much research has been conducted in educational robotics, a new instructional technology, for K-12 education. However, there are arguments on the effect of robotics and limited empirical evidence to investigate the impact of robotics in science learning. Also most robotics studies were carried in an informal educational setting. This study…

  19. Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play

    NARCIS (Netherlands)

    van der Ree, Michiel; Wiering, Marco

    2013-01-01

    This paper compares three strategies in using reinforcement learning algorithms to let an artificial agent learnto play the game of Othello. The three strategies that are compared are: Learning by self-play, learning from playing against a fixed opponent, and learning from playing against a fixed

  20. Spatial Ability Learning through Educational Robotics

    Science.gov (United States)

    Julià, Carme; Antolí, Juan Òscar

    2016-01-01

    Several authors insist on the importance of students' acquisition of spatial abilities and visualization in order to have academic success in areas such as science, technology or engineering. This paper proposes to discuss and analyse the use of educational robotics to develop spatial abilities in 12 year old students. First of all, a course to…

  1. Memetic Engineering as a Basis for Learning in Robotic Communities

    Science.gov (United States)

    Truszkowski, Walter F.; Rouff, Christopher; Akhavannik, Mohammad H.

    2014-01-01

    This paper represents a new contribution to the growing literature on memes. While most memetic thought has been focused on its implications on humans, this paper speculates on the role that memetics can have on robotic communities. Though speculative, the concepts are based on proven advanced multi agent technology work done at NASA - Goddard Space Flight Center and Lockheed Martin. The paper is composed of the following sections : 1) An introductory section which gently leads the reader into the realm of memes. 2) A section on memetic engineering which addresses some of the central issues with robotic learning via memes. 3) A section on related work which very concisely identifies three other areas of memetic applications, i.e., news, psychology, and the study of human behaviors. 4) A section which discusses the proposed approach for realizing memetic behaviors in robots and robotic communities. 5) A section which presents an exploration scenario for a community of robots working on Mars. 6) A final section which discusses future research which will be required to realize a comprehensive science of robotic memetics.

  2. Learning probabilistic features for robotic navigation using laser sensors.

    Directory of Open Access Journals (Sweden)

    Fidel Aznar

    Full Text Available SLAM is a popular task used by robots and autonomous vehicles to build a map of an unknown environment and, at the same time, to determine their location within the map. This paper describes a SLAM-based, probabilistic robotic system able to learn the essential features of different parts of its environment. Some previous SLAM implementations had computational complexities ranging from O(Nlog(N to O(N(2, where N is the number of map features. Unlike these methods, our approach reduces the computational complexity to O(N by using a model to fuse the information from the sensors after applying the Bayesian paradigm. Once the training process is completed, the robot identifies and locates those areas that potentially match the sections that have been previously learned. After the training, the robot navigates and extracts a three-dimensional map of the environment using a single laser sensor. Thus, it perceives different sections of its world. In addition, in order to make our system able to be used in a low-cost robot, low-complexity algorithms that can be easily implemented on embedded processors or microcontrollers are used.

  3. Learning probabilistic features for robotic navigation using laser sensors.

    Science.gov (United States)

    Aznar, Fidel; Pujol, Francisco A; Pujol, Mar; Rizo, Ramón; Pujol, María-José

    2014-01-01

    SLAM is a popular task used by robots and autonomous vehicles to build a map of an unknown environment and, at the same time, to determine their location within the map. This paper describes a SLAM-based, probabilistic robotic system able to learn the essential features of different parts of its environment. Some previous SLAM implementations had computational complexities ranging from O(Nlog(N)) to O(N(2)), where N is the number of map features. Unlike these methods, our approach reduces the computational complexity to O(N) by using a model to fuse the information from the sensors after applying the Bayesian paradigm. Once the training process is completed, the robot identifies and locates those areas that potentially match the sections that have been previously learned. After the training, the robot navigates and extracts a three-dimensional map of the environment using a single laser sensor. Thus, it perceives different sections of its world. In addition, in order to make our system able to be used in a low-cost robot, low-complexity algorithms that can be easily implemented on embedded processors or microcontrollers are used.

  4. Learning alternative movement coordination patterns using reinforcement feedback.

    Science.gov (United States)

    Lin, Tzu-Hsiang; Denomme, Amber; Ranganathan, Rajiv

    2018-05-01

    One of the characteristic features of the human motor system is redundancy-i.e., the ability to achieve a given task outcome using multiple coordination patterns. However, once participants settle on using a specific coordination pattern, the process of learning to use a new alternative coordination pattern to perform the same task is still poorly understood. Here, using two experiments, we examined this process of how participants shift from one coordination pattern to another using different reinforcement schedules. Participants performed a virtual reaching task, where they moved a cursor to different targets positioned on the screen. Our goal was to make participants use a coordination pattern with greater trunk motion, and to this end, we provided reinforcement by making the cursor disappear if the trunk motion during the reach did not cross a specified threshold value. In Experiment 1, we compared two reinforcement schedules in two groups of participants-an abrupt group, where the threshold was introduced immediately at the beginning of practice; and a gradual group, where the threshold was introduced gradually with practice. Results showed that both abrupt and gradual groups were effective in shifting their coordination patterns to involve greater trunk motion, but the abrupt group showed greater retention when the reinforcement was removed. In Experiment 2, we examined the basis of this advantage in the abrupt group using two additional control groups. Results showed that the advantage of the abrupt group was because of a greater number of practice trials with the desired coordination pattern. Overall, these results show that reinforcement can be successfully used to shift coordination patterns, which has potential in the rehabilitation of movement disorders.

  5. Fast Conflict Resolution Based on Reinforcement Learning in Multi-agent System

    Institute of Scientific and Technical Information of China (English)

    PIAOSonghao; HONGBingrong; CHUHaitao

    2004-01-01

    In multi-agent system where each agen thas a different goal (even the team of agents has the same goal), agents must be able to resolve conflicts arising in the process of achieving their goal. Many researchers presented methods for conflict resolution, e.g., Reinforcement learning (RL), but the conventional RL requires a large computation cost because every agent must learn, at the same time the overlap of actions selected by each agent results in local conflict. Therefore in this paper, we propose a novel method to solve these problems. In order to deal with the conflict within the multi-agent system, the concept of potential field function based Action selection priority level (ASPL) is brought forward. In this method, all kinds of environment factor that may have influence on the priority are effectively computed with the potential field function. So the priority to access the local resource can be decided rapidly. By avoiding the complex coordination mechanism used in general multi-agent system, the conflict in multi-agent system is settled more efficiently. Our system consists of RL with ASPL module and generalized rules module. Using ASPL, RL module chooses a proper cooperative behavior, and generalized rule module can accelerate the learning process. By applying the proposed method to Robot Soccer, the learning process can be accelerated. The results of simulation and real experiments indicate the effectiveness of the method.

  6. Manufacturing Scheduling Using Colored Petri Nets and Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Maria Drakaki

    2017-02-01

    Full Text Available Agent-based intelligent manufacturing control systems are capable to efficiently respond and adapt to environmental changes. Manufacturing system adaptation and evolution can be addressed with learning mechanisms that increase the intelligence of agents. In this paper a manufacturing scheduling method is presented based on Timed Colored Petri Nets (CTPNs and reinforcement learning (RL. CTPNs model the manufacturing system and implement the scheduling. In the search for an optimal solution a scheduling agent uses RL and in particular the Q-learning algorithm. A warehouse order-picking scheduling is presented as a case study to illustrate the method. The proposed scheduling method is compared to existing methods. Simulation and state space results are used to evaluate performance and identify system properties.

  7. An Improved Reinforcement Learning System Using Affective Factors

    Directory of Open Access Journals (Sweden)

    Takashi Kuremoto

    2013-07-01

    Full Text Available As a powerful and intelligent machine learning method, reinforcement learning (RL has been widely used in many fields such as game theory, adaptive control, multi-agent system, nonlinear forecasting, and so on. The main contribution of this technique is its exploration and exploitation approaches to find the optimal solution or semi-optimal solution of goal-directed problems. However, when RL is applied to multi-agent systems (MASs, problems such as “curse of dimension”, “perceptual aliasing problem”, and uncertainty of the environment constitute high hurdles to RL. Meanwhile, although RL is inspired by behavioral psychology and reward/punishment from the environment is used, higher mental factors such as affects, emotions, and motivations are rarely adopted in the learning procedure of RL. In this paper, to challenge agents learning in MASs, we propose a computational motivation function, which adopts two principle affective factors “Arousal” and “Pleasure” of Russell’s circumplex model of affects, to improve the learning performance of a conventional RL algorithm named Q-learning (QL. Compared with the conventional QL, computer simulations of pursuit problems with static and dynamic preys were carried out, and the results showed that the proposed method results in agents having a faster and more stable learning performance.

  8. Experiential Learning of Electronics Subject Matter in Middle School Robotics Courses

    Science.gov (United States)

    Rihtaršic, David; Avsec, Stanislav; Kocijancic, Slavko

    2016-01-01

    The purpose of this paper is to investigate whether the experiential learning of electronics subject matter is effective in the middle school open learning of robotics. Electronics is often ignored in robotics courses. Since robotics courses are typically comprised of computer-related subjects, and mechanical and electrical engineering, these…

  9. Future Challenges of Robotics and Artificial Intelligence in Nursing: What Can We Learn from Monsters in Popular Culture?

    Science.gov (United States)

    Erikson, Henrik; Salzmann-Erikson, Martin

    It is highly likely that artificial intelligence (AI) will be implemented in nursing robotics in various forms, both in medical and surgical robotic instruments, but also as different types of droids and humanoids, physical reinforcements, and also animal/pet robots. Exploring and discussing AI and robotics in nursing and health care before these tools become commonplace is of great importance. We propose that monsters in popular culture might be studied with the hope of learning about situations and relationships that generate empathic capacities in their monstrous existences. The aim of the article is to introduce the theoretical framework and assumptions behind this idea. Both robots and monsters are posthuman creations. The knowledge we present here gives ideas about how nursing science can address the postmodern, technologic, and global world to come. Monsters therefore serve as an entrance to explore technologic innovations such as AI. Analyzing when and why monsters step out of character can provide important insights into the conceptualization of caring and nursing as a science, which is important for discussing these empathic protocols, as well as more general insight into human knowledge. The relationship between caring, monsters, robotics, and AI is not as farfetched as it might seem at first glance.

  10. Optimal control in microgrid using multi-agent reinforcement learning.

    Science.gov (United States)

    Li, Fu-Dong; Wu, Min; He, Yong; Chen, Xin

    2012-11-01

    This paper presents an improved reinforcement learning method to minimize electricity costs on the premise of satisfying the power balance and generation limit of units in a microgrid with grid-connected mode. Firstly, the microgrid control requirements are analyzed and the objective function of optimal control for microgrid is proposed. Then, a state variable "Average Electricity Price Trend" which is used to express the most possible transitions of the system is developed so as to reduce the complexity and randomicity of the microgrid, and a multi-agent architecture including agents, state variables, action variables and reward function is formulated. Furthermore, dynamic hierarchical reinforcement learning, based on change rate of key state variable, is established to carry out optimal policy exploration. The analysis shows that the proposed method is beneficial to handle the problem of "curse of dimensionality" and speed up learning in the unknown large-scale world. Finally, the simulation results under JADE (Java Agent Development Framework) demonstrate the validity of the presented method in optimal control for a microgrid with grid-connected mode. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  11. Robot Learning from Demonstration: A Task-level Planning Approach

    Directory of Open Access Journals (Sweden)

    Staffan Ekvall

    2008-09-01

    Full Text Available In this paper, we deal with the problem of learning by demonstration, task level learning and planning for robotic applications that involve object manipulation. Preprogramming robots for execution of complex domestic tasks such as setting a dinner table is of little use, since the same order of subtasks may not be conceivable in the run time due to the changed state of the world. In our approach, we aim to learn the goal of the task and use a task planner to reach the goal given different initial states of the world. For some tasks, there are underlying constraints that must be fulfille, and knowing just the final goal is not sufficient. We propose two techniques for constraint identification. In the first case, the teacher can directly instruct the system about the underlying constraints. In the second case, the constraints are identified by the robot itself based on multiple observations. The constraints are then considered in the planning phase, allowing the task to be executed without violating any of them. We evaluate our work on a real robot performing pick-and-place tasks.

  12. A Bayesian Developmental Approach to Robotic Goal-Based Imitation Learning.

    Directory of Open Access Journals (Sweden)

    Michael Jae-Yoon Chung

    Full Text Available A fundamental challenge in robotics today is building robots that can learn new skills by observing humans and imitating human actions. We propose a new Bayesian approach to robotic learning by imitation inspired by the developmental hypothesis that children use self-experience to bootstrap the process of intention recognition and goal-based imitation. Our approach allows an autonomous agent to: (i learn probabilistic models of actions through self-discovery and experience, (ii utilize these learned models for inferring the goals of human actions, and (iii perform goal-based imitation for robotic learning and human-robot collaboration. Such an approach allows a robot to leverage its increasing repertoire of learned behaviors to interpret increasingly complex human actions and use the inferred goals for imitation, even when the robot has very different actuators from humans. We demonstrate our approach using two different scenarios: (i a simulated robot that learns human-like gaze following behavior, and (ii a robot that learns to imitate human actions in a tabletop organization task. In both cases, the agent learns a probabilistic model of its own actions, and uses this model for goal inference and goal-based imitation. We also show that the robotic agent can use its probabilistic model to seek human assistance when it recognizes that its inferred actions are too uncertain, risky, or impossible to perform, thereby opening the door to human-robot collaboration.

  13. Reinforcement Learning Based on the Bayesian Theorem for Electricity Markets Decision Support

    DEFF Research Database (Denmark)

    Sousa, Tiago; Pinto, Tiago; Praca, Isabel

    2014-01-01

    This paper presents the applicability of a reinforcement learning algorithm based on the application of the Bayesian theorem of probability. The proposed reinforcement learning algorithm is an advantageous and indispensable tool for ALBidS (Adaptive Learning strategic Bidding System), a multi...

  14. Reinforcement Learning Based Web Service Compositions for Mobile Business

    Science.gov (United States)

    Zhou, Juan; Chen, Shouming

    In this paper, we propose a new solution to Reactive Web Service Composition, via molding with Reinforcement Learning, and introducing modified (alterable) QoS variables into the model as elements in the Markov Decision Process tuple. Moreover, we give an example of Reactive-WSC-based mobile banking, to demonstrate the intrinsic capability of the solution in question of obtaining the optimized service composition, characterized by (alterable) target QoS variable sets with optimized values. Consequently, we come to the conclusion that the solution has decent potentials in boosting customer experiences and qualities of services in Web Services, and those in applications in the whole electronic commerce and business sector.

  15. Space Objects Maneuvering Detection and Prediction via Inverse Reinforcement Learning

    Science.gov (United States)

    Linares, R.; Furfaro, R.

    This paper determines the behavior of Space Objects (SOs) using inverse Reinforcement Learning (RL) to estimate the reward function that each SO is using for control. The approach discussed in this work can be used to analyze maneuvering of SOs from observational data. The inverse RL problem is solved using the Feature Matching approach. This approach determines the optimal reward function that a SO is using while maneuvering by assuming that the observed trajectories are optimal with respect to the SO's own reward function. This paper uses estimated orbital elements data to determine the behavior of SOs in a data-driven fashion.

  16. An iterative learning controller for nonholonomic mobile robots

    International Nuclear Information System (INIS)

    Oriolo, G.; Panzieri, S.; Ulivi, G.

    1998-01-01

    The authors present an iterative learning controller that applies to nonholonomic mobile robots, as well as other systems that can be put in chained form. The learning algorithm exploits the fact that chained-form. The learning algorithm exploits the fact that chained-form systems are linear under piecewise-constant inputs. The proposed control scheme requires the execution of a small number of experiments to drive the system to the desired state in finite time, with nice convergence and robustness properties with respect to modeling inaccuracies as well as disturbances. To avoid the necessity of exactly reinitializing the system at each iteration, the basic method is modified so as to obtain a cyclic controller, by which the system is cyclically steered through an arbitrary sequence of states. As a case study, a carlike mobile robot is considered. Both simulation and experimental results are reported to show the performance of the method

  17. Cognitive-Developmental Learning for a Humanoid Robot: A Caregiver's Gift

    National Research Council Canada - National Science Library

    Arsenio, Artur M

    2004-01-01

    The goal of this work is to build a cognitive system for the humanoid robot, Cog, that exploits human caregivers as catalysts to perceive and learn about actions, objects, scenes, people, and the robot itself...

  18. Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing

    OpenAIRE

    Le, Minh; Fokkens, Antske

    2017-01-01

    Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this paper, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error propagation. Reinforcement learning improves accuracy of both labeled and unlabeled dependencies of the Stanford Neural Dependency Parser, a high performance greedy parser, while maintaining its eff...

  19. Autonomous learning in humanoid robotics through mental imagery.

    Science.gov (United States)

    Di Nuovo, Alessandro G; Marocco, Davide; Di Nuovo, Santo; Cangelosi, Angelo

    2013-05-01

    In this paper we focus on modeling autonomous learning to improve performance of a humanoid robot through a modular artificial neural networks architecture. A model of a neural controller is presented, which allows a humanoid robot iCub to autonomously improve its sensorimotor skills. This is achieved by endowing the neural controller with a secondary neural system that, by exploiting the sensorimotor skills already acquired by the robot, is able to generate additional imaginary examples that can be used by the controller itself to improve the performance through a simulated mental training. Results and analysis presented in the paper provide evidence of the viability of the approach proposed and help to clarify the rational behind the chosen model and its implementation. Copyright © 2012 Elsevier Ltd. All rights reserved.

  20. A Game Theoretic Approach to Swarm Robotics

    Directory of Open Access Journals (Sweden)

    S. N. Givigi

    2006-01-01

    Full Text Available In this article, we discuss some techniques for achieving swarm intelligent robots through the use of traits of personality. Traits of personality are characteristics of each robot that, altogether, define the robot's behaviours. We discuss the use of evolutionary psychology to select a set of traits of personality that will evolve due to a learning process based on reinforcement learning. The use of Game Theory is introduced, and some simulations showing its potential are reported.

  1. From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning.

    Science.gov (United States)

    Elfwing, Stefan; Uchibe, Eiji; Doya, Kenji

    2016-12-01

    Free-energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state and action spaces. However, the FERL method does only really work well with binary, or close to binary, state input, where the number of active states is fewer than the number of non-active states. In the FERL method, the value function is approximated by the negative free energy of a restricted Boltzmann machine (RBM). In our earlier study, we demonstrated that the performance and the robustness of the FERL method can be improved by scaling the free energy by a constant that is related to the size of network. In this study, we propose that RBM function approximation can be further improved by approximating the value function by the negative expected energy (EERL), instead of the negative free energy, as well as being able to handle continuous state input. We validate our proposed method by demonstrating that EERL: (1) outperforms FERL, as well as standard neural network and linear function approximation, for three versions of a gridworld task with high-dimensional image state input; (2) achieves new state-of-the-art results in stochastic SZ-Tetris in both model-free and model-based learning settings; and (3) significantly outperforms FERL and standard neural network function approximation for a robot navigation task with raw and noisy RGB images as state input and a large number of actions. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  2. Human likeness: cognitive and affective factors affecting adoption of robot-assisted learning systems

    Science.gov (United States)

    Yoo, Hosun; Kwon, Ohbyung; Lee, Namyeon

    2016-07-01

    With advances in robot technology, interest in robotic e-learning systems has increased. In some laboratories, experiments are being conducted with humanoid robots as artificial tutors because of their likeness to humans, the rich possibilities of using this type of media, and the multimodal interaction capabilities of these robots. The robot-assisted learning system, a special type of e-learning system, aims to increase the learner's concentration, pleasure, and learning performance dramatically. However, very few empirical studies have examined the effect on learning performance of incorporating humanoid robot technology into e-learning systems or people's willingness to accept or adopt robot-assisted learning systems. In particular, human likeness, the essential characteristic of humanoid robots as compared with conventional e-learning systems, has not been discussed in a theoretical context. Hence, the purpose of this study is to propose a theoretical model to explain the process of adoption of robot-assisted learning systems. In the proposed model, human likeness is conceptualized as a combination of media richness, multimodal interaction capabilities, and para-social relationships; these factors are considered as possible determinants of the degree to which human cognition and affection are related to the adoption of robot-assisted learning systems.

  3. Human reinforcement learning subdivides structured action spaces by learning effector-specific values.

    Science.gov (United States)

    Gershman, Samuel J; Pesaran, Bijan; Daw, Nathaniel D

    2009-10-28

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.

  4. Robotics

    Science.gov (United States)

    Popov, E. P.; Iurevich, E. I.

    The history and the current status of robotics are reviewed, as are the design, operation, and principal applications of industrial robots. Attention is given to programmable robots, robots with adaptive control and elements of artificial intelligence, and remotely controlled robots. The applications of robots discussed include mechanical engineering, cargo handling during transportation and storage, mining, and metallurgy. The future prospects of robotics are briefly outlined.

  5. Reversal Learning Task in Children with Autism Spectrum Disorder: A Robot-Based Approach

    Science.gov (United States)

    Costescu, Cristina A.; Vanderborght, Bram; David, Daniel O.

    2015-01-01

    Children with autism spectrum disorder (ASD) engage in highly perseverative and inflexible behaviours. Technological tools, such as robots, received increased attention as social reinforces and/or assisting tools for improving the performance of children with ASD. The aim of our study is to investigate the role of the robotic toy Keepon in a…

  6. Reinforcement Learning in Distributed Domains: Beyond Team Games

    Science.gov (United States)

    Wolpert, David H.; Sill, Joseph; Turner, Kagan

    2000-01-01

    Distributed search algorithms are crucial in dealing with large optimization problems, particularly when a centralized approach is not only impractical but infeasible. Many machine learning concepts have been applied to search algorithms in order to improve their effectiveness. In this article we present an algorithm that blends Reinforcement Learning (RL) and hill climbing directly, by using the RL signal to guide the exploration step of a hill climbing algorithm. We apply this algorithm to the domain of a constellations of communication satellites where the goal is to minimize the loss of importance weighted data. We introduce the concept of 'ghost' traffic, where correctly setting this traffic induces the satellites to act to optimize the world utility. Our results indicated that the bi-utility search introduced in this paper outperforms both traditional hill climbing algorithms and distributed RL approaches such as team games.

  7. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Science.gov (United States)

    Kato, Ayaka; Morita, Kenji

    2016-10-01

    It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2) value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning

  8. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Directory of Open Access Journals (Sweden)

    Ayaka Kato

    2016-10-01

    Full Text Available It has been suggested that dopamine (DA represents reward-prediction-error (RPE defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1 decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2 value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i slowdown of behavior by post-training blockade of DA signaling, (ii observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems

  9. Reinforcement learning for dpm of embedded visual sensor nodes

    International Nuclear Information System (INIS)

    Khani, U.; Sadhayo, I. H.

    2014-01-01

    This paper proposes a RL (Reinforcement Learning) based DPM (Dynamic Power Management) technique to learn time out policies during a visual sensor node's operation which has multiple power/performance states. As opposed to the widely used static time out policies, our proposed DPM policy which is also referred to as OLTP (Online Learning of Time out Policies), learns to dynamically change the time out decisions in the different node states including the non-operational states. The selection of time out values in different power/performance states of a visual sensing platform is based on the workload estimates derived from a ML-ANN (Multi-Layer Artificial Neural Network) and an objective function given by weighted performance and power parameters. The DPM approach is also able to dynamically adjust the power-performance weights online to satisfy a given constraint of either power consumption or performance. Results show that the proposed learning algorithm explores the power-performance tradeoff with non-stationary workload and outperforms other DPM policies. It also performs the online adjustment of the tradeoff parameters in order to meet a user-specified constraint. (author)

  10. An appraisal of the learning curve in robotic general surgery.

    Science.gov (United States)

    Pernar, Luise I M; Robertson, Faith C; Tavakkoli, Ali; Sheu, Eric G; Brooks, David C; Smink, Douglas S

    2017-11-01

    Robotic-assisted surgery is used with increasing frequency in general surgery for a variety of applications. In spite of this increase in usage, the learning curve is not yet defined. This study reviews the literature on the learning curve in robotic general surgery to inform adopters of the technology. PubMed and EMBASE searches yielded 3690 abstracts published between July 1986 and March 2016. The abstracts were evaluated based on the following inclusion criteria: written in English, reporting original work, focus on general surgery operations, and with explicit statistical methods. Twenty-six full-length articles were included in final analysis. The articles described the learning curves in colorectal (9 articles, 35%), foregut/bariatric (8, 31%), biliary (5, 19%), and solid organ (4, 15%) surgery. Eighteen of 26 (69%) articles report single-surgeon experiences. Time was used as a measure of the learning curve in all studies (100%); outcomes were examined in 10 (38%). In 12 studies (46%), the authors identified three phases of the learning curve. Numbers of cases needed to achieve plateau performance were wide-ranging but overlapping for different kinds of operations: 19-128 cases for colorectal, 8-95 for foregut/bariatric, 20-48 for biliary, and 10-80 for solid organ surgery. Although robotic surgery is increasingly utilized in general surgery, the literature provides few guidelines on the learning curve for adoption. In this heterogeneous sample of reviewed articles, the number of cases needed to achieve plateau performance varies by case type and the learning curve may have multiple phases as surgeons add more complex cases to their case mix with growing experience. Time is the most common determinant for the learning curve. The literature lacks a uniform assessment of outcomes and complications, which would arguably reflect expertise in a more meaningful way than time to perform the operation alone.

  11. Explicit and implicit reinforcement learning across the psychosis spectrum.

    Science.gov (United States)

    Barch, Deanna M; Carter, Cameron S; Gold, James M; Johnson, Sheri L; Kring, Ann M; MacDonald, Angus W; Pizzagalli, Diego A; Ragland, J Daniel; Silverstein, Steven M; Strauss, Milton E

    2017-07-01

    Motivational and hedonic impairments are core features of a variety of types of psychopathology. An important aspect of motivational function is reinforcement learning (RL), including implicit (i.e., outside of conscious awareness) and explicit (i.e., including explicit representations about potential reward associations) learning, as well as both positive reinforcement (learning about actions that lead to reward) and punishment (learning to avoid actions that lead to loss). Here we present data from paradigms designed to assess both positive and negative components of both implicit and explicit RL, examine performance on each of these tasks among individuals with schizophrenia, schizoaffective disorder, and bipolar disorder with psychosis, and examine their relative relationships to specific symptom domains transdiagnostically. None of the diagnostic groups differed significantly from controls on the implicit RL tasks in either bias toward a rewarded response or bias away from a punished response. However, on the explicit RL task, both the individuals with schizophrenia and schizoaffective disorder performed significantly worse than controls, but the individuals with bipolar did not. Worse performance on the explicit RL task, but not the implicit RL task, was related to worse motivation and pleasure symptoms across all diagnostic categories. Performance on explicit RL, but not implicit RL, was related to working memory, which accounted for some of the diagnostic group differences. However, working memory did not account for the relationship of explicit RL to motivation and pleasure symptoms. These findings suggest transdiagnostic relationships across the spectrum of psychotic disorders between motivation and pleasure impairments and explicit RL. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  12. Identifying Factors Reinforcing Robotization: Interactive Forces of Employment, Working Hour and Wage

    Directory of Open Access Journals (Sweden)

    Joonmo Cho

    2018-02-01

    Full Text Available Unlike previous studies on robotization approaching the future based on the cutting-edge technologies and adopting a framework where robotization is considered as an exogenous variable, this study considers that robotization occurs endogenously and uses it as a dependent variable for an objective examination of the effect of robotization on the labor market. To this end, a robotization indicator is created based on the actual number of industrial robots currently deployed in workplaces, and a multiple regression analysis is performed using the robotization indicator and labor variables such as employment, working hours, and wage. The results using the multiple regression considering the triangular relationship of employment–working-hours–wages show that job destruction due to robotization is not too remarkable yet that use. Our results show the complementary relation between employment and robotization, but the substituting relation between working hour and robotization. The results also demonstrate the effects of union, the size of the company and the proportion of production workers and simple labor workers etc. These findings indicate that the degree of robotization may vary with many factors of the labor market. Limitations of this study and implications for future research are also discussed.

  13. Characterizing Reinforcement Learning Methods through Parameterized Learning Problems

    Science.gov (United States)

    2011-06-03

    extraneous. The agent could potentially adapt these representational aspects by applying methods from feature selection ( Kolter and Ng, 2009; Petrik et al...611–616. AAAI Press. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature selection in least-squares temporal difference learning. In A. P

  14. Toward cognitive robotics

    Science.gov (United States)

    Laird, John E.

    2009-05-01

    Our long-term goal is to develop autonomous robotic systems that have the cognitive abilities of humans, including communication, coordination, adapting to novel situations, and learning through experience. Our approach rests on the recent integration of the Soar cognitive architecture with both virtual and physical robotic systems. Soar has been used to develop a wide variety of knowledge-rich agents for complex virtual environments, including distributed training environments and interactive computer games. For development and testing in robotic virtual environments, Soar interfaces to a variety of robotic simulators and a simple mobile robot. We have recently made significant extensions to Soar that add new memories and new non-symbolic reasoning to Soar's original symbolic processing, which should significantly improve Soar abilities for control of robots. These extensions include episodic memory, semantic memory, reinforcement learning, and mental imagery. Episodic memory and semantic memory support the learning and recalling of prior events and situations as well as facts about the world. Reinforcement learning provides the ability of the system to tune its procedural knowledge - knowledge about how to do things. Mental imagery supports the use of diagrammatic and visual representations that are critical to support spatial reasoning. We speculate on the future of unmanned systems and the need for cognitive robotics to support dynamic instruction and taskability.

  15. Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.

    Science.gov (United States)

    Ren, Zhipeng; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Zhipeng Ren; Daoyi Dong; Huaxiong Li; Chunlin Chen; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Ren, Zhipeng

    2018-06-01

    In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.

  16. Learning domain abstractions for long lived robots

    CSIR Research Space (South Africa)

    Rosman, Benjamin S

    2014-06-01

    Full Text Available the ability to continually learn from a lifetime of experience. Key to this is the ability to generalise from experiences and form representations which facilitate faster learning of new tasks, as well as the transfer of knowledge between different situations...

  17. Deep reinforcement learning for automated radiation adaptation in lung cancer.

    Science.gov (United States)

    Tseng, Huan-Hsin; Luo, Yi; Cui, Sunan; Chien, Jen-Tzung; Ten Haken, Randall K; Naqa, Issam El

    2017-12-01

    To investigate deep reinforcement learning (DRL) based on historical treatment plans for developing automated radiation adaptation protocols for nonsmall cell lung cancer (NSCLC) patients that aim to maximize tumor local control at reduced rates of radiation pneumonitis grade 2 (RP2). In a retrospective population of 114 NSCLC patients who received radiotherapy, a three-component neural networks framework was developed for deep reinforcement learning (DRL) of dose fractionation adaptation. Large-scale patient characteristics included clinical, genetic, and imaging radiomics features in addition to tumor and lung dosimetric variables. First, a generative adversarial network (GAN) was employed to learn patient population characteristics necessary for DRL training from a relatively limited sample size. Second, a radiotherapy artificial environment (RAE) was reconstructed by a deep neural network (DNN) utilizing both original and synthetic data (by GAN) to estimate the transition probabilities for adaptation of personalized radiotherapy patients' treatment courses. Third, a deep Q-network (DQN) was applied to the RAE for choosing the optimal dose in a response-adapted treatment setting. This multicomponent reinforcement learning approach was benchmarked against real clinical decisions that were applied in an adaptive dose escalation clinical protocol. In which, 34 patients were treated based on avid PET signal in the tumor and constrained by a 17.2% normal tissue complication probability (NTCP) limit for RP2. The uncomplicated cure probability (P+) was used as a baseline reward function in the DRL. Taking our adaptive dose escalation protocol as a blueprint for the proposed DRL (GAN + RAE + DQN) architecture, we obtained an automated dose adaptation estimate for use at ∼2/3 of the way into the radiotherapy treatment course. By letting the DQN component freely control the estimated adaptive dose per fraction (ranging from 1-5 Gy), the DRL automatically favored dose

  18. "Notice of Violation of IEEE Publication Principles" Multiobjective Reinforcement Learning: A Comprehensive Overview.

    Science.gov (United States)

    Liu, Chunming; Xu, Xin; Hu, Dewen

    2013-04-29

    Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.

  19. Reinforcement Learning for Routing in Cognitive Radio Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    Hasan A. A. Al-Rawi

    2014-01-01

    Full Text Available Cognitive radio (CR enables unlicensed users (or secondary users, SUs to sense for and exploit underutilized licensed spectrum owned by the licensed users (or primary users, PUs. Reinforcement learning (RL is an artificial intelligence approach that enables a node to observe, learn, and make appropriate decisions on action selection in order to maximize network performance. Routing enables a source node to search for a least-cost route to its destination node. While there have been increasing efforts to enhance the traditional RL approach for routing in wireless networks, this research area remains largely unexplored in the domain of routing in CR networks. This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate through simulation. New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing. Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs’ network performance without significantly jeopardizing PUs’ network performance, specifically SUs’ interference to PUs.

  20. Learning to locate an odour source with a mobile robot

    OpenAIRE

    Duckett, T.; Axelsson, M.; Saffiotti, A.

    2001-01-01

    We address the problem of enabling a mobile robot to locate a stationary odour source using an electronic nose constructed from gas sensors. On the hardware side, we use a stereo nose architecture consisting of two parallel chambers, each containing an identical set of sensors. On the software side, we use a recurrent artificial neural network to learn the direction to a stationary source from a time series of sensor readings. This contrasts with previous approaches, that rely on the existenc...

  1. Mobile Robot Navigation Based on Q-Learning Technique

    Directory of Open Access Journals (Sweden)

    Lazhar Khriji

    2011-03-01

    Full Text Available This paper shows how Q-learning approach can be used in a successful way to deal with the problem of mobile robot navigation. In real situations where a large number of obstacles are involved, normal Q-learning approach would encounter two major problems due to excessively large state space. First, learning the Q-values in tabular form may be infeasible because of the excessive amount of memory needed to store the table. Second, rewards in the state space may be so sparse that with random exploration they will only be discovered extremely slowly. In this paper, we propose a navigation approach for mobile robot, in which the prior knowledge is used within Q-learning. We address the issue of individual behavior design using fuzzy logic. The strategy of behaviors based navigation reduces the complexity of the navigation problem by dividing them in small actions easier for design and implementation. The Q-Learning algorithm is applied to coordinate between these behaviors, which make a great reduction in learning convergence times. Simulation and experimental results confirm the convergence to the desired results in terms of saved time and computational resources.

  2. Interactive Rhythm Learning System by Combining Tablet Computers and Robots

    Directory of Open Access Journals (Sweden)

    Chien-Hsing Chou

    2017-03-01

    Full Text Available This study proposes a percussion learning device that combines tablet computers and robots. This device comprises two systems: a rhythm teaching system, in which users can compose and practice rhythms by using a tablet computer, and a robot performance system. First, teachers compose the rhythm training contents on the tablet computer. Then, the learners practice these percussion exercises by using the tablet computer and a small drum set. The teaching system provides a new and user-friendly score editing interface for composing a rhythm exercise. It also provides a rhythm rating function to facilitate percussion training for children and improve the stability of rhythmic beating. To encourage children to practice percussion exercises, a robotic performance system is used to interact with the children; this system can perform percussion exercises for students to listen to and then help them practice the exercise. This interaction enhances children’s interest and motivation to learn and practice rhythm exercises. The results of experimental course and field trials reveal that the proposed system not only increases students’ interest and efficiency in learning but also helps them in understanding musical rhythms through interaction and composing simple rhythms.

  3. Serendipitous offline learning in a neuromorphic robot

    CSIR Research Space (South Africa)

    Stewart, TC

    2016-02-01

    Full Text Available ) by turning right if it is present at an intersection, and otherwise turning left. In general, this system can learn arbitrary relations between sensory input and motor behavior....

  4. The Development of a Robot-Based Learning Companion: A User-Centered Design Approach

    Science.gov (United States)

    Hsieh, Yi-Zeng; Su, Mu-Chun; Chen, Sherry Y.; Chen, Gow-Dong

    2015-01-01

    A computer-vision-based method is widely employed to support the development of a variety of applications. In this vein, this study uses a computer-vision-based method to develop a playful learning system, which is a robot-based learning companion named RobotTell. Unlike existing playful learning systems, a user-centered design (UCD) approach is…

  5. Cross-Situational Learning with Bayesian Generative Models for Multimodal Category and Word Learning in Robots

    Directory of Open Access Journals (Sweden)

    Akira Taniguchi

    2017-12-01

    Full Text Available In this paper, we propose a Bayesian generative model that can form multiple categories based on each sensory-channel and can associate words with any of the four sensory-channels (action, position, object, and color. This paper focuses on cross-situational learning using the co-occurrence between words and information of sensory-channels in complex situations rather than conventional situations of cross-situational learning. We conducted a learning scenario using a simulator and a real humanoid iCub robot. In the scenario, a human tutor provided a sentence that describes an object of visual attention and an accompanying action to the robot. The scenario was set as follows: the number of words per sensory-channel was three or four, and the number of trials for learning was 20 and 40 for the simulator and 25 and 40 for the real robot. The experimental results showed that the proposed method was able to estimate the multiple categorizations and to learn the relationships between multiple sensory-channels and words accurately. In addition, we conducted an action generation task and an action description task based on word meanings learned in the cross-situational learning scenario. The experimental results showed that the robot could successfully use the word meanings learned by using the proposed method.

  6. Learning-based identification and iterative learning control of direct-drive robots

    NARCIS (Netherlands)

    Bukkems, B.H.M.; Kostic, D.; Jager, de A.G.; Steinbuch, M.

    2005-01-01

    A combination of model-based and Iterative Learning Control is proposed as a method to achieve high-quality motion control of direct-drive robots in repetitive motion tasks. We include both model-based and learning components in the total control law, as their individual properties influence the

  7. Learning User Preferences in Ubiquitous Systems: A User Study and a Reinforcement Learning Approach

    OpenAIRE

    Zaidenberg , Sofia; Reignier , Patrick; Mandran , Nadine

    2010-01-01

    International audience; Our study concerns a virtual assistant, proposing services to the user based on its current perceived activity and situation (ambient intelligence). Instead of asking the user to define his preferences, we acquire them automatically using a reinforcement learning approach. Experiments showed that our system succeeded the learning of user preferences. In order to validate the relevance and usability of such a system, we have first conducted a user study. 26 non-expert s...

  8. Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments

    OpenAIRE

    Kidziński, Łukasz; Mohanty, Sharada Prasanna; Ong, Carmichael; Huang, Zhewei; Zhou, Shuchang; Pechenko, Anton; Stelmaszczyk, Adam; Jarosik, Piotr; Pavlov, Mikhail; Kolesnikov, Sergey; Plis, Sergey; Chen, Zhibo; Zhang, Zhizheng; Chen, Jiale; Shi, Jun

    2018-01-01

    In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar ...

  9. Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing

    NARCIS (Netherlands)

    Le, M.N.; Fokkens, A.S.

    Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this paper, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error

  10. Training and learning robotic surgery, time for a more structured approach: a systematic review

    NARCIS (Netherlands)

    Schreuder, H. W. R.; Wolswijk, R.; Zweemer, R. P.; Schijven, M. P.; Verheijen, R. H. M.

    2012-01-01

    Background Robotic assisted laparoscopic surgery is growing rapidly and there is an increasing need for a structured approach to train future robotic surgeons. Objectives To review the literature on training and learning strategies for robotic assisted laparoscopic surgery. Search strategy A

  11. Embodied Computation: An Active-Learning Approach to Mobile Robotics Education

    Science.gov (United States)

    Riek, L. D.

    2013-01-01

    This paper describes a newly designed upper-level undergraduate and graduate course, Autonomous Mobile Robots. The course employs active, cooperative, problem-based learning and is grounded in the fundamental computational problems in mobile robotics defined by Dudek and Jenkin. Students receive a broad survey of robotics through lectures, weekly…

  12. Framework for Educational Robotics: A Multiphase Approach to Enhance User Learning in a Competitive Arena

    Science.gov (United States)

    Lye, Ngit Chan; Wong, Kok Wai; Chiou, Andrew

    2013-01-01

    Educational robotics involves using robots as an educational tool to provide a long term, and progressive learning activity, to cater to different age group. The current concern is that, using robots in education should not be an instance of a one-off project for the sole purpose of participating in a competitive event. Instead, it should be a…

  13. What Pupils Can Learn from Working with Robotic Direct Manipulation Environments

    Science.gov (United States)

    Slangen, Lou; van Keulen, Hanno; Gravemeijer, Koeno

    2011-01-01

    This study investigates what pupils aged 10-12 can learn from working with robots, assuming that understanding robotics is a sign of technological literacy. We conducted cognitive and conceptual analysis to develop a frame of reference for determining pupils' understanding of robotics. Four perspectives were distinguished with increasing…

  14. What pupils can learn from working with robotic direct manipulation environments

    NARCIS (Netherlands)

    Lou Slangen; Hanno van Keulen; Koeno Gravemeijer

    2011-01-01

    This study investigates what pupils aged 10-12 can learn from working with robots, assuming that understanding robotics is a sign of technological literacy. We conducted cognitive and conceptual analysis to develop a frame of reference for determining pupils' understanding of robotics. Four

  15. What pupils can learn from working with robotic direct manipulation environments

    NARCIS (Netherlands)

    Lou Slangen; Hanno van Keulen; Koeno Gravemeijer

    2010-01-01

    This study investigates what pupils aged 10-12 can learn from working with robots, assuming that understanding robotics is a sign of technological literacy. We conducted cognitive and conceptual analysis to develop a frame of reference for determining pupils' understanding of robotics. Four

  16. Working Memory and Reinforcement Schedule Jointly Determine Reinforcement Learning in Children: Potential Implications for Behavioral Parent Training

    Directory of Open Access Journals (Sweden)

    Elien Segers

    2018-03-01

    Full Text Available Introduction: Behavioral Parent Training (BPT is often provided for childhood psychiatric disorders. These disorders have been shown to be associated with working memory impairments. BPT is based on operant learning principles, yet how operant principles shape behavior (through the partial reinforcement (PRF extinction effect, i.e., greater resistance to extinction that is created when behavior is reinforced partially rather than continuously and the potential role of working memory therein is scarcely studied in children. This study explored the PRF extinction effect and the role of working memory therein using experimental tasks in typically developing children.Methods: Ninety-seven children (age 6–10 completed a working memory task and an operant learning task, in which children acquired a response-sequence rule under either continuous or PRF (120 trials, followed by an extinction phase (80 trials. Data of 88 children were used for analysis.Results: The PRF extinction effect was confirmed: We observed slower acquisition and extinction in the PRF condition as compared to the continuous reinforcement (CRF condition. Working memory was negatively related to acquisition but not extinction performance.Conclusion: Both reinforcement contingencies and working memory relate to acquisition performance. Potential implications for BPT are that decreasing working memory load may enhance the chance of optimally learning through reinforcement.

  17. Task path planning, scheduling and learning for free-ranging robot systems

    Science.gov (United States)

    Wakefield, G. Steve

    1987-01-01

    The development of robotics applications for space operations is often restricted by the limited movement available to guided robots. Free ranging robots can offer greater flexibility than physically guided robots in these applications. Presented here is an object oriented approach to path planning and task scheduling for free-ranging robots that allows the dynamic determination of paths based on the current environment. The system also provides task learning for repetitive jobs. This approach provides a basis for the design of free-ranging robot systems which are adaptable to various environments and tasks.

  18. How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

    Science.gov (United States)

    Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C

    2014-03-01

    Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward

  19. Memory Transformation Enhances Reinforcement Learning in Dynamic Environments.

    Science.gov (United States)

    Santoro, Adam; Frankland, Paul W; Richards, Blake A

    2016-11-30

    Over the course of systems consolidation, there is a switch from a reliance on detailed episodic memories to generalized schematic memories. This switch is sometimes referred to as "memory transformation." Here we demonstrate a previously unappreciated benefit of memory transformation, namely, its ability to enhance reinforcement learning in a dynamic environment. We developed a neural network that is trained to find rewards in a foraging task where reward locations are continuously changing. The network can use memories for specific locations (episodic memories) and statistical patterns of locations (schematic memories) to guide its search. We find that switching from an episodic to a schematic strategy over time leads to enhanced performance due to the tendency for the reward location to be highly correlated with itself in the short-term, but regress to a stable distribution in the long-term. We also show that the statistics of the environment determine the optimal utilization of both types of memory. Our work recasts the theoretical question of why memory transformation occurs, shifting the focus from the avoidance of memory interference toward the enhancement of reinforcement learning across multiple timescales. As time passes, memories transform from a highly detailed state to a more gist-like state, in a process called "memory transformation." Theories of memory transformation speak to its advantages in terms of reducing memory interference, increasing memory robustness, and building models of the environment. However, the role of memory transformation from the perspective of an agent that continuously acts and receives reward in its environment is not well explored. In this work, we demonstrate a view of memory transformation that defines it as a way of optimizing behavior across multiple timescales. Copyright © 2016 the authors 0270-6474/16/3612228-15$15.00/0.

  20. Lessons learned over a decade of pediatric robotic ureteral reimplantation

    Directory of Open Access Journals (Sweden)

    Minki Baek

    2017-01-01

    Full Text Available The da Vinci robotic system has improved surgeon dexterity, ergonomics, and visualization to allow for a minimally invasive option for complex reconstructive procedures in children. Over the past decade, robot-assisted laparoscopic ureteral reimplantation (RALUR has become a viable minimally invasive surgical option for pediatric vesicoureteral reflux (VUR. However, higher-thanexpected complication rates and suboptimal reflux resolution rates at some centers have also been reported. The heterogeneity of surgical outcomes may arise from the inherent and underestimated complexity of the RALUR procedure that may justify its reclassification as a complex reconstructive procedure and especially for robotic surgeons early in their learning curve. Currently, no consensus exists on the role of RALUR for the surgical management of VUR. High success rates and low major complication rates are the expected norm for the current gold standard surgical option of open ureteral reimplantation. Similar to how robot-assisted laparoscopic surgery has gradually replaced open surgery as the most utilized option for prostatectomy in prostate cancer patients, RALUR may become a higher utilized surgical option in children with VUR if the adoption of standardized surgical techniques that have been associated with optimal outcomes can be adopted during the second decade of RALUR. A future standard of RALUR for children with VUR whose parents seek a minimally invasive surgical option can arise if widespread achievement of high success rates and low major complication rates can be obtained, similar to the replacement of open surgery with robot-assisted laparoscopic radical prostectomy as the new strandard for men with prostate cancer.

  1. GA-based fuzzy reinforcement learning for control of a magnetic bearing system.

    Science.gov (United States)

    Lin, C T; Jou, C P

    2000-01-01

    This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.

  2. Reinforcement learning techniques for controlling resources in power networks

    Science.gov (United States)

    Kowli, Anupama Sunil

    As power grids transition towards increased reliance on renewable generation, energy storage and demand response resources, an effective control architecture is required to harness the full functionalities of these resources. There is a critical need for control techniques that recognize the unique characteristics of the different resources and exploit the flexibility afforded by them to provide ancillary services to the grid. The work presented in this dissertation addresses these needs. Specifically, new algorithms are proposed, which allow control synthesis in settings wherein the precise distribution of the uncertainty and its temporal statistics are not known. These algorithms are based on recent developments in Markov decision theory, approximate dynamic programming and reinforcement learning. They impose minimal assumptions on the system model and allow the control to be "learned" based on the actual dynamics of the system. Furthermore, they can accommodate complex constraints such as capacity and ramping limits on generation resources, state-of-charge constraints on storage resources, comfort-related limitations on demand response resources and power flow limits on transmission lines. Numerical studies demonstrating applications of these algorithms to practical control problems in power systems are discussed. Results demonstrate how the proposed control algorithms can be used to improve the performance and reduce the computational complexity of the economic dispatch mechanism in a power network. We argue that the proposed algorithms are eminently suitable to develop operational decision-making tools for large power grids with many resources and many sources of uncertainty.

  3. A Reinforcement Learning Framework for Spiking Networks with Dynamic Synapses

    Directory of Open Access Journals (Sweden)

    Karim El-Laithy

    2011-01-01

    Full Text Available An integration of both the Hebbian-based and reinforcement learning (RL rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.

  4. Reinforcement Learning for Predictive Analytics in Smart Cities

    Directory of Open Access Journals (Sweden)

    Kostas Kolomvatsos

    2017-06-01

    Full Text Available The digitization of our lives cause a shift in the data production as well as in the required data management. Numerous nodes are capable of producing huge volumes of data in our everyday activities. Sensors, personal smart devices as well as the Internet of Things (IoT paradigm lead to a vast infrastructure that covers all the aspects of activities in modern societies. In the most of the cases, the critical issue for public authorities (usually, local, like municipalities is the efficient management of data towards the support of novel services. The reason is that analytics provided on top of the collected data could help in the delivery of new applications that will facilitate citizens’ lives. However, the provision of analytics demands intelligent techniques for the underlying data management. The most known technique is the separation of huge volumes of data into a number of parts and their parallel management to limit the required time for the delivery of analytics. Afterwards, analytics requests in the form of queries could be realized and derive the necessary knowledge for supporting intelligent applications. In this paper, we define the concept of a Query Controller ( Q C that receives queries for analytics and assigns each of them to a processor placed in front of each data partition. We discuss an intelligent process for query assignments that adopts Machine Learning (ML. We adopt two learning schemes, i.e., Reinforcement Learning (RL and clustering. We report on the comparison of the two schemes and elaborate on their combination. Our aim is to provide an efficient framework to support the decision making of the QC that should swiftly select the appropriate processor for each query. We provide mathematical formulations for the discussed problem and present simulation results. Through a comprehensive experimental evaluation, we reveal the advantages of the proposed models and describe the outcomes results while comparing them with a

  5. Longitudinal investigation on learned helplessness tested under negative and positive reinforcement involving stimulus control.

    Science.gov (United States)

    Oliveira, Emileane C; Hunziker, Maria Helena

    2014-07-01

    In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.

  6. Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators.

    Science.gov (United States)

    Yang, Qinmin; Jagannathan, Sarangapani

    2012-04-01

    In this paper, reinforcement learning state- and output-feedback-based adaptive critic controller designs are proposed by using the online approximators (OLAs) for a general multi-input and multioutput affine unknown nonlinear discretetime systems in the presence of bounded disturbances. The proposed controller design has two entities, an action network that is designed to produce optimal signal and a critic network that evaluates the performance of the action network. The critic estimates the cost-to-go function which is tuned online using recursive equations derived from heuristic dynamic programming. Here, neural networks (NNs) are used both for the action and critic whereas any OLAs, such as radial basis functions, splines, fuzzy logic, etc., can be utilized. For the output-feedback counterpart, an additional NN is designated as the observer to estimate the unavailable system states, and thus, separation principle is not required. The NN weight tuning laws for the controller schemes are also derived while ensuring uniform ultimate boundedness of the closed-loop system using Lyapunov theory. Finally, the effectiveness of the two controllers is tested in simulation on a pendulum balancing system and a two-link robotic arm system.

  7. Reinforcement Learning Based Novel Adaptive Learning Framework for Smart Grid Prediction

    Directory of Open Access Journals (Sweden)

    Tian Li

    2017-01-01

    Full Text Available Smart grid is a potential infrastructure to supply electricity demand for end users in a safe and reliable manner. With the rapid increase of the share of renewable energy and controllable loads in smart grid, the operation uncertainty of smart grid has increased briskly during recent years. The forecast is responsible for the safety and economic operation of the smart grid. However, most existing forecast methods cannot account for the smart grid due to the disabilities to adapt to the varying operational conditions. In this paper, reinforcement learning is firstly exploited to develop an online learning framework for the smart grid. With the capability of multitime scale resolution, wavelet neural network has been adopted in the online learning framework to yield reinforcement learning and wavelet neural network (RLWNN based adaptive learning scheme. The simulations on two typical prediction problems in smart grid, including wind power prediction and load forecast, validate the effectiveness and the scalability of the proposed RLWNN based learning framework and algorithm.

  8. The combination of appetitive and aversive reinforcers and the nature of their interaction during auditory learning.

    Science.gov (United States)

    Ilango, A; Wetzel, W; Scheich, H; Ohl, F W

    2010-03-31

    Learned changes in behavior can be elicited by either appetitive or aversive reinforcers. It is, however, not clear whether the two types of motivation, (approaching appetitive stimuli and avoiding aversive stimuli) drive learning in the same or different ways, nor is their interaction understood in situations where the two types are combined in a single experiment. To investigate this question we have developed a novel learning paradigm for Mongolian gerbils, which not only allows rewards and punishments to be presented in isolation or in combination with each other, but also can use these opposite reinforcers to drive the same learned behavior. Specifically, we studied learning of tone-conditioned hurdle crossing in a shuttle box driven by either an appetitive reinforcer (brain stimulation reward) or an aversive reinforcer (electrical footshock), or by a combination of both. Combination of the two reinforcers potentiated speed of acquisition, led to maximum possible performance, and delayed extinction as compared to either reinforcer alone. Additional experiments, using partial reinforcement protocols and experiments in which one of the reinforcers was omitted after the animals had been previously trained with the combination of both reinforcers, indicated that appetitive and aversive reinforcers operated together but acted in different ways: in this particular experimental context, punishment appeared to be more effective for initial acquisition and reward more effective to maintain a high level of conditioned responses (CRs). The results imply that learning mechanisms in problem solving were maximally effective when the initial punishment of mistakes was combined with the subsequent rewarding of correct performance. Copyright 2010 IBRO. Published by Elsevier Ltd. All rights reserved.

  9. Dopamine-Dependent Reinforcement of Motor Skill Learning: Evidence from Gilles de la Tourette Syndrome

    Science.gov (United States)

    Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias

    2011-01-01

    Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…

  10. Off-policy reinforcement learning for H∞ control design.

    Science.gov (United States)

    Luo, Biao; Wu, Huai-Ning; Huang, Tingwen

    2015-01-01

    The H∞ control design problem is considered for nonlinear systems with unknown internal system model. It is known that the nonlinear H∞ control problem can be transformed into solving the so-called Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, model-based approaches cannot be used for approximately solving HJI equation, when the accurate system model is unavailable or costly to obtain in practice. To overcome these difficulties, an off-policy reinforcement leaning (RL) method is introduced to learn the solution of HJI equation from real system data instead of mathematical system model, and its convergence is proved. In the off-policy RL method, the system data can be generated with arbitrary policies rather than the evaluating policy, which is extremely important and promising for practical systems. For implementation purpose, a neural network (NN)-based actor-critic structure is employed and a least-square NN weight update algorithm is derived based on the method of weighted residuals. Finally, the developed NN-based off-policy RL method is tested on a linear F16 aircraft plant, and further applied to a rotational/translational actuator system.

  11. Human-robot cooperative movement training: Learning a novel sensory motor transformation during walking with robotic assistance-as-needed

    Directory of Open Access Journals (Sweden)

    Benitez Raul

    2007-03-01

    Full Text Available Abstract Background A prevailing paradigm of physical rehabilitation following neurologic injury is to "assist-as-needed" in completing desired movements. Several research groups are attempting to automate this principle with robotic movement training devices and patient cooperative algorithms that encourage voluntary participation. These attempts are currently not based on computational models of motor learning. Methods Here we assume that motor recovery from a neurologic injury can be modelled as a process of learning a novel sensory motor transformation, which allows us to study a simplified experimental protocol amenable to mathematical description. Specifically, we use a robotic force field paradigm to impose a virtual impairment on the left leg of unimpaired subjects walking on a treadmill. We then derive an "assist-as-needed" robotic training algorithm to help subjects overcome the virtual impairment and walk normally. The problem is posed as an optimization of performance error and robotic assistance. The optimal robotic movement trainer becomes an error-based controller with a forgetting factor that bounds kinematic errors while systematically reducing its assistance when those errors are small. As humans have a natural range of movement variability, we introduce an error weighting function that causes the robotic trainer to disregard this variability. Results We experimentally validated the controller with ten unimpaired subjects by demonstrating how it helped the subjects learn the novel sensory motor transformation necessary to counteract the virtual impairment, while also preventing them from experiencing large kinematic errors. The addition of the error weighting function allowed the robot assistance to fade to zero even though the subjects' movements were variable. We also show that in order to assist-as-needed, the robot must relax its assistance at a rate faster than that of the learning human. Conclusion The assist

  12. Human-robot cooperative movement training: learning a novel sensory motor transformation during walking with robotic assistance-as-needed.

    Science.gov (United States)

    Emken, Jeremy L; Benitez, Raul; Reinkensmeyer, David J

    2007-03-28

    A prevailing paradigm of physical rehabilitation following neurologic injury is to "assist-as-needed" in completing desired movements. Several research groups are attempting to automate this principle with robotic movement training devices and patient cooperative algorithms that encourage voluntary participation. These attempts are currently not based on computational models of motor learning. Here we assume that motor recovery from a neurologic injury can be modelled as a process of learning a novel sensory motor transformation, which allows us to study a simplified experimental protocol amenable to mathematical description. Specifically, we use a robotic force field paradigm to impose a virtual impairment on the left leg of unimpaired subjects walking on a treadmill. We then derive an "assist-as-needed" robotic training algorithm to help subjects overcome the virtual impairment and walk normally. The problem is posed as an optimization of performance error and robotic assistance. The optimal robotic movement trainer becomes an error-based controller with a forgetting factor that bounds kinematic errors while systematically reducing its assistance when those errors are small. As humans have a natural range of movement variability, we introduce an error weighting function that causes the robotic trainer to disregard this variability. We experimentally validated the controller with ten unimpaired subjects by demonstrating how it helped the subjects learn the novel sensory motor transformation necessary to counteract the virtual impairment, while also preventing them from experiencing large kinematic errors. The addition of the error weighting function allowed the robot assistance to fade to zero even though the subjects' movements were variable. We also show that in order to assist-as-needed, the robot must relax its assistance at a rate faster than that of the learning human. The assist-as-needed algorithm proposed here can limit error during the learning of a

  13. Unimodal Learning Enhances Crossmodal Learning in Robotic Audio-Visual Tracking

    DEFF Research Database (Denmark)

    Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

    2017-01-01

    Crossmodal sensory integration is a fundamental feature of the brain that aids in forming an coherent and unified representation of observed events in the world. Spatiotemporally correlated sensory stimuli brought about by rich sensorimotor experiences drive the development of crossmodal integrat...... a non-holonomic robotic agent towards a moving audio-visual target. Simulation results demonstrate that unimodal learning enhances crossmodal learning and improves both the overall accuracy and precision of multisensory orientation response....

  14. Unimodal Learning Enhances Crossmodal Learning in Robotic Audio-Visual Tracking

    DEFF Research Database (Denmark)

    Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

    2018-01-01

    Crossmodal sensory integration is a fundamental feature of the brain that aids in forming an coherent and unified representation of observed events in the world. Spatiotemporally correlated sensory stimuli brought about by rich sensorimotor experiences drive the development of crossmodal integrat...... a non-holonomic robotic agent towards a moving audio-visual target. Simulation results demonstrate that unimodal learning enhances crossmodal learning and improves both the overall accuracy and precision of multisensory orientation response....

  15. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning

    Directory of Open Access Journals (Sweden)

    Yuntian Feng

    2017-01-01

    Full Text Available We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  16. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning.

    Science.gov (United States)

    Feng, Yuntian; Zhang, Hongjun; Hao, Wenning; Chen, Gang

    2017-01-01

    We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q -Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  17. Knowledge-Based Reinforcement Learning for Data Mining

    Science.gov (United States)

    Kudenko, Daniel; Grzes, Marek

    Data Mining is the process of extracting patterns from data. Two general avenues of research in the intersecting areas of agents and data mining can be distinguished. The first approach is concerned with mining an agent’s observation data in order to extract patterns, categorize environment states, and/or make predictions of future states. In this setting, data is normally available as a batch, and the agent’s actions and goals are often independent of the data mining task. The data collection is mainly considered as a side effect of the agent’s activities. Machine learning techniques applied in such situations fall into the class of supervised learning. In contrast, the second scenario occurs where an agent is actively performing the data mining, and is responsible for the data collection itself. For example, a mobile network agent is acquiring and processing data (where the acquisition may incur a certain cost), or a mobile sensor agent is moving in a (perhaps hostile) environment, collecting and processing sensor readings. In these settings, the tasks of the agent and the data mining are highly intertwined and interdependent (or even identical). Supervised learning is not a suitable technique for these cases. Reinforcement Learning (RL) enables an agent to learn from experience (in form of reward and punishment for explorative actions) and adapt to new situations, without a teacher. RL is an ideal learning technique for these data mining scenarios, because it fits the agent paradigm of continuous sensing and acting, and the RL agent is able to learn to make decisions on the sampling of the environment which provides the data. Nevertheless, RL still suffers from scalability problems, which have prevented its successful use in many complex real-world domains. The more complex the tasks, the longer it takes a reinforcement learning algorithm to converge to a good solution. For many real-world tasks, human expert knowledge is available. For example, human

  18. Robot Grasp Learning by Demonstration without Predefined Rules

    Directory of Open Access Journals (Sweden)

    César Fernández

    2011-12-01

    Full Text Available A learning-based approach to autonomous robot grasping is presented. Pattern recognition techniques are used to measure the similarity between a set of previously stored example grasps and all the possible candidate grasps for a new object. Two sets of features are defined in order to characterize grasps: point attributes describe the surroundings of a contact point; point-set attributes describe the relationship between the set of n contact points (assuming an n-fingered robot gripper is used. In the experiments performed, the nearest neighbour classifier outperforms other approaches like multilayer perceptrons, radial basis functions or decision trees, in terms of classification accuracy, while computational load is not excessive for a real time application (a grasp is fully synthesized in 0.2 seconds. The results obtained on a synthetic database show that the proposed system is able to imitate the grasping behaviour of the user (e.g. the system learns to grasp a mug by its handle. All the code has been made available for testing purposes.

  19. Learning Robotics in a Science Museum Theatre Play: Investigation of Learning Outcomes, Contexts and Experiences

    Science.gov (United States)

    Peleg, Ran; Baram-Tsabari, Ayelet

    2017-12-01

    Theatre is often introduced into science museums to enhance visitor experience. While learning in museums exhibitions received considerable research attention, learning from museum theatre has not. The goal of this exploratory study was to investigate the potential educational role of a science museum theatre play. The study aimed to investigate (1) cognitive learning outcomes of the play, (2) how these outcomes interact with different viewing contexts and (3) experiential learning outcomes through the theatrical experience. The play `Robot and I', addressing principles in robotics, was commissioned by a science museum. Data consisted of 391 questionnaires and interviews with 47 children and 20 parents. Findings indicate that explicit but not implicit learning goals were decoded successfully. There was little synergy between learning outcomes of the play and an exhibition on robotics, demonstrating the effect of two different physical contexts. Interview data revealed that prior knowledge, experience and interest played a major role in children's understanding of the play. Analysis of the theatrical experience showed that despite strong identification with the child protagonist, children often doubted the protagonist's knowledge jeopardizing integration of scientific content. The study extends the empirical knowledge and theoretical thinking on museum theatre to better support claims of its virtues and respond to their criticism.

  20. Nonparametric bayesian reward segmentation for skill discovery using inverse reinforcement learning

    CSIR Research Space (South Africa)

    Ranchod, P

    2015-10-01

    Full Text Available We present a method for segmenting a set of unstructured demonstration trajectories to discover reusable skills using inverse reinforcement learning (IRL). Each skill is characterised by a latent reward function which the demonstrator is assumed...

  1. Perception-based Co-evolutionary Reinforcement Learning for UAV Sensor Allocation

    National Research Council Canada - National Science Library

    Berenji, Hamid

    2003-01-01

    .... A Perception-based reasoning approach based on co-evolutionary reinforcement learning was developed for jointly addressing sensor allocation on each individual UAV and allocation of a team of UAVs...

  2. Applying reinforcement learning to the weapon assignment problem in air defence

    CSIR Research Space (South Africa)

    Mouton, H

    2011-12-01

    Full Text Available . The techniques investigated in this article were two methods from the machine-learning subfield of reinforcement learning (RL), namely a Monte Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy temporal-difference (TD) learning...

  3. Robotics

    International Nuclear Information System (INIS)

    Scheide, A.W.

    1983-01-01

    This article reviews some of the technical areas and history associated with robotics, provides information relative to the formation of a Robotics Industry Committee within the Industry Applications Society (IAS), and describes how all activities relating to robotics will be coordinated within the IEEE. Industrial robots are being used for material handling, processes such as coating and arc welding, and some mechanical and electronics assembly. An industrial robot is defined as a programmable, multifunctional manipulator designed to move material, parts, tools, or specialized devices through variable programmed motions for a variety of tasks. The initial focus of the Robotics Industry Committee will be on the application of robotics systems to the various industries that are represented within the IAS

  4. Lessons Learned in Designing User-configurable Modular Robotics

    DEFF Research Database (Denmark)

    Lund, Henrik Hautop

    2013-01-01

    User-configurable robotics allows users to easily configure robotic systems to perform task-fulfilling behaviors as desired by the users. With a user configurable robotic system, the user can easily modify the physical and func-tional aspect in terms of hardware and software components of a robotic...... with the semi-autonomous com-ponents of the user-configurable robotic system in interaction with the given environment. Components constituting such a user-configurable robotic system can be characterized as modules in a modular robotic system. Several factors in the definition and implementation...

  5. A Case-Study for Life-Long Learning and Adaptation in Cooperative Robot Teams

    International Nuclear Information System (INIS)

    Parker, L.E.

    1999-01-01

    While considerable progress has been made in recent years toward the development of multi-robot teams, much work remains to be done before these teams are used widely in real-world applications. Two particular needs toward this end are the development of mechanisms that enable robot teams to generate cooperative behaviors on their own, and the development of techniques that allow these teams to autonomously adapt their behavior over time as the environment or the robot team changes. This paper proposes the use of the Cooperative Multi-Robot Observation of Multiple Moving Targets (CMOMMT) application as a rich domain for studying the issues of multi-robot learning and adaptation. After discussing the need for learning and adaptation in multi-robot teams, this paper describes the CMOMMT application and its relevance to multi-robot learning. We discuss the results of the previously- developed, hand-generated algorithm for CMOMMT and the potential for learning that was discovered from the hand-generated approach. We then describe the early work that has been done (by us and others) to generate multi- robot learning techniques for the CMOMMT application, as well as our ongoing research to develop approaches that give performance as good, or better, than the hand-generated approach. The ultimate goal of this research is to develop techniques for multi-robot learning and adaptation in the CMOMMT application domain that will generalize to cooperative robot applications in other domains, thus making the practical use of multi-robot teams in a wide variety of real-world applications much closer to reality

  6. Place preference and vocal learning rely on distinct reinforcers in songbirds.

    Science.gov (United States)

    Murdoch, Don; Chen, Ruidong; Goldberg, Jesse H

    2018-04-30

    In reinforcement learning (RL) agents are typically tasked with maximizing a single objective function such as reward. But it remains poorly understood how agents might pursue distinct objectives at once. In machines, multiobjective RL can be achieved by dividing a single agent into multiple sub-agents, each of which is shaped by agent-specific reinforcement, but it remains unknown if animals adopt this strategy. Here we use songbirds to test if navigation and singing, two behaviors with distinct objectives, can be differentially reinforced. We demonstrate that strobe flashes aversively condition place preference but not song syllables. Brief noise bursts aversively condition song syllables but positively reinforce place preference. Thus distinct behavior-generating systems, or agencies, within a single animal can be shaped by correspondingly distinct reinforcement signals. Our findings suggest that spatially segregated vocal circuits can solve a credit assignment problem associated with multiobjective learning.

  7. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study

    Science.gov (United States)

    Anderson, Sarah J.; Hecker, Kent G.; Krigolson, Olave E.; Jamniczky, Heather A.

    2018-01-01

    In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise. PMID:29467638

  8. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention-A Neuroeducation Study.

    Science.gov (United States)

    Anderson, Sarah J; Hecker, Kent G; Krigolson, Olave E; Jamniczky, Heather A

    2018-01-01

    In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise.

  9. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study

    Directory of Open Access Journals (Sweden)

    Sarah J. Anderson

    2018-02-01

    Full Text Available In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT methods incorporate pre-class exercises (typically online meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise.

  10. HCBPM: An Idea toward a Social Learning Environment for Humanoid Robot

    Directory of Open Access Journals (Sweden)

    Fady Alnajjar

    2010-01-01

    Full Text Available To advance robotics toward real-world applications, a growing body of research has focused on the development of control systems for humanoid robots in recent years. Several approaches have been proposed to support the learning stage of such controllers, where the robot can learn new behaviors by observing and/or receiving direct guidance from a human or even another robot. These approaches require dynamic learning and memorization techniques, which the robot can use to reform and update its internal systems continuously while learning new behaviors. Against this background, this study investigates a new approach to the development of an incremental learning and memorization model. This approach was inspired by the principles of neuroscience, and the developed model was named “Hierarchical Constructive Backpropagation with Memory” (HCBPM. The validity of the model was tested by teaching a humanoid robot to recognize a group of objects through natural interaction. The experimental results indicate that the proposed model efficiently enhances real-time machine learning in general and can be used to establish an environment suitable for social learning between the robot and the user in particular.

  11. Identification of animal behavioral strategies by inverse reinforcement learning.

    Directory of Open Access Journals (Sweden)

    Shoichiro Yamaguchi

    2018-05-01

    Full Text Available Animals are able to reach a desired state in an environment by controlling various behavioral patterns. Identification of the behavioral strategy used for this control is important for understanding animals' decision-making and is fundamental to dissect information processing done by the nervous system. However, methods for quantifying such behavioral strategies have not been fully established. In this study, we developed an inverse reinforcement-learning (IRL framework to identify an animal's behavioral strategy from behavioral time-series data. We applied this framework to C. elegans thermotactic behavior; after cultivation at a constant temperature with or without food, fed worms prefer, while starved worms avoid the cultivation temperature on a thermal gradient. Our IRL approach revealed that the fed worms used both the absolute temperature and its temporal derivative and that their behavior involved two strategies: directed migration (DM and isothermal migration (IM. With DM, worms efficiently reached specific temperatures, which explains their thermotactic behavior when fed. With IM, worms moved along a constant temperature, which reflects isothermal tracking, well-observed in previous studies. In contrast to fed animals, starved worms escaped the cultivation temperature using only the absolute, but not the temporal derivative of temperature. We also investigated the neural basis underlying these strategies, by applying our method to thermosensory neuron-deficient worms. Thus, our IRL-based approach is useful in identifying animal strategies from behavioral time-series data and could be applied to a wide range of behavioral studies, including decision-making, in other organisms.

  12. Fuzzy control in robot-soccer, evolutionary learning in the first layer of control

    Directory of Open Access Journals (Sweden)

    Peter J Thomas

    2003-02-01

    Full Text Available In this paper an evolutionary algorithm is developed to learn a fuzzy knowledge base for the control of a soccer playing micro-robot from any configuration belonging to a grid of initial configurations to hit the ball along the ball to goal line of sight. The knowledge base uses relative co-ordinate system including left and right wheel velocities of the robot. Final path positions allow forward and reverse facing robot to ball and include its physical dimensions.

  13. A Predictive Study of Learner Attitudes toward Open Learning in a Robotics Class

    Science.gov (United States)

    Avsec, Stanislav; Rihtarsic, David; Kocijancic, Slavko

    2014-01-01

    Open learning (OL) strives to transform teaching and learning by applying learning science and emerging technologies to increase student success, improve learning productivity, and lower barriers to access. OL of robotics has a significant growth rate in secondary and/or high schools, but failures exist. Little is known about why many users stop…

  14. Dynamic Sensor Tasking for Space Situational Awareness via Reinforcement Learning

    Science.gov (United States)

    Linares, R.; Furfaro, R.

    2016-09-01

    This paper studies the Sensor Management (SM) problem for optical Space Object (SO) tracking. The tasking problem is formulated as a Markov Decision Process (MDP) and solved using Reinforcement Learning (RL). The RL problem is solved using the actor-critic policy gradient approach. The actor provides a policy which is random over actions and given by a parametric probability density function (pdf). The critic evaluates the policy by calculating the estimated total reward or the value function for the problem. The parameters of the policy action pdf are optimized using gradients with respect to the reward function. Both the critic and the actor are modeled using deep neural networks (multi-layer neural networks). The policy neural network takes the current state as input and outputs probabilities for each possible action. This policy is random, and can be evaluated by sampling random actions using the probabilities determined by the policy neural network's outputs. The critic approximates the total reward using a neural network. The estimated total reward is used to approximate the gradient of the policy network with respect to the network parameters. This approach is used to find the non-myopic optimal policy for tasking optical sensors to estimate SO orbits. The reward function is based on reducing the uncertainty for the overall catalog to below a user specified uncertainty threshold. This work uses a 30 km total position error for the uncertainty threshold. This work provides the RL method with a negative reward as long as any SO has a total position error above the uncertainty threshold. This penalizes policies that take longer to achieve the desired accuracy. A positive reward is provided when all SOs are below the catalog uncertainty threshold. An optimal policy is sought that takes actions to achieve the desired catalog uncertainty in minimum time. This work trains the policy in simulation by letting it task a single sensor to "learn" from its performance

  15. Robotics

    Energy Technology Data Exchange (ETDEWEB)

    Lorino, P; Altwegg, J M

    1985-05-01

    This article, which is aimed at the general reader, examines latest developments in, and the role of, modern robotics. The 7 main sections are sub-divided into 27 papers presented by 30 authors. The sections are as follows: 1) The role of robotics, 2) Robotics in the business world and what it can offer, 3) Study and development, 4) Utilisation, 5) Wages, 6) Conditions for success, and 7) Technological dynamics.

  16. A Combination of Machine Learning and Cerebellar Models for the Motor Control and Learning of a Modular Robot

    DEFF Research Database (Denmark)

    Baira Ojeda, Ismael; Tolu, Silvia; Pacheco, Moises

    2017-01-01

    We scaled up a bio-inspired control architecture for the motor control and motor learning of a real modular robot. In our approach, the Locally Weighted Projection Regression algorithm (LWPR) and a cerebellar microcircuit coexist, forming a Unit Learning Machine. The LWPR optimizes the input space...... and learns the internal model of a single robot module to command the robot to follow a desired trajectory with its end-effector. The cerebellar microcircuit refines the LWPR output delivering corrective commands. We contrasted distinct cerebellar circuits including analytical models and spiking models...

  17. A reward optimization method based on action subrewards in hierarchical reinforcement learning.

    Science.gov (United States)

    Fu, Yuchen; Liu, Quan; Ling, Xionghong; Cui, Zhiming

    2014-01-01

    Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

  18. Robotic Fish to Aid Animal Behavior Studies and Informal Science Learning

    Science.gov (United States)

    Phamduy, Paul

    The application of robotic fish in the fields of animal behavior and informal science learning are new and relatively untapped. In the context of animal behavior studies, robotic fish offers a consistent and customizable stimulus that could contribute to dissect the determinants of social behavior. In the realm of informal science learning, robotic fish are gaining momentum for the possibility of educating the general public simultaneously on fish physiology and underwater robotics. In this dissertation, the design and development of a number of robotic fish platforms and prototypes and their application in animal behavioral studies and informal science learning settings are presented. Robotic platforms for animal behavioral studies focused on the utilization replica or same scale prototypes. A novel robotic fish platform, featuring a three-dimensional swimming multi-linked robotic fish, was developed with three control modes varying in the level of robot autonomy offered. This platform was deployed at numerous science festivals and science centers, to obtain data on visitor engagement and experience.

  19. Enhancement of Online Robotics Learning Using Real-Time 3D Visualization Technology

    OpenAIRE

    Richard Chiou; Yongjin (james) Kwon; Tzu-Liang (bill) Tseng; Robin Kizirian; Yueh-Ting Yang

    2010-01-01

    This paper discusses a real-time e-Lab Learning system based on the integration of 3D visualization technology with a remote robotic laboratory. With the emergence and development of the Internet field, online learning is proving to play a significant role in the upcoming era. In an effort to enhance Internet-based learning of robotics and keep up with the rapid progression of technology, a 3- Dimensional scheme of viewing the robotic laboratory has been introduced in addition to the remote c...

  20. Inquiry learning with a social robot: can you explain that to me?

    NARCIS (Netherlands)

    Wijnen, Frances Martine; Charisi, Vasiliki; Davison, Daniel Patrick; van der Meij, Jan; Reidsma, Dennis; Evers, Vanessa; Heerink, M.; de Jong, M.

    2015-01-01

    This paper presents preliminary results of a study which assesses the impact a social robot might have on the verbalization of a child’s internal reasoning and knowledge while working on a learning task. In a comparative experiment we offered children the context of either a social robot or an

  1. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning.

    Science.gov (United States)

    Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B J

    2014-06-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.

  2. Video Demo: Deep Reinforcement Learning for Coordination in Traffic Light Control

    NARCIS (Netherlands)

    van der Pol, E.; Oliehoek, F.A.; Bosse, T.; Bredeweg, B.

    2016-01-01

    This video demonstration contrasts two approaches to coordination in traffic light control using reinforcement learning: earlier work, based on a deconstruction of the state space into a linear combination of vehicle states, and our own approach based on the Deep Q-learning algorithm.

  3. Software for Project-Based Learning of Robot Motion Planning

    Science.gov (United States)

    Moll, Mark; Bordeaux, Janice; Kavraki, Lydia E.

    2013-01-01

    Motion planning is a core problem in robotics concerned with finding feasible paths for a given robot. Motion planning algorithms perform a search in the high-dimensional continuous space of robot configurations and exemplify many of the core algorithmic concepts of search algorithms and associated data structures. Motion planning algorithms can…

  4. Learning to Program with Personal Robots: Influences on Student Motivation

    Science.gov (United States)

    McGill, Monica M.

    2012-01-01

    One of the goals of using robots in introductory programming courses is to increase motivation among learners. There have been several types of robots that have been used extensively in the classroom to teach a variety of computer science concepts. A more recently introduced robot designed to teach programming to novice students is the Institute…

  5. Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models.

    Science.gov (United States)

    Najnin, Shamima; Banerjee, Bonny

    2018-01-01

    Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The "novel words to novel objects" language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task.

  6. A non-linear manifold alignment approach to robot learning from demonstrations

    CSIR Research Space (South Africa)

    Makondo, Ndivhuwo

    2018-04-01

    Full Text Available with potentially different, but unknown, kinematics from humans. This paper proposes a method that enables robots with unknown kinematics to learn skills from demonstrations. Our proposed method requires a motion trajectory obtained from human demonstrations via a...

  7. A Multi-Agent Control Architecture for a Robotic Wheelchair

    Directory of Open Access Journals (Sweden)

    C. Galindo

    2006-01-01

    Full Text Available Assistant robots like robotic wheelchairs can perform an effective and valuable work in our daily lives. However, they eventually may need external help from humans in the robot environment (particularly, the driver in the case of a wheelchair to accomplish safely and efficiently some tricky tasks for the current technology, i.e. opening a locked door, traversing a crowded area, etc. This article proposes a control architecture for assistant robots designed under a multi-agent perspective that facilitates the participation of humans into the robotic system and improves the overall performance of the robot as well as its dependability. Within our design, agents have their own intentions and beliefs, have different abilities (that include algorithmic behaviours and human skills and also learn autonomously the most convenient method to carry out their actions through reinforcement learning. The proposed architecture is illustrated with a real assistant robot: a robotic wheelchair that provides mobility to impaired or elderly people.

  8. E-Learning System for Learning Virtual Circuit Making with a Microcontroller and Programming to Control a Robot

    Science.gov (United States)

    Takemura, Atsushi

    2015-01-01

    This paper proposes a novel e-Learning system for learning electronic circuit making and programming a microcontroller to control a robot. The proposed e-Learning system comprises a virtual-circuit-making function for the construction of circuits with a versatile, Arduino microcontroller and an educational system that can simulate behaviors of…

  9. Active Learning Environments with Robotic Tangibles: Children's Physical and Virtual Spatial Programming Experiences

    Science.gov (United States)

    Burleson, Winslow S.; Harlow, Danielle B.; Nilsen, Katherine J.; Perlin, Ken; Freed, Natalie; Jensen, Camilla Nørgaard; Lahey, Byron; Lu, Patrick; Muldner, Kasia

    2018-01-01

    As computational thinking becomes increasingly important for children to learn, we must develop interfaces that leverage the ways that young children learn to provide opportunities for them to develop these skills. Active Learning Environments with Robotic Tangibles (ALERT) and Robopad, an analogous on-screen virtual spatial programming…

  10. The drift diffusion model as the choice rule in reinforcement learning.

    Science.gov (United States)

    Pedersen, Mads Lund; Frank, Michael J; Biele, Guido

    2017-08-01

    Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.

  11. EFFICIENT SPECTRUM UTILIZATION IN COGNITIVE RADIO THROUGH REINFORCEMENT LEARNING

    Directory of Open Access Journals (Sweden)

    Dhananjay Kumar

    2013-09-01

    Full Text Available Machine learning schemes can be employed in cognitive radio systems to intelligently locate the spectrum holes with some knowledge about the operating environment. In this paper, we formulate a variation of Actor Critic Learning algorithm known as Continuous Actor Critic Learning Automaton (CACLA and compare this scheme with Actor Critic Learning scheme and existing Q–learning scheme. Simulation results show that our CACLA scheme has lesser execution time and achieves higher throughput compared to other two schemes.

  12. Robot education peers in a situated primary school study: Personalisation promotes child learning.

    Science.gov (United States)

    Baxter, Paul; Ashurst, Emily; Read, Robin; Kennedy, James; Belpaeme, Tony

    2017-01-01

    The benefit of social robots to support child learning in an educational context over an extended period of time is evaluated. Specifically, the effect of personalisation and adaptation of robot social behaviour is assessed. Two autonomous robots were embedded within two matched classrooms of a primary school for a continuous two week period without experimenter supervision to act as learning companions for the children for familiar and novel subjects. Results suggest that while children in both personalised and non-personalised conditions learned, there was increased child learning of a novel subject exhibited when interacting with a robot that personalised its behaviours, with indications that this benefit extended to other class-based performance. Additional evidence was obtained suggesting that there is increased acceptance of the personalised robot peer over a non-personalised version. These results provide the first evidence in support of peer-robot behavioural personalisation having a positive influence on learning when embedded in a learning environment for an extended period of time.

  13. Robot education peers in a situated primary school study: Personalisation promotes child learning.

    Directory of Open Access Journals (Sweden)

    Paul Baxter

    Full Text Available The benefit of social robots to support child learning in an educational context over an extended period of time is evaluated. Specifically, the effect of personalisation and adaptation of robot social behaviour is assessed. Two autonomous robots were embedded within two matched classrooms of a primary school for a continuous two week period without experimenter supervision to act as learning companions for the children for familiar and novel subjects. Results suggest that while children in both personalised and non-personalised conditions learned, there was increased child learning of a novel subject exhibited when interacting with a robot that personalised its behaviours, with indications that this benefit extended to other class-based performance. Additional evidence was obtained suggesting that there is increased acceptance of the personalised robot peer over a non-personalised version. These results provide the first evidence in support of peer-robot behavioural personalisation having a positive influence on learning when embedded in a learning environment for an extended period of time.

  14. Robot education peers in a situated primary school study: Personalisation promotes child learning

    Science.gov (United States)

    Ashurst, Emily; Read, Robin; Kennedy, James; Belpaeme, Tony

    2017-01-01

    The benefit of social robots to support child learning in an educational context over an extended period of time is evaluated. Specifically, the effect of personalisation and adaptation of robot social behaviour is assessed. Two autonomous robots were embedded within two matched classrooms of a primary school for a continuous two week period without experimenter supervision to act as learning companions for the children for familiar and novel subjects. Results suggest that while children in both personalised and non-personalised conditions learned, there was increased child learning of a novel subject exhibited when interacting with a robot that personalised its behaviours, with indications that this benefit extended to other class-based performance. Additional evidence was obtained suggesting that there is increased acceptance of the personalised robot peer over a non-personalised version. These results provide the first evidence in support of peer-robot behavioural personalisation having a positive influence on learning when embedded in a learning environment for an extended period of time. PMID:28542648

  15. Design based action research in the world of robot technology and learning

    DEFF Research Database (Denmark)

    Majgaard, Gunver

    2010-01-01

    Why is design based action research method important in the world of robot technology and learning? The article explores how action research and interaction-driven design can be used in development of educational robot technological tools. The actual case is the development of “Fraction Battle......” which is about learning fractions in primary school. The technology is based on robot technology. An outdoor digital playground is taken into to the classroom and then redesigned. The article argues for interaction design takes precedence to technology or goal driven design for development...... of educational tools....

  16. Manifold traversing as a model for learning control of autonomous robots

    Science.gov (United States)

    Szakaly, Zoltan F.; Schenker, Paul S.

    1992-01-01

    This paper describes a recipe for the construction of control systems that support complex machines such as multi-limbed/multi-fingered robots. The robot has to execute a task under varying environmental conditions and it has to react reasonably when previously unknown conditions are encountered. Its behavior should be learned and/or trained as opposed to being programmed. The paper describes one possible method for organizing the data that the robot has learned by various means. This framework can accept useful operator input even if it does not fully specify what to do, and can combine knowledge from autonomous, operator assisted and programmed experiences.

  17. A neural network-based exploratory learning and motor planning system for co-robots

    Directory of Open Access Journals (Sweden)

    Byron V Galbraith

    2015-07-01

    Full Text Available Collaborative robots, or co-robots, are semi-autonomous robotic agents designed to work alongside humans in shared workspaces. To be effective, co-robots require the ability to respond and adapt to dynamic scenarios encountered in natural environments. One way to achieve this is through exploratory learning, or learning by doing, an unsupervised method in which co-robots are able to build an internal model for motor planning and coordination based on real-time sensory inputs. In this paper, we present an adaptive neural network-based system for co-robot control that employs exploratory learning to achieve the coordinated motor planning needed to navigate toward, reach for, and grasp distant objects. To validate this system we used the 11-degrees-of-freedom RoPro Calliope mobile robot. Through motor babbling of its wheels and arm, the Calliope learned how to relate visual and proprioceptive information to achieve hand-eye-body coordination. By continually evaluating sensory inputs and externally provided goal directives, the Calliope was then able to autonomously select the appropriate wheel and joint velocities needed to perform its assigned task, such as following a moving target or retrieving an indicated object.

  18. Enhancement of Online Robotics Learning Using Real-Time 3D Visualization Technology

    Directory of Open Access Journals (Sweden)

    Richard Chiou

    2010-06-01

    Full Text Available This paper discusses a real-time e-Lab Learning system based on the integration of 3D visualization technology with a remote robotic laboratory. With the emergence and development of the Internet field, online learning is proving to play a significant role in the upcoming era. In an effort to enhance Internet-based learning of robotics and keep up with the rapid progression of technology, a 3- Dimensional scheme of viewing the robotic laboratory has been introduced in addition to the remote controlling of the robots. The uniqueness of the project lies in making this process Internet-based, and remote robot operated and visualized in 3D. This 3D system approach provides the students with a more realistic feel of the 3D robotic laboratory even though they are working remotely. As a result, the 3D visualization technology has been tested as part of a laboratory in the MET 205 Robotics and Mechatronics class and has received positive feedback by most of the students. This type of research has introduced a new level of realism and visual communications to online laboratory learning in a remote classroom.

  19. A neural network-based exploratory learning and motor planning system for co-robots.

    Science.gov (United States)

    Galbraith, Byron V; Guenther, Frank H; Versace, Massimiliano

    2015-01-01

    Collaborative robots, or co-robots, are semi-autonomous robotic agents designed to work alongside humans in shared workspaces. To be effective, co-robots require the ability to respond and adapt to dynamic scenarios encountered in natural environments. One way to achieve this is through exploratory learning, or "learning by doing," an unsupervised method in which co-robots are able to build an internal model for motor planning and coordination based on real-time sensory inputs. In this paper, we present an adaptive neural network-based system for co-robot control that employs exploratory learning to achieve the coordinated motor planning needed to navigate toward, reach for, and grasp distant objects. To validate this system we used the 11-degrees-of-freedom RoPro Calliope mobile robot. Through motor babbling of its wheels and arm, the Calliope learned how to relate visual and proprioceptive information to achieve hand-eye-body coordination. By continually evaluating sensory inputs and externally provided goal directives, the Calliope was then able to autonomously select the appropriate wheel and joint velocities needed to perform its assigned task, such as following a moving target or retrieving an indicated object.

  20. Adversarial Reinforcement Learning in a Cyber Security Simulation}

    NARCIS (Netherlands)

    Elderman, Richard; Pater, Leon; Thie, Albert; Drugan, Madalina; Wiering, Marco

    2017-01-01

    This paper focuses on cyber-security simulations in networks modeled as a Markov game with incomplete information and stochastic elements. The resulting game is an adversarial sequential decision making problem played with two agents, the attacker and defender. The two agents pit one reinforcement

  1. Training in robotics: The learning curve and contemporary concepts in training.

    Science.gov (United States)

    Bach, Christian; Miernik, Arkadiusz; Schönthaler, Martin

    2014-03-01

    To define the learning curve of robot-assisted laparoscopic surgery for prostatectomy (RALP) and upper tract procedures, and show the differences between the classical approach to training and the new concept of parallel learning. This mini-review is based on the results of a Medline search using the keywords 'da Vinci', 'robot-assisted laparoscopic surgery', 'training', 'teaching' and 'learning curve'. For RALP and robot-assisted upper tract surgery, a learning curve of 8-150 procedures is quoted, with most articles proposing that 30-40 cases are needed to carry out the procedure safely. There is no consensus about which endpoints should be measured. In the traditional proctored training model, the surgeon learns the procedure linearly, following the sequential order of the surgical steps. A more recent approach is to specify the relative difficulty of each step and to train the surgeon simultaneously in several steps of equal difficulty. The entire procedure is only performed after all the steps are mastered in a timely manner. Recently, a 'warm-up' before robotic surgery has been shown to be beneficial for successful surgery in the operating room. There is no clear definition of the duration of the effective learning curve for RALP and robotic upper tract surgery. The concept of stepwise, parallel learning has the potential to accelerate the learning process and to make sure that initial cases are not too long. It can also be assumed that a preoperative 'warm up' could help significantly to improve the progress of the trainee.

  2. Neural correlates of reinforcement learning and social preferences in competitive bidding.

    Science.gov (United States)

    van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M

    2013-01-30

    In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.

  3. A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control

    Science.gov (United States)

    Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu

    This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.

  4. Learning Similar Actions by Reinforcement or Sensory-Prediction Errors Rely on Distinct Physiological Mechanisms.

    Science.gov (United States)

    Uehara, Shintaro; Mawase, Firas; Celnik, Pablo

    2017-09-14

    Humans can acquire knowledge of new motor behavior via different forms of learning. The two forms most commonly studied have been the development of internal models based on sensory-prediction errors (error-based learning) and success-based feedback (reinforcement learning). Human behavioral studies suggest these are distinct learning processes, though the neurophysiological mechanisms that are involved have not been characterized. Here, we evaluated physiological markers from the cerebellum and the primary motor cortex (M1) using noninvasive brain stimulations while healthy participants trained finger-reaching tasks. We manipulated the extent to which subjects rely on error-based or reinforcement by providing either vector or binary feedback about task performance. Our results demonstrated a double dissociation where learning the task mainly via error-based mechanisms leads to cerebellar plasticity modifications but not long-term potentiation (LTP)-like plasticity changes in M1; while learning a similar action via reinforcement mechanisms elicited M1 LTP-like plasticity but not cerebellar plasticity changes. Our findings indicate that learning complex motor behavior is mediated by the interplay of different forms of learning, weighing distinct neural mechanisms in M1 and the cerebellum. Our study provides insights for designing effective interventions to enhance human motor learning. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. Robotic Literacy Learning Companions: Exploring Student Engagement with a Humanoid Robot in an Afterschool Literacy Program

    Science.gov (United States)

    Levchak, Sofia

    2016-01-01

    This study was an investigation of the use of a NAO humanoid robot as an effective tool for engaging readers in an afterschool program as well as to find if increasing engagement using a humanoid robot would affect students' reading comprehension when compared to traditional forms of instruction. The targeted population of this study was…

  6. Real-time Stereoscopic 3D for E-Robotics Learning

    Directory of Open Access Journals (Sweden)

    Richard Y. Chiou

    2011-02-01

    Full Text Available Following the design and testing of a successful 3-Dimensional surveillance system, this 3D scheme has been implemented into online robotics learning at Drexel University. A real-time application, utilizing robot controllers, programmable logic controllers and sensors, has been developed in the “MET 205 Robotics and Mechatronics” class to provide the students with a better robotic education. The integration of the 3D system allows the students to precisely program the robot and execute functions remotely. Upon the students’ recommendation, polarization has been chosen to be the main platform behind the 3D robotic system. Stereoscopic calculations are carried out for calibration purposes to display the images with the highest possible comfort-level and 3D effect. The calculations are further validated by comparing the results with students’ evaluations. Due to the Internet-based feature, multiple clients have the opportunity to perform the online automation development. In the future, students, in different universities, will be able to cross-control robotic components of different types around the world. With the development of this 3D ERobotics interface, automation resources and robotic learning can be shared and enriched regardless of location.

  7. Self-learning fuzzy logic controllers based on reinforcement

    International Nuclear Information System (INIS)

    Wang, Z.; Shao, S.; Ding, J.

    1996-01-01

    This paper proposes a new method for learning and tuning Fuzzy Logic Controllers. The self-learning scheme in this paper is composed of Bucket-Brigade and Genetic Algorithm. The proposed method is tested on the cart-pole system. Simulation results show that our approach has good learning and control performance

  8. Detecting and Classifying Human Touches in a Social Robot Through Acoustic Sensing and Machine Learning.

    Science.gov (United States)

    Alonso-Martín, Fernando; Gamboa-Montero, Juan José; Castillo, José Carlos; Castro-González, Álvaro; Salichs, Miguel Ángel

    2017-05-16

    An important aspect in Human-Robot Interaction is responding to different kinds of touch stimuli. To date, several technologies have been explored to determine how a touch is perceived by a social robot, usually placing a large number of sensors throughout the robot's shell. In this work, we introduce a novel approach, where the audio acquired from contact microphones located in the robot's shell is processed using machine learning techniques to distinguish between different types of touches. The system is able to determine when the robot is touched (touch detection), and to ascertain the kind of touch performed among a set of possibilities: stroke , tap , slap , and tickle (touch classification). This proposal is cost-effective since just a few microphones are able to cover the whole robot's shell since a single microphone is enough to cover each solid part of the robot. Besides, it is easy to install and configure as it just requires a contact surface to attach the microphone to the robot's shell and plug it into the robot's computer. Results show the high accuracy scores in touch gesture recognition. The testing phase revealed that Logistic Model Trees achieved the best performance, with an F -score of 0.81. The dataset was built with information from 25 participants performing a total of 1981 touch gestures.

  9. Experiments with Online Reinforcement Learning in Real-Time Strategy Games

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-time strategy (RTS) games provide a challenging platform to implement online reinforcement learning (RL) techniques in a real application. Computer, as one game player, monitors opponents' (human or other computers) strategies and then updates its own policy using RL methods. In this article......, we first examine the suitability of applying the online RL in various computer games. Reinforcement learning application depends on both RL complexity and the game features. We then propose a multi-layer framework for implementing online RL in an RTS game. The framework significantly reduces RL...... the effectiveness of our proposed framework and shed light on relevant issues in using online RL in RTS games....

  10. A Model to Explain the Emergence of Reward Expectancy neurons using Reinforcement Learning and Neural Network

    OpenAIRE

    Shinya, Ishii; Munetaka, Shidara; Katsunari, Shibata

    2006-01-01

    In an experiment of multi-trial task to obtain a reward, reward expectancy neurons,###which responded only in the non-reward trials that are necessary to advance###toward the reward, have been observed in the anterior cingulate cortex of monkeys.###In this paper, to explain the emergence of the reward expectancy neuron in###terms of reinforcement learning theory, a model that consists of a recurrent neural###network trained based on reinforcement learning is proposed. The analysis of the###hi...

  11. Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain.

    Science.gov (United States)

    Niv, Yael; Edlund, Jeffrey A; Dayan, Peter; O'Doherty, John P

    2012-01-11

    Humans and animals are exquisitely, though idiosyncratically, sensitive to risk or variance in the outcomes of their actions. Economic, psychological, and neural aspects of this are well studied when information about risk is provided explicitly. However, we must normally learn about outcomes from experience, through trial and error. Traditional models of such reinforcement learning focus on learning about the mean reward value of cues and ignore higher order moments such as variance. We used fMRI to test whether the neural correlates of human reinforcement learning are sensitive to experienced risk. Our analysis focused on anatomically delineated regions of a priori interest in the nucleus accumbens, where blood oxygenation level-dependent (BOLD) signals have been suggested as correlating with quantities derived from reinforcement learning. We first provide unbiased evidence that the raw BOLD signal in these regions corresponds closely to a reward prediction error. We then derive from this signal the learned values of cues that predict rewards of equal mean but different variance and show that these values are indeed modulated by experienced risk. Moreover, a close neurometric-psychometric coupling exists between the fluctuations of the experience-based evaluations of risky options that we measured neurally and the fluctuations in behavioral risk aversion. This suggests that risk sensitivity is integral to human learning, illuminating economic models of choice, neuroscientific models of affective learning, and the workings of the underlying neural mechanisms.

  12. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    Science.gov (United States)

    Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane

    2017-01-01

    Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning. PMID:29163004

  13. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Lucas Kastner

    2017-10-01

    Full Text Available Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1 Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2 Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3 Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.

  14. Learning compliant manipulation through kinesthetic and tactile human-robot interaction.

    Science.gov (United States)

    Kronander, Klas; Billard, Aude

    2014-01-01

    Robot Learning from Demonstration (RLfD) has been identified as a key element for making robots useful in daily lives. A wide range of techniques has been proposed for deriving a task model from a set of demonstrations of the task. Most previous works use learning to model the kinematics of the task, and for autonomous execution the robot then relies on a stiff position controller. While many tasks can and have been learned this way, there are tasks in which controlling the position alone is insufficient to achieve the goals of the task. These are typically tasks that involve contact or require a specific response to physical perturbations. The question of how to adjust the compliance to suit the need of the task has not yet been fully treated in Robot Learning from Demonstration. In this paper, we address this issue and present interfaces that allow a human teacher to indicate compliance variations by physically interacting with the robot during task execution. We validate our approach in two different experiments on the 7 DoF Barrett WAM and KUKA LWR robot manipulators. Furthermore, we conduct a user study to evaluate the usability of our approach from a non-roboticists perspective.

  15. The Impact of Robot Tutor Nonverbal Social Behavior on Child Learning

    Directory of Open Access Journals (Sweden)

    James Kennedy

    2017-04-01

    Full Text Available Several studies have indicated that interacting with social robots in educational contexts may lead to a greater learning than interactions with computers or virtual agents. As such, an increasing amount of social human–robot interaction research is being conducted in the learning domain, particularly with children. However, it is unclear precisely what social behavior a robot should employ in such interactions. Inspiration can be taken from human–human studies; this often leads to an assumption that the more social behavior an agent utilizes, the better the learning outcome will be. We apply a nonverbal behavior metric to a series of studies in which children are taught how to identify prime numbers by a robot with various behavioral manipulations. We find a trend, which generally agrees with the pedagogy literature, but also that overt nonverbal behavior does not account for all learning differences. We discuss the impact of novelty, child expectations, and responses to social cues to further the understanding of the relationship between robot social behavior and learning. We suggest that the combination of nonverbal behavior and social cue congruency is necessary to facilitate learning.

  16. Robotic Motion Learning Framework to Promote Social Engagement

    Directory of Open Access Journals (Sweden)

    Rachael Burns

    2018-02-01

    Full Text Available Imitation is a powerful component of communication between people, and it poses an important implication in improving the quality of interaction in the field of human–robot interaction (HRI. This paper discusses a novel framework designed to improve human–robot interaction through robotic imitation of a participant’s gestures. In our experiment, a humanoid robotic agent socializes with and plays games with a participant. For the experimental group, the robot additionally imitates one of the participant’s novel gestures during a play session. We hypothesize that the robot’s use of imitation will increase the participant’s openness towards engaging with the robot. Experimental results from a user study of 12 subjects show that post-imitation, experimental subjects displayed a more positive emotional state, had higher instances of mood contagion towards the robot, and interpreted the robot to have a higher level of autonomy than their control group counterparts did. These results point to an increased participant interest in engagement fueled by personalized imitation during interaction.

  17. Adaptive Strategy for Online Gait Learning Evaluated on the Polymorphic Robotic LocoKit

    DEFF Research Database (Denmark)

    Christensen, David Johan; Larsen, Jørgen Christian; Stoy, Kasper

    2012-01-01

    This paper presents experiments with a morphologyindependent, life-long strategy for online learning of locomotion gaits, performed on a quadruped robot constructed from the LocoKit modular robot. The learning strategy applies a stochastic optimization algorithm to optimize eight open parameters...... of a central pattern generator based gait implementation. We observe that the strategy converges in roughly ten minutes to gaits of similar or higher velocity than a manually designed gait and that the strategy readapts in the event of failed actuators. In future work we plan to study co-learning...

  18. Three-dimensional neural net for learning visuomotor coordination of a robot arm.

    Science.gov (United States)

    Martinetz, T M; Ritter, H J; Schulten, K J

    1990-01-01

    An extension of T. Kohonen's (1982) self-organizing mapping algorithm together with an error-correction scheme based on the Widrow-Hoff learning rule is applied to develop a learning algorithm for the visuomotor coordination of a simulated robot arm. Learning occurs by a sequence of trial movements without the need for an external teacher. Using input signals from a pair of cameras, the closed robot arm system is able to reduce its positioning error to about 0.3% of the linear dimensions of its work space. This is achieved by choosing the connectivity of a three-dimensional lattice consisting of the units of the neural net.

  19. Behavioral similarity measurement based on image processing for robots that use imitative learning

    Science.gov (United States)

    Sterpin B., Dante G.; Martinez S., Fernando; Jacinto G., Edwar

    2017-02-01

    In the field of the artificial societies, particularly those are based on memetics, imitative behavior is essential for the development of cultural evolution. Applying this concept for robotics, through imitative learning, a robot can acquire behavioral patterns from another robot. Assuming that the learning process must have an instructor and, at least, an apprentice, the fact to obtain a quantitative measurement for their behavioral similarity, would be potentially useful, especially in artificial social systems focused on cultural evolution. In this paper the motor behavior of both kinds of robots, for two simple tasks, is represented by 2D binary images, which are processed in order to measure their behavioral similarity. The results shown here were obtained comparing some similarity measurement methods for binary images.

  20. Designing a Robot Teaching Assistant for Enhancing and Sustaining Learning Motivation

    Science.gov (United States)

    Hung, I-Chun; Chao, Kuo-Jen; Lee, Ling; Chen, Nian-Shing

    2013-01-01

    Although many researchers have pointed out that educational robots can stimulate learners' learning motivation, the learning motivation will be hardly sustained and rapidly decreased over time if the amusement and interaction merely come from the new technology itself without incorporating instructional strategies. Many researchers have…

  1. Robotics

    Indian Academy of Sciences (India)

    netic induction to detect an object. The development of ... end effector, inclination of object, magnetic and electric fields, etc. The sensors described ... In the case of a robot, the various actuators and motors have to be modelled. The major ...

  2. Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Taylor, M.E.; Stone, P.

    2010-01-01

    Temporal difference and evolutionary methods are two of the most common approaches to solving reinforcement learning problems. However, there is little consensus on their relative merits and there have been few empirical studies that directly compare their performance. This article aims to address

  3. IMPLEMENTATION OF MULTIAGENT REINFORCEMENT LEARNING MECHANISM FOR OPTIMAL ISLANDING OPERATION OF DISTRIBUTION NETWORK

    DEFF Research Database (Denmark)

    Saleem, Arshad; Lind, Morten

    2008-01-01

    among electric power utilities to utilize modern information and communication technologies (ICT) in order to improve the automation of the distribution system. In this paper we present our work for the implementation of a dynamic multi-agent based distributed reinforcement learning mechanism...

  4. Starting a robotic program in general thoracic surgery: why, how, and lessons learned.

    Science.gov (United States)

    Cerfolio, Robert J; Bryant, Ayesha S; Minnich, Douglas J

    2011-06-01

    We report our experience in starting a robotic program in thoracic surgery. We retrospectively reviewed our experience in starting a robotic program in general thoracic surgery on a consecutive series of patients. Between February 2009 and September 2010, 150 patients underwent robotic operations. Types of procedures were lobectomy in 62, thymectomy in 30, and benign esophageal procedures in 6. No thymectomy or esophageal procedures required conversion. One conversion was needed for suspected bleeding for a mediastinal mass. Twelve patients were converted for lobectomy (none for bleeding, 1 in the last 24). Median operative time for robotic thymectomy was 119 minutes, and median length of stay was 1 day. The median time for robotic lobectomy was 185 minutes, and median length of stay was 2 days. There were no operative deaths. Morbidity occurred in 23 patients (15%). All patients with cancer had R0 resections and resection of all visible mediastinal and hilar lymph nodes. Robotic surgery is safe and oncologically sound. It requires training of the entire operating room team. The learning curve is steep, involving port placement, availability of the proper instrumentation, use of the correct robotic arms, and proper patient positioning. The robot provides an ideal surgical approach for thymectomy and other mediastinal tumors. Its advantage over thoracoscopy for pulmonary resection is unproven; however, we believe complete thoracic lymph node dissection and teaching is easier. Importantly, defined credentialing for surgeons and cost analysis studies are needed. Copyright © 2011 The Society of Thoracic Surgeons. Published by Elsevier Inc. All rights reserved.

  5. Energy Management Strategy for a Hybrid Electric Vehicle Based on Deep Reinforcement Learning

    OpenAIRE

    Yue Hu; Weimin Li; Kun Xu; Taimoor Zahid; Feiyan Qin; Chenming Li

    2018-01-01

    An energy management strategy (EMS) is important for hybrid electric vehicles (HEVs) since it plays a decisive role on the performance of the vehicle. However, the variation of future driving conditions deeply influences the effectiveness of the EMS. Most existing EMS methods simply follow predefined rules that are not adaptive to different driving conditions online. Therefore, it is useful that the EMS can learn from the environment or driving cycle. In this paper, a deep reinforcement learn...

  6. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning

    OpenAIRE

    Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ

    2014-01-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated....

  7. Arousal regulation and affective adaptation to human responsiveness by a robot that explores and learns a novel environment.

    Science.gov (United States)

    Hiolle, Antoine; Lewis, Matthew; Cañamero, Lola

    2014-01-01

    In the context of our work in developmental robotics regarding robot-human caregiver interactions, in this paper we investigate how a "baby" robot that explores and learns novel environments can adapt its affective regulatory behavior of soliciting help from a "caregiver" to the preferences shown by the caregiver in terms of varying responsiveness. We build on two strands of previous work that assessed independently (a) the differences between two "idealized" robot profiles-a "needy" and an "independent" robot-in terms of their use of a caregiver as a means to regulate the "stress" (arousal) produced by the exploration and learning of a novel environment, and (b) the effects on the robot behaviors of two caregiving profiles varying in their responsiveness-"responsive" and "non-responsive"-to the regulatory requests of the robot. Going beyond previous work, in this paper we (a) assess the effects that the varying regulatory behavior of the two robot profiles has on the exploratory and learning patterns of the robots; (b) bring together the two strands previously investigated in isolation and take a step further by endowing the robot with the capability to adapt its regulatory behavior along the "needy" and "independent" axis as a function of the varying responsiveness of the caregiver; and (c) analyze the effects that the varying regulatory behavior has on the exploratory and learning patterns of the adaptive robot.

  8. Active-learning strategies: the use of a game to reinforce learning in nursing education. A case study.

    Science.gov (United States)

    Boctor, Lisa

    2013-03-01

    The majority of nursing students are kinesthetic learners, preferring a hands-on, active approach to education. Research shows that active-learning strategies can increase student learning and satisfaction. This study looks at the use of one active-learning strategy, a Jeopardy-style game, 'Nursopardy', to reinforce Fundamentals of Nursing material, aiding in students' preparation for a standardized final exam. The game was created keeping students varied learning styles and the NCLEX blueprint in mind. The blueprint was used to create 5 categories, with 26 total questions. Student survey results, using a five-point Likert scale showed that they did find this learning method enjoyable and beneficial to learning. More research is recommended regarding learning outcomes, when using active-learning strategies, such as games. Copyright © 2012 Elsevier Ltd. All rights reserved.

  9. Learning in robotic manipulation: The role of dimensionality reduction in policy search methods. Comment on "Hand synergies: Integration of robotics and neuroscience for understanding the control of biological and artificial hands" by Marco Santello et al.

    Science.gov (United States)

    Ficuciello, Fanny; Siciliano, Bruno

    2016-07-01

    learning into control naturally leads to relaxing the above requirements through the adoption of coordinated motion patterns and sensory-motor synergies as useful tools leading to a problem of reduced dimension. To this purpose, model-based control strategies relying on synergistic models of manipulation activities learned from human experience can be integrated with real-time learning from actions strategies [5]. In [6] a classification of learning strategies for robotics is provided, while the difference between imitation learning and reinforcement learning (RL) is highlighted in [7]. From recent research in the field [8,9], it seems that RL represents the future toward autonomous and intelligent robots since it provides learning capabilities as those of humans, i.e. based on exploration and trial-and-error policies. In this context, suitable policy search methods to be implemented in a synergy-based framework are to be sought in order to reduce the search space dimension while guaranteeing the convergence and efficiency of the learning algorithm.

  10. Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults.

    Science.gov (United States)

    Vernetti, Angélina; Smith, Tim J; Senju, Atsushi

    2017-03-15

    While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.

  11. Flat vs. Expressive Storytelling: Young Children's Learning and Retention of a Social Robot's Narrative.

    Science.gov (United States)

    Kory Westlund, Jacqueline M; Jeong, Sooyeon; Park, Hae W; Ronfard, Samuel; Adhikari, Aradhana; Harris, Paul L; DeSteno, David; Breazeal, Cynthia L

    2017-01-01

    Prior research with preschool children has established that dialogic or active book reading is an effective method for expanding young children's vocabulary. In this exploratory study, we asked whether similar benefits are observed when a robot engages in dialogic reading with preschoolers. Given the established effectiveness of active reading, we also asked whether this effectiveness was critically dependent on the expressive characteristics of the robot. For approximately half the children, the robot's active reading was expressive; the robot's voice included a wide range of intonation and emotion ( Expressive ). For the remaining children, the robot read and conversed with a flat voice, which sounded similar to a classic text-to-speech engine and had little dynamic range ( Flat ). The robot's movements were kept constant across conditions. We performed a verification study using Amazon Mechanical Turk (AMT) to confirm that the Expressive robot was viewed as significantly more expressive, more emotional, and less passive than the Flat robot. We invited 45 preschoolers with an average age of 5 years who were either English Language Learners (ELL), bilingual, or native English speakers to engage in the reading task with the robot. The robot narrated a story from a picture book, using active reading techniques and including a set of target vocabulary words in the narration. Children were post-tested on the vocabulary words and were also asked to retell the story to a puppet. A subset of 34 children performed a second story retelling 4-6 weeks later. Children reported liking and learning from the robot a similar amount in the Expressive and Flat conditions. However, as compared to children in the Flat condition, children in the Expressive condition were more concentrated and engaged as indexed by their facial expressions; they emulated the robot's story more in their story retells; and they told longer stories during their delayed retelling. Furthermore, children who

  12. A Scalable Neuro-inspired Robot Controller Integrating a Machine Learning Algorithm and a Spiking Cerebellar-like Network

    DEFF Research Database (Denmark)

    Baira Ojeda, Ismael; Tolu, Silvia; Lund, Henrik Hautop

    2017-01-01

    Combining Fable robot, a modular robot, with a neuroinspired controller, we present the proof of principle of a system that can scale to several neurally controlled compliant modules. The motor control and learning of a robot module are carried out by a Unit Learning Machine (ULM) that embeds...... the Locally Weighted Projection Regression algorithm (LWPR) and a spiking cerebellar-like microcircuit. The LWPR guarantees both an optimized representation of the input space and the learning of the dynamic internal model (IM) of the robot. However, the cerebellar-like sub-circuit integrates LWPR input...

  13. High and low temperatures have unequal reinforcing properties in Drosophila spatial learning.

    Science.gov (United States)

    Zars, Melissa; Zars, Troy

    2006-07-01

    Small insects regulate their body temperature solely through behavior. Thus, sensing environmental temperature and implementing an appropriate behavioral strategy can be critical for survival. The fly Drosophila melanogaster prefers 24 degrees C, avoiding higher and lower temperatures when tested on a temperature gradient. Furthermore, temperatures above 24 degrees C have negative reinforcing properties. In contrast, we found that flies have a preference in operant learning experiments for a low-temperature-associated position rather than the 24 degrees C alternative in the heat-box. Two additional differences between high- and low-temperature reinforcement, i.e., temperatures above and below 24 degrees C, were found. Temperatures equally above and below 24 degrees C did not reinforce equally and only high temperatures supported increased memory performance with reversal conditioning. Finally, low- and high-temperature reinforced memories are similarly sensitive to two genetic mutations. Together these results indicate the qualitative meaning of temperatures below 24 degrees C depends on the dynamics of the temperatures encountered and that the reinforcing effects of these temperatures depend on at least some common genetic components. Conceptualizing these results using the Wolf-Heisenberg model of operant conditioning, we propose the maximum difference in experienced temperatures determines the magnitude of the reinforcement input to a conditioning circuit.

  14. Cerebellar and prefrontal cortex contributions to adaptation, strategies, and reinforcement learning.

    Science.gov (United States)

    Taylor, Jordan A; Ivry, Richard B

    2014-01-01

    Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. © 2014 Elsevier B.V. All rights reserved.

  15. Learning from Analogies between Robotic World and Natural Phenomena

    Science.gov (United States)

    Verner, Igor M.; Cuperman, Dan

    This paper proposes an approach which combines robotics and science education through the development of robotic models and inquiry into natural phenomena. The robotic models are constructed using the PicoCricket kit. The approach is implemented and evaluated in the framework of teacher training courses for Technion students given in connection with outreach courses for middle school and high school students. The educational study indicated that the proposed approach facilitated acquisition of both technology and science concepts and inspired analogical reasoning and crossdisciplinary connections between the two domains.

  16. Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing

    Science.gov (United States)

    Lefebvre, Germain; Blakemore, Sarah-Jayne

    2017-01-01

    Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice. PMID:28800597

  17. Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing.

    Science.gov (United States)

    Palminteri, Stefano; Lefebvre, Germain; Kilford, Emma J; Blakemore, Sarah-Jayne

    2017-08-01

    Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice.

  18. Self-supervised learning as an enabling technology for future space exploration robots: ISS experiments on monocular distance learning

    Science.gov (United States)

    van Hecke, Kevin; de Croon, Guido C. H. E.; Hennes, Daniel; Setterfield, Timothy P.; Saenz-Otero, Alvar; Izzo, Dario

    2017-11-01

    Although machine learning holds an enormous promise for autonomous space robots, it is currently not employed because of the inherent uncertain outcome of learning processes. In this article we investigate a learning mechanism, Self-Supervised Learning (SSL), which is very reliable and hence an important candidate for real-world deployment even on safety-critical systems such as space robots. To demonstrate this reliability, we introduce a novel SSL setup that allows a stereo vision equipped robot to cope with the failure of one of its cameras. The setup learns to estimate average depth using a monocular image, by using the stereo vision depths from the past as trusted ground truth. We present preliminary results from an experiment on the International Space Station (ISS) performed with the MIT/NASA SPHERES VERTIGO satellite. The presented experiments were performed on October 8th, 2015 on board the ISS. The main goals were (1) data gathering, and (2) navigation based on stereo vision. First the astronaut Kimiya Yui moved the satellite around the Japanese Experiment Module to gather stereo vision data for learning. Subsequently, the satellite freely explored the space in the module based on its (trusted) stereo vision system and a pre-programmed exploration behavior, while simultaneously performing the self-supervised learning of monocular depth estimation on board. The two main goals were successfully achieved, representing the first online learning robotic experiments in space. These results lay the groundwork for a follow-up experiment in which the satellite will use the learned single-camera depth estimation for autonomous exploration in the ISS, and are an advancement towards future space robots that continuously improve their navigation capabilities over time, even in harsh and completely unknown space environments.

  19. Frontostriatal development and probabilistic reinforcement learning during adolescence.

    Science.gov (United States)

    DePasque, Samantha; Galván, Adriana

    2017-09-01

    Adolescence has traditionally been viewed as a period of vulnerability to increased risk-taking and adverse outcomes, which have been linked to neurobiological maturation of the frontostriatal reward system. However, growing research on the role of developmental changes in the adolescent frontostriatal system in facilitating learning will provide a more nuanced view of adolescence. In this review, we discuss the implications of existing research on this topic for learning during adolescence, and suggest that the very neural changes that render adolescents vulnerable to social pressure and risky decision making may also stand to play a role in scaffolding the ability to learn from rewards and from performance-related feedback. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. Reinforcement Learning for Constrained Energy Trading Games With Incomplete Information.

    Science.gov (United States)

    Wang, Huiwei; Huang, Tingwen; Liao, Xiaofeng; Abu-Rub, Haitham; Chen, Guo

    2017-10-01

    This paper considers the problem of designing adaptive learning algorithms to seek the Nash equilibrium (NE) of the constrained energy trading game among individually strategic players with incomplete information. In this game, each player uses the learning automaton scheme to generate the action probability distribution based on his/her private information for maximizing his own averaged utility. It is shown that if one of admissible mixed-strategies converges to the NE with probability one, then the averaged utility and trading quantity almost surely converge to their expected ones, respectively. For the given discontinuous pricing function, the utility function has already been proved to be upper semicontinuous and payoff secure which guarantee the existence of the mixed-strategy NE. By the strict diagonal concavity of the regularized Lagrange function, the uniqueness of NE is also guaranteed. Finally, an adaptive learning algorithm is provided to generate the strategy probability distribution for seeking the mixed-strategy NE.

  1. Understanding Human Hand Gestures for Learning Robot Pick-and-Place Tasks

    Directory of Open Access Journals (Sweden)

    Hsien-I Lin

    2015-05-01

    Full Text Available Programming robots by human demonstration is an intuitive approach, especially by gestures. Because robot pick-and-place tasks are widely used in industrial factories, this paper proposes a framework to learn robot pick-and-place tasks by understanding human hand gestures. The proposed framework is composed of the module of gesture recognition and the module of robot behaviour control. For the module of gesture recognition, transport empty (TE, transport loaded (TL, grasp (G, and release (RL from Gilbreth's therbligs are the hand gestures to be recognized. A convolution neural network (CNN is adopted to recognize these gestures from a camera image. To achieve the robust performance, the skin model by a Gaussian mixture model (GMM is used to filter out non-skin colours of an image, and the calibration of position and orientation is applied to obtain the neutral hand pose before the training and testing of the CNN. For the module of robot behaviour control, the corresponding robot motion primitives to TE, TL, G, and RL, respectively, are implemented in the robot. To manage the primitives in the robot system, a behaviour-based programming platform based on the Extensible Agent Behavior Specification Language (XABSL is adopted. Because the XABSL provides the flexibility and re-usability of the robot primitives, the hand motion sequence from the module of gesture recognition can be easily used in the XABSL programming platform to implement the robot pick-and-place tasks. The experimental evaluation of seven subjects performing seven hand gestures showed that the average recognition rate was 95.96%. Moreover, by the XABSL programming platform, the experiment showed the cube-stacking task was easily programmed by human demonstration.

  2. Children and Robots Learning to Play Hide and Seek

    National Research Council Canada - National Science Library

    Trafton, J. G; Schultz, Alan C; Perznowski, Dennis; Bugajska, Magdalena D; Adams, William; Cassimatis, Nicholas L; Brock, Derek P

    2006-01-01

    ...., containment, under) and use that information to play a credible game of hide and seek. They model this hypothesis within the ACT-R cognitive architecture and put the model on a robot, which is able to mimic the child's hiding behavior. They also take the "hiding" model and use it as the basis for a "seeking" model. They suggest that using the same representations and procedures that a person uses allows better interaction between the human and robotic system.

  3. Bi-directional effect of increasing doses of baclofen on reinforcement learning

    Directory of Open Access Journals (Sweden)

    Jean eTerrier

    2011-07-01

    Full Text Available In rodents as well as in humans, efficient reinforcement learning depends on dopamine (DA released from ventral tegmental area (VTA neurons. It has been shown that in brain slices of mice, GABAB-receptor agonists at low concentrations increase the firing frequency of VTA-DA neurons, while high concentrations reduce the firing frequency. It remains however elusive whether baclofen can modulate reinforcement learning. Here, in a double blind study in 34 healthy human volunteers, we tested the effects of a low and a high concentration of oral baclofen in a gambling task associated with monetary reward. A low (20 mg dose of baclofen increased the efficiency of reward-associated learning but had no effect on the avoidance of monetary loss. A high (50 mg dose of baclofen on the other hand did not affect the learning curve. At the end of the task, subjects who received 20 mg baclofen p.o. were more accurate in choosing the symbol linked to the highest probability of earning money compared to the control group (89.55±1.39% vs 81.07±1.55%, p=0.002. Our results support a model where baclofen, at low concentrations, causes a disinhibition of DA neurons, increases DA levels and thus facilitates reinforcement learning.

  4. Spared internal but impaired external reward prediction error signals in major depressive disorder during reinforcement learning.

    Science.gov (United States)

    Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris

    2017-01-01

    Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.

  5. DYNAMIC AND INCREMENTAL EXPLORATION STRATEGY IN FUSION ADAPTIVE RESONANCE THEORY FOR ONLINE REINFORCEMENT LEARNING

    Directory of Open Access Journals (Sweden)

    Budhitama Subagdja

    2016-06-01

    Full Text Available One of the fundamental challenges in reinforcement learning is to setup a proper balance between exploration and exploitation to obtain the maximum cummulative reward in the long run. Most protocols for exploration bound the overall values to a convergent level of performance. If new knowledge is inserted or the environment is suddenly changed, the issue becomes more intricate as the exploration must compromise the pre-existing knowledge. This paper presents a type of multi-channel adaptive resonance theory (ART neural network model called fusion ART which serves as a fuzzy approximator for reinforcement learning with inherent features that can regulate the exploration strategy. This intrinsic regulation is driven by the condition of the knowledge learnt so far by the agent. The model offers a stable but incremental reinforcement learning that can involve prior rules as bootstrap knowledge for guiding the agent to select the right action. Experiments in obstacle avoidance and navigation tasks demonstrate that in the configuration of learning wherein the agent learns from scratch, the inherent exploration model in fusion ART model is comparable to the basic E-greedy policy. On the other hand, the model is demonstrated to deal with prior knowledge and strike a balance between exploration and exploitation.

  6. Adaptive Load Balancing of Parallel Applications with Multi-Agent Reinforcement Learning on Heterogeneous Systems

    Directory of Open Access Journals (Sweden)

    Johan Parent

    2004-01-01

    Full Text Available We report on the improvements that can be achieved by applying machine learning techniques, in particular reinforcement learning, for the dynamic load balancing of parallel applications. The applications being considered in this paper are coarse grain data intensive applications. Such applications put high pressure on the interconnect of the hardware. Synchronization and load balancing in complex, heterogeneous networks need fast, flexible, adaptive load balancing algorithms. Viewing a parallel application as a one-state coordination game in the framework of multi-agent reinforcement learning, and by using a recently introduced multi-agent exploration technique, we are able to improve upon the classic job farming approach. The improvements are achieved with limited computation and communication overhead.

  7. Imitation learning of Non-Linear Point-to-Point Robot Motions using Dirichlet Processes

    DEFF Research Database (Denmark)

    Krüger, Volker; Tikhanoff, Vadim; Natale, Lorenzo

    2012-01-01

    In this paper we discuss the use of the infinite Gaussian mixture model and Dirichlet processes for learning robot movements from demonstrations. Starting point of this work is an earlier paper where the authors learn a non-linear dynamic robot movement model from a small number of observations....... The model in that work is learned using a classical finite Gaussian mixture model (FGMM) where the Gaussian mixtures are appropriately constrained. The problem with this approach is that one needs to make a good guess for how many mixtures the FGMM should use. In this work, we generalize this approach...... our algorithm on the same data that was used in [5], where the authors use motion capture devices to record the demonstrations. As further validation we test our approach on novel data acquired on our iCub in a different demonstration scenario in which the robot is physically driven by the human...

  8. Research on Open-Closed-Loop Iterative Learning Control with Variable Forgetting Factor of Mobile Robots

    Directory of Open Access Journals (Sweden)

    Hongbin Wang

    2016-01-01

    Full Text Available We propose an iterative learning control algorithm (ILC that is developed using a variable forgetting factor to control a mobile robot. The proposed algorithm can be categorized as an open-closed-loop iterative learning control, which produces control instructions by using both previous and current data. However, introducing a variable forgetting factor can weaken the former control output and its variance in the control law while strengthening the robustness of the iterative learning control. If it is applied to the mobile robot, this will reduce position errors in robot trajectory tracking control effectively. In this work, we show that the proposed algorithm guarantees tracking error bound convergence to a small neighborhood of the origin under the condition of state disturbances, output measurement noises, and fluctuation of system dynamics. By using simulation, we demonstrate that the controller is effective in realizing the prefect tracking.

  9. Robots Learn to Recognize Individuals from Imitative Encounters with People and Avatars

    Science.gov (United States)

    Boucenna, Sofiane; Cohen, David; Meltzoff, Andrew N.; Gaussier, Philippe; Chetouani, Mohamed

    2016-02-01

    Prior to language, human infants are prolific imitators. Developmental science grounds infant imitation in the neural coding of actions, and highlights the use of imitation for learning from and about people. Here, we used computational modeling and a robot implementation to explore the functional value of action imitation. We report 3 experiments using a mutual imitation task between robots, adults, typically developing children, and children with Autism Spectrum Disorder. We show that a particular learning architecture - specifically one combining artificial neural nets for (i) extraction of visual features, (ii) the robot’s motor internal state, (iii) posture recognition, and (iv) novelty detection - is able to learn from an interactive experience involving mutual imitation. This mutual imitation experience allowed the robot to recognize the interactive agent in a subsequent encounter. These experiments using robots as tools for modeling human cognitive development, based on developmental theory, confirm the promise of developmental robotics. Additionally, findings illustrate how person recognition may emerge through imitative experience, intercorporeal mapping, and statistical learning.

  10. Instructional control of reinforcement learning: A behavioral and neurocomputational investigation

    NARCIS (Netherlands)

    Doll, B.B.; Jacobs, W.J.; Sanfey, A.G.; Frank, M.J.

    2009-01-01

    Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S (Ed) 1989. Rule-governed behavior:

  11. Correction of Visual Perception Based on Neuro-Fuzzy Learning for the Humanoid Robot TEO

    Directory of Open Access Journals (Sweden)

    Juan Hernandez-Vicen

    2018-03-01

    Full Text Available New applications related to robotic manipulation or transportation tasks, with or without physical grasping, are continuously being developed. To perform these activities, the robot takes advantage of different kinds of perceptions. One of the key perceptions in robotics is vision. However, some problems related to image processing makes the application of visual information within robot control algorithms difficult. Camera-based systems have inherent errors that affect the quality and reliability of the information obtained. The need of correcting image distortion slows down image parameter computing, which decreases performance of control algorithms. In this paper, a new approach to correcting several sources of visual distortions on images in only one computing step is proposed. The goal of this system/algorithm is the computation of the tilt angle of an object transported by a robot, minimizing image inherent errors and increasing computing speed. After capturing the image, the computer system extracts the angle using a Fuzzy filter that corrects at the same time all possible distortions, obtaining the real angle in only one processing step. This filter has been developed by the means of Neuro-Fuzzy learning techniques, using datasets with information obtained from real experiments. In this way, the computing time has been decreased and the performance of the application has been improved. The resulting algorithm has been tried out experimentally in robot transportation tasks in the humanoid robot TEO (Task Environment Operator from the University Carlos III of Madrid.

  12. Closed-loop adaptation of neurofeedback based on mental effort facilitates reinforcement learning of brain self-regulation.

    Science.gov (United States)

    Bauer, Robert; Fels, Meike; Royter, Vladislav; Raco, Valerio; Gharabaghi, Alireza

    2016-09-01

    Considering self-rated mental effort during neurofeedback may improve training of brain self-regulation. Twenty-one healthy, right-handed subjects performed kinesthetic motor imagery of opening their left hand, while threshold-based classification of beta-band desynchronization resulted in proprioceptive robotic feedback. The experiment consisted of two blocks in a cross-over design. The participants rated their perceived mental effort nine times per block. In the adaptive block, the threshold was adjusted on the basis of these ratings whereas adjustments were carried out at random in the other block. Electroencephalography was used to examine the cortical activation patterns during the training sessions. The perceived mental effort was correlated with the difficulty threshold of neurofeedback training. Adaptive threshold-setting reduced mental effort and increased the classification accuracy and positive predictive value. This was paralleled by an inter-hemispheric cortical activation pattern in low frequency bands connecting the right frontal and left parietal areas. Optimal balance of mental effort was achieved at thresholds significantly higher than maximum classification accuracy. Rating of mental effort is a feasible approach for effective threshold-adaptation during neurofeedback training. Closed-loop adaptation of the neurofeedback difficulty level facilitates reinforcement learning of brain self-regulation. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  13. Autonomous Inter-Task Transfer in Reinforcement Learning Domains

    Science.gov (United States)

    2008-08-01

    Mountain Car. However, because the source task uses a car with a motor more than twice as powerful as in the 3D task, the tran- sition function learned in...powerful car motor or changing the surface friction of the hill • s: changing the range of the state variables • si: changing where the car starts...Aamodt and Enric Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches, 1994. Mazda Ahmadi, Matthew E

  14. Stress Modulates Reinforcement Learning in Younger and Older Adults

    OpenAIRE

    Lighthall, Nichole R.; Gorlick, Marissa A.; Schoeke, Andrej; Frank, Michael J.; Mather, Mara

    2012-01-01

    Animal research and human neuroimaging studies indicate that stress increases dopamine levels in brain regions involved in reward processing and stress also appears to increase the attractiveness of addictive drugs. The current study tested the hypothesis that stress increases reward salience, leading to more effective learning about positive than negative outcomes in a probabilistic selection task. Changes to dopamine pathways with age raise the question of whether stress effects on incentiv...

  15. Learning to Automatically Detect Features for Mobile Robots Using Second-Order Hidden Markov Models

    Directory of Open Access Journals (Sweden)

    Olivier Aycard

    2004-12-01

    Full Text Available In this paper, we propose a new method based on Hidden Markov Models to interpret temporal sequences of sensor data from mobile robots to automatically detect features. Hidden Markov Models have been used for a long time in pattern recognition, especially in speech recognition. Their main advantages over other methods (such as neural networks are their ability to model noisy temporal signals of variable length. We show in this paper that this approach is well suited for interpretation of temporal sequences of mobile-robot sensor data. We present two distinct experiments and results: the first one in an indoor environment where a mobile robot learns to detect features like open doors or T-intersections, the second one in an outdoor environment where a different mobile robot has to identify situations like climbing a hill or crossing a rock.

  16. Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning.

    Science.gov (United States)

    Zhu, Lusha; Mathewson, Kyle E; Hsu, Ming

    2012-01-31

    Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents' beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs.

  17. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia

    Science.gov (United States)

    Markou, Athina; Salamone, John D.; Bussey, Timothy; Mar, Adam; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-01-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu). A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. PMID:23994273

  18. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia.

    Science.gov (United States)

    Markou, Athina; Salamone, John D; Bussey, Timothy J; Mar, Adam C; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-11-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu) meeting. A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. Copyright © 2013 Elsevier

  19. Humanoid Cognitive Robots That Learn by Imitating: Implications for Consciousness Studies

    Directory of Open Access Journals (Sweden)

    James A. Reggia

    2018-01-01

    Full Text Available While the concept of a conscious machine is intriguing, producing such a machine remains controversial and challenging. Here, we describe how our work on creating a humanoid cognitive robot that learns to perform tasks via imitation learning relates to this issue. Our discussion is divided into three parts. First, we summarize our previous framework for advancing the understanding of the nature of phenomenal consciousness. This framework is based on identifying computational correlates of consciousness. Second, we describe a cognitive robotic system that we recently developed that learns to perform tasks by imitating human-provided demonstrations. This humanoid robot uses cause–effect reasoning to infer a demonstrator’s intentions in performing a task, rather than just imitating the observed actions verbatim. In particular, its cognitive components center on top-down control of a working memory that retains the explanatory interpretations that the robot constructs during learning. Finally, we describe our ongoing work that is focused on converting our robot’s imitation learning cognitive system into purely neurocomputational form, including both its low-level cognitive neuromotor components, its use of working memory, and its causal reasoning mechanisms. Based on our initial results, we argue that the top-down cognitive control of working memory, and in particular its gating mechanisms, is an important potential computational correlate of consciousness in humanoid robots. We conclude that developing high-level neurocognitive control systems for cognitive robots and using them to search for computational correlates of consciousness provides an important approach to advancing our understanding of consciousness, and that it provides a credible and achievable route to ultimately developing a phenomenally conscious machine.

  20. Representation Learning of Logic Words by an RNN: From Word Sequences to Robot Actions

    Directory of Open Access Journals (Sweden)

    Tatsuro Yamada

    2017-12-01

    Full Text Available An important characteristic of human language is compositionality. We can efficiently express a wide variety of real-world situations, events, and behaviors by compositionally constructing the meaning of a complex expression from a finite number of elements. Previous studies have analyzed how machine-learning models, particularly neural networks, can learn from experience to represent compositional relationships between language and robot actions with the aim of understanding the symbol grounding structure and achieving intelligent communicative agents. Such studies have mainly dealt with the words (nouns, adjectives, and verbs that directly refer to real-world matters. In addition to these words, the current study deals with logic words, such as “not,” “and,” and “or” simultaneously. These words are not directly referring to the real world, but are logical operators that contribute to the construction of meaning in sentences. In human–robot communication, these words may be used often. The current study builds a recurrent neural network model with long short-term memory units and trains it to learn to translate sentences including logic words into robot actions. We investigate what kind of compositional representations, which mediate sentences and robot actions, emerge as the network's internal states via the learning process. Analysis after learning shows that referential words are merged with visual information and the robot's own current state, and the logical words are represented by the model in accordance with their functions as logical operators. Words such as “true,” “false,” and “not” work as non-linear transformations to encode orthogonal phrases into the same area in a memory cell state space. The word “and,” which required a robot to lift up both its hands, worked as if it was a universal quantifier. The word “or,” which required action generation that looked apparently random, was represented as an

  1. Reinforcement learning on slow features of high-dimensional input streams.

    Directory of Open Access Journals (Sweden)

    Robert Legenstein

    Full Text Available Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning.

  2. Learning curve for robotic-assisted surgery for rectal cancer: use of the cumulative sum method.

    Science.gov (United States)

    Yamaguchi, Tomohiro; Kinugasa, Yusuke; Shiomi, Akio; Sato, Sumito; Yamakawa, Yushi; Kagawa, Hiroyasu; Tomioka, Hiroyuki; Mori, Keita

    2015-07-01

    Few data are available to assess the learning curve for robotic-assisted surgery for rectal cancer. The aim of the present study was to evaluate the learning curve for robotic-assisted surgery for rectal cancer by a surgeon at a single institute. From December 2011 to August 2013, a total of 80 consecutive patients who underwent robotic-assisted surgery for rectal cancer performed by the same surgeon were included in this study. The learning curve was analyzed using the cumulative sum method. This method was used for all 80 cases, taking into account operative time. Operative procedures included anterior resections in 6 patients, low anterior resections in 46 patients, intersphincteric resections in 22 patients, and abdominoperineal resections in 6 patients. Lateral lymph node dissection was performed in 28 patients. Median operative time was 280 min (range 135-683 min), and median blood loss was 17 mL (range 0-690 mL). No postoperative complications of Clavien-Dindo classification Grade III or IV were encountered. We arranged operative times and calculated cumulative sum values, allowing differentiation of three phases: phase I, Cases 1-25; phase II, Cases 26-50; and phase III, Cases 51-80. Our data suggested three phases of the learning curve in robotic-assisted surgery for rectal cancer. The first 25 cases formed the learning phase.

  3. Does Prior Laparoscopic and Open Surgery Experience Have Any Impact on Learning Curve in Transition to Robotic Surgery?

    Directory of Open Access Journals (Sweden)

    Cüneyt Adayener

    2016-12-01

    Full Text Available It has been 15 years since the Food And Drug Administration approved the Da Vinci® robotic surgery system. Robotic applications are being used extensively in urology, particularly in radical prostatectomy. Like all high-tech products, this system also has a high cost and a steep learning curve, therefore, preventing it from becoming widespread. There are various studies on the effect of open surgery or laparoscopy experience on the learning curve of robotic surgery. Analyzing these interactions well will provide valuable information on making the training period of robotic system more efficient.

  4. Use of robotics as a learning aid for disabled children

    Directory of Open Access Journals (Sweden)

    Teodiano Freire Bastos

    2012-05-01

    Full Text Available Severe disabled children have little chance of environmental and social exploration and discovery, and due to this lack of interaction and independency, it may lead to an idea that they are unable to do anything by themselves. Trying to help these children on this situation, educational robotics can offer and aid, once it can give them a certain degree of independency in exploration of environment. The system developed in this work allows the child to transmit the commands to a robot. Sensors placed on the child’s body can obtain information from head movement or muscle signals to command the robot to carry out tasks. With the use of this system, the disabled children get a better cognitive development and social interaction, balancing in a certain way, the negative effects of their disabilities.

  5. What do we learn about development from baby robots?

    Science.gov (United States)

    Oudeyer, Pierre-Yves

    2017-01-01

    Understanding infant development is one of the great scientific challenges of contemporary science. In addressing this challenge, robots have proven useful as they allow experimenters to model the developing brain and body and understand the processes by which new patterns emerge in sensorimotor, cognitive, and social domains. Robotics also complements traditional experimental methods in psychology and neuroscience, where only a few variables can be studied at the same time. Moreover, work with robots has enabled researchers to systematically explore the role of the body in shaping the development of skill. All told, this work has shed new light on development as a complex dynamical system. WIREs Cogn Sci 2017, 8:e1395. doi: 10.1002/wcs.1395 For further resources related to this article, please visit the WIREs website. © 2016 Wiley Periodicals, Inc.

  6. Incremental active learning of sensorimotor models in developmental robotics

    OpenAIRE

    Ribes Sanz, Arturo

    2015-01-01

    La rápida evolución de la robótica esta promoviendo que emerjan nuevos campos relacionados con la robótica. Inspirándose en ideas provinientes de la psicología del desarrollo, la robótica del desarrollo es un nuevo campo que pretende proveer a los robots de capacidades que les permiten aprender de una manera abierta durante toda su vida. Hay situaciones donde los ingenieros o los diseñadores no pueden prever todos los posibles problemas que un robot pueda encontrar. Tal como el número de tare...

  7. A Review of the Relationship between Novelty, Intrinsic Motivation and Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Siddique Nazmul

    2017-11-01

    Full Text Available This paper presents a review on the tri-partite relationship between novelty, intrinsic motivation and reinforcement learning. The paper first presents a literature survey on novelty and the different computational models of novelty detection, with a specific focus on the features of stimuli that trigger a Hedonic value for generating a novelty signal. It then presents an overview of intrinsic motivation and investigations into different models with the aim of exploring deeper co-relationships between specific features of a novelty signal and its effect on intrinsic motivation in producing a reward function. Finally, it presents survey results on reinforcement learning, different models and their functional relationship with intrinsic motivation.

  8. Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.

    Science.gov (United States)

    Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

    2015-01-01

    Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.

  9. The Study of Reinforcement Learning for Traffic Self-Adaptive Control under Multiagent Markov Game Environment

    Directory of Open Access Journals (Sweden)

    Lun-Hui Xu

    2013-01-01

    Full Text Available Urban traffic self-adaptive control problem is dynamic and uncertain, so the states of traffic environment are hard to be observed. Efficient agent which controls a single intersection can be discovered automatically via multiagent reinforcement learning. However, in the majority of the previous works on this approach, each agent needed perfect observed information when interacting with the environment and learned individually with less efficient coordination. This study casts traffic self-adaptive control as a multiagent Markov game problem. The design employs traffic signal control agent (TSCA for each signalized intersection that coordinates with neighboring TSCAs. A mathematical model for TSCAs’ interaction is built based on nonzero-sum markov game which has been applied to let TSCAs learn how to cooperate. A multiagent Markov game reinforcement learning approach is constructed on the basis of single-agent Q-learning. This method lets each TSCA learn to update its Q-values under the joint actions and imperfect information. The convergence of the proposed algorithm is analyzed theoretically. The simulation results show that the proposed method is convergent and effective in realistic traffic self-adaptive control setting.

  10. Optimal Search Strategy of Robotic Assembly Based on Neural Vibration Learning

    Directory of Open Access Journals (Sweden)

    Lejla Banjanovic-Mehmedovic

    2011-01-01

    Full Text Available This paper presents implementation of optimal search strategy (OSS in verification of assembly process based on neural vibration learning. The application problem is the complex robot assembly of miniature parts in the example of mating the gears of one multistage planetary speed reducer. Assembly of tube over the planetary gears was noticed as the most difficult problem of overall assembly. The favourable influence of vibration and rotation movement on compensation of tolerance was also observed. With the proposed neural-network-based learning algorithm, it is possible to find extended scope of vibration state parameter. Using optimal search strategy based on minimal distance path between vibration parameter stage sets (amplitude and frequencies of robots gripe vibration and recovery parameter algorithm, we can improve the robot assembly behaviour, that is, allow the fastest possible way of mating. We have verified by using simulation programs that search strategy is suitable for the situation of unexpected events due to uncertainties.

  11. Multichannel sound reinforcement systems at work in a learning environment

    Science.gov (United States)

    Malek, John; Campbell, Colin

    2003-04-01

    Many people have experienced the entertaining benefits of a surround sound system, either in their own home or in a movie theater, but another application exists for multichannel sound that has for the most part gone unused. This is the application of multichannel sound systems to the learning environment. By incorporating a 7.1 surround processor and a touch panel interface programmable control system, the main lecture hall at the University of Michigan Taubman College of Architecture and Urban Planning has been converted from an ordinary lecture hall to a working audiovisual laboratory. The multichannel sound system is used in a wide variety of experiments, including exposure to sounds to test listeners' aural perception of the tonal characteristics of varying pitch, reverberation, speech transmission index, and sound-pressure level. The touch panel's custom interface allows a variety of user groups to control different parts of the AV system and provides preset capability that allows for numerous system configurations.

  12. TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow

    OpenAIRE

    Hafner, Danijar; Davidson, James; Vanhoucke, Vincent

    2017-01-01

    We introduce TensorFlow Agents, an efficient infrastructure paradigm for building parallel reinforcement learning algorithms in TensorFlow. We simulate multiple environments in parallel, and group them to perform the neural network computation on a batch rather than individual observations. This allows the TensorFlow execution engine to parallelize computation, without the need for manual synchronization. Environments are stepped in separate Python processes to progress them in parallel witho...

  13. Skill Learning for Intelligent Robot by Perception-Action Integration: A View from Hierarchical Temporal Memory

    Directory of Open Access Journals (Sweden)

    Xinzheng Zhang

    2017-01-01

    Full Text Available Skill learning autonomously through interactions with the environment is a crucial ability for intelligent robot. A perception-action integration or sensorimotor cycle, as an important issue in imitation learning, is a natural mechanism without the complex program process. Recently, neurocomputing model and developmental intelligence method are considered as a new trend for implementing the robot skill learning. In this paper, based on research of the human brain neocortex model, we present a skill learning method by perception-action integration strategy from the perspective of hierarchical temporal memory (HTM theory. The sequential sensor data representing a certain skill from a RGB-D camera are received and then encoded as a sequence of Sparse Distributed Representation (SDR vectors. The sequential SDR vectors are treated as the inputs of the perception-action HTM. The HTM learns sequences of SDRs and makes predictions of what the next input SDR will be. It stores the transitions of the current perceived sensor data and next predicted actions. We evaluated the performance of this proposed framework for learning the shaking hands skill on a humanoid NAO robot. The experimental results manifest that the skill learning method designed in this paper is promising.

  14. Concurrent Unimodal Learning Enhances Multisensory Responses of Bi-Directional Crossmodal Learning in Robotic Audio-Visual Tracking

    DEFF Research Database (Denmark)

    Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

    2018-01-01

    modalities to independently update modality-specific neural weights on a moment-by-moment basis, in response to dynamic changes in noisy sensory stimuli. The circuit is embodied as a non-holonomic robotic agent that must orient a towards a moving audio-visual target. The circuit continuously learns the best...

  15. Learning with Charlie: A robot buddy for children with diabetes

    NARCIS (Netherlands)

    Blanson Henkemans, O.A.; Pal, S.M. van der; Werner, I.; Looije, R.; Neericnx, M.A.

    2017-01-01

    Children with type 1 diabetes mellitus (T1DM) have a need for social, cognitive and affective support for self-management. The PAL project develops a social robot and its avatar. The aim is to assist the child, health care professional and parents to jointly perform diabetes management. Diabetes

  16. Learning Preference Models for Autonomous Mobile Robots in Complex Domains

    Science.gov (United States)

    2010-12-01

    templates,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 167–172, 2009. 128, 153 [254] J. Z. Kolter , P. Abbeel, and A. Y...92] Knepper, R. A. [246] Koller, D. [234] Kolter , J. Z. [254] Konolige, K. [148] Koutsougeras, C. [32] Krotkov, E. [96, 123, 178] Kuffner, J. [252

  17. Supporting Robotics Education in STEM with Learning Analytics

    DEFF Research Database (Denmark)

    Spikol, Daniel; Friesel, Anna; Ehrenberg, H.

    of instructional practices that center on collaborative, hands-on, engineering design problems. The use of hands-on engineering design problems for robotics in classroom teaching are facilitated by physical computing kits, such as programmable microcontrollers like Arduino, and other platforms. These kits provide...

  18. Iterative learning control with sampled-data feedback for robot manipulators

    Directory of Open Access Journals (Sweden)

    Delchev Kamen

    2014-09-01

    Full Text Available This paper deals with the improvement of the stability of sampled-data (SD feedback control for nonlinear multiple-input multiple-output time varying systems, such as robotic manipulators, by incorporating an off-line model based nonlinear iterative learning controller. The proposed scheme of nonlinear iterative learning control (NILC with SD feedback is applicable to a large class of robots because the sampled-data feedback is required for model based feedback controllers, especially for robotic manipulators with complicated dynamics (6 or 7 DOF, or more, while the feedforward control from the off-line iterative learning controller should be assumed as a continuous one. The robustness and convergence of the proposed NILC law with SD feedback is proven, and the derived sufficient condition for convergence is the same as the condition for a NILC with a continuous feedback control input. With respect to the presented NILC algorithm applied to a virtual PUMA 560 robot, simulation results are presented in order to verify convergence and applicability of the proposed learning controller with SD feedback controller attached

  19. Measuring and Modelling Delays in Robot Manipulators for Temporally Precise Control using Machine Learning

    DEFF Research Database (Denmark)

    Andersen, Thomas Timm; Amor, Heni Ben; Andersen, Nils Axel

    2015-01-01

    and separate. In this paper, we present a data-driven methodology for separating and modelling inherent delays during robot control. We show how both actuation and response delays can be modelled using modern machine learning methods. The resulting models can be used to predict the delays as well...

  20. A Robot-Partner for Preschool Children Learning English Using Socio-Cognitive Conflict

    Science.gov (United States)

    Mazzoni, Elvis; Benvenuti, Martina

    2015-01-01

    This paper presents an exploratory study in which a humanoid robot (MecWilly) acted as a partner to preschool children, helping them to learn English words. In order to use the Socio-Cognitive Conflict paradigm to induce the knowledge acquisition process, we designed a playful activity in which children worked in pairs with another child or with…

  1. An e-Learning System with MR for Experiments Involving Circuit Construction to Control a Robot

    Science.gov (United States)

    Takemura, Atsushi

    2016-01-01

    This paper proposes a novel e-Learning system for technological experiments involving electronic circuit-construction and controlling robot motion that are necessary in the field of technology. The proposed system performs automated recognition of circuit images transmitted from individual learners and automatically supplies the learner with…

  2. Anticipatory Driving for a Robot-Car Based on Supervised Learning

    DEFF Research Database (Denmark)

    Markelic, I.; Kulvicius, Tomas; Tamosiunaite, M.

    2009-01-01

    Using look ahead information and plan making improves hu- man driving. We therefore propose that also autonomously driving systems should dispose over such abilities. We adapt a machine learning approach, where the system, a car-like robot, is trained by an experienced driver by correlating visual...

  3. Action understanding and imitation learning in a robot-human task

    NARCIS (Netherlands)

    Erlhagen, W.; Mukovskiy, A.; Bicho, E.; Panin, G.; Kiss, C.; Knoll, A.; Schie, H.T. van; Bekkering, H.

    2005-01-01

    We report results of an interdisciplinary project which aims at endowing a real robot system with the capacity for learning by goal-directed imitation. The control architecture is biologically inspired as it reflects recent experimental findings in action observation/execution studies. We test its

  4. Are Commercial "Personal Robots" Ready for Language Learning? Focus on Second Language Speech

    Science.gov (United States)

    Moussalli, Souheila; Cardoso, Walcir

    2016-01-01

    Today's language classrooms are challenged with limited classroom time and lack of input, and output practice in a stress-free environment (Hsu, 2015). The use of commercial, readily available tools such as Personal Robots (PRs; e.g. Amazon's Echo, Jibo) might promote language learning by freeing up class time, allowing for a more focused…

  5. Robotic Construction Kits as Computational Manipulatives for Learning in the STEM Disciplines

    Science.gov (United States)

    Sullivan, Florence R.; Heffernan, John

    2016-01-01

    This article presents a systematic review of research related to the use of robotics construction kits (RCKs) in P-12 learning in the STEM disciplines for typically developing children. The purpose of this review is to configure primarily qualitative and mixed methods findings from studies meeting our selection and quality criterion to answer the…

  6. Learning-based position control of a closed-kinematic chain robot end-effector

    Science.gov (United States)

    Nguyen, Charles C.; Zhou, Zhen-Lei

    1990-01-01

    A trajectory control scheme whose design is based on learning theory, for a six-degree-of-freedom (DOF) robot end-effector built to study robotic assembly of NASA hardwares in space is presented. The control scheme consists of two control systems: the feedback control system and the learning control system. The feedback control system is designed using the concept of linearization about a selected operating point, and the method of pole placement so that the closed-loop linearized system is stabilized. The learning control scheme consisting of PD-type learning controllers, provides additional inputs to improve the end-effector performance after each trial. Experimental studies performed on a 2 DOF end-effector built at CUA, for three tracking cases show that actual trajectories approach desired trajectories as the number of trials increases. The tracking errors are substantially reduced after only five trials.

  7. Evidence of Self-Directed Learning on a High School Robotics Team

    Directory of Open Access Journals (Sweden)

    Nathan R. Dolenc

    2014-12-01

    Full Text Available Self-directed learning is described as an individual taking the initiative to engage in a learning experience while assuming responsibility to follow through to its conclusion. Robotics competitions are examples of informal environments that can facilitate self-directed learning. This study examined how mentor involvement, student behavior, and physical workspace contributed to self-directed learning on one robotics competition team. How did mentors transfer responsibility to students? How did students respond to managing a team? Are the physical attributes of a workspace important? The mentor, student, and workplace factors captured in the research showed mentors wanting students to do the work, students assuming leadership roles, and the limited workspace having a positive effect on student productivity.

  8. Nonparametric Online Learning Control for Soft Continuum Robot: An Enabling Technique for Effective Endoscopic Navigation

    Science.gov (United States)

    Lee, Kit-Hang; Fu, Denny K.C.; Leong, Martin C.W.; Chow, Marco; Fu, Hing-Choi; Althoefer, Kaspar; Sze, Kam Yim; Yeung, Chung-Kwong

    2017-01-01

    Abstract Bioinspired robotic structures comprising soft actuation units have attracted increasing research interest. Taking advantage of its inherent compliance, soft robots can assure safe interaction with external environments, provided that precise and effective manipulation could be achieved. Endoscopy is a typical application. However, previous model-based control approaches often require simplified geometric assumptions on the soft manipulator, but which could be very inaccurate in the presence of unmodeled external interaction forces. In this study, we propose a generic control framework based on nonparametric and online, as well as local, training to learn the inverse model directly, without prior knowledge of the robot's structural parameters. Detailed experimental evaluation was conducted on a soft robot prototype with control redundancy, performing trajectory tracking in dynamically constrained environments. Advanced element formulation of finite element analysis is employed to initialize the control policy, hence eliminating the need for random exploration in the robot's workspace. The proposed control framework enabled a soft fluid-driven continuum robot to follow a 3D trajectory precisely, even under dynamic external disturbance. Such enhanced control accuracy and adaptability would facilitate effective endoscopic navigation in complex and changing environments. PMID:29251567

  9. Diffusion of robotics into clinical practice in the United States: process, patient safety, learning curves, and the public health.

    Science.gov (United States)

    Mirheydar, Hossein S; Parsons, J Kellogg

    2013-06-01

    Robotic technology disseminated into urological practice without robust comparative effectiveness data. To review the diffusion of robotic surgery into urological practice. We performed a comprehensive literature review focusing on diffusion patterns, patient safety, learning curves, and comparative costs for robotic radical prostatectomy, partial nephrectomy, and radical cystectomy. Robotic urologic surgery diffused in patterns typical of novel technology spreading among practicing surgeons. Robust evidence-based data comparing outcomes of robotic to open surgery were sparse. Although initial Level 3 evidence for robotic prostatectomy observed complication outcomes similar to open prostatectomy, subsequent population-based Level 2 evidence noted an increased prevalence of adverse patient safety events and genitourinary complications among robotic patients during the early years of diffusion. Level 2 evidence indicated comparable to improved patient safety outcomes for robotic compared to open partial nephrectomy and cystectomy. Learning curve recommendations for robotic urologic surgery have drawn exclusively on Level 4 evidence and subjective, non-validated metrics. The minimum number of cases required to achieve competency for robotic prostatectomy has increased to unrealistically high levels. Most comparative cost-analyses have demonstrated that robotic surgery is significantly more expensive than open or laparoscopic surgery. Evidence-based data are limited but suggest an increased prevalence of adverse patient safety events for robotic prostatectomy early in the national diffusion period. Learning curves for robotic urologic surgery are subjective and based on non-validated metrics. The urological community should develop rigorous, evidence-based processes by which future technological innovations may diffuse in an organized and safe manner.

  10. Indirect iterative learning control for a discrete visual servo without a camera-robot model.

    Science.gov (United States)

    Jiang, Ping; Bamforth, Leon C A; Feng, Zuren; Baruch, John E F; Chen, YangQuan

    2007-08-01

    This paper presents a discrete learning controller for vision-guided robot trajectory imitation with no prior knowledge of the camera-robot model. A teacher demonstrates a desired movement in front of a camera, and then, the robot is tasked to replay it by repetitive tracking. The imitation procedure is considered as a discrete tracking control problem in the image plane, with an unknown and time-varying image Jacobian matrix. Instead of updating the control signal directly, as is usually done in iterative learning control (ILC), a series of neural networks are used to approximate the unknown Jacobian matrix around every sample point in the demonstrated trajectory, and the time-varying weights of local neural networks are identified through repetitive tracking, i.e., indirect ILC. This makes repetitive segmented training possible, and a segmented training strategy is presented to retain the training trajectories solely within the effective region for neural network approximation. However, a singularity problem may occur if an unmodified neural-network-based Jacobian estimation is used to calculate the robot end-effector velocity. A new weight modification algorithm is proposed which ensures invertibility of the estimation, thus circumventing the problem. Stability is further discussed, and the relationship between the approximation capability of the neural network and the tracking accuracy is obtained. Simulations and experiments are carried out to illustrate the validity of the proposed controller for trajectory imitation of robot manipulators with unknown time-varying Jacobian matrices.

  11. An Architecture for Emotional and Context-Aware Associative Learning for Robot Companions

    OpenAIRE

    Rizzi Raymundo, C.; Johnson, C. G.; Vargas, P. A.

    2015-01-01

    This work proposes a theoretical architectural model based on the brain's fear learning system with the purpose of generating artificial fear conditioning at both stimuli and context abstraction levels in robot companions. The proposed architecture is inspired by the different brain regions involved in fear learning, here divided into four modules that work in an integrated and parallel manner: the sensory system, the amygdala system, the hippocampal system and the working memory. Each of the...

  12. Interaction learning for dynamic movement primitives used in cooperative robotic tasks

    DEFF Research Database (Denmark)

    Kulvicius, Tomas; Biehl, Martin; Aein, Mohamad Javad

    2013-01-01

    Abstract Since several years dynamic movement primitives (DMPs) are more and more getting into the center of interest for flexible movement control in robotics. In this study we introduce sensory feedback together with a predictive learning mechanism which allows tightly coupled dual-agent systems...... to learn an adaptive, sensor-driven interaction based on DMPs. The coupled conventional (no-sensors, no learning) DMP-system automatically equilibrates and can still be solved analytically allowing us to derive conditions for stability. When adding adaptive sensor control we can show that both agents learn...

  13. Lessons learned from the STS-120/ISS 10A robotics operations

    Science.gov (United States)

    Aziz, Sarmad

    2010-01-01

    analysis of the robotics operations executed during the STS-120 and 10A stage mission. The paper will highlight the unique challenges associated with the planning and execution of the P6 truss and Node 2 module relocation tasks, as well as the robotics issues faced during the planning and execution of the solar array repair space walk. The analysis will address the operational techniques and mission planning guideline used to deal with tight timelines, structural loads issues, ISS attitude control issues, and complex interdependencies between various ISS systems during the assembly and solar array repair operations. A discussion of the lessons learned from the planning and execution of these complex robotics tasks will also be presented.

  14. Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex.

    Directory of Open Access Journals (Sweden)

    George L Chadderdon

    Full Text Available Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint "forearm" to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1, no learning (0, or punishment (-1, corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.

  15. Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex.

    Science.gov (United States)

    Chadderdon, George L; Neymotin, Samuel A; Kerr, Cliff C; Lytton, William W

    2012-01-01

    Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint "forearm" to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1), no learning (0), or punishment (-1), corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.

  16. Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder.

    Science.gov (United States)

    Rothkirch, Marcus; Tonn, Jonas; Köhler, Stephan; Sterzer, Philipp

    2017-04-01

    According to current concepts, major depressive disorder is strongly related to dysfunctional neural processing of motivational information, entailing impairments in reinforcement learning. While computational modelling can reveal the precise nature of neural learning signals, it has not been used to study learning-related neural dysfunctions in unmedicated patients with major depressive disorder so far. We thus aimed at comparing the neural coding of reward and punishment prediction errors, representing indicators of neural learning-related processes, between unmedicated patients with major depressive disorder and healthy participants. To this end, a group of unmedicated patients with major depressive disorder (n = 28) and a group of age- and sex-matched healthy control participants (n = 30) completed an instrumental learning task involving monetary gains and losses during functional magnetic resonance imaging. The two groups did not differ in their learning performance. Patients and control participants showed the same level of prediction error-related activity in the ventral striatum and the anterior insula. In contrast, neural coding of reward prediction errors in the medial orbitofrontal cortex was reduced in patients. Moreover, neural reward prediction error signals in the medial orbitofrontal cortex and ventral striatum showed negative correlations with anhedonia severity. Using a standard instrumental learning paradigm we found no evidence for an overall impairment of reinforcement learning in medication-free patients with major depressive disorder. Importantly, however, the attenuated neural coding of reward in the medial orbitofrontal cortex and the relation between anhedonia and reduced reward prediction error-signalling in the medial orbitofrontal cortex and ventral striatum likely reflect an impairment in experiencing pleasure from rewarding events as a key mechanism of anhedonia in major depressive disorder. © The Author (2017). Published by Oxford

  17. Learning and Correcting Robot Trajectory Keypoints from a Single Demonstration

    DEFF Research Database (Denmark)

    Juan, Iñigo Iturrate San; Østergaard, Esben Hallundbæk; Rytter, Martin

    2017-01-01

    of a trajectory from a single demonstration. Additionally, by utilizing velocity information in the task space, the method is able to achieve a level of precision that is sufficient for industrial assembly tasks. Along with this, we present a user study that shows that our method enables non-expert robot users......Kinesthetic teaching provides an accessible way for non-experts to quickly and easily program a robot system by demonstration. A crucial aspect of this technique is to obtain an accurate approximation of the robot’s intended trajectory for the task, while filtering out spurious aspects...... of the demonstration. While several methods to this end have been proposed, they either rely on several demonstrations or on the user explicitly indicating relevant trajectory waypoints. We propose a method, based on the Douglas-Peucker line simplification algorithm that is able to extract the notable points...

  18. STDP-based behavior learning on the TriBot robot

    Science.gov (United States)

    Arena, P.; De Fiore, S.; Patané, L.; Pollino, M.; Ventura, C.

    2009-05-01

    This paper describes a correlation-based navigation algorithm, based on an unsupervised learning paradigm for spiking neural networks, called Spike Timing Dependent Plasticity (STDP). This algorithm was implemented on a new bio-inspired hybrid mini-robot called TriBot to learn and increase its behavioral capabilities. In fact correlation based algorithms have been found to explain many basic behaviors in simple animals. The main interesting consequence of STDP is that the system is able to learn high-level sensor features, based on a set of basic reflexes, depending on some low-level sensor inputs. TriBot is composed of 3 modules, the first two being identical and inspired by the Whegs hybrid robot. The peculiar characteristics of the robot consists in the innovative shape of the three-spoke appendages that allow to increase stability of the structure. The last module is composed of two standard legs with 3 degrees of freedom each. Thanks to the cooperation among these modules, TriBot is able to face with irregular terrains overcoming potential deadlock situations, to climb high obstacles compared to its size and to manipulate objects. Robot experiments will be reported to demonstrate the potentiality and the effectiveness of the approach.

  19. Blended learning for reinforcing dental pharmacology in the clinical years: A qualitative analysis.

    Science.gov (United States)

    Eachempati, Prashanti; Kiran Kumar, K S; Sumanth, K N

    2016-10-01

    Blended learning has become the method of choice in educational institutions because of its systematic integration of traditional classroom teaching and online components. This study aims to analyze student's reflection regarding blended learning in dental pharmacology. A cross-sectional study was conducted in Faculty of Dentistry, Melaka-Manipal Medical College among 3 rd and 4 th year BDS students. A total of 145 dental students, who consented, participate in the study. Students were divided into 14 groups. Nine online sessions followed by nine face-to-face discussions were held. Each session addressed topics related to oral lesions and orofacial pain with pharmacological applications. After each week, students were asked to reflect on blended learning. On completion of 9 weeks, reflections were collected and analyzed. Qualitative analysis was done using thematic analysis model suggested by Braun and Clarke. The four main themes were identified, namely, merits of blended learning, skill in writing prescription for oral diseases, dosages of drugs, and identification of strengths and weakness. In general, the participants had a positive feedback regarding blended learning. Students felt more confident in drug selection and prescription writing. They could recollect the doses better after the online and face-to-face sessions. Most interestingly, the students reflected that they are able to identify their strength and weakness after the blended learning sessions. Blended learning module was successfully implemented for reinforcing dental pharmacology. The results obtained in this study enable us to plan future comparative studies to know the effectiveness of blended learning in dental pharmacology.

  20. Teaching Functional Patterns through Robotic Applications

    Directory of Open Access Journals (Sweden)

    J. Boender

    2016-11-01

    Full Text Available We present our approach to teaching functional programming to First Year Computer Science students at Middlesex University through projects in robotics. A holistic approach is taken to the curriculum, emphasising the connections between different subject areas. A key part of the students' learning is through practical projects that draw upon and integrate the taught material. To support these, we developed the Middlesex Robotic plaTfOrm (MIRTO, an open-source platform built using Raspberry Pi, Arduino, HUB-ee wheels and running Racket (a LISP dialect. In this paper we present the motivations for our choices and explain how a number of concepts of functional programming may be employed when programming robotic applications. We present some students' work with robotics projects: we consider the use of robotics projects to have been a success, both for their value in reinforcing students' understanding of programming concepts and for their value in motivating the students.

  1. Case Study Analyses of the Impact of Flipped Learning in Teaching Programming Robots

    Directory of Open Access Journals (Sweden)

    Majlinda Fetaji

    2016-11-01

    Full Text Available The focus of the research study was to investigate and find out the benefits of the flipped learning pedagogy on the student learning in teaching programming Robotics classes. Also, the assessment of whether it has any advantages over the traditional teaching methods in computer sciences. Assessment of learners on their attitudes, motivation, and effectiveness when using flipped classroom compared with traditional classroom has been realized. The research questions investigated are: “What kind of problems can we face when we have robotics classes in the traditional methods?”, “If we applied flipped learning method, can we solve these problems?”. In order to analyze all this, a case study experiment was realized and insights as well as recommendations are presented.

  2. Representation and Integration: Combining Robot Control, High-Level Planning, and Action Learning

    DEFF Research Database (Denmark)

    Petrick, Ronald; Kraft, Dirk; Mourao, Kira

    We describe an approach to integrated robot control, high-level planning, and action effect learning that attempts to overcome the representational difficulties that exist between these diverse areas. Our approach combines ideas from robot vision, knowledgelevel planning, and connectionist machine......-level action specifications, suitable for planning, from a robot’s interactions with the world. We present a detailed overview of our approach and show how it supports the learning of certain aspects of a high-level lepresentation from low-level world state information....... learning, and focuses on the representational needs of these components.We also make use of a simple representational unit called an instantiated state transition fragment (ISTF) and a related structure called an object-action complex (OAC). The goal of this work is a general approach for inducing high...

  3. Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

    Directory of Open Access Journals (Sweden)

    Nicolas Frémaux

    2013-04-01

    Full Text Available Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD learning of Doya (2000 to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

  4. 'Proactive' use of cue-context congruence for building reinforcement learning's reward function.

    Science.gov (United States)

    Zsuga, Judit; Biro, Klara; Tajti, Gabor; Szilasi, Magdolna Emma; Papp, Csaba; Juhasz, Bela; Gesztelyi, Rudolf

    2016-10-28

    Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent's knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability. The central objective of reinforcement learning is to solve these two functions outside the agent's control either using, or not using a model. In the present paper, using the proactive model of reinforcement learning we offer insight on how the brain creates simplified representations of the environment, and how these representations are organized to support the identification of relevant stimuli and action. Furthermore, we identify neurobiological correlates of our model by suggesting that the reward and policy functions, attributes of the Bellman equitation, are built by the orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC), respectively. Based on this we propose that the OFC assesses cue-context congruence to activate the most context frame. Furthermore given the bidirectional neuroanatomical link between the OFC and model-free structures, we suggest that model-based input is incorporated into the reward prediction error (RPE) signal, and conversely RPE signal may be used to update the reward-related information of context frames and the policy underlying action selection in the OFC and ACC, respectively. Furthermore clinical implications for cognitive behavioral interventions are discussed.

  5. Visual Semantic Navigation Based on Deep Learning for Indoor Mobile Robots

    Directory of Open Access Journals (Sweden)

    Li Wang

    2018-01-01

    Full Text Available In order to improve the environmental perception ability of mobile robots during semantic navigation, a three-layer perception framework based on transfer learning is proposed, including a place recognition model, a rotation region recognition model, and a “side” recognition model. The first model is used to recognize different regions in rooms and corridors, the second one is used to determine where the robot should be rotated, and the third one is used to decide the walking side of corridors or aisles in the room. Furthermore, the “side” recognition model can also correct the motion of robots in real time, according to which accurate arrival to the specific target is guaranteed. Moreover, semantic navigation is accomplished using only one sensor (a camera. Several experiments are conducted in a real indoor environment, demonstrating the effectiveness and robustness of the proposed perception framework.

  6. Pragmatic Frames for Teaching and Learning in Human-Robot Interaction: Review and Challenges.

    Science.gov (United States)

    Vollmer, Anna-Lisa; Wrede, Britta; Rohlfing, Katharina J; Oudeyer, Pierre-Yves

    2016-01-01

    One of the big challenges in robotics today is to learn from human users that are inexperienced in interacting with robots but yet are often used to teach skills flexibly to other humans and to children in particular. A potential route toward natural and efficient learning and teaching in Human-Robot Interaction (HRI) is to leverage the social competences of humans and the underlying interactional mechanisms. In this perspective, this article discusses the importance of pragmatic frames as flexible interaction protocols that provide important contextual cues to enable learners to infer new action or language skills and teachers to convey these cues. After defining and discussing the concept of pragmatic frames, grounded in decades of research in developmental psychology, we study a selection of HRI work in the literature which has focused on learning-teaching interaction and analyze the interactional and learning mechanisms that were used in the light of pragmatic frames. This allows us to show that many of the works have already used in practice, but not always explicitly, basic elements of the pragmatic frames machinery. However, we also show that pragmatic frames have so far been used in a very restricted way as compared to how they are used in human-human interaction and argue that this has been an obstacle preventing robust natural multi-task learning and teaching in HRI. In particular, we explain that two central features of human pragmatic frames, mostly absent of existing HRI studies, are that (1) social peers use rich repertoires of frames, potentially combined together, to convey and infer multiple kinds of cues; (2) new frames can be learnt continually, building on existing ones, and guiding the interaction toward higher levels of complexity and expressivity. To conclude, we give an outlook on the future research direction describing the relevant key challenges that need to be solved for leveraging pragmatic frames for robot learning and teaching.

  7. A Predictive Study of Learner Attitudes Toward Open Learning in a Robotics Class

    Science.gov (United States)

    Avsec, Stanislav; Rihtarsic, David; Kocijancic, Slavko

    2014-10-01

    Open learning (OL) strives to transform teaching and learning by applying learning science and emerging technologies to increase student success, improve learning productivity, and lower barriers to access. OL of robotics has a significant growth rate in secondary and/or high schools, but failures exist. Little is known about why many users stop their OL after their initial experience. Previous research done under different task environments has suggested a variety of factors affecting user satisfaction with different types of OL. In this study, we tested a regression model for student satisfaction involving students' attitudes toward OL usage. A survey was conducted to investigate the critical factors affecting students' achievements and satisfaction in OL of robotics with use of own developed direct manipulation learning environment as learning context. A multiple regression analyses were carried out to investigate how different facets of students' expectations and experiences are related to perceived learning achievements and course satisfaction. Descriptive statistics and analysis of variance was performed to determine the effect of predictor variables to student satisfaction. The results demonstrate that students have significantly positive perceptions toward using OL of robotics as a learning-assisted tool. Furthermore, behavioral intention to use OL is influenced by perceived usefulness and self-efficacy. The following five major categories of satisfaction factors with OL course were revealed during analysis of the studies (effect sizes in parentheses): organization (0.69); implementation (0.61); professional content (0.53); interaction (0.43); self-efficacy (0.14). All these effect sizes were judged to be significant and large. The results also showed that learner-mentor/instructor interaction, learner-professional content interaction, and online and offline self-efficacy were good predictors of student satisfaction and course quality. Peer interactions and

  8. Retention of fundamental surgical skills learned in robot-assisted surgery.

    Science.gov (United States)

    Suh, Irene H; Mukherjee, Mukul; Shah, Bhavin C; Oleynikov, Dmitry; Siu, Ka-Chun

    2012-12-01

    Evaluation of the learning curve for robotic surgery has shown reduced errors and decreased task completion and training times compared with regular laparoscopic surgery. However, most training evaluations of robotic surgery have only addressed short-term retention after the completion of training. Our goal was to investigate the amount of surgical skills retained after 3 months of training with the da Vinci™ Surgical System. Seven medical students without any surgical experience were recruited. Participants were trained with a 4-day training program of robotic surgical skills and underwent a series of retention tests at 1 day, 1 week, 1 month, and 3 months post-training. Data analysis included time to task completion, speed, distance traveled, and movement curvature by the instrument tip. Performance of the participants was graded using the modified Objective Structured Assessment of Technical Skills (OSATS) for robotic surgery. Participants filled out a survey after each training session by answering a set of questions. Time to task completion and the movement curvature was decreased from pre- to post-training and the performance was retained at all the corresponding retention periods: 1 day, 1 week, 1 month, and 3 months. The modified OSATS showed improvement from pre-test to post-test and this improvement was maintained during all the retention periods. Participants increased in self-confidence and mastery in performing robotic surgical tasks after training. Our novel comprehensive training program improved robot-assisted surgical performance and learning. All trainees retained their fundamental surgical skills for 3 months after receiving the training program.

  9. Why Are There Developmental Stages in Language Learning? A Developmental Robotics Model of Language Development.

    Science.gov (United States)

    Morse, Anthony F; Cangelosi, Angelo

    2017-02-01

    Most theories of learning would predict a gradual acquisition and refinement of skills as learning progresses, and while some highlight exponential growth, this fails to explain why natural cognitive development typically progresses in stages. Models that do span multiple developmental stages typically have parameters to "switch" between stages. We argue that by taking an embodied view, the interaction between learning mechanisms, the resulting behavior of the agent, and the opportunities for learning that the environment provides can account for the stage-wise development of cognitive abilities. We summarize work relevant to this hypothesis and suggest two simple mechanisms that account for some developmental transitions: neural readiness focuses on changes in the neural substrate resulting from ongoing learning, and perceptual readiness focuses on the perceptual requirements for learning new tasks. Previous work has demonstrated these mechanisms in replications of a wide variety of infant language experiments, spanning multiple developmental stages. Here we piece this work together as a single model of ongoing learning with no parameter changes at all. The model, an instance of the Epigenetic Robotics Architecture (Morse et al 2010) embodied on the iCub humanoid robot, exhibits ongoing multi-stage development while learning pre-linguistic and then basic language skills. Copyright © 2016 Cognitive Science Society, Inc.

  10. A Combination of Machine Learning and Cerebellar-like Neural Networks for the Motor Control and Motor Learning of the Fable Modular Robot

    DEFF Research Database (Denmark)

    Baira Ojeda, Ismael; Tolu, Silvia; Pacheco, Moises

    2017-01-01

    We scaled up a bio-inspired control architecture for the motor control and motor learning of a real modular robot. In our approach, the Locally Weighted Projection Regression algorithm (LWPR) and a cerebellar microcircuit coexist, in the form of a Unit Learning Machine. The LWPR algorithm optimizes...... the input space and learns the internal model of a single robot module to command the robot to follow a desired trajectory with its end-effector. The cerebellar-like microcircuit refines the LWPR output delivering corrective commands. We contrasted distinct cerebellar-like circuits including analytical...

  11. A Plant Control Technology Using Reinforcement Learning Method with Automatic Reward Adjustment

    Science.gov (United States)

    Eguchi, Toru; Sekiai, Takaaki; Yamada, Akihiro; Shimizu, Satoru; Fukai, Masayuki

    A control technology using Reinforcement Learning (RL) and Radial Basis Function (RBF) Network has been developed to reduce environmental load substances exhausted from power and industrial plants. This technology consists of the statistic model using RBF Network, which estimates characteristics of plants with respect to environmental load substances, and RL agent, which learns the control logic for the plants using the statistic model. In this technology, it is necessary to design an appropriate reward function given to the agent immediately according to operation conditions and control goals to control plants flexibly. Therefore, we propose an automatic reward adjusting method of RL for plant control. This method adjusts the reward function automatically using information of the statistic model obtained in its learning process. In the simulations, it is confirmed that the proposed method can adjust the reward function adaptively for several test functions, and executes robust control toward the thermal power plant considering the change of operation conditions and control goals.

  12. Optimal medication dosing from suboptimal clinical examples: a deep reinforcement learning approach.

    Science.gov (United States)

    Nemati, Shamim; Ghassemi, Mohammad M; Clifford, Gari D

    2016-08-01

    Misdosing medications with sensitive therapeutic windows, such as heparin, can place patients at unnecessary risk, increase length of hospital stay, and lead to wasted hospital resources. In this work, we present a clinician-in-the-loop sequential decision making framework, which provides an individualized dosing policy adapted to each patient's evolving clinical phenotype. We employed retrospective data from the publicly available MIMIC II intensive care unit database, and developed a deep reinforcement learning algorithm that learns an optimal heparin dosing policy from sample dosing trails and their associated outcomes in large electronic medical records. Using separate training and testing datasets, our model was observed to be effective in proposing heparin doses that resulted in better expected outcomes than the clinical guidelines. Our results demonstrate that a sequential modeling approach, learned from retrospective data, could potentially be used at the bedside to derive individualized patient dosing policies.

  13. Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems

    International Nuclear Information System (INIS)

    Wei Qing-Lai; Song Rui-Zhuo; Xiao Wen-Dong; Sun Qiu-Ye

    2015-01-01

    This paper estimates an off-policy integral reinforcement learning (IRL) algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the system data generated by an arbitrary control. Moreover, off-policy IRL can be regarded as a direct learning method, which avoids the identification of system dynamics. In this paper, the performance index function is first given based on the system tracking error and control error. For solving the Hamilton–Jacobi–Bellman (HJB) equation, an off-policy IRL algorithm is proposed. It is proven that the iterative control makes the tracking error system asymptotically stable, and the iterative performance index function is convergent. Simulation study demonstrates the effectiveness of the developed tracking control method. (paper)

  14. An approach to robot SLAM based on incremental appearance learning with omnidirectional vision

    Science.gov (United States)

    Wu, Hua; Qin, Shi-Yin

    2011-03-01

    Localisation and mapping with an omnidirectional camera becomes more difficult as the landmark appearances change dramatically in the omnidirectional image. With conventional techniques, it is difficult to match the features of the landmark with the template. We present a novel robot simultaneous localisation and mapping (SLAM) algorithm with an omnidirectional camera, which uses incremental landmark appearance learning to provide posterior probability distribution for estimating the robot pose under a particle filtering framework. The major contribution of our work is to represent the posterior estimation of the robot pose by incremental probabilistic principal component analysis, which can be naturally incorporated into the particle filtering algorithm for robot SLAM. Moreover, the innovative method of this article allows the adoption of the severe distorted landmark appearances viewed with omnidirectional camera for robot SLAM. The experimental results demonstrate that the localisation error is less than 1 cm in an indoor environment using five landmarks, and the location of the landmark appearances can be estimated within 5 pixels deviation from the ground truth in the omnidirectional image at a fairly fast speed.

  15. The Initial Development of Object Knowledge by a Learning Robot.

    Science.gov (United States)

    Modayil, Joseph; Kuipers, Benjamin

    2008-11-30

    We describe how a robot can develop knowledge of the objects in its environment directly from unsupervised sensorimotor experience. The object knowledge consists of multiple integrated representations: trackers that form spatio-temporal clusters of sensory experience, percepts that represent properties for the tracked objects, classes that support efficient generalization from past experience, and actions that reliably change object percepts. We evaluate how well this intrinsically acquired object knowledge can be used to solve externally specified tasks including object recognition and achieving goals that require both planning and continuous control.

  16. A Passive Learning Sensor Architecture for Multimodal Image Labeling: An Application for Social Robots

    Directory of Open Access Journals (Sweden)

    Marco A. Gutiérrez

    2017-02-01

    Full Text Available Object detection and classification have countless applications in human–robot interacting systems. It is a necessary skill for autonomous robots that perform tasks in household scenarios. Despite the great advances in deep learning and computer vision, social robots performing non-trivial tasks usually spend most of their time finding and modeling objects. Working in real scenarios means dealing with constant environment changes and relatively low-quality sensor data due to the distance at which objects are often found. Ambient intelligence systems equipped with different sensors can also benefit from the ability to find objects, enabling them to inform humans about their location. For these applications to succeed, systems need to detect the objects that may potentially contain other objects, working with relatively low-resolution sensor data. A passive learning architecture for sensors has been designed in order to take advantage of multimodal information, obtained using an RGB-D camera and trained semantic language models. The main contribution of the architecture lies in the improvement of the performance of the sensor under conditions of low resolution and high light variations using a combination of image labeling and word semantics. The tests performed on each of the stages of the architecture compare this solution with current research labeling techniques for the application of an autonomous social robot working in an apartment. The results obtained demonstrate that the proposed sensor architecture outperforms state-of-the-art approaches.

  17. Dynamic Learning from Adaptive Neural Control of Uncertain Robots with Guaranteed Full-State Tracking Precision

    Directory of Open Access Journals (Sweden)

    Min Wang

    2017-01-01

    Full Text Available A dynamic learning method is developed for an uncertain n-link robot with unknown system dynamics, achieving predefined performance attributes on the link angular position and velocity tracking errors. For a known nonsingular initial robotic condition, performance functions and unconstrained transformation errors are employed to prevent the violation of the full-state tracking error constraints. By combining two independent Lyapunov functions and radial basis function (RBF neural network (NN approximator, a novel and simple adaptive neural control scheme is proposed for the dynamics of the unconstrained transformation errors, which guarantees uniformly ultimate boundedness of all the signals in the closed-loop system. In the steady-state control process, RBF NNs are verified to satisfy the partial persistent excitation (PE condition. Subsequently, an appropriate state transformation is adopted to achieve the accurate convergence of neural weight estimates. The corresponding experienced knowledge on unknown robotic dynamics is stored in NNs with constant neural weight values. Using the stored knowledge, a static neural learning controller is developed to improve the full-state tracking performance. A comparative simulation study on a 2-link robot illustrates the effectiveness of the proposed scheme.

  18. Hierarchical HMM based learning of navigation primitives for cooperative robotic endovascular catheterization.

    Science.gov (United States)

    Rafii-Tari, Hedyeh; Liu, Jindong; Payne, Christopher J; Bicknell, Colin; Yang, Guang-Zhong

    2014-01-01

    Despite increased use of remote-controlled steerable catheter navigation systems for endovascular intervention, most current designs are based on master configurations which tend to alter natural operator tool interactions. This introduces problems to both ergonomics and shared human-robot control. This paper proposes a novel cooperative robotic catheterization system based on learning-from-demonstration. By encoding the higher-level structure of a catheterization task as a sequence of primitive motions, we demonstrate how to achieve prospective learning for complex tasks whilst incorporating subject-specific variations. A hierarchical Hidden Markov Model is used to model each movement primitive as well as their sequential relationship. This model is applied to generation of motion sequences, recognition of operator input, and prediction of future movements for the robot. The framework is validated by comparing catheter tip motions against the manual approach, showing significant improvements in the quality of catheterization. The results motivate the design of collaborative robotic systems that are intuitive to use, while reducing the cognitive workload of the operator.

  19. Pragmatic Frames for Teaching and Learning in Human–Robot Interaction: Review and Challenges

    Science.gov (United States)

    Vollmer, Anna-Lisa; Wrede, Britta; Rohlfing, Katharina J.; Oudeyer, Pierre-Yves

    2016-01-01

    One of the big challenges in robotics today is to learn from human users that are inexperienced in interacting with robots but yet are often used to teach skills flexibly to other humans and to children in particular. A potential route toward natural and efficient learning and teaching in Human-Robot Interaction (HRI) is to leverage the social competences of humans and the underlying interactional mechanisms. In this perspective, this article discusses the importance of pragmatic frames as flexible interaction protocols that provide important contextual cues to enable learners to infer new action or language skills and teachers to convey these cues. After defining and discussing the concept of pragmatic frames, grounded in decades of research in developmental psychology, we study a selection of HRI work in the literature which has focused on learning–teaching interaction and analyze the interactional and learning mechanisms that were used in the light of pragmatic frames. This allows us to show that many of the works have already used in practice, but not always explicitly, basic elements of the pragmatic frames machinery. However, we also show that pragmatic frames have so far been used in a very restricted way as compared to how they are used in human–human interaction and argue that this has been an obstacle preventing robust natural multi-task learning and teaching in HRI. In particular, we explain that two central features of human pragmatic frames, mostly absent of existing HRI studies, are that (1) social peers use rich repertoires of frames, potentially combined together, to convey and infer multiple kinds of cues; (2) new frames can be learnt continually, building on existing ones, and guiding the interaction toward higher levels of complexity and expressivity. To conclude, we give an outlook on the future research direction describing the relevant key challenges that need to be solved for leveraging pragmatic frames for robot learning and teaching

  20. Universal effect of dynamical reinforcement learning mechanism in spatial evolutionary games

    International Nuclear Information System (INIS)

    Zhang, Hai-Feng; Wu, Zhi-Xi; Wang, Bing-Hong

    2012-01-01

    One of the prototypical mechanisms in understanding the ubiquitous cooperation in social dilemma situations is the win–stay, lose–shift rule. In this work, a generalized win–stay, lose–shift learning model—a reinforcement learning model with dynamic aspiration level—is proposed to describe how humans adapt their social behaviors based on their social experiences. In the model, the players incorporate the information of the outcomes in previous rounds with time-dependent aspiration payoffs to regulate the probability of choosing cooperation. By investigating such a reinforcement learning rule in the spatial prisoner's dilemma game and public goods game, a most noteworthy viewpoint is that moderate greediness (i.e. moderate aspiration level) favors best the development and organization of collective cooperation. The generality of this observation is tested against different regulation strengths and different types of network of interaction as well. We also make comparisons with two recently proposed models to highlight the importance of the mechanism of adaptive aspiration level in supporting cooperation in structured populations

  1. Reinforcement Learning with Autonomous Small Unmanned Aerial Vehicles in Cluttered Environments

    Science.gov (United States)

    Tran, Loc; Cross, Charles; Montague, Gilbert; Motter, Mark; Neilan, James; Qualls, Garry; Rothhaar, Paul; Trujillo, Anna; Allen, B. Danette

    2015-01-01

    We present ongoing work in the Autonomy Incubator at NASA Langley Research Center (LaRC) exploring the efficacy of a data set aggregation approach to reinforcement learning for small unmanned aerial vehicle (sUAV) flight in dense and cluttered environments with reactive obstacle avoidance. The goal is to learn an autonomous flight model using training experiences from a human piloting a sUAV around static obstacles. The training approach uses video data from a forward-facing camera that records the human pilot's flight. Various computer vision based features are extracted from the video relating to edge and gradient information. The recorded human-controlled inputs are used to train an autonomous control model that correlates the extracted feature vector to a yaw command. As part of the reinforcement learning approach, the autonomous control model is iteratively updated with feedback from a human agent who corrects undesired model output. This data driven approach to autonomous obstacle avoidance is explored for simulated forest environments furthering autonomous flight under the tree canopy research. This enables flight in previously inaccessible environments which are of interest to NASA researchers in Earth and Atmospheric sciences.

  2. Fuzzy OLAP association rules mining-based modular reinforcement learning approach for multiagent systems.

    Science.gov (United States)

    Kaya, Mehmet; Alhajj, Reda

    2005-04-01

    Multiagent systems and data mining have recently attracted considerable attention in the field of computing. Reinforcement learning is the most commonly used learning process for multiagent systems. However, it still has some drawbacks, including modeling other learning agents present in the domain as part of the state of the environment, and some states are experienced much less than others, or some state-action pairs are never visited during the learning phase. Further, before completing the learning process, an agent cannot exhibit a certain behavior in some states that may be experienced sufficiently. In this study, we propose a novel multiagent learning approach to handle these problems. Our approach is based on utilizing the mining process for modular cooperative learning systems. It incorporates fuzziness and online analytical processing (OLAP) based mining to effectively process the information reported by agents. First, we describe a fuzzy data cube OLAP architecture which facilitates effective storage and processing of the state information reported by agents. This way, the action of the other agent, not even in the visual environment. of the agent under consideration, can simply be predicted by extracting online association rules, a well-known data mining technique, from the constructed data cube. Second, we present a new action selection model, which is also based on association rules mining. Finally, we generalize not sufficiently experienced states, by mining multilevel association rules from the proposed fuzzy data cube. Experimental results obtained on two different versions of a well-known pursuit domain show the robustness and effectiveness of the proposed fuzzy OLAP mining based modular learning approach. Finally, we tested the scalability of the approach presented in this paper and compared it with our previous work on modular-fuzzy Q-learning and ordinary Q-learning.

  3. Filtering sensory information with XCSF: improving learning robustness and robot arm control performance.

    Science.gov (United States)

    Kneissler, Jan; Stalph, Patrick O; Drugowitsch, Jan; Butz, Martin V

    2014-01-01

    It has been shown previously that the control of a robot arm can be efficiently learned using the XCSF learning classifier system, which is a nonlinear regression system based on evolutionary computation. So far, however, the predictive knowledge about how actual motor activity changes the state of the arm system has not been exploited. In this paper, we utilize the forward velocity kinematics knowledge of XCSF to alleviate the negative effect of noisy sensors for successful learning and control. We incorporate Kalman filtering for estimating successive arm positions, iteratively combining sensory readings with XCSF-based predictions of hand position changes over time. The filtered arm position is used to improve both trajectory planning and further learning of the forward velocity kinematics. We test the approach on a simulated kinematic robot arm model. The results show that the combination can improve learning and control performance significantly. However, it also shows that variance estimates of XCSF prediction may be underestimated, in which case self-delusional spiraling effects can hinder effective learning. Thus, we introduce a heuristic parameter, which can be motivated by theory, and which limits the influence of XCSF's predictions on its own further learning input. As a result, we obtain drastic improvements in noise tolerance, allowing the system to cope with more than 10 times higher noise levels.

  4. Examining Students' Proportional Reasoning Strategy Levels as Evidence of the Impact of an Integrated LEGO Robotics and Mathematics Learning Experience

    Science.gov (United States)

    Martínez Ortiz, Araceli

    2015-01-01

    The presented study used a problem-solving experience in engineering design with LEGO robotics materials as the real-world mathematics-learning context. The goals of the study were (a) to determine if a short but intensive extracurricular learning experience would lead to significant student learning of a particular academic topic and (b) to…

  5. Dynamic pricing and automated resource allocation for complex information services reinforcement learning and combinatorial auctions

    CERN Document Server

    Schwind, Michael; Fandel, G

    2007-01-01

    Many firms provide their customers with online information products which require limited resources such as server capacity. This book develops allocation mechanisms that aim to ensure an efficient resource allocation in modern IT-services. Recent methods of artificial intelligence, such as neural networks and reinforcement learning, and nature-oriented optimization methods, such as genetic algorithms and simulated annealing, are advanced and applied to allocation processes in distributed IT-infrastructures, e.g. grid systems. The author presents two methods, both of which using the users??? w

  6. The learning curve of robot-assisted laparoscopic aortofemoral bypass grafting for aortoiliac occlusive disease.

    Science.gov (United States)

    Novotný, Tomáš; Dvorák, Martin; Staffa, Robert

    2011-02-01

    Since the end of the 20th century, robot-assisted surgery has been finding its role among other minimally invasive methods. Vascular surgery seems to be another specialty in which the benefits of this technology can be expected. Our objective was to assess the learning curve of robot-assisted laparoscopic aortofemoral bypass grafting for aortoiliac occlusive disease in a group of 40 patients. Between May 2006 and January 2010, 40 patients (32 men, 8 women), who were a median age of 58 years (range, 48-75 years), underwent 40 robot-assisted laparoscopic aortofemoral reconstructions. Learning curve estimations were used for anastomosis, clamping, and operative time assessment. For conversion rate evaluation, the cumulative summation (CUSUM) technique was used. Statistical analysis comparing the first and second half of our group, and unilateral-to-bilateral reconstructions were performed. We created 21 aortofemoral and 19 aortobifemoral bypasses. The median proximal anastomosis time was 23 minutes (range, 18-50 minutes), median clamping time was 60 minutes (range, 40-95 minutes), and median operative time was 295 minutes (range, 180-475 minutes). The 30-day mortality rate was 0%, and no graft or wound infection or cardiopulmonary or hepatorenal complications were observed. During the median 18-month follow-up (range, 2-48 months), three early graft occlusions occurred (7%). After reoperations, the secondary patency of reconstructions was 100%. Data showed a typical short learning curve for robotic proximal anastomosis creation with anastomosis and clamping time reduction. The operative time learning curve was flat, confirming the procedure's complexity. There were two conversions to open surgery. CUSUM analysis confirmed that an acceptable conversion rate set at 5% was achieved. Comparing the first and second half of our group, all recorded times showed statistically significant improvements. Differences between unilateral and bilateral reconstructions were not

  7. Design of Intelligent Robot as A Tool for Teaching Media Based on Computer Interactive Learning and Computer Assisted Learning to Improve the Skill of University Student

    Science.gov (United States)

    Zuhrie, M. S.; Basuki, I.; Asto B, I. G. P.; Anifah, L.

    2018-01-01

    The focus of the research is the teaching module which incorporates manufacturing, planning mechanical designing, controlling system through microprocessor technology and maneuverability of the robot. Computer interactive and computer-assisted learning is strategies that emphasize the use of computers and learning aids (computer assisted learning) in teaching and learning activity. This research applied the 4-D model research and development. The model is suggested by Thiagarajan, et.al (1974). 4-D Model consists of four stages: Define Stage, Design Stage, Develop Stage, and Disseminate Stage. This research was conducted by applying the research design development with an objective to produce a tool of learning in the form of intelligent robot modules and kit based on Computer Interactive Learning and Computer Assisted Learning. From the data of the Indonesia Robot Contest during the period of 2009-2015, it can be seen that the modules that have been developed confirm the fourth stage of the research methods of development; disseminate method. The modules which have been developed for students guide students to produce Intelligent Robot Tool for Teaching Based on Computer Interactive Learning and Computer Assisted Learning. Results of students’ responses also showed a positive feedback to relate to the module of robotics and computer-based interactive learning.

  8. Brain Circuits of Methamphetamine Place Reinforcement Learning: The Role of the Hippocampus-VTA Loop.

    Science.gov (United States)

    Keleta, Yonas B; Martinez, Joe L

    2012-03-01

    The reinforcing effects of addictive drugs including methamphetamine (METH) involve the midbrain ventral tegmental area (VTA). VTA is primary source of dopamine (DA) to the nucleus accumbens (NAc) and the ventral hippocampus (VHC). These three brain regions are functionally connected through the hippocampal-VTA loop that includes two main neural pathways: the bottom-up pathway and the top-down pathway. In this paper, we take the view that addiction is a learning process. Therefore, we tested the involvement of the hippocampus in reinforcement learning by studying conditioned place preference (CPP) learning by sequentially conditioning each of the three nuclei in either the bottom-up order of conditioning; VTA, then VHC, finally NAc, or the top-down order; VHC, then VTA, finally NAc. Following habituation, the rats underwent experimental modules consisting of two conditioning trials each followed by immediate testing (test 1 and test 2) and two additional tests 24 h (test 3) and/or 1 week following conditioning (test 4). The module was repeated three times for each nucleus. The results showed that METH, but not Ringer's, produced positive CPP following conditioning each brain area in the bottom-up order. In the top-down order, METH, but not Ringer's, produced either an aversive CPP or no learning effect following conditioning each nucleus of interest. In addition, METH place aversion was antagonized by coadministration of the N-methyl-d-aspartate (NMDA) receptor antagonist MK801, suggesting that the aversion learning was an NMDA receptor activation-dependent process. We conclude that the hippocampus is a critical structure in the reward circuit and hence suggest that the development of target-specific therapeutics for the control of addiction emphasizes on the hippocampus-VTA top-down connection.

  9. Alignment Condition-Based Robust Adaptive Iterative Learning Control of Uncertain Robot System

    Directory of Open Access Journals (Sweden)

    Guofeng Tong

    2014-04-01

    Full Text Available This paper proposes an adaptive iterative learning control strategy integrated with saturation-based robust control for uncertain robot system in presence of modelling uncertainties, unknown parameter, and external disturbance under alignment condition. An important merit is that it achieves adaptive switching of gain matrix both in conventional PD-type feedforward control and robust adaptive control in the iteration domain simultaneously. The analysis of convergence of proposed control law is based on Lyapunov's direct method under alignment initial condition. Simulation results demonstrate the faster learning rate and better robust performance with proposed algorithm by comparing with other existing robust controllers. The actual experiment on three-DOF robot manipulator shows its better practical effectiveness.

  10. Adaptive and Energy Efficient Walking in a Hexapod Robot under Neuromechanical Control and Sensorimotor Learning

    DEFF Research Database (Denmark)

    Xiong, Xiaofeng; Wörgötter, Florentin; Manoonpong, Poramate

    2016-01-01

    The control of multilegged animal walking is a neuromechanical process, and to achieve this in an adaptive and energy efficient way is a difficult and challenging problem. This is due to the fact that this process needs in real time: 1) to coordinate very many degrees of freedom of jointed legs; 2......) to generate the proper leg stiffness (i.e., compliance); and 3) to determine joint angles that give rise to particular positions at the endpoints of the legs. To tackle this problem for a robotic application, here we present a neuromechanical controller coupled with sensorimotor learning. The controller...... energy efficient walking, compared to other small legged robots. In addition, this paper also shows that the tight combination of neural control with tunable muscle-like functions, guided by sensory feedback and coupled with sensorimotor learning, is a way forward to better understand and solve adaptive...

  11. Motivating programming students by Problem Based Learning and LEGO robots

    DEFF Research Database (Denmark)

    Lykke, Marianne; Coto Chotto, Mayela; Mora, Sonia

    2014-01-01

    . For this reason the school is focusing on different teaching methods to help their students master these skills. This paper introduces an experimental, controlled comparison study of three learning designs, involving a problem based learning (PBL) approach in connection with the use of LEGO Mindstorms to improve...... students programming skills and motivation for learning in an introductory programming course. The paper reports the results related with one of the components of the study - the experiential qualities of the three learning designs. The data were collected through a questionnaire survey with 229 students...... from three groups exposed to different learning designs and through six qualitative walk-alongs collecting data from these groups by informal interviews and observations. Findings from the three studies were discussed in three focus group interviews with 10 students from the three experimental groups....

  12. Tema 2: The NAO robot as a Persuasive Educational and Entertainment Robot (PEER – a case study on children’s articulation, categorization and interaction with a social robot for learning

    Directory of Open Access Journals (Sweden)

    Lykke Brogaard Bertel

    2016-01-01

    Full Text Available The application of social robots as motivational tools and companions in education is increasingly being explored from a theoretical and practical point of view. In this paper, we examine the social robot NAO as a Persuasive Educational and Entertainment Robot (PEER and present findings from a case study on the use of NAO to support learning environments in Danish primary schools. In the case study we focus on the children’s practice of articulation and embodied interaction with NAO and investigate the role of NAO as a ‘tool’, ‘social actor’ or ‘simulating medium’ in the learning designs. We examine whether this categorization is static or dynamic, i. e. develops and changes over the course of the interaction and explore how this relates to and affects the student’s motivation to engage in the NAO-supported learning activities.

  13. Tema 2: The NAO robot as a Persuasive Educational and Entertainment Robot (PEER – a case study on children’s articulation, categorization and interaction with a social robot for learning

    Directory of Open Access Journals (Sweden)

    Lykke Brogaard Bertel

    2015-12-01

    Full Text Available The application of social robots as motivational tools and companions in education is increasingly being explored from a theoretical and practical point of view. In this paper, we examine the social robot NAO as a Persuasive Educational and Entertainment Robot (PEER and present findings from a case study on the use of NAO to support learning environments in Danish primary schools. In the case study we focus on the children’s practice of articulation and embodied interaction with NAO and investigate the role of NAO as a ‘tool’, ‘social actor’ or ‘simulating medium’ in the learning designs. We examine whether this categorization is static or dynamic, i. e. develops and changes over the course of the interaction and explore how this relates to and affects the student’s motivation to engage in the NAO-supported learning activities.

  14. Automatic planning for robots: review of methods and some ideas about structure and learning

    Energy Technology Data Exchange (ETDEWEB)

    Cuena, J.; Salmeron, C.

    1983-01-01

    After a brief review of the problems involved in the design of an automatic planner system, the attention is focused in the particular problems that appear when the planner is used to control the actions of a robot. As conclusion, the introduction of techniques for learning in order to improve the efficiency of a planner are suggested, and a method for it, at present in development, is presented. 14 references.

  15. Linking Individual Learning Styles to Approach-Avoidance Motivational Traits and Computational Aspects of Reinforcement Learning.

    Directory of Open Access Journals (Sweden)

    Kristoffer Carl Aberg

    Full Text Available Learning how to gain rewards (approach learning and avoid punishments (avoidance learning is fundamental for everyday life. While individual differences in approach and avoidance learning styles have been related to genetics and aging, the contribution of personality factors, such as traits, remains undetermined. Moreover, little is known about the computational mechanisms mediating differences in learning styles. Here, we used a probabilistic selection task with positive and negative feedbacks, in combination with computational modelling, to show that individuals displaying better approach (vs. avoidance learning scored higher on measures of approach (vs. avoidance trait motivation, but, paradoxically, also displayed reduced learning speed following positive (vs. negative outcomes. These data suggest that learning different types of information depend on associated reward values and internal motivational drives, possibly determined by personality traits.

  16. Linking Individual Learning Styles to Approach-Avoidance Motivational Traits and Computational Aspects of Reinforcement Learning

    Science.gov (United States)

    Carl Aberg, Kristoffer; Doell, Kimberly C.; Schwartz, Sophie

    2016-01-01

    Learning how to gain rewards (approach learning) and avoid punishments (avoidance learning) is fundamental for everyday life. While individual differences in approach and avoidance learning styles have been related to genetics and aging, the contribution of personality factors, such as traits, remains undetermined. Moreover, little is known about the computational mechanisms mediating differences in learning styles. Here, we used a probabilistic selection task with positive and negative feedbacks, in combination with computational modelling, to show that individuals displaying better approach (vs. avoidance) learning scored higher on measures of approach (vs. avoidance) trait motivation, but, paradoxically, also displayed reduced learning speed following positive (vs. negative) outcomes. These data suggest that learning different types of information depend on associated reward values and internal motivational drives, possibly determined by personality traits. PMID:27851807

  17. Robotics as a resource to facilitate the learning and general skills development

    Directory of Open Access Journals (Sweden)

    Flor Ángela Bravo Sánchez

    2012-07-01

    Full Text Available Normal.dotm 0 0 1 127 729 Universidad de Salamanca 6 1 895 12.0 0 false 18 pt 18 pt 0 0 false false false /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} The growing importance of technology in the world today and its continuous development, makes the technology becomes an integral part of the formation process in childhood and youth. For this reason is important to develop proposals that are offered to children and young people to come into contact with new technologies, that is possible through the use of software and hardware tools, such as robotic prototypes and specialized programs for educational purposes This paper shows the importance of the use of robotics as a learning tool and presents the typical stages that must be confronted in implementing educational robotics projects in the classroom. It also presents an educational robotics project called "Mundo Robotica" which seeks to involve robotics in the classroom through practical activities and learning resources, all this is articulated from a virtual platform.

  18. An analysis of intergroup rivalry using Ising model and reinforcement learning

    Science.gov (United States)

    Zhao, Feng-Fei; Qin, Zheng; Shao, Zhuo

    2014-01-01

    Modeling of intergroup rivalry can help us better understand economic competitions, political elections and other similar activities. The result of intergroup rivalry depends on the co-evolution of individual behavior within one group and the impact from the rival group. In this paper, we model the rivalry behavior using Ising model. Different from other simulation studies using Ising model, the evolution rules of each individual in our model are not static, but have the ability to learn from historical experience using reinforcement learning technique, which makes the simulation more close to real human behavior. We studied the phase transition in intergroup rivalry and focused on the impact of the degree of social freedom, the personality of group members and the social experience of individuals. The results of computer simulation show that a society with a low degree of social freedom and highly educated, experienced individuals is more likely to be one-sided in intergroup rivalry.

  19. Pedunculopontine tegmental nucleus lesions impair stimulus--reward learning in autoshaping and conditioned reinforcement paradigms.

    Science.gov (United States)

    Inglis, W L; Olmstead, M C; Robbins, T W

    2000-04-01

    The role of the pedunculopontine tegmental nucleus (PPTg) in stimulus-reward learning was assessed by testing the effects of PPTg lesions on performance in visual autoshaping and conditioned reinforcement (CRf) paradigms. Rats with PPTg lesions were unable to learn an association between a conditioned stimulus (CS) and a primary reward in either paradigm. In the autoshaping experiment, PPTg-lesioned rats approached the CS+ and CS- with equal frequency, and the latencies to respond to the two stimuli did not differ. PPTg lesions also disrupted discriminated approaches to an appetitive CS in the CRf paradigm and completely abolished the acquisition of responding with CRf. These data are discussed in the context of a possible cognitive function of the PPTg, particularly in terms of lesion-induced disruptions of attentional processes that are mediated by the thalamus.

  20. Reinforcement learning solution for HJB equation arising in constrained optimal control problem.

    Science.gov (United States)

    Luo, Biao; Wu, Huai-Ning; Huang, Tingwen; Liu, Derong

    2015-11-01

    The constrained optimal control problem depends on the solution of the complicated Hamilton-Jacobi-Bellman equation (HJBE). In this paper, a data-based off-policy reinforcement learning (RL) method is proposed, which learns the solution of the HJBE and the optimal control policy from real system data. One important feature of the off-policy RL is that its policy evaluation can be realized with data generated by other behavior policies, not necessarily the target policy, which solves the insufficient exploration problem. The convergence of the off-policy RL is proved by demonstrating its equivalence to the successive approximation approach. Its implementation procedure is based on the actor-critic neural networks structure, where the function approximation is conducted with linearly independent basis functions. Subsequently, the convergence of the implementation procedure with function approximation is also proved. Finally, its effectiveness is verified through computer simulations. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Robot initiative in a team learning task increases the rhythm of interaction but not the perceived engagement

    Science.gov (United States)

    Ivaldi, Serena; Anzalone, Salvatore M.; Rousseau, Woody; Sigaud, Olivier; Chetouani, Mohamed

    2014-01-01

    We hypothesize that the initiative of a robot during a collaborative task with a human can influence the pace of interaction, the human response to attention cues, and the perceived engagement. We propose an object learning experiment where the human interacts in a natural way with the humanoid iCub. Through a two-phases scenario, the human teaches the robot about the properties of some objects. We compare the effect of the initiator of the task in the teaching phase (human or robot) on the rhythm of the interaction in the verification phase. We measure the reaction time of the human gaze when responding to attention utterances of the robot. Our experiments show that when the robot is the initiator of the learning task, the pace of interaction is higher and the reaction to attention cues faster. Subjective evaluations suggest that the initiating role of the robot, however, does not affect the perceived engagement. Moreover, subjective and third-person evaluations of the interaction task suggest that the attentive mechanism we implemented in the humanoid robot iCub is able to arouse engagement and make the robot's behavior readable. PMID:24596554

  2. The role of multiple neuromodulators in reinforcement learning that is based on competition between eligibility traces

    Directory of Open Access Journals (Sweden)

    Marco A Huertas

    2016-12-01

    Full Text Available The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment, and how to stop learning once the target behaviors are attained (stopping rule. To address the first problem, synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although these were mere theoretical constructs, recent experiements have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP and one for long-term depression (LTD, each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different

  3. The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces.

    Science.gov (United States)

    Huertas, Marco A; Schwettmann, Sarah E; Shouval, Harel Z

    2016-01-01

    The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL) refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment), and how to stop learning once the target behaviors are attained (stopping rule). To address the first problem synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although, these were mere theoretical constructs, recent experiments have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward nucleus cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP) and one for long-term depression (LTD), each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although, the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different neuromodulators for

  4. Optimal and Scalable Caching for 5G Using Reinforcement Learning of Space-Time Popularities

    Science.gov (United States)

    Sadeghi, Alireza; Sheikholeslami, Fatemeh; Giannakis, Georgios B.

    2018-02-01

    Small basestations (SBs) equipped with caching units have potential to handle the unprecedented demand growth in heterogeneous networks. Through low-rate, backhaul connections with the backbone, SBs can prefetch popular files during off-peak traffic hours, and service them to the edge at peak periods. To intelligently prefetch, each SB must learn what and when to cache, while taking into account SB memory limitations, the massive number of available contents, the unknown popularity profiles, as well as the space-time popularity dynamics of user file requests. In this work, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown. Joint consideration of global and local popularity demands along with cache-refreshing costs allow for a simple, yet practical asynchronous caching approach. The novel RL-based caching relies on a Q-learning algorithm to implement the optimal policy in an online fashion, thus enabling the cache control unit at the SB to learn, track, and possibly adapt to the underlying dynamics. To endow the algorithm with scalability, a linear function approximation of the proposed Q-learning scheme is introduced, offering faster convergence as well as reduced complexity and memory requirements. Numerical tests corroborate the merits of the proposed approach in various realistic settings.

  5. Energy Management Strategy for a Hybrid Electric Vehicle Based on Deep Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Yue Hu

    2018-01-01

    Full Text Available An energy management strategy (EMS is important for hybrid electric vehicles (HEVs since it plays a decisive role on the performance of the vehicle. However, the variation of future driving conditions deeply influences the effectiveness of the EMS. Most existing EMS methods simply follow predefined rules that are not adaptive to different driving conditions online. Therefore, it is useful that the EMS can learn from the environment or driving cycle. In this paper, a deep reinforcement learning (DRL-based EMS is designed such that it can learn to select actions directly from the states without any prediction or predefined rules. Furthermore, a DRL-based online learning architecture is presented. It is significant for applying the DRL algorithm in HEV energy management under different driving conditions. Simulation experiments have been conducted using MATLAB and Advanced Vehicle Simulator (ADVISOR co-simulation. Experimental results validate the effectiveness of the DRL-based EMS compared with the rule-based EMS in terms of fuel economy. The online learning architecture is also proved to be effective. The proposed method ensures the optimality, as well as real-time applicability, in HEVs.

  6. A Day-to-Day Route Choice Model Based on Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Fangfang Wei

    2014-01-01

    Full Text Available Day-to-day traffic dynamics are generated by individual traveler’s route choice and route adjustment behaviors, which are appropriate to be researched by using agent-based model and learning theory. In this paper, we propose a day-to-day route choice model based on reinforcement learning and multiagent simulation. Travelers’ memory, learning rate, and experience cognition are taken into account. Then the model is verified and analyzed. Results show that the network flow can converge to user equilibrium (UE if travelers can remember all the travel time they have experienced, but which is not necessarily the case under limited memory; learning rate can strengthen the flow fluctuation, but memory leads to the contrary side; moreover, high learning rate results in the cyclical oscillation during the process of flow evolution. Finally, both the scenarios of link capacity degradation and random link capacity are used to illustrate the model’s applications. Analyses and applications of our model demonstrate the model is reasonable and useful for studying the day-to-day traffic dynamics.

  7. Global reinforcement training of CrossNets

    Science.gov (United States)

    Ma, Xiaolong

    2007-10-01

    Hybrid "CMOL" integrated circuits, incorporating advanced CMOS devices for neural cell bodies, nanowires as axons and dendrites, and latching switches as synapses, may be used for the hardware implementation of extremely dense (107 cells and 1012 synapses per cm2) neuromorphic networks, operating up to 10 6 times faster than their biological prototypes. We are exploring several "Cross- Net" architectures that accommodate the limitations imposed by CMOL hardware and should allow effective training of the networks without a direct external access to individual synapses. Our studies have show that CrossNets based on simple (two-terminal) crosspoint devices can work well in at least two modes: as Hop-field networks for associative memory and multilayer perceptrons for classification tasks. For more intelligent tasks (such as robot motion control or complex games), which do not have "examples" for supervised learning, more advanced training methods such as the global reinforcement learning are necessary. For application of global reinforcement training algorithms to CrossNets, we have extended Williams's REINFORCE learning principle to a more general framework and derived several learning rules that are more suitable for CrossNet hardware implementation. The results of numerical experiments have shown that these new learning rules can work well for both classification tasks and reinforcement tasks such as the cartpole balancing control problem. Some limitations imposed by the CMOL hardware need to be carefully addressed for the the successful application of in situ reinforcement training to CrossNets.

  8. Multiobjective Reinforcement Learning for Traffic Signal Control Using Vehicular Ad Hoc Network

    Directory of Open Access Journals (Sweden)

    Houli Duan

    2010-01-01

    Full Text Available We propose a new multiobjective control algorithm based on reinforcement learning for urban traffic signal control, named multi-RL. A multiagent structure is used to describe the traffic system. A vehicular ad hoc network is used for the data exchange among agents. A reinforcement learning algorithm is applied to predict the overall value of the optimization objective given vehicles' states. The policy which minimizes the cumulative value of the optimization objective is regarded as the optimal one. In order to make the method adaptive to various traffic conditions, we also introduce a multiobjective control scheme in which the optimization objective is selected adaptively to real-time traffic states. The optimization objectives include the vehicle stops, the average waiting time, and the maximum queue length of the next intersection. In addition, we also accommodate a priority control to the buses and the emergency vehicles through our model. The simulation results indicated that our algorithm could perform more efficiently than traditional traffic light control methods.

  9. FMRQ-A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks.

    Science.gov (United States)

    Zhang, Zhen; Zhao, Dongbin; Gao, Junwei; Wang, Dongqing; Dai, Yujie

    2017-06-01

    In this paper, we propose a multiagent reinforcement learning algorithm dealing with fully cooperative tasks. The algorithm is called frequency of the maximum reward Q-learning (FMRQ). FMRQ aims to achieve one of the optimal Nash equilibria so as to optimize the performance index in multiagent systems. The frequency of obtaining the highest global immediate reward instead of immediate reward is used as the reinforcement signal. With FMRQ each agent does not need the observation of the other agents' actions and only shares its state and reward at each step. We validate FMRQ through case studies of repeated games: four cases of two-player two-action and one case of three-player two-action. It is demonstrated that FMRQ can converge to one of the optimal Nash equilibria in these cases. Moreover, comparison experiments on tasks with multiple states and finite steps are conducted. One is box-pushing and the other one is distributed sensor network problem. Experimental results show that the proposed algorithm outperforms others with higher performance.

  10. Reinforcement Learning Based Data Self-Destruction Scheme for Secured Data Management

    Directory of Open Access Journals (Sweden)

    Young Ki Kim

    2018-04-01

    Full Text Available As technologies and services that leverage cloud computing have evolved, the number of businesses and individuals who use them are increasing rapidly. In the course of using cloud services, as users store and use data that include personal information, research on privacy protection models to protect sensitive information in the cloud environment is becoming more important. As a solution to this problem, a self-destructing scheme has been proposed that prevents the decryption of encrypted user data after a certain period of time using a Distributed Hash Table (DHT network. However, the existing self-destructing scheme does not mention how to set the number of key shares and the threshold value considering the environment of the dynamic DHT network. This paper proposes a method to set the parameters to generate the key shares needed for the self-destructing scheme considering the availability and security of data. The proposed method defines state, action, and reward of the reinforcement learning model based on the similarity of the graph, and applies the self-destructing scheme process by updating the parameter based on the reinforcement learning model. Through the proposed technique, key sharing parameters can be set in consideration of data availability and security in dynamic DHT network environments.

  11. Reinforcement learning of self-regulated β-oscillations for motor restoration in chronic stroke

    Directory of Open Access Journals (Sweden)

    Georgios eNaros

    2015-07-01

    Full Text Available Neurofeedback training of motor imagery-related brain-states with brain-machine interfaces (BMI is currently being explored prior to standard physiotherapy to improve the motor outcome of stroke rehabilitation. Pilot studies suggest that such a priming intervention before physiotherapy might increase the responsiveness of the brain to the subsequent physiotherapy, thereby improving the clinical outcome. However, there is little evidence up to now that these BMI-based interventions have achieved operate conditioning of specific brain states that facilitate task-specific functional gains beyond the practice of primed physiotherapy. In this context, we argue that BMI technology needs to aim at physiological features relevant for the targeted behavioral gain. Moreover, this therapeutic intervention has to be informed by concepts of reinforcement learning to develop its full potential. Such a refined neurofeedback approach would need to address the following issues (1 Defining a physiological feedback target specific to the intended behavioral gain, e.g. β-band oscillations for cortico-muscular communication. This targeted brain state could well be different from the brain state optimal for the neurofeedback task (2 Selecting a BMI classification and thresholding approach on the basis of learning principles, i.e. balancing challenge and reward of the neurofeedback task instead of maximizing the classification accuracy of the feedback device (3 Adjusting the feedback in the course of the training period to account for the cognitive load and the learning experience of the participant. The proposed neurofeedback strategy provides evidence for the feasibility of the suggested approach by demonstrating that dynamic threshold adaptation based on reinforcement learning may lead to frequency-specific operant conditioning of β-band oscillations paralleled by task-specific motor improvement; a proposal that requires investigation in a larger cohort of stroke

  12. Multi-Objective Reinforcement Learning-Based Deep Neural Networks for Cognitive Space Communications

    Science.gov (United States)

    Ferreria, Paulo Victor R.; Paffenroth, Randy; Wyglinski, Alexander M.; Hackett, Timothy M.; Bilen, Sven G.; Reinhart, Richard C.; Mortensen, Dale J.

    2017-01-01

    Future communication subsystems of space exploration missions can potentially benefit from software-defined radios (SDRs) controlled by machine learning algorithms. In this paper, we propose a novel hybrid radio resource allocation management control algorithm that integrates multi-objective reinforcement learning and deep artificial neural networks. The objective is to efficiently manage communications system resources by monitoring performance functions with common dependent variables that result in conflicting goals. The uncertainty in the performance of thousands of different possible combinations of radio parameters makes the trade-off between exploration and exploitation in reinforcement learning (RL) much more challenging for future critical space-based missions. Thus, the system should spend as little time as possible on exploring actions, and whenever it explores an action, it should perform at acceptable levels most of the time. The proposed approach enables on-line learning by interactions with the environment and restricts poor resource allocation performance through virtual environment exploration. Improvements in the multiobjective performance can be achieved via transmitter parameter adaptation on a packet-basis, with poorly predicted performance promptly resulting in rejected decisions. Simulations presented in this work considered the DVB-S2 standard adaptive transmitter parameters and additional ones expected to be present in future adaptive radio systems. Performance results are provided by analysis of the proposed hybrid algorithm when operating across a satellite communication channel from Earth to GEO orbit during clear sky conditions. The proposed approach constitutes part of the core cognitive engine proof-of-concept to be delivered to the NASA Glenn Research Center SCaN Testbed located onboard the International Space Station.

  13. Smart learning objects for smart education in computer science theory, methodology and robot-based implementation

    CERN Document Server

    Stuikys, Vytautas

    2015-01-01

    This monograph presents the challenges, vision and context to design smart learning objects (SLOs) through Computer Science (CS) education modelling and feature model transformations. It presents the latest research on the meta-programming-based generative learning objects (the latter with advanced features are treated as SLOs) and the use of educational robots in teaching CS topics. The introduced methodology includes the overall processes to develop SLO and smart educational environment (SEE) and integrates both into the real education setting to provide teaching in CS using constructivist a

  14. Optimal critic learning for robot control in time-varying environments.

    Science.gov (United States)

    Wang, Chen; Li, Yanan; Ge, Shuzhi Sam; Lee, Tong Heng

    2015-10-01

    In this paper, optimal critic learning is developed for robot control in a time-varying environment. The unknown environment is described as a linear system with time-varying parameters, and impedance control is employed for the interaction control. Desired impedance parameters are obtained in the sense of an optimal realization of the composite of trajectory tracking and force regulation. Q -function-based critic learning is developed to determine the optimal impedance parameters without the knowledge of the system dynamics. The simulation results are presented and compared with existing methods, and the efficacy of the proposed method is verified.

  15. "Alarm-corrected" ergonomic armrest use could improve learning curves of novices on robotic simulator.

    Science.gov (United States)

    Yang, Kun; Perez, Manuela; Hossu, Gabriela; Hubert, Nicolas; Perrenot, Cyril; Hubert, Jacques

    2017-01-01

    In robotic surgery, the professional ergonomic habit of using an armrest reduces operator fatigue and increases the precision of motion. We designed and validated a pressure surveillance system (PSS) based on force sensors to investigate armrest use. The objective was to evaluate whether adding an alarm to the PSS system could shorten ergonomic training and improve performance. Twenty robot and simulator-naïve participants were recruited and randomized in two groups (A and B). The PSS was installed on a robotic simulator, the dV-Trainer, to detect contact with the armrest. The Group A members completed three tasks on the dV-Trainer without the alarm, making 15 attempts at each task. The Group B members practiced the first two tasks with the alarm and then completed the final tasks without the alarm. The simulator provided an overall score reflecting the trainees' performance. We used the new concept of an "armrest load" score to describe the ergonomic habit of using the armrest. Group B had a significantly higher performance score (p ergonomic errors and accelerated professional ergonomic habit acquisition. The combination of the PSS and alarm is effective in significantly shortening the learning curve in the robotic training process.

  16. Variability in Dopamine Genes Dissociates Model-Based and Model-Free Reinforcement Learning.

    Science.gov (United States)

    Doll, Bradley B; Bath, Kevin G; Daw, Nathaniel D; Frank, Michael J

    2016-01-27

    Considerable evidence suggests that multiple learning systems can drive behavior. Choice can proceed reflexively from previous actions and their associated outcomes, as captured by "model-free" learning algorithms, or flexibly from prospective consideration of outcomes that might occur, as captured by "model-based" learning algorithms. However, differential contributions of dopamine to these systems are poorly understood. Dopamine is widely thought to support model-free learning by modulating plasticity in striatum. Model-based learning may also be affected by these striatal effects, or by other dopaminergic effects elsewhere, notably on prefrontal working memory function. Indeed, prominent demonstrations linking striatal dopamine to putatively model-free learning did not rule out model-based effects, whereas other studies have reported dopaminergic modulation of verifiably model-based learning, but without distinguishing a prefrontal versus striatal locus. To clarify the relationships between dopamine, neural systems, and learning strategies, we combine a genetic association approach in humans with two well-studied reinforcement learning tasks: one isolating model-based from model-free behavior and the other sensitive to key aspects of striatal plasticity. Prefrontal function was indexed by a polymorphism in the COMT gene, differences of which reflect dopamine levels in the prefrontal cortex. This polymorphism has been associated with differences in prefrontal activity and working memory. Striatal function was indexed by a gene coding for DARPP-32, which is densely expressed in the striatum where it is necessary for synaptic plasticity. We found evidence for our hypothesis that variations in prefrontal dopamine relate to model-based learning, whereas variations in striatal dopamine function relate to model-free learning. Decisions can stem reflexively from their previously associated outcomes or flexibly from deliberative consideration of potential choice outcomes

  17. Simultaneous development of laparoscopy and robotics provides acceptable perioperative outcomes and shows robotics to have a faster learning curve and to be overall faster in rectal cancer surgery: analysis of novice MIS surgeon learning curves.

    Science.gov (United States)

    Melich, George; Hong, Young Ki; Kim, Jieun; Hur, Hyuk; Baik, Seung Hyuk; Kim, Nam Kyu; Sender Liberman, A; Min, Byung Soh

    2015-03-01

    Laparoscopy offers some evidence of benefit compared to open rectal surgery. Robotic rectal surgery is evolving into an accepted approach. The objective was to analyze and compare laparoscopic and robotic rectal surgery learning curves with respect to operative times and perioperative outcomes for a novice minimally invasive colorectal surgeon. One hundred and six laparoscopic and 92 robotic LAR rectal surgery cases were analyzed. All surgeries were performed by a surgeon who was primarily trained in open rectal surgery. Patient characteristics and perioperative outcomes were analyzed. Operative time and CUSUM plots were used for evaluating the learning curve for laparoscopic versus robotic LAR. Laparoscopic versus robotic LAR outcomes feature initial group operative times of 308 (291-325) min versus 397 (373-420) min and last group times of 220 (212-229) min versus 204 (196-211) min-reversed in favor of robotics; major complications of 4.7 versus 6.5 % (NS), resection margin involvement of 2.8 versus 4.4 % (NS), conversion rate of 3.8 versus 1.1 (NS), lymph node harvest of 16.3 versus 17.2 (NS), and estimated blood loss of 231 versus 201 cc (NS). Due to faster learning curves for extracorporeal phase and total mesorectal excision phase, the robotic surgery was observed to be faster than laparoscopic surgery after the initial 41 cases. CUSUM plots demonstrate acceptable perioperative surgical outcomes from the beginning of the study. Initial robotic operative times improved with practice rapidly and eventually became faster than those for laparoscopy. Developing both laparoscopic and robotic skills simultaneously can provide acceptable perioperative outcomes in rectal surgery. It might be suggested that in the current milieu of clashing interests between evolving technology and economic constrains, there might be advantages in embracing both approaches.

  18. Action observation and robotic agents: learning and anthropomorphism.

    Science.gov (United States)

    Press, Clare

    2011-05-01

    The 'action observation network' (AON), which is thought to translate observed actions into motor codes required for their execution, is biologically tuned: it responds more to observation of human, than non-human, movement. This biological specificity has been taken to support the hypothesis that the AON underlies various social functions, such as theory of mind and action understanding, and that, when it is active during observation of non-human agents like humanoid robots, it is a sign of ascription of human mental states to these agents. This review will outline evidence for biological tuning in the AON, examining the features which generate it, and concluding that there is evidence for tuning to both the form and kinematic profile of observed movements, and little evidence for tuning to belief about stimulus identity. It will propose that a likely reason for biological tuning is that human actions, relative to non-biological movements, have been observed more frequently while executing corresponding actions. If the associative hypothesis of the AON is correct, and the network indeed supports social functioning, sensorimotor experience with non-human agents may help us to predict, and therefore interpret, their movements. Copyright © 2011 Elsevier Ltd. All rights reserved.

  19. Investigation of Drive-Reinforcement Learning and Application of Learning to Flight Control

    Science.gov (United States)

    1993-08-01

    WL-TR-93-1153 INVESTIGATION OF DRIVE-REINFORCEMEN% LEARNING AND APPLICATION OF LEARNING TO FLIGHT CONTROL AD-A277 442 WALTER L. BAKER (ED), STEPHEN ...OF LEARNING TO FUIGHT CONTROL PE 62204 ___ ___ ___ ___ __ ___ ___ ___ ___ ___ ___ __ PR 2003 6. AUTHOR(S) TA 05 WALTER L. BAKER (ED), STEPHEN C. ATKINS...34 Computers and Thought, E. A. Freigenbaum and J. Feldman (eds.), Mc- Graw Hill, New York, (1959). [19] Holland, J. H., "Escaping Brittleness: The Possibility

  20. Within- and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory.

    Science.gov (United States)

    Collins, Anne G E; Frank, Michael J

    2018-03-06

    Learning from rewards and punishments is essential to survival and facilitates flexible human behavior. It is widely appreciated that multiple cognitive and reinforcement learning systems contribute to decision-making, but the nature of their interactions is elusive. Here, we leverage methods for extracting trial-by-trial indices of reinforcement learning (RL) and working memory (WM) in human electro-encephalography to reveal single-trial computations beyond that afforded by behavior alone. Neural dynamics confirmed that increases in neural expectation were predictive of reduced neural surprise in the following feedback period, supporting central tenets of RL models. Within- and cross-trial dynamics revealed a cooperative interplay between systems for learning, in which WM contributes expectations to guide RL, despite competition between systems during choice. Together, these results provide a deeper understanding of how multiple neural systems interact for learning and decision-making and facilitate analysis of their disruption in clinical populations.