WorldWideScience

Sample records for robotic reinforcement learning

  1. Framework for robot skill learning using reinforcement learning

    Science.gov (United States)

    Wei, Yingzi; Zhao, Mingyang

    2003-09-01

    Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.

  2. TEXPLORE temporal difference reinforcement learning for robots and time-constrained domains

    CERN Document Server

    Hester, Todd

    2013-01-01

    This book presents and develops new reinforcement learning methods that enable fast and robust learning on robots in real-time. Robots have the potential to solve many problems in society, because of their ability to work in dangerous places doing necessary jobs that no one wants or is able to do. One barrier to their widespread deployment is that they are mainly limited to tasks where it is possible to hand-program behaviors for every situation that may be encountered. For robots to meet their potential, they need methods that enable them to learn and adapt to novel situations that they were not programmed for. Reinforcement learning (RL) is a paradigm for learning sequential decision making processes and could solve the problems of learning and adaptation on robots. This book identifies four key challenges that must be addressed for an RL algorithm to be practical for robotic control tasks. These RL for Robotics Challenges are: 1) it must learn in very few samples; 2) it must learn in domains with continuou...

  3. Reinforcement function design and bias for efficient learning in mobile robots

    International Nuclear Information System (INIS)

    Touzet, C.; Santos, J.M.

    1998-01-01

    The main paradigm in sub-symbolic learning robot domain is the reinforcement learning method. Various techniques have been developed to deal with the memorization/generalization problem, demonstrating the superior ability of artificial neural network implementations. In this paper, the authors address the issue of designing the reinforcement so as to optimize the exploration part of the learning. They also present and summarize works relative to the use of bias intended to achieve the effective synthesis of the desired behavior. Demonstrative experiments involving a self-organizing map implementation of the Q-learning and real mobile robots (Nomad 200 and Khepera) in a task of obstacle avoidance behavior synthesis are described. 3 figs., 5 tabs

  4. Temporal Memory Reinforcement Learning for the Autonomous Micro-mobile Robot Based-behavior

    Institute of Scientific and Technical Information of China (English)

    Yang Yujun(杨玉君); Cheng Junshi; Chen Jiapin; Li Xiaohai

    2004-01-01

    This paper presents temporal memory reinforcement learning for the autonomous micro-mobile robot based-behavior. Human being has a memory oblivion process, i.e. the earlier to memorize, the earlier to forget, only the repeated thing can be remembered firmly. Enlightening forms this, and the robot need not memorize all the past states, at the same time economizes the EMS memory space, which is not enough in the MPU of our AMRobot. The proposed algorithm is an extension of the Q-learning, which is an incremental reinforcement learning method. The results of simulation have shown that the algorithm is valid.

  5. Intrinsically motivated reinforcement learning for human-robot interaction in the real-world.

    Science.gov (United States)

    Qureshi, Ahmed Hussain; Nakamura, Yutaka; Yoshikawa, Yuichiro; Ishiguro, Hiroshi

    2018-03-26

    For a natural social human-robot interaction, it is essential for a robot to learn the human-like social skills. However, learning such skills is notoriously hard due to the limited availability of direct instructions from people to teach a robot. In this paper, we propose an intrinsically motivated reinforcement learning framework in which an agent gets the intrinsic motivation-based rewards through the action-conditional predictive model. By using the proposed method, the robot learned the social skills from the human-robot interaction experiences gathered in the real uncontrolled environments. The results indicate that the robot not only acquired human-like social skills but also took more human-like decisions, on a test dataset, than a robot which received direct rewards for the task achievement. Copyright © 2018 Elsevier Ltd. All rights reserved.

  6. Emotion in reinforcement learning agents and robots : A survey

    NARCIS (Netherlands)

    Moerland, T.M.; Broekens, D.J.; Jonker, C.M.

    2018-01-01

    This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action

  7. Bio-robots automatic navigation with graded electric reward stimulation based on Reinforcement Learning.

    Science.gov (United States)

    Zhang, Chen; Sun, Chao; Gao, Liqiang; Zheng, Nenggan; Chen, Weidong; Zheng, Xiaoxiang

    2013-01-01

    Bio-robots based on brain computer interface (BCI) suffer from the lack of considering the characteristic of the animals in navigation. This paper proposed a new method for bio-robots' automatic navigation combining the reward generating algorithm base on Reinforcement Learning (RL) with the learning intelligence of animals together. Given the graded electrical reward, the animal e.g. the rat, intends to seek the maximum reward while exploring an unknown environment. Since the rat has excellent spatial recognition, the rat-robot and the RL algorithm can convergent to an optimal route by co-learning. This work has significant inspiration for the practical development of bio-robots' navigation with hybrid intelligence.

  8. Optimized Assistive Human-Robot Interaction Using Reinforcement Learning.

    Science.gov (United States)

    Modares, Hamidreza; Ranatunga, Isura; Lewis, Frank L; Popa, Dan O

    2016-03-01

    An intelligent human-robot interaction (HRI) system with adjustable robot behavior is presented. The proposed HRI system assists the human operator to perform a given task with minimum workload demands and optimizes the overall human-robot system performance. Motivated by human factor studies, the presented control structure consists of two control loops. First, a robot-specific neuro-adaptive controller is designed in the inner loop to make the unknown nonlinear robot behave like a prescribed robot impedance model as perceived by a human operator. In contrast to existing neural network and adaptive impedance-based control methods, no information of the task performance or the prescribed robot impedance model parameters is required in the inner loop. Then, a task-specific outer-loop controller is designed to find the optimal parameters of the prescribed robot impedance model to adjust the robot's dynamics to the operator skills and minimize the tracking error. The outer loop includes the human operator, the robot, and the task performance details. The problem of finding the optimal parameters of the prescribed robot impedance model is transformed into a linear quadratic regulator (LQR) problem which minimizes the human effort and optimizes the closed-loop behavior of the HRI system for a given task. To obviate the requirement of the knowledge of the human model, integral reinforcement learning is used to solve the given LQR problem. Simulation results on an x - y table and a robot arm, and experimental implementation results on a PR2 robot confirm the suitability of the proposed method.

  9. Vision-based Navigation and Reinforcement Learning Path Finding for Social Robots

    OpenAIRE

    Pérez Sala, Xavier

    2010-01-01

    We propose a robust system for automatic Robot Navigation in uncontrolled en- vironments. The system is composed by three main modules: the Arti cial Vision module, the Reinforcement Learning module, and the behavior control module. The aim of the system is to allow a robot to automatically nd a path that arrives to a pre xed goal. Turn and straight movements in uncontrolled environments are automatically estimated and controlled using the proposed modules. The Arti cial Vi...

  10. Emotion in reinforcement learning agents and robots: A survey

    OpenAIRE

    Moerland, T.M.; Broekens, D.J.; Jonker, C.M.

    2018-01-01

    This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action selection. Therefore, computational emotion models are usually grounded in the agent's decision making architecture, of which RL is an important subclass. Studying emotions in RL-based agents is useful for ...

  11. Intrinsic interactive reinforcement learning - Using error-related potentials for real world human-robot interaction.

    Science.gov (United States)

    Kim, Su Kyoung; Kirchner, Elsa Andrea; Stefes, Arne; Kirchner, Frank

    2017-12-14

    Reinforcement learning (RL) enables robots to learn its optimal behavioral strategy in dynamic environments based on feedback. Explicit human feedback during robot RL is advantageous, since an explicit reward function can be easily adapted. However, it is very demanding and tiresome for a human to continuously and explicitly generate feedback. Therefore, the development of implicit approaches is of high relevance. In this paper, we used an error-related potential (ErrP), an event-related activity in the human electroencephalogram (EEG), as an intrinsically generated implicit feedback (rewards) for RL. Initially we validated our approach with seven subjects in a simulated robot learning scenario. ErrPs were detected online in single trial with a balanced accuracy (bACC) of 91%, which was sufficient to learn to recognize gestures and the correct mapping between human gestures and robot actions in parallel. Finally, we validated our approach in a real robot scenario, in which seven subjects freely chose gestures and the real robot correctly learned the mapping between gestures and actions (ErrP detection (90% bACC)). In this paper, we demonstrated that intrinsically generated EEG-based human feedback in RL can successfully be used to implicitly improve gesture-based robot control during human-robot interaction. We call our approach intrinsic interactive RL.

  12. Performance Comparison of Two Reinforcement Learning Algorithms for Small Mobile Robots

    Czech Academy of Sciences Publication Activity Database

    Neruda, Roman; Slušný, Stanislav

    2009-01-01

    Roč. 2, č. 1 (2009), s. 59-68 ISSN 2005-4297 R&D Projects: GA MŠk(CZ) 1M0567 Grant - others:GA UK(CZ) 7637/2007 Institutional research plan: CEZ:AV0Z10300504 Keywords : reinforcement learning * mobile robots * inteligent agents Subject RIV: IN - Informatics, Computer Science http://www.sersc.org/journals/IJCA/vol2_no1/7.pdf

  13. Multiagent Reinforcement Learning with Regret Matching for Robot Soccer

    Directory of Open Access Journals (Sweden)

    Qiang Liu

    2013-01-01

    Full Text Available This paper proposes a novel multiagent reinforcement learning (MARL algorithm Nash- learning with regret matching, in which regret matching is used to speed up the well-known MARL algorithm Nash- learning. It is critical that choosing a suitable strategy for action selection to harmonize the relation between exploration and exploitation to enhance the ability of online learning for Nash- learning. In Markov Game the joint action of agents adopting regret matching algorithm can converge to a group of points of no-regret that can be viewed as coarse correlated equilibrium which includes Nash equilibrium in essence. It is can be inferred that regret matching can guide exploration of the state-action space so that the rate of convergence of Nash- learning algorithm can be increased. Simulation results on robot soccer validate that compared to original Nash- learning algorithm, the use of regret matching during the learning phase of Nash- learning has excellent ability of online learning and results in significant performance in terms of scores, average reward and policy convergence.

  14. Episodic reinforcement learning control approach for biped walking

    Directory of Open Access Journals (Sweden)

    Katić Duško

    2012-01-01

    Full Text Available This paper presents a hybrid dynamic control approach to the realization of humanoid biped robotic walk, focusing on the policy gradient episodic reinforcement learning with fuzzy evaluative feedback. The proposed structure of controller involves two feedback loops: a conventional computed torque controller and an episodic reinforcement learning controller. The reinforcement learning part includes fuzzy information about Zero-Moment- Point errors. Simulation tests using a medium-size 36-DOF humanoid robot MEXONE were performed to demonstrate the effectiveness of our method.

  15. Study and Application of Reinforcement Learning in Cooperative Strategy of the Robot Soccer Based on BDI Model

    Directory of Open Access Journals (Sweden)

    Wu Bo-ying

    2009-11-01

    Full Text Available The dynamic cooperation model of multi-Agent is formed by combining reinforcement learning with BDI model. In this model, the concept of the individual optimization loses its meaning, because the repayment of each Agent dose not only depend on itsself but also on the choice of other Agents. All Agents can pursue a common optimum solution and try to realize the united intention as a whole to a maximum limit. The robot moves to its goal, depending on the present positions of the other robots that cooperate with it and the present position of the ball. One of these robots cooperating with it is controlled to move by man with a joystick. In this way, Agent can be ensured to search for each state-action as frequently as possible when it carries on choosing movements, so as to shorten the time of searching for the movement space so that the convergence speed of reinforcement learning can be improved. The validity of the proposed cooperative strategy for the robot soccer has been proved by combining theoretical analysis with simulation robot soccer match (11vs11 .

  16. Learning motor skills from algorithms to robot experiments

    CERN Document Server

    Kober, Jens

    2014-01-01

    This book presents the state of the art in reinforcement learning applied to robotics both in terms of novel algorithms and applications. It discusses recent approaches that allow robots to learn motor skills and presents tasks that need to take into account the dynamic behavior of the robot and its environment, where a kinematic movement plan is not sufficient. The book illustrates a method that learns to generalize parameterized motor plans which is obtained by imitation or reinforcement learning, by adapting a small set of global parameters, and appropriate kernel-based reinforcement learning algorithms. The presented applications explore highly dynamic tasks and exhibit a very efficient learning process. All proposed approaches have been extensively validated with benchmarks tasks, in simulation, and on real robots. These tasks correspond to sports and games but the presented techniques are also applicable to more mundane household tasks. The book is based on the first author’s doctoral thesis, which wo...

  17. Reinforcement learning for a biped robot based on a CPG-actor-critic method.

    Science.gov (United States)

    Nakamura, Yutaka; Mori, Takeshi; Sato, Masa-aki; Ishii, Shin

    2007-08-01

    Animals' rhythmic movements, such as locomotion, are considered to be controlled by neural circuits called central pattern generators (CPGs), which generate oscillatory signals. Motivated by this biological mechanism, studies have been conducted on the rhythmic movements controlled by CPG. As an autonomous learning framework for a CPG controller, we propose in this article a reinforcement learning method we call the "CPG-actor-critic" method. This method introduces a new architecture to the actor, and its training is roughly based on a stochastic policy gradient algorithm presented recently. We apply this method to an automatic acquisition problem of control for a biped robot. Computer simulations show that training of the CPG can be successfully performed by our method, thus allowing the biped robot to not only walk stably but also adapt to environmental changes.

  18. Reaching control of a full-torso, modelled musculoskeletal robot using muscle synergies emergent under reinforcement learning

    International Nuclear Information System (INIS)

    Diamond, A; Holland, O E

    2014-01-01

    ‘Anthropomimetic’ robots mimic both human morphology and internal structure—skeleton, muscles, compliance and high redundancy—thus presenting a formidable challenge to conventional control. Here we derive a novel controller for this class of robot which learns effective reaching actions through the sustained activation of weighted muscle synergies, an approach which draws upon compelling, recent evidence from animal and human studies, but is almost unexplored to date in the musculoskeletal robot literature. Since the effective synergy patterns for a given robot will be unknown, we derive a reinforcement-learning approach intended to allow their emergence, in particular those patterns aiding linearization of control. Using an extensive physics-based model of the anthropomimetic ECCERobot, we find that effective reaching actions can be learned comprising only two sequential motor co-activation patterns, each controlled by just a single common driving signal. Factor analysis shows the emergent muscle co-activations can be largely reconstructed using weighted combinations of only 13 common fragments. Testing these ‘candidate’ synergies as drivable units, the same controller now learns the reaching task both faster and better. (paper)

  19. Morphology Independent Learning in Modular Robots

    DEFF Research Database (Denmark)

    Christensen, David Johan; Bordignon, Mirko; Schultz, Ulrik Pagh

    2009-01-01

    Hand-coding locomotion controllers for modular robots is difficult due to their polymorphic nature. Instead, we propose to use a simple and distributed reinforcement learning strategy. ATRON modules with identical controllers can be assembled in any configuration. To optimize the robot’s locomotion...... speed its modules independently and in parallel adjust their behavior based on a single global reward signal. In simulation, we study the learning strategy’s performance on different robot configurations. On the physical platform, we perform learning experiments with ATRON robots learning to move as fast...

  20. Assist-as-needed robotic trainer based on reinforcement learning and its application to dart-throwing.

    Science.gov (United States)

    Obayashi, Chihiro; Tamei, Tomoya; Shibata, Tomohiro

    2014-05-01

    This paper proposes a novel robotic trainer for motor skill learning. It is user-adaptive inspired by the assist-as-needed principle well known in the field of physical therapy. Most previous studies in the field of the robotic assistance of motor skill learning have used predetermined desired trajectories, and it has not been examined intensively whether these trajectories were optimal for each user. Furthermore, the guidance hypothesis states that humans tend to rely too much on external assistive feedback, resulting in interference with the internal feedback necessary for motor skill learning. A few studies have proposed a system that adjusts its assistive strength according to the user's performance in order to prevent the user from relying too much on the robotic assistance. There are, however, problems in these studies, in that a physical model of the user's motor system is required, which is inherently difficult to construct. In this paper, we propose a framework for a robotic trainer that is user-adaptive and that neither requires a specific desired trajectory nor a physical model of the user's motor system, and we achieve this using model-free reinforcement learning. We chose dart-throwing as an example motor-learning task as it is one of the simplest throwing tasks, and its performance can easily be and quantitatively measured. Training experiments with novices, aiming at maximizing the score with the darts and minimizing the physical robotic assistance, demonstrate the feasibility and plausibility of the proposed framework. Copyright © 2014 Elsevier Ltd. All rights reserved.

  1. Reinforcement Learning State-of-the-Art

    CERN Document Server

    Wiering, Marco

    2012-01-01

    Reinforcement learning encompasses both a science of adaptive behavior of rational beings in uncertain environments and a computational methodology for finding optimal behaviors for challenging problems in control, optimization and adaptive behavior of intelligent agents. As a field, reinforcement learning has progressed tremendously in the past decade. The main goal of this book is to present an up-to-date series of survey articles on the main contemporary sub-fields of reinforcement learning. This includes surveys on partially observable environments, hierarchical task decompositions, relational knowledge representation and predictive state representations. Furthermore, topics such as transfer, evolutionary methods and continuous spaces in reinforcement learning are surveyed. In addition, several chapters review reinforcement learning methods in robotics, in games, and in computational neuroscience. In total seventeen different subfields are presented by mostly young experts in those areas, and together the...

  2. Safe robot execution in model-based reinforcement learning

    OpenAIRE

    Martínez Martínez, David; Alenyà Ribas, Guillem; Torras, Carme

    2015-01-01

    Task learning in robotics requires repeatedly executing the same actions in different states to learn the model of the task. However, in real-world domains, there are usually sequences of actions that, if executed, may produce unrecoverable errors (e.g. breaking an object). Robots should avoid repeating such errors when learning, and thus explore the state space in a more intelligent way. This requires identifying dangerous action effects to avoid including such actions in the generated plans...

  3. Decentralized Reinforcement Learning of robot behaviors

    NARCIS (Netherlands)

    Leottau, David L.; Ruiz-del-Solar, Javier; Babuska, R.

    2018-01-01

    A multi-agent methodology is proposed for Decentralized Reinforcement Learning (DRL) of individual behaviors in problems where multi-dimensional action spaces are involved. When using this methodology, sub-tasks are learned in parallel by individual agents working toward a common goal. In

  4. Reinforcement learning in computer vision

    Science.gov (United States)

    Bernstein, A. V.; Burnaev, E. V.

    2018-04-01

    Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.

  5. Memristive device based learning for navigation in robots.

    Science.gov (United States)

    Sarim, Mohammad; Kumar, Manish; Jha, Rashmi; Minai, Ali A

    2017-11-08

    Biomimetic robots have gained attention recently for various applications ranging from resource hunting to search and rescue operations during disasters. Biological species are known to intuitively learn from the environment, gather and process data, and make appropriate decisions. Such sophisticated computing capabilities in robots are difficult to achieve, especially if done in real-time with ultra-low energy consumption. Here, we present a novel memristive device based learning architecture for robots. Two terminal memristive devices with resistive switching of oxide layer are modeled in a crossbar array to develop a neuromorphic platform that can impart active real-time learning capabilities in a robot. This approach is validated by navigating a robot vehicle in an unknown environment with randomly placed obstacles. Further, the proposed scheme is compared with reinforcement learning based algorithms using local and global knowledge of the environment. The simulation as well as experimental results corroborate the validity and potential of the proposed learning scheme for robots. The results also show that our learning scheme approaches an optimal solution for some environment layouts in robot navigation.

  6. Autonomous Motion Learning for Intra-Vehicular Activity Space Robot

    Science.gov (United States)

    Watanabe, Yutaka; Yairi, Takehisa; Machida, Kazuo

    Space robots will be needed in the future space missions. So far, many types of space robots have been developed, but in particular, Intra-Vehicular Activity (IVA) space robots that support human activities should be developed to reduce human-risks in space. In this paper, we study the motion learning method of an IVA space robot with the multi-link mechanism. The advantage point is that this space robot moves using reaction force of the multi-link mechanism and contact forces from the wall as space walking of an astronaut, not to use a propulsion. The control approach is determined based on a reinforcement learning with the actor-critic algorithm. We demonstrate to clear effectiveness of this approach using a 5-link space robot model by simulation. First, we simulate that a space robot learn the motion control including contact phase in two dimensional case. Next, we simulate that a space robot learn the motion control changing base attitude in three dimensional case.

  7. Functional Contour-following via Haptic Perception and Reinforcement Learning.

    Science.gov (United States)

    Hellman, Randall B; Tekin, Cem; van der Schaar, Mihaela; Santos, Veronica J

    2018-01-01

    Many tasks involve the fine manipulation of objects despite limited visual feedback. In such scenarios, tactile and proprioceptive feedback can be leveraged for task completion. We present an approach for real-time haptic perception and decision-making for a haptics-driven, functional contour-following task: the closure of a ziplock bag. This task is challenging for robots because the bag is deformable, transparent, and visually occluded by artificial fingertip sensors that are also compliant. A deep neural net classifier was trained to estimate the state of a zipper within a robot's pinch grasp. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards by balancing exploration versus exploitation of the state-action space. The C-MAB learner outperformed a benchmark Q-learner by more efficiently exploring the state-action space while learning a hard-to-code task. The learned C-MAB policy was tested with novel ziplock bag scenarios and contours (wire, rope). Importantly, this work contributes to the development of reinforcement learning approaches that account for limited resources such as hardware life and researcher time. As robots are used to perform complex, physically interactive tasks in unstructured or unmodeled environments, it becomes important to develop methods that enable efficient and effective learning with physical testbeds.

  8. Challenges in adapting imitation and reinforcement learning to compliant robots

    Directory of Open Access Journals (Sweden)

    Calinon Sylvain

    2011-12-01

    Full Text Available There is an exponential increase of the range of tasks that robots are forecasted to accomplish. (Reprogramming these robots becomes a critical issue for their commercialization and for their applications to real-world scenarios in which users without expertise in robotics wish to adapt the robot to their needs. This paper addresses the problem of designing userfriendly human-robot interfaces to transfer skills in a fast and efficient manner. This paper presents recent work conducted at the Learning and Interaction group at ADVR-IIT, ranging from skill acquisition through kinesthetic teaching to self-refinement strategies initiated from demonstrations. Our group started to explore the use of imitation and exploration strategies that can take advantage of the compliant capabilities of recent robot hardware and control architectures.

  9. Reinforcement learning: Solving two case studies

    Science.gov (United States)

    Duarte, Ana Filipa; Silva, Pedro; dos Santos, Cristina Peixoto

    2012-09-01

    Reinforcement Learning algorithms offer interesting features for the control of autonomous systems, such as the ability to learn from direct interaction with the environment, and the use of a simple reward signalas opposed to the input-outputs pairsused in classic supervised learning. The reward signal indicates the success of failure of the actions executed by the agent in the environment. In this work, are described RL algorithmsapplied to two case studies: the Crawler robot and the widely known inverted pendulum. We explore RL capabilities to autonomously learn a basic locomotion pattern in the Crawler, andapproach the balancing problem of biped locomotion using the inverted pendulum.

  10. Beyond adaptive-critic creative learning for intelligent mobile robots

    Science.gov (United States)

    Liao, Xiaoqun; Cao, Ming; Hall, Ernest L.

    2001-10-01

    Intelligent industrial and mobile robots may be considered proven technology in structured environments. Teach programming and supervised learning methods permit solutions to a variety of applications. However, we believe that to extend the operation of these machines to more unstructured environments requires a new learning method. Both unsupervised learning and reinforcement learning are potential candidates for these new tasks. The adaptive critic method has been shown to provide useful approximations or even optimal control policies to non-linear systems. The purpose of this paper is to explore the use of new learning methods that goes beyond the adaptive critic method for unstructured environments. The adaptive critic is a form of reinforcement learning. A critic element provides only high level grading corrections to a cognition module that controls the action module. In the proposed system the critic's grades are modeled and forecasted, so that an anticipated set of sub-grades are available to the cognition model. The forecasting grades are interpolated and are available on the time scale needed by the action model. The success of the system is highly dependent on the accuracy of the forecasted grades and adaptability of the action module. Examples from the guidance of a mobile robot are provided to illustrate the method for simple line following and for the more complex navigation and control in an unstructured environment. The theory presented that is beyond the adaptive critic may be called creative theory. Creative theory is a form of learning that models the highest level of human learning - imagination. The application of the creative theory appears to not only be to mobile robots but also to many other forms of human endeavor such as educational learning and business forecasting. Reinforcement learning such as the adaptive critic may be applied to known problems to aid in the discovery of their solutions. The significance of creative theory is that it

  11. Decision Making in Reinforcement Learning Using a Modified Learning Space Based on the Importance of Sensors

    Directory of Open Access Journals (Sweden)

    Yasutaka Kishima

    2013-01-01

    Full Text Available Many studies have been conducted on the application of reinforcement learning (RL to robots. A robot which is made for general purpose has redundant sensors or actuators because it is difficult to assume an environment that the robot will face and a task that the robot must execute. In this case, -space on RL contains redundancy so that the robot must take much time to learn a given task. In this study, we focus on the importance of sensors with regard to a robot’s performance of a particular task. The sensors that are applicable to a task differ according to the task. By using the importance of the sensors, we try to adjust the state number of the sensors and to reduce the size of -space. In this paper, we define the measure of importance of a sensor for a task with the correlation between the value of each sensor and reward. A robot calculates the importance of the sensors and makes the size of -space smaller. We propose the method which reduces learning space and construct the learning system by putting it in RL. In this paper, we confirm the effectiveness of our proposed system with an experimental robot.

  12. A Differentiable Physics Engine for Deep Learning in Robotics

    OpenAIRE

    Degrave, Jonas; Hermans, Michiel; Dambre, Joni; wyffels, Francis

    2016-01-01

    One of the most important fields in robotics is the optimization of controllers. Currently, robots are treated as a black box in this optimization process, which is the reason why derivative-free optimization methods such as evolutionary algorithms or reinforcement learning are omnipresent. We propose an implementation of a modern physics engine, which has the ability to differentiate control parameters. This has been implemented on both CPU and GPU. We show how this speeds up the optimizatio...

  13. Impedance learning for robotic contact tasks using natural actor-critic algorithm.

    Science.gov (United States)

    Kim, Byungchan; Park, Jooyoung; Park, Shinsuk; Kang, Sungchul

    2010-04-01

    Compared with their robotic counterparts, humans excel at various tasks by using their ability to adaptively modulate arm impedance parameters. This ability allows us to successfully perform contact tasks even in uncertain environments. This paper considers a learning strategy of motor skill for robotic contact tasks based on a human motor control theory and machine learning schemes. Our robot learning method employs impedance control based on the equilibrium point control theory and reinforcement learning to determine the impedance parameters for contact tasks. A recursive least-square filter-based episodic natural actor-critic algorithm is used to find the optimal impedance parameters. The effectiveness of the proposed method was tested through dynamic simulations of various contact tasks. The simulation results demonstrated that the proposed method optimizes the performance of the contact tasks in uncertain conditions of the environment.

  14. Gaussian Processes for Data-Efficient Learning in Robotics and Control.

    Science.gov (United States)

    Deisenroth, Marc Peter; Fox, Dieter; Rasmussen, Carl Edward

    2015-02-01

    Autonomous learning has been a promising direction in control and robotics for more than a decade since data-driven learning allows to reduce the amount of engineering knowledge, which is otherwise required. However, autonomous reinforcement learning (RL) approaches typically require many interactions with the system to learn controllers, which is a practical limitation in real systems, such as robots, where many interactions can be impractical and time consuming. To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, realistic simulators, pre-shaped policies, or specific knowledge about the underlying dynamics. In this paper, we follow a different approach and speed up learning by extracting more information from data. In particular, we learn a probabilistic, non-parametric Gaussian process transition model of the system. By explicitly incorporating model uncertainty into long-term planning and controller learning our approach reduces the effects of model errors, a key problem in model-based learning. Compared to state-of-the art RL our model-based policy search method achieves an unprecedented speed of learning. We demonstrate its applicability to autonomous learning in real robot and control tasks.

  15. Reinforcement Learning on autonomous humanoid robots

    NARCIS (Netherlands)

    Schuitema, E.

    2012-01-01

    Service robots have the potential to be of great value in households, health care and other labor intensive environments. However, these environments are typically unique, not very structured and frequently changing, which makes it difficult to make service robots robust and versatile through manual

  16. Fiber-Reinforced Origamic Robotic Actuator.

    Science.gov (United States)

    Yi, Juan; Chen, Xiaojiao; Song, Chaoyang; Wang, Zheng

    2018-02-01

    A novel pneumatic soft linear actuator Fiber-reinforced Origamic Robotic Actuator (FORA) is proposed with significant improvements on the popular McKibben-type actuators, offering nearly doubled motion range, substantially improved force profile, and significantly lower actuation pressure. The desirable feature set is made possible by a novel soft origamic chamber that expands radially while contracts axially when pressurized. Combining this new origamic chamber with a reinforcing fiber mesh, FORA generates very high traction force (over 150N) and very large contractile motion (over 50%) at very low input pressure (100 kPa). We developed quasi-static analytical models both to characterize the motion and forces and as guidelines for actuator design. Fabrication of FORA mostly involves consumer-grade three-dimensional (3D) printing. We provide a detailed list of materials and dimensions. Fabricated FORAs were tested on a dedicated platform against commercially available pneumatic artificial muscles from Shadow and Festo to showcase its superior performances and validate the analytical models with very good agreements. Finally, a robotic joint was developed driven by two antagonistic FORAs, to showcase the benefits of the performance improvements. With its simple structure, fully characterized mechanism, easy fabrication procedure, and highly desirable performance, FORA could be easily customized to application requirements and fabricated by anyone with access to a 3D printer. This will pave the way to the wider adaptation and application of soft robotic systems.

  17. Grounding the meanings in sensorimotor behavior using reinforcement learning

    Directory of Open Access Journals (Sweden)

    Igor eFarkaš

    2012-02-01

    Full Text Available The recent outburst of interest in cognitive developmental robotics is fueled by the ambition to propose ecologically plausible mechanisms of how, among other things, a learning agent/robot could ground linguistic meanings in its sensorimotor behaviour. Along this stream, we propose a model that allows the simulated iCub robot to learn the meanings of actions (point, touch and push oriented towards objects in robot's peripersonal space. In our experiments, the iCub learns to execute motor actions and comment on them. Architecturally, the model is composed of three neural-network-based modules that are trained in different ways. The first module, a two-layer perceptron, is trained by back-propagation to attend to the target position in the visual scene, given the low-level visual information and the feature-based target information. The second module, having the form of an actor-critic architecture, is the most distinguishing part of our model, and is trained by a continuous version of reinforcement learning to execute actions as sequences, based on a linguistic command. The third module, an echo-state network, is trained to provide the linguistic description of the executed actions. The trained model generalises well in case of novel action-target combinations with randomised initial arm positions. It can also promptly adapt its behavior if the action/target suddenly changes during motor execution.

  18. A line follower robot implementation using Lego's Mindstorms Kit and Q-Learning

    Directory of Open Access Journals (Sweden)

    Hector-Gabriel Acosta-Mesa

    2012-03-01

    Full Text Available Un problema común al trabajar con robots móviles es que la fase de programación puede ser un proceso largo, costoso y difícil para los programadores. Los Algoritmos de Aprendizaje por Refuerzo ofrecen uno de los marcos de trabajo más generales en el ámbito de aprendizaje de máquina. Este trabajo presenta un enfoque usando el algoritmo de Q-Learning en un robot Lego para que aprenda "por sí mismo" a seguir una línea negra dibujada en una superficie blanca. El entorno de programación utilizado en este trabajo es Matlab.A common problem working with mobile robots is that programming phase could be a long, expensive and heavy process for programmers. The reinforcement learning algorithms offer one of the most general frameworks in learning subjects. This work presents an approach using the Q-Learning algorithm on a Lego robot in order for it to learn by itself how to follow a blackline drawn down on a white surface, using Matlab [5] as programming environment.

  19. What Can Reinforcement Learning Teach Us About Non-Equilibrium Quantum Dynamics

    Science.gov (United States)

    Bukov, Marin; Day, Alexandre; Sels, Dries; Weinberg, Phillip; Polkovnikov, Anatoli; Mehta, Pankaj

    Equilibrium thermodynamics and statistical physics are the building blocks of modern science and technology. Yet, our understanding of thermodynamic processes away from equilibrium is largely missing. In this talk, I will reveal the potential of what artificial intelligence can teach us about the complex behaviour of non-equilibrium systems. Specifically, I will discuss the problem of finding optimal drive protocols to prepare a desired target state in quantum mechanical systems by applying ideas from Reinforcement Learning [one can think of Reinforcement Learning as the study of how an agent (e.g. a robot) can learn and perfect a given policy through interactions with an environment.]. The driving protocols learnt by our agent suggest that the non-equilibrium world features possibilities easily defying intuition based on equilibrium physics.

  20. Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

    Science.gov (United States)

    Uchibe, Eiji; Doya, Kenji

    2008-12-01

    Understanding the design principle of reward functions is a substantial challenge both in artificial intelligence and neuroscience. Successful acquisition of a task usually requires not only rewards for goals, but also for intermediate states to promote effective exploration. This paper proposes a method for designing 'intrinsic' rewards of autonomous agents by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use Cyber Rodent robots, in which collision avoidance, recharging from battery packs, and 'mating' by software reproduction are three major 'extrinsic' rewards. We show in hardware experiments that the robots can find appropriate 'intrinsic' rewards for the vision of battery packs and other robots to promote approach behaviors.

  1. Toward cognitive robotics

    Science.gov (United States)

    Laird, John E.

    2009-05-01

    Our long-term goal is to develop autonomous robotic systems that have the cognitive abilities of humans, including communication, coordination, adapting to novel situations, and learning through experience. Our approach rests on the recent integration of the Soar cognitive architecture with both virtual and physical robotic systems. Soar has been used to develop a wide variety of knowledge-rich agents for complex virtual environments, including distributed training environments and interactive computer games. For development and testing in robotic virtual environments, Soar interfaces to a variety of robotic simulators and a simple mobile robot. We have recently made significant extensions to Soar that add new memories and new non-symbolic reasoning to Soar's original symbolic processing, which should significantly improve Soar abilities for control of robots. These extensions include episodic memory, semantic memory, reinforcement learning, and mental imagery. Episodic memory and semantic memory support the learning and recalling of prior events and situations as well as facts about the world. Reinforcement learning provides the ability of the system to tune its procedural knowledge - knowledge about how to do things. Mental imagery supports the use of diagrammatic and visual representations that are critical to support spatial reasoning. We speculate on the future of unmanned systems and the need for cognitive robotics to support dynamic instruction and taskability.

  2. Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions.

    Science.gov (United States)

    Tamosiunaite, Minija; Asfour, Tamim; Wörgötter, Florentin

    2009-03-01

    Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult.

  3. Algorithms for Reinforcement Learning

    CERN Document Server

    Szepesvari, Csaba

    2010-01-01

    Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms'

  4. Towards autonomous neuroprosthetic control using Hebbian reinforcement learning.

    Science.gov (United States)

    Mahmoudi, Babak; Pohlmeyer, Eric A; Prins, Noeline W; Geng, Shijia; Sanchez, Justin C

    2013-12-01

    Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.

  5. Personalizing a Service Robot by Learning Human Habits from Behavioral Footprints

    Directory of Open Access Journals (Sweden)

    Kun Li

    2015-03-01

    Full Text Available For a domestic personal robot, personalized services are as important as predesigned tasks, because the robot needs to adjust the home state based on the operator's habits. An operator's habits are composed of cues, behaviors, and rewards. This article introduces behavioral footprints to describe the operator's behaviors in a house, and applies the inverse reinforcement learning technique to extract the operator's habits, represented by a reward function. We implemented the proposed approach with a mobile robot on indoor temperature adjustment, and compared this approach with a baseline method that recorded all the cues and behaviors of the operator. The result shows that the proposed approach allows the robot to reveal the operator's habits accurately and adjust the environment state accordingly.

  6. Learning for intelligent mobile robots

    Science.gov (United States)

    Hall, Ernest L.; Liao, Xiaoqun; Alhaj Ali, Souma M.

    2003-10-01

    Unlike intelligent industrial robots which often work in a structured factory setting, intelligent mobile robots must often operate in an unstructured environment cluttered with obstacles and with many possible action paths. However, such machines have many potential applications in medicine, defense, industry and even the home that make their study important. Sensors such as vision are needed. However, in many applications some form of learning is also required. The purpose of this paper is to present a discussion of recent technical advances in learning for intelligent mobile robots. During the past 20 years, the use of intelligent industrial robots that are equipped not only with motion control systems but also with sensors such as cameras, laser scanners, or tactile sensors that permit adaptation to a changing environment has increased dramatically. However, relatively little has been done concerning learning. Adaptive and robust control permits one to achieve point to point and controlled path operation in a changing environment. This problem can be solved with a learning control. In the unstructured environment, the terrain and consequently the load on the robot"s motors are constantly changing. Learning the parameters of a proportional, integral and derivative controller (PID) and artificial neural network provides an adaptive and robust control. Learning may also be used for path following. Simulations that include learning may be conducted to see if a robot can learn its way through a cluttered array of obstacles. If a situation is performed repetitively, then learning can also be used in the actual application. To reach an even higher degree of autonomous operation, a new level of learning is required. Recently learning theories such as the adaptive critic have been proposed. In this type of learning a critic provides a grade to the controller of an action module such as a robot. The creative control process is used that is "beyond the adaptive critic." A

  7. The Reinforcement Learning Competition 2014

    OpenAIRE

    Dimitrakakis, Christos; Li, Guangliang; Tziortziotis, Nikoalos

    2014-01-01

    Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the development of very general learning agents for arbitrary reinforcement learning problems and to provide a test-bed for the unbiased evaluation of algorithms.

  8. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning.

    Science.gov (United States)

    Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S

    2011-01-01

    As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. © 2011 IEEE

  9. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction.

    Science.gov (United States)

    de Greeff, Joachim; Belpaeme, Tony

    2015-01-01

    Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children's social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference); the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a "mental model" of the robot, tailoring the tutoring to the robot's performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot's bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance.

  10. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction.

    Directory of Open Access Journals (Sweden)

    Joachim de Greeff

    Full Text Available Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children's social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference; the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a "mental model" of the robot, tailoring the tutoring to the robot's performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot's bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance.

  11. Robots Learn Writing

    Directory of Open Access Journals (Sweden)

    Huan Tan

    2012-01-01

    Full Text Available This paper proposes a general method for robots to learn motions and corresponding semantic knowledge simultaneously. A modified ISOMAP algorithm is used to convert the sampled 6D vectors of joint angles into 2D trajectories, and the required movements for writing numbers are learned from this modified ISOMAP-based model. Using this algorithm, the knowledge models are established. Learned motion and knowledge models are stored in a 2D latent space. Gaussian Process (GP method is used to model and represent these models. Practical experiments are carried out on a humanoid robot, named ISAC, to learn the semantic representations of numbers and the movements of writing numbers through imitation and to verify the effectiveness of this framework. This framework is applied into training a humanoid robot, named ISAC. At the learning stage, ISAC not only learns the dynamics of the movement required to write the numbers, but also learns the semantic meaning of the numbers which are related to the writing movements from the same data set. Given speech commands, ISAC recognizes the words and generated corresponding motion trajectories to write the numbers. This imitation learning method is implemented on a cognitive architecture to provide robust cognitive information processing.

  12. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction

    Science.gov (United States)

    de Greeff, Joachim; Belpaeme, Tony

    2015-01-01

    Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children’s social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference); the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a “mental model” of the robot, tailoring the tutoring to the robot’s performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot’s bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance. PMID:26422143

  13. Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: a simulated robotic study.

    Science.gov (United States)

    Mirolli, Marco; Santucci, Vieri G; Baldassarre, Gianluca

    2013-03-01

    An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. Learning and Chaining of Motor Primitives for Goal-directed Locomotion of a Snake-Like Robot with Screw-Drive Units

    DEFF Research Database (Denmark)

    Chatterjee, Sromona; Nachstedt, Timo; Tamosiunaite, Minija

    2015-01-01

    -directed locomotion for the robot. The behavioural primitives of the robot are generated using a reinforcement learning approach called "Policy Improvement with Path Integrals" (PI2). PI2 is numerically simple and has the ability to deal with high-dimensional systems. Here, PI2 is used to learn the robot’s motor...... controls by finding proper locomotion control parameters, like joint angles and screw-drive unit velocities, in a coordinated manner for different goals. Thus, it is able to generate a large repertoire of motor primitives, which are selectively stored to form a primitive library. The learning process...

  15. Value learning through reinforcement : The basics of dopamine and reinforcement learning

    NARCIS (Netherlands)

    Daw, N.D.; Tobler, P.N.; Glimcher, P.W.; Fehr, E.

    2013-01-01

    This chapter provides an overview of reinforcement learning and temporal difference learning and relates these topics to the firing properties of midbrain dopamine neurons. First, we review the RescorlaWagner learning rule and basic learning phenomena, such as blocking, which the rule explains. Then

  16. A Game Theoretic Approach to Swarm Robotics

    Directory of Open Access Journals (Sweden)

    S. N. Givigi

    2006-01-01

    Full Text Available In this article, we discuss some techniques for achieving swarm intelligent robots through the use of traits of personality. Traits of personality are characteristics of each robot that, altogether, define the robot's behaviours. We discuss the use of evolutionary psychology to select a set of traits of personality that will evolve due to a learning process based on reinforcement learning. The use of Game Theory is introduced, and some simulations showing its potential are reported.

  17. Deep Reinforcement Learning: An Overview

    OpenAIRE

    Li, Yuxi

    2017-01-01

    We give an overview of recent exciting achievements of deep reinforcement learning (RL). We discuss six core elements, six important mechanisms, and twelve applications. We start with background of machine learning, deep learning and reinforcement learning. Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. After that, we discuss important mechanisms for RL, including attention and memory, unsuperv...

  18. Machine Learning for Robotic Vision

    OpenAIRE

    Drummond, Tom

    2018-01-01

    Machine learning is a crucial enabling technology for robotics, in particular for unlocking the capabilities afforded by visual sensing. This talk will present research within Prof Drummond’s lab that explores how machine learning can be developed and used within the context of Robotic Vision.

  19. A learning-based semi-autonomous controller for robotic exploration of unknown disaster scenes while searching for victims.

    Science.gov (United States)

    Doroodgar, Barzin; Liu, Yugang; Nejat, Goldie

    2014-12-01

    Semi-autonomous control schemes can address the limitations of both teleoperation and fully autonomous robotic control of rescue robots in disaster environments by allowing a human operator to cooperate and share such tasks with a rescue robot as navigation, exploration, and victim identification. In this paper, we present a unique hierarchical reinforcement learning-based semi-autonomous control architecture for rescue robots operating in cluttered and unknown urban search and rescue (USAR) environments. The aim of the controller is to enable a rescue robot to continuously learn from its own experiences in an environment in order to improve its overall performance in exploration of unknown disaster scenes. A direction-based exploration technique is integrated in the controller to expand the search area of the robot via the classification of regions and the rubble piles within these regions. Both simulations and physical experiments in USAR-like environments verify the robustness of the proposed HRL-based semi-autonomous controller to unknown cluttered scenes with different sizes and varying types of configurations.

  20. Learning ROS for robotics programming

    CERN Document Server

    Martinez, Aaron

    2013-01-01

    The book will take an easy-to-follow and engaging tutorial approach, providing a practical and comprehensive way to learn ROS.If you are a robotic enthusiast who wants to learn how to build and program your own robots in an easy-to-develop, maintainable and shareable way, ""Learning ROS for Robotics Programming"" is for you. In order to make the most of the book, you should have some C++ programming background, knowledge of GNU/Linux systems, and computer science in general. No previous background on ROS is required, since this book provides all the skills required. It is also advisable to hav

  1. Brain-Machine Interface control of a robot arm using actor-critic rainforcement learning.

    Science.gov (United States)

    Pohlmeyer, Eric A; Mahmoudi, Babak; Geng, Shijia; Prins, Noeline; Sanchez, Justin C

    2012-01-01

    Here we demonstrate how a marmoset monkey can use a reinforcement learning (RL) Brain-Machine Interface (BMI) to effectively control the movements of a robot arm for a reaching task. In this work, an actor-critic RL algorithm used neural ensemble activity in the monkey's motor cortext to control the robot movements during a two-target decision task. This novel approach to decoding offers unique advantages for BMI control applications. Compared to supervised learning decoding methods, the actor-critic RL algorithm does not require an explicit set of training data to create a static control model, but rather it incrementally adapts the model parameters according to its current performance, in this case requiring only a very basic feedback signal. We show how this algorithm achieved high performance when mapping the monkey's neural states (94%) to robot actions, and only needed to experience a few trials before obtaining accurate real-time control of the robot arm. Since RL methods responsively adapt and adjust their parameters, they can provide a method to create BMIs that are robust against perturbations caused by changes in either the neural input space or the output actions they generate under different task requirements or goals.

  2. Reinforcement learning in supply chains.

    Science.gov (United States)

    Valluri, Annapurna; North, Michael J; Macal, Charles M

    2009-10-01

    Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.

  3. Rational and Mechanistic Perspectives on Reinforcement Learning

    Science.gov (United States)

    Chater, Nick

    2009-01-01

    This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…

  4. Morphology Independent Learning in Modular Robots

    DEFF Research Database (Denmark)

    Christensen, David Johan; Bordignon, Mirko; Schultz, Ulrik Pagh

    2009-01-01

    speed its modules independently and in parallel adjust their behavior based on a single global reward signal. In simulation, we study the learning strategy’s performance on different robot configurations. On the physical platform, we perform learning experiments with ATRON robots learning to move as fast...

  5. Morphology Independent Learning in Modular Robots

    DEFF Research Database (Denmark)

    Christensen, David Johan; Bordignon, Mirko; Schultz, Ulrik Pagh

    2009-01-01

    speed its modules independently and in parallel adjust their behavior based on a single global reward signal. In simulation, we study the learning strategy?s performance on different robot con?gurations. On the physical platform, we perform learning experiments with ATRON robots learning to move as fast...

  6. Robot Learning a New Subfield? The Robolearn-96 Workshop

    OpenAIRE

    Hexmoor, Henry; Meeden, Lisa; Murphy, Robin R.

    1997-01-01

    This article posits the idea of robot learning as a new subfield. The results of the Robolearn-96 Workshop provide evidence that learning in modern robotics is distinct from traditional machine learning. The article examines the role of robotics in the social and natural sciences and the potential impact of learning on robotics, generating both a continuum of research issues and a description of the divergent terminology, target domains, and standards of proof associated with robot learning. ...

  7. Robot learning and error correction

    Science.gov (United States)

    Friedman, L.

    1977-01-01

    A model of robot learning is described that associates previously unknown perceptions with the sensed known consequences of robot actions. For these actions, both the categories of outcomes and the corresponding sensory patterns are incorporated in a knowledge base by the system designer. Thus the robot is able to predict the outcome of an action and compare the expectation with the experience. New knowledge about what to expect in the world may then be incorporated by the robot in a pre-existing structure whether it detects accordance or discrepancy between a predicted consequence and experience. Errors committed during plan execution are detected by the same type of comparison process and learning may be applied to avoiding the errors.

  8. Serendipitous Offline Learning in a Neuromorphic Robot

    Directory of Open Access Journals (Sweden)

    Terrence C Stewart

    2016-02-01

    Full Text Available We demonstrate a hybrid neuromorphic learning paradigm that learns complex sensorimotor mappings based on a small set of hard-coded reflex behaviours. A mobile robot is first controlled by a basic set of reflexive hand-designed behaviours. All sensor data is provided via a spike-based silicon retina camera (eDVS, and all control is implemented via spiking neurons simulated on neuromorphic hardware (SpiNNaker. Given this control system, the robot is capable of simple obstacle avoidance and random exploration. To train the robot to perform more complex tasks, we observe the robot and find instances where he robot accidentally performs the desired action. Data recorded from the robot during these times is then used to update the neural control system, increasing the likelihood of the robot performing that task in the future, given a similar sensor state. As an example application of this general-purpose method of training, we demonstrate the robot learning to respond to novel sensory stimuli (a mirror by turning right if it is present at an intersection, and otherwise turning left. In general, this system can learn arbitrary relations between sensory input and motor behaviour.

  9. Switching Reinforcement Learning for Continuous Action Space

    Science.gov (United States)

    Nagayoshi, Masato; Murao, Hajime; Tamaki, Hisashi

    Reinforcement Learning (RL) attracts much attention as a technique of realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. This difficulty includes a problem of designing a suitable action space of an agent, i.e., satisfying two requirements in trade-off: (i) to keep the characteristics (or structure) of an original search space as much as possible in order to seek strategies that lie close to the optimal, and (ii) to reduce the search space as much as possible in order to expedite the learning process. In order to design a suitable action space adaptively, we propose switching RL model to mimic a process of an infant's motor development in which gross motor skills develop before fine motor skills. Then, a method for switching controllers is constructed by introducing and referring to the “entropy”. Further, through computational experiments by using robot navigation problems with one and two-dimensional continuous action space, the validity of the proposed method has been confirmed.

  10. Learning to trade via direct reinforcement.

    Science.gov (United States)

    Moody, J; Saffell, M

    2001-01-01

    We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision-making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills.

  11. Construction of multi-agent mobile robots control system in the problem of persecution with using a modified reinforcement learning method based on neural networks

    Science.gov (United States)

    Patkin, M. L.; Rogachev, G. N.

    2018-02-01

    A method for constructing a multi-agent control system for mobile robots based on training with reinforcement using deep neural networks is considered. Synthesis of the management system is proposed to be carried out with reinforcement training and the modified Actor-Critic method, in which the Actor module is divided into Action Actor and Communication Actor in order to simultaneously manage mobile robots and communicate with partners. Communication is carried out by sending partners at each step a vector of real numbers that are added to the observation vector and affect the behaviour. Functions of Actors and Critic are approximated by deep neural networks. The Critics value function is trained by using the TD-error method and the Actor’s function by using DDPG. The Communication Actor’s neural network is trained through gradients received from partner agents. An environment in which a cooperative multi-agent interaction is present was developed, computer simulation of the application of this method in the control problem of two robots pursuing two goals was carried out.

  12. Interactive language learning by robots: the transition from babbling to word forms.

    Science.gov (United States)

    Lyon, Caroline; Nehaniv, Chrystopher L; Saunders, Joe

    2012-01-01

    The advent of humanoid robots has enabled a new approach to investigating the acquisition of language, and we report on the development of robots able to acquire rudimentary linguistic skills. Our work focuses on early stages analogous to some characteristics of a human child of about 6 to 14 months, the transition from babbling to first word forms. We investigate one mechanism among many that may contribute to this process, a key factor being the sensitivity of learners to the statistical distribution of linguistic elements. As well as being necessary for learning word meanings, the acquisition of anchor word forms facilitates the segmentation of an acoustic stream through other mechanisms. In our experiments some salient one-syllable word forms are learnt by a humanoid robot in real-time interactions with naive participants. Words emerge from random syllabic babble through a learning process based on a dialogue between the robot and the human participant, whose speech is perceived by the robot as a stream of phonemes. Numerous ways of representing the speech as syllabic segments are possible. Furthermore, the pronunciation of many words in spontaneous speech is variable. However, in line with research elsewhere, we observe that salient content words are more likely than function words to have consistent canonical representations; thus their relative frequency increases, as does their influence on the learner. Variable pronunciation may contribute to early word form acquisition. The importance of contingent interaction in real-time between teacher and learner is reflected by a reinforcement process, with variable success. The examination of individual cases may be more informative than group results. Nevertheless, word forms are usually produced by the robot after a few minutes of dialogue, employing a simple, real-time, frequency dependent mechanism. This work shows the potential of human-robot interaction systems in studies of the dynamics of early language

  13. Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters.

    Science.gov (United States)

    Khamassi, Mehdi; Enel, Pierre; Dominey, Peter Ford; Procyk, Emmanuel

    2013-01-01

    Converging evidence suggest that the medial prefrontal cortex (MPFC) is involved in feedback categorization, performance monitoring, and task monitoring, and may contribute to the online regulation of reinforcement learning (RL) parameters that would affect decision-making processes in the lateral prefrontal cortex (LPFC). Previous neurophysiological experiments have shown MPFC activities encoding error likelihood, uncertainty, reward volatility, as well as neural responses categorizing different types of feedback, for instance, distinguishing between choice errors and execution errors. Rushworth and colleagues have proposed that the involvement of MPFC in tracking the volatility of the task could contribute to the regulation of one of RL parameters called the learning rate. We extend this hypothesis by proposing that MPFC could contribute to the regulation of other RL parameters such as the exploration rate and default action values in case of task shifts. Here, we analyze the sensitivity to RL parameters of behavioral performance in two monkey decision-making tasks, one with a deterministic reward schedule and the other with a stochastic one. We show that there exist optimal parameter values specific to each of these tasks, that need to be found for optimal performance and that are usually hand-tuned in computational models. In contrast, automatic online regulation of these parameters using some heuristics can help producing a good, although non-optimal, behavioral performance in each task. We finally describe our computational model of MPFC-LPFC interaction used for online regulation of the exploration rate and its application to a human-robot interaction scenario. There, unexpected uncertainties are produced by the human introducing cued task changes or by cheating. The model enables the robot to autonomously learn to reset exploration in response to such uncertain cues and events. The combined results provide concrete evidence specifying how prefrontal

  14. A Bayesian Developmental Approach to Robotic Goal-Based Imitation Learning.

    Directory of Open Access Journals (Sweden)

    Michael Jae-Yoon Chung

    Full Text Available A fundamental challenge in robotics today is building robots that can learn new skills by observing humans and imitating human actions. We propose a new Bayesian approach to robotic learning by imitation inspired by the developmental hypothesis that children use self-experience to bootstrap the process of intention recognition and goal-based imitation. Our approach allows an autonomous agent to: (i learn probabilistic models of actions through self-discovery and experience, (ii utilize these learned models for inferring the goals of human actions, and (iii perform goal-based imitation for robotic learning and human-robot collaboration. Such an approach allows a robot to leverage its increasing repertoire of learned behaviors to interpret increasingly complex human actions and use the inferred goals for imitation, even when the robot has very different actuators from humans. We demonstrate our approach using two different scenarios: (i a simulated robot that learns human-like gaze following behavior, and (ii a robot that learns to imitate human actions in a tabletop organization task. In both cases, the agent learns a probabilistic model of its own actions, and uses this model for goal inference and goal-based imitation. We also show that the robotic agent can use its probabilistic model to seek human assistance when it recognizes that its inferred actions are too uncertain, risky, or impossible to perform, thereby opening the door to human-robot collaboration.

  15. A Semi-Open Learning Environment for Mobile Robotics

    Directory of Open Access Journals (Sweden)

    Enrique Sucar

    2007-05-01

    Full Text Available We have developed a semi-open learning environment for mobile robotics, to learn through free exploration, but with specific performance criteria that guides the learning process. The environment includes virtual and remote robotics laboratories, and an intelligent virtual assistant the guides the students using the labs. A series of experiments in the virtual and remote labs are designed to gradually learn the basics of mobile robotics. Each experiment considers exploration and performance aspects, which are evaluated by the virtual assistant, giving feedback to the user. The virtual laboratory has been incorporated to a course in mobile robotics and used by a group of students. A preliminary evaluation shows that the intelligent tutor combined with the virtual laboratory can improve the learning process.

  16. ROBOT LEARNING OF OBJECT MANIPULATION TASK ACTIONS FROM HUMAN DEMONSTRATIONS

    Directory of Open Access Journals (Sweden)

    Maria Kyrarini

    2017-08-01

    Full Text Available Robot learning from demonstration is a method which enables robots to learn in a similar way as humans. In this paper, a framework that enables robots to learn from multiple human demonstrations via kinesthetic teaching is presented. The subject of learning is a high-level sequence of actions, as well as the low-level trajectories necessary to be followed by the robot to perform the object manipulation task. The multiple human demonstrations are recorded and only the most similar demonstrations are selected for robot learning. The high-level learning module identifies the sequence of actions of the demonstrated task. Using Dynamic Time Warping (DTW and Gaussian Mixture Model (GMM, the model of demonstrated trajectories is learned. The learned trajectory is generated by Gaussian mixture regression (GMR from the learned Gaussian mixture model.  In online working phase, the sequence of actions is identified and experimental results show that the robot performs the learned task successfully.

  17. Human-robot skills transfer interfaces for a flexible surgical robot.

    Science.gov (United States)

    Calinon, Sylvain; Bruno, Danilo; Malekzadeh, Milad S; Nanayakkara, Thrishantha; Caldwell, Darwin G

    2014-09-01

    In minimally invasive surgery, tools go through narrow openings and manipulate soft organs to perform surgical tasks. There are limitations in current robot-assisted surgical systems due to the rigidity of robot tools. The aim of the STIFF-FLOP European project is to develop a soft robotic arm to perform surgical tasks. The flexibility of the robot allows the surgeon to move within organs to reach remote areas inside the body and perform challenging procedures in laparoscopy. This article addresses the problem of designing learning interfaces enabling the transfer of skills from human demonstration. Robot programming by demonstration encompasses a wide range of learning strategies, from simple mimicking of the demonstrator's actions to the higher level imitation of the underlying intent extracted from the demonstrations. By focusing on this last form, we study the problem of extracting an objective function explaining the demonstrations from an over-specified set of candidate reward functions, and using this information for self-refinement of the skill. In contrast to inverse reinforcement learning strategies that attempt to explain the observations with reward functions defined for the entire task (or a set of pre-defined reward profiles active for different parts of the task), the proposed approach is based on context-dependent reward-weighted learning, where the robot can learn the relevance of candidate objective functions with respect to the current phase of the task or encountered situation. The robot then exploits this information for skills refinement in the policy parameters space. The proposed approach is tested in simulation with a cutting task performed by the STIFF-FLOP flexible robot, using kinesthetic demonstrations from a Barrett WAM manipulator. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  18. Robot learning from human teachers

    CERN Document Server

    Chernova, Sonia

    2014-01-01

    Learning from Demonstration (LfD) explores techniques for learning a task policy from examples provided by a human teacher. The field of LfD has grown into an extensive body of literature over the past 30 years, with a wide variety of approaches for encoding human demonstrations and modeling skills and tasks. Additionally, we have recently seen a focus on gathering data from non-expert human teachers (i.e., domain experts but not robotics experts). In this book, we provide an introduction to the field with a focus on the unique technical challenges associated with designing robots that learn f

  19. Robots show us how to teach them: feedback from robots shapes tutoring behavior during action learning.

    Science.gov (United States)

    Vollmer, Anna-Lisa; Mühlig, Manuel; Steil, Jochen J; Pitsch, Karola; Fritsch, Jannik; Rohlfing, Katharina J; Wrede, Britta

    2014-01-01

    Robot learning by imitation requires the detection of a tutor's action demonstration and its relevant parts. Current approaches implicitly assume a unidirectional transfer of knowledge from tutor to learner. The presented work challenges this predominant assumption based on an extensive user study with an autonomously interacting robot. We show that by providing feedback, a robot learner influences the human tutor's movement demonstrations in the process of action learning. We argue that the robot's feedback strongly shapes how tutors signal what is relevant to an action and thus advocate a paradigm shift in robot action learning research toward truly interactive systems learning in and benefiting from interaction.

  20. Robot soccer action selection based on Q learning

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    This paper researches robot soccer action selection based on Q learning . The robot learn to activate particular behavior given their current situation and reward signal. We adopt neural network to implementations of Q learning for their generalization properties and limited computer memory requirements

  1. Can young children learn words from a robot?

    OpenAIRE

    Moriguchi, Yusuke; Kanda, Takayuki; Ishiguro, Hiroshi; Shimada, Yoko; Itakura, Shoji

    2011-01-01

    Young children generally learn words from other people. Recent research has shown that children can learn new actions and skills from nonhuman agents. This study examines whether young children could learn words from a robot. Preschool children were shown a video in which either a woman (human condition) or a mechanical robot (robot condition) labeled novel objects. Then the children were asked to select the objects according to the names used in the video. The results revealed that children ...

  2. Human-level control through deep reinforcement learning

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-01

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  3. Human-level control through deep reinforcement learning.

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-26

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  4. Stochastic abstract policies: generalizing knowledge to improve reinforcement learning.

    Science.gov (United States)

    Koga, Marcelo L; Freire, Valdinei; Costa, Anna H R

    2015-01-01

    Reinforcement learning (RL) enables an agent to learn behavior by acquiring experience through trial-and-error interactions with a dynamic environment. However, knowledge is usually built from scratch and learning to behave may take a long time. Here, we improve the learning performance by leveraging prior knowledge; that is, the learner shows proper behavior from the beginning of a target task, using the knowledge from a set of known, previously solved, source tasks. In this paper, we argue that building stochastic abstract policies that generalize over past experiences is an effective way to provide such improvement and this generalization outperforms the current practice of using a library of policies. We achieve that contributing with a new algorithm, AbsProb-PI-multiple and a framework for transferring knowledge represented as a stochastic abstract policy in new RL tasks. Stochastic abstract policies offer an effective way to encode knowledge because the abstraction they provide not only generalizes solutions but also facilitates extracting the similarities among tasks. We perform experiments in a robotic navigation environment and analyze the agent's behavior throughout the learning process and also assess the transfer ratio for different amounts of source tasks. We compare our method with the transfer of a library of policies, and experiments show that the use of a generalized policy produces better results by more effectively guiding the agent when learning a target task.

  5. Curiosity driven reinforcement learning for motion planning on humanoids

    Science.gov (United States)

    Frank, Mikhail; Leitner, Jürgen; Stollenga, Marijn; Förster, Alexander; Schmidhuber, Jürgen

    2014-01-01

    Most previous work on artificial curiosity (AC) and intrinsic motivation focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study AC in a more realistic setting, we embody a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning (RL) framework consists of a state-of-the-art, low-level, reactive control layer, which controls the iCub while respecting constraints, and a high-level curious agent, which explores the iCub's state-action space through information gain maximization, learning a world model from experience, controlling the actual iCub hardware in real-time. To the best of our knowledge, this is the first ever embodied, curious agent for real-time motion planning on a humanoid. We demonstrate that it can learn compact Markov models to represent large regions of the iCub's configuration space, and that the iCub explores intelligently, showing interest in its physical constraints as well as in objects it finds in its environment. PMID:24432001

  6. Learning Semantics of Gestural Instructions for Human-Robot Collaboration

    Science.gov (United States)

    Shukla, Dadhichi; Erkent, Özgür; Piater, Justus

    2018-01-01

    Designed to work safely alongside humans, collaborative robots need to be capable partners in human-robot teams. Besides having key capabilities like detecting gestures, recognizing objects, grasping them, and handing them over, these robots need to seamlessly adapt their behavior for efficient human-robot collaboration. In this context we present the fast, supervised Proactive Incremental Learning (PIL) framework for learning associations between human hand gestures and the intended robotic manipulation actions. With the proactive aspect, the robot is competent to predict the human's intent and perform an action without waiting for an instruction. The incremental aspect enables the robot to learn associations on the fly while performing a task. It is a probabilistic, statistically-driven approach. As a proof of concept, we focus on a table assembly task where the robot assists its human partner. We investigate how the accuracy of gesture detection affects the number of interactions required to complete the task. We also conducted a human-robot interaction study with non-roboticist users comparing a proactive with a reactive robot that waits for instructions. PMID:29615888

  7. Learning Semantics of Gestural Instructions for Human-Robot Collaboration.

    Science.gov (United States)

    Shukla, Dadhichi; Erkent, Özgür; Piater, Justus

    2018-01-01

    Designed to work safely alongside humans, collaborative robots need to be capable partners in human-robot teams. Besides having key capabilities like detecting gestures, recognizing objects, grasping them, and handing them over, these robots need to seamlessly adapt their behavior for efficient human-robot collaboration. In this context we present the fast, supervised Proactive Incremental Learning (PIL) framework for learning associations between human hand gestures and the intended robotic manipulation actions. With the proactive aspect, the robot is competent to predict the human's intent and perform an action without waiting for an instruction. The incremental aspect enables the robot to learn associations on the fly while performing a task. It is a probabilistic, statistically-driven approach. As a proof of concept, we focus on a table assembly task where the robot assists its human partner. We investigate how the accuracy of gesture detection affects the number of interactions required to complete the task. We also conducted a human-robot interaction study with non-roboticist users comparing a proactive with a reactive robot that waits for instructions.

  8. Robotic inspection of fiber reinforced composites using phased array UT

    Science.gov (United States)

    Stetson, Jeffrey T.; De Odorico, Walter

    2014-02-01

    Ultrasound is the current NDE method of choice to inspect large fiber reinforced airframe structures. Over the last 15 years Cartesian based scanning machines using conventional ultrasound techniques have been employed by all airframe OEMs and their top tier suppliers to perform these inspections. Technical advances in both computing power and commercially available, multi-axis robots now facilitate a new generation of scanning machines. These machines use multiple end effector tools taking full advantage of phased array ultrasound technologies yielding substantial improvements in inspection quality and productivity. This paper outlines the general architecture for these new robotic scanning systems as well as details the variety of ultrasonic techniques available for use with them including advances such as wide area phased array scanning and sound field adaptation for non-flat, non-parallel surfaces.

  9. A reinforcement learning model of joy, distress, hope and fear

    Science.gov (United States)

    Broekens, Joost; Jacobs, Elmer; Jonker, Catholijn M.

    2015-07-01

    In this paper we computationally study the relation between adaptive behaviour and emotion. Using the reinforcement learning framework, we propose that learned state utility, ?, models fear (negative) and hope (positive) based on the fact that both signals are about anticipation of loss or gain. Further, we propose that joy/distress is a signal similar to the error signal. We present agent-based simulation experiments that show that this model replicates psychological and behavioural dynamics of emotion. This work distinguishes itself by assessing the dynamics of emotion in an adaptive agent framework - coupling it to the literature on habituation, development, extinction and hope theory. Our results support the idea that the function of emotion is to provide a complex feedback signal for an organism to adapt its behaviour. Our work is relevant for understanding the relation between emotion and adaptation in animals, as well as for human-robot interaction, in particular how emotional signals can be used to communicate between adaptive agents and humans.

  10. Adaptive learning fuzzy control of a mobile robot

    International Nuclear Information System (INIS)

    Tsukada, Akira; Suzuki, Katsuo; Fujii, Yoshio; Shinohara, Yoshikuni

    1989-11-01

    In this report a problem is studied to construct a fuzzy controller for a mobile robot to move autonomously along a given reference direction curve, for which control rules are generated and acquired through an adaptive learning process. An adaptive learning fuzzy controller has been developed for a mobile robot. Good properties of the controller are shown through the travelling experiments of the mobile robot. (author)

  11. Neural Behavior Chain Learning of Mobile Robot Actions

    Directory of Open Access Journals (Sweden)

    Lejla Banjanovic-Mehmedovic

    2012-01-01

    Full Text Available This paper presents a visual/motor behavior learning approach, based on neural networks. We propose Behavior Chain Model (BCM in order to create a way of behavior learning. Our behavior-based system evolution task is a mobile robot detecting a target and driving/acting towards it. First, the mapping relations between the image feature domain of the object and the robot action domain are derived. Second, a multilayer neural network for offline learning of the mapping relations is used. This learning structure through neural network training process represents a connection between the visual perceptions and motor sequence of actions in order to grip a target. Last, using behavior learning through a noticed action chain, we can predict mobile robot behavior for a variety of similar tasks in similar environment. Prediction results suggest that the methodology is adequate and could be recognized as an idea for designing different mobile robot behaviour assistance.

  12. SCAFFOLDINGAND REINFORCEMENT: USING DIGITAL LOGBOOKS IN LEARNING VOCABULARY

    OpenAIRE

    Khalifa, Salma Hasan Almabrouk; Shabdin, Ahmad Affendi

    2016-01-01

    Reinforcement and scaffolding are tested approaches to enhance learning achievements. Keeping a record of the learning process as well as the new learned words functions as scaffolding to help learners build a comprehensive vocabulary. Similarly, repetitive learning of new words reinforces permanent learning for long-term memory. Paper-based logbooks may prove to be good records of the learning process, but if learners use digital logbooks, the results may be even better. Digital logbooks wit...

  13. Approaches to probabilistic model learning for mobile manipulation robots

    CERN Document Server

    Sturm, Jürgen

    2013-01-01

    Mobile manipulation robots are envisioned to provide many useful services both in domestic environments as well as in the industrial context. Examples include domestic service robots that implement large parts of the housework, and versatile industrial assistants that provide automation, transportation, inspection, and monitoring services. The challenge in these applications is that the robots have to function under changing, real-world conditions, be able to deal with considerable amounts of noise and uncertainty, and operate without the supervision of an expert. This book presents novel learning techniques that enable mobile manipulation robots, i.e., mobile platforms with one or more robotic manipulators, to autonomously adapt to new or changing situations. The approaches presented in this book cover the following topics: (1) learning the robot's kinematic structure and properties using actuation and visual feedback, (2) learning about articulated objects in the environment in which the robot is operating,...

  14. Reinforcement learning in complementarity game and population dynamics.

    Science.gov (United States)

    Jost, Jürgen; Li, Wei

    2014-02-01

    We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005)] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.

  15. Locomotion training of legged robots using hybrid machine learning techniques

    Science.gov (United States)

    Simon, William E.; Doerschuk, Peggy I.; Zhang, Wen-Ran; Li, Andrew L.

    1995-01-01

    In this study artificial neural networks and fuzzy logic are used to control the jumping behavior of a three-link uniped robot. The biped locomotion control problem is an increment of the uniped locomotion control. Study of legged locomotion dynamics indicates that a hierarchical controller is required to control the behavior of a legged robot. A structured control strategy is suggested which includes navigator, motion planner, biped coordinator and uniped controllers. A three-link uniped robot simulation is developed to be used as the plant. Neurocontrollers were trained both online and offline. In the case of on-line training, a reinforcement learning technique was used to train the neurocontroller to make the robot jump to a specified height. After several hundred iterations of training, the plant output achieved an accuracy of 7.4%. However, when jump distance and body angular momentum were also included in the control objectives, training time became impractically long. In the case of off-line training, a three-layered backpropagation (BP) network was first used with three inputs, three outputs and 15 to 40 hidden nodes. Pre-generated data were presented to the network with a learning rate as low as 0.003 in order to reach convergence. The low learning rate required for convergence resulted in a very slow training process which took weeks to learn 460 examples. After training, performance of the neurocontroller was rather poor. Consequently, the BP network was replaced by a Cerebeller Model Articulation Controller (CMAC) network. Subsequent experiments described in this document show that the CMAC network is more suitable to the solution of uniped locomotion control problems in terms of both learning efficiency and performance. A new approach is introduced in this report, viz., a self-organizing multiagent cerebeller model for fuzzy-neural control of uniped locomotion is suggested to improve training efficiency. This is currently being evaluated for a possible

  16. Belief reward shaping in reinforcement learning

    CSIR Research Space (South Africa)

    Marom, O

    2018-02-01

    Full Text Available A key challenge in many reinforcement learning problems is delayed rewards, which can significantly slow down learning. Although reward shaping has previously been introduced to accelerate learning by bootstrapping an agent with additional...

  17. Continuing Robot Skill Learning after Demonstration with Human Feedback

    Directory of Open Access Journals (Sweden)

    Argall Brenna D.

    2011-12-01

    Full Text Available Though demonstration-based approaches have been successfully applied to learning a variety of robot behaviors, there do exist some limitations. The ability to continue learning after demonstration, based on execution experience with the learned policy, therefore has proven to be an asset to many demonstration-based learning systems. This paper discusses important considerations for interfaces that provide feedback to adapt and improve demonstrated behaviors. Feedback interfaces developed for two robots with very different motion capabilities - a wheeled mobile robot and high degree-of-freedom humanoid - are highlighted.

  18. Human likeness: cognitive and affective factors affecting adoption of robot-assisted learning systems

    Science.gov (United States)

    Yoo, Hosun; Kwon, Ohbyung; Lee, Namyeon

    2016-07-01

    With advances in robot technology, interest in robotic e-learning systems has increased. In some laboratories, experiments are being conducted with humanoid robots as artificial tutors because of their likeness to humans, the rich possibilities of using this type of media, and the multimodal interaction capabilities of these robots. The robot-assisted learning system, a special type of e-learning system, aims to increase the learner's concentration, pleasure, and learning performance dramatically. However, very few empirical studies have examined the effect on learning performance of incorporating humanoid robot technology into e-learning systems or people's willingness to accept or adopt robot-assisted learning systems. In particular, human likeness, the essential characteristic of humanoid robots as compared with conventional e-learning systems, has not been discussed in a theoretical context. Hence, the purpose of this study is to propose a theoretical model to explain the process of adoption of robot-assisted learning systems. In the proposed model, human likeness is conceptualized as a combination of media richness, multimodal interaction capabilities, and para-social relationships; these factors are considered as possible determinants of the degree to which human cognition and affection are related to the adoption of robot-assisted learning systems.

  19. Adaptive representations for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.

    2010-01-01

    This book presents new algorithms for reinforcement learning, a form of machine learning in which an autonomous agent seeks a control policy for a sequential decision task. Since current methods typically rely on manually designed solution representations, agents that automatically adapt their own

  20. A Multi-Agent Control Architecture for a Robotic Wheelchair

    Directory of Open Access Journals (Sweden)

    C. Galindo

    2006-01-01

    Full Text Available Assistant robots like robotic wheelchairs can perform an effective and valuable work in our daily lives. However, they eventually may need external help from humans in the robot environment (particularly, the driver in the case of a wheelchair to accomplish safely and efficiently some tricky tasks for the current technology, i.e. opening a locked door, traversing a crowded area, etc. This article proposes a control architecture for assistant robots designed under a multi-agent perspective that facilitates the participation of humans into the robotic system and improves the overall performance of the robot as well as its dependability. Within our design, agents have their own intentions and beliefs, have different abilities (that include algorithmic behaviours and human skills and also learn autonomously the most convenient method to carry out their actions through reinforcement learning. The proposed architecture is illustrated with a real assistant robot: a robotic wheelchair that provides mobility to impaired or elderly people.

  1. Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers

    Science.gov (United States)

    Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.

    2014-01-01

    Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…

  2. Reinforcement learning in continuous state and action spaces

    NARCIS (Netherlands)

    H. P. van Hasselt (Hado); M.A. Wiering; M. van Otterlo

    2012-01-01

    textabstractMany traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action

  3. HCBPM: An Idea toward a Social Learning Environment for Humanoid Robot

    Directory of Open Access Journals (Sweden)

    Fady Alnajjar

    2010-01-01

    Full Text Available To advance robotics toward real-world applications, a growing body of research has focused on the development of control systems for humanoid robots in recent years. Several approaches have been proposed to support the learning stage of such controllers, where the robot can learn new behaviors by observing and/or receiving direct guidance from a human or even another robot. These approaches require dynamic learning and memorization techniques, which the robot can use to reform and update its internal systems continuously while learning new behaviors. Against this background, this study investigates a new approach to the development of an incremental learning and memorization model. This approach was inspired by the principles of neuroscience, and the developed model was named “Hierarchical Constructive Backpropagation with Memory” (HCBPM. The validity of the model was tested by teaching a humanoid robot to recognize a group of objects through natural interaction. The experimental results indicate that the proposed model efficiently enhances real-time machine learning in general and can be used to establish an environment suitable for social learning between the robot and the user in particular.

  4. Neural Basis of Reinforcement Learning and Decision Making

    Science.gov (United States)

    Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan

    2012-01-01

    Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543

  5. Autonomous reinforcement learning with experience replay.

    Science.gov (United States)

    Wawrzyński, Paweł; Tanwani, Ajay Kumar

    2013-05-01

    This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. Copyright © 2012 Elsevier Ltd. All rights reserved.

  6. Reinforcement Learning in Repeated Portfolio Decisions

    OpenAIRE

    Diao, Linan; Rieskamp, Jörg

    2011-01-01

    How do people make investment decisions when they receive outcome feedback? We examined how well the standard mean-variance model and two reinforcement models predict people's portfolio decisions. The basic reinforcement model predicts a learning process that relies solely on the portfolio's overall return, whereas the proposed extended reinforcement model also takes the risk and covariance of the investments into account. The experimental results illustrate that people reacted sensitively to...

  7. Exploration and Learning for Cognitive Robots

    NARCIS (Netherlands)

    Rudinac, M.

    2013-01-01

    Before a future with household robots is really feasible, those robots need to be easily adaptable to novel environments and users, be able to apply previously acquired knowledge, and able to learn from perceiving and interacting with the world and users around them. This thesis proposes a cognitive

  8. Reinforcement learning improves behaviour from evaluative feedback

    Science.gov (United States)

    Littman, Michael L.

    2015-05-01

    Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.

  9. Experiential Learning of Robotics Fundamentals Based on a Case Study of Robot-Assisted Stereotactic Neurosurgery

    Science.gov (United States)

    Faria, Carlos; Vale, Carolina; Machado, Toni; Erlhagen, Wolfram; Rito, Manuel; Monteiro, Sérgio; Bicho, Estela

    2016-01-01

    Robotics has been playing an important role in modern surgery, especially in procedures that require extreme precision, such as neurosurgery. This paper addresses the challenge of teaching robotics to undergraduate engineering students, through an experiential learning project of robotics fundamentals based on a case study of robot-assisted…

  10. Solution to reinforcement learning problems with artificial potential field

    Institute of Scientific and Technical Information of China (English)

    XIE Li-juan; XIE Guang-rong; CHEN Huan-wen; LI Xiao-li

    2008-01-01

    A novel method was designed to solve reinforcement learning problems with artificial potential field. Firstly a reinforcement learning problem was transferred to a path planning problem by using artificial potential field(APF), which was a very appropriate method to model a reinforcement learning problem. Secondly, a new APF algorithm was proposed to overcome the local minimum problem in the potential field methods with a virtual water-flow concept. The performance of this new method was tested by a gridworld problem named as key and door maze. The experimental results show that within 45 trials, good and deterministic policies are found in almost all simulations. In comparison with WIERING's HQ-learning system which needs 20 000 trials for stable solution, the proposed new method can obtain optimal and stable policy far more quickly than HQ-learning. Therefore, the new method is simple and effective to give an optimal solution to the reinforcement learning problem.

  11. Fable II: Design of a Modular Robot for Creative Learning

    DEFF Research Database (Denmark)

    Pacheco, Moises; Fogh, Rune; Lund, Henrik Hautop

    2015-01-01

    Robotic systems have a high potential for creative learning if they are flexible, accessible and engaging for the user in the experimental process of building and programming robots. In this paper we describe the Fable modular robotic system for creative learning which we develop to enable and mo...

  12. Global reinforcement training of CrossNets

    Science.gov (United States)

    Ma, Xiaolong

    2007-10-01

    Hybrid "CMOL" integrated circuits, incorporating advanced CMOS devices for neural cell bodies, nanowires as axons and dendrites, and latching switches as synapses, may be used for the hardware implementation of extremely dense (107 cells and 1012 synapses per cm2) neuromorphic networks, operating up to 10 6 times faster than their biological prototypes. We are exploring several "Cross- Net" architectures that accommodate the limitations imposed by CMOL hardware and should allow effective training of the networks without a direct external access to individual synapses. Our studies have show that CrossNets based on simple (two-terminal) crosspoint devices can work well in at least two modes: as Hop-field networks for associative memory and multilayer perceptrons for classification tasks. For more intelligent tasks (such as robot motion control or complex games), which do not have "examples" for supervised learning, more advanced training methods such as the global reinforcement learning are necessary. For application of global reinforcement training algorithms to CrossNets, we have extended Williams's REINFORCE learning principle to a more general framework and derived several learning rules that are more suitable for CrossNet hardware implementation. The results of numerical experiments have shown that these new learning rules can work well for both classification tasks and reinforcement tasks such as the cartpole balancing control problem. Some limitations imposed by the CMOL hardware need to be carefully addressed for the the successful application of in situ reinforcement training to CrossNets.

  13. Learning tactile skills through curious exploration

    Directory of Open Access Journals (Sweden)

    Leo ePape

    2012-07-01

    Full Text Available We present curiosity-driven, autonomous acquisition of tactile exploratory skills on a biomimetic robot finger equipped with an array of microelectromechanical touch sensors. Instead of building tailored algorithms for solving a specific tactile task, we employ a more general curiosity-driven reinforcement learning approach that autonomously learns a set of motor skills in absence of an explicit teacher signal. In this approach, the acquisition of skills is driven by the information content of the sensory input signals relative to a learner that aims at representing sensory inputs using fewer and fewer computational resources. We show that, from initially random exploration of its environment, the robotic system autonomously develops a small set of basic motor skills that lead to different kinds of tactile input. Next, the system learns how to exploit the learned motor skills to solve supervised texture classification tasks. Our approach demonstrates the feasibility of autonomous acquisition of tactile skills on physical robotic platforms through curiosity-driven reinforcement learning, overcomes typical difficulties of engineered solutions for active tactile exploration and underactuated control, and provides a basis for studying developmental learning through intrinsic motivation in robots.

  14. Application of a model of instrumental conditioning to mobile robot control

    Science.gov (United States)

    Saksida, Lisa M.; Touretzky, D. S.

    1997-09-01

    Instrumental conditioning is a psychological process whereby an animal learns to associate its actions with their consequences. This type of learning is exploited in animal training techniques such as 'shaping by successive approximations,' which enables trainers to gradually adjust the animal's behavior by giving strategically timed reinforcements. While this is similar in principle to reinforcement learning, the real phenomenon includes many subtle effects not considered in the machine learning literature. In addition, a good deal of domain information is utilized by an animal learning a new task; it does not start from scratch every time it learns a new behavior. For these reasons, it is not surprising that mobile robot learning algorithms have yet to approach the sophistication and robustness of animal learning. A serious attempt to model instrumental learning could prove fruitful for improving machine learning techniques. In the present paper, we develop a computational theory of shaping at a level appropriate for controlling mobile robots. The theory is based on a series of mechanisms for 'behavior editing,' in which pre-existing behaviors, either innate or previously learned, can be dramatically changed in magnitude, shifted in direction, or otherwise manipulated so as to produce new behavioral routines. We have implemented our theory on Amelia, an RWI B21 mobile robot equipped with a gripper and color video camera. We provide results from training Amelia on several tasks, all of which were constructed as variations of one innate behavior, object-pursuit.

  15. Reinforcement Learning in Autism Spectrum Disorder

    Directory of Open Access Journals (Sweden)

    Manuela Schuetze

    2017-11-01

    Full Text Available Early behavioral interventions are recognized as integral to standard care in autism spectrum disorder (ASD, and often focus on reinforcing desired behaviors (e.g., eye contact and reducing the presence of atypical behaviors (e.g., echoing others' phrases. However, efficacy of these programs is mixed. Reinforcement learning relies on neurocircuitry that has been reported to be atypical in ASD: prefrontal-sub-cortical circuits, amygdala, brainstem, and cerebellum. Thus, early behavioral interventions rely on neurocircuitry that may function atypically in at least a subset of individuals with ASD. Recent work has investigated physiological, behavioral, and neural responses to reinforcers to uncover differences in motivation and learning in ASD. We will synthesize this work to identify promising avenues for future research that ultimately can be used to enhance the efficacy of early intervention.

  16. Reinforcement and inference in cross-situational word learning.

    Science.gov (United States)

    Tilles, Paulo F C; Fontanari, José F

    2013-01-01

    Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation) inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only) reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.

  17. Robotic Fish to Aid Animal Behavior Studies and Informal Science Learning

    Science.gov (United States)

    Phamduy, Paul

    The application of robotic fish in the fields of animal behavior and informal science learning are new and relatively untapped. In the context of animal behavior studies, robotic fish offers a consistent and customizable stimulus that could contribute to dissect the determinants of social behavior. In the realm of informal science learning, robotic fish are gaining momentum for the possibility of educating the general public simultaneously on fish physiology and underwater robotics. In this dissertation, the design and development of a number of robotic fish platforms and prototypes and their application in animal behavioral studies and informal science learning settings are presented. Robotic platforms for animal behavioral studies focused on the utilization replica or same scale prototypes. A novel robotic fish platform, featuring a three-dimensional swimming multi-linked robotic fish, was developed with three control modes varying in the level of robot autonomy offered. This platform was deployed at numerous science festivals and science centers, to obtain data on visitor engagement and experience.

  18. The Robotic Decathlon: Project-Based Learning Labs and Curriculum Design for an Introductory Robotics Course

    Science.gov (United States)

    Cappelleri, D. J.; Vitoroulis, N.

    2013-01-01

    This paper presents a series of novel project-based learning labs for an introductory robotics course that are developed into a semester-long Robotic Decathlon. The last three events of the Robotic Decathlon are used as three final one-week-long project tasks; these replace a previous course project that was a semester-long robotics competition.…

  19. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

    OpenAIRE

    He, Frank S.; Liu, Yang; Schwing, Alexander G.; Peng, Jian

    2016-01-01

    We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and...

  20. Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.

    Science.gov (United States)

    Ren, Zhipeng; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Zhipeng Ren; Daoyi Dong; Huaxiong Li; Chunlin Chen; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Ren, Zhipeng

    2018-06-01

    In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.

  1. Manifold Regularized Reinforcement Learning.

    Science.gov (United States)

    Li, Hongliang; Liu, Derong; Wang, Ding

    2018-04-01

    This paper introduces a novel manifold regularized reinforcement learning scheme for continuous Markov decision processes. Smooth feature representations for value function approximation can be automatically learned using the unsupervised manifold regularization method. The learned features are data-driven, and can be adapted to the geometry of the state space. Furthermore, the scheme provides a direct basis representation extension for novel samples during policy learning and control. The performance of the proposed scheme is evaluated on two benchmark control tasks, i.e., the inverted pendulum and the energy storage problem. Simulation results illustrate the concepts of the proposed scheme and show that it can obtain excellent performance.

  2. Reinforcement learning for microgrid energy management

    International Nuclear Information System (INIS)

    Kuznetsova, Elizaveta; Li, Yan-Fu; Ruiz, Carlos; Zio, Enrico; Ault, Graham; Bell, Keith

    2013-01-01

    We consider a microgrid for energy distribution, with a local consumer, a renewable generator (wind turbine) and a storage facility (battery), connected to the external grid via a transformer. We propose a 2 steps-ahead reinforcement learning algorithm to plan the battery scheduling, which plays a key role in the achievement of the consumer goals. The underlying framework is one of multi-criteria decision-making by an individual consumer who has the goals of increasing the utilization rate of the battery during high electricity demand (so as to decrease the electricity purchase from the external grid) and increasing the utilization rate of the wind turbine for local use (so as to increase the consumer independence from the external grid). Predictions of available wind power feed the reinforcement learning algorithm for selecting the optimal battery scheduling actions. The embedded learning mechanism allows to enhance the consumer knowledge about the optimal actions for battery scheduling under different time-dependent environmental conditions. The developed framework gives the capability to intelligent consumers to learn the stochastic environment and make use of the experience to select optimal energy management actions. - Highlights: • A consumer exploits a 2 steps-ahead reinforcement learning for battery scheduling. • The Q-learning based mechanism is fed by the predictions of available wind power. • Wind speed state evolutions are modeled with a Markov chain model. • Optimal scheduling actions are learned through the occurrence of similar scenarios. • The consumer manifests a continuous enhance of his knowledge about optimal actions

  3. Learning from demonstration: Teaching a myoelectric prosthesis with an intact limb via reinforcement learning.

    Science.gov (United States)

    Vasan, Gautham; Pilarski, Patrick M

    2017-07-01

    Prosthetic arms should restore and extend the capabilities of someone with an amputation. They should move naturally and be able to perform elegant, coordinated movements that approximate those of a biological arm. Despite these objectives, the control of modern-day prostheses is often nonintuitive and taxing. Existing devices and control approaches do not yet give users the ability to effect highly synergistic movements during their daily-life control of a prosthetic device. As a step towards improving the control of prosthetic arms and hands, we introduce an intuitive approach to training a prosthetic control system that helps a user achieve hard-to-engineer control behaviours. Specifically, we present an actor-critic reinforcement learning method that for the first time promises to allow someone with an amputation to use their non-amputated arm to teach their prosthetic arm how to move through a wide range of coordinated motions and grasp patterns. We evaluate our method during the myoelectric control of a multi-joint robot arm by non-amputee users, and demonstrate that by using our approach a user can train their arm to perform simultaneous gestures and movements in all three degrees of freedom in the robot's hand and wrist based only on information sampled from the robot and the user's above-elbow myoelectric signals. Our results indicate that this learning-from-demonstration paradigm may be well suited to use by both patients and clinicians with minimal technical knowledge, as it allows a user to personalize the control of his or her prosthesis without having to know the underlying mechanics of the prosthetic limb. These preliminary results also suggest that our approach may extend in a straightforward way to next-generation prostheses with precise finger and wrist control, such that these devices may someday allow users to perform fluid and intuitive movements like playing the piano, catching a ball, and comfortably shaking hands.

  4. Educational resources and tools for robotic learning

    Directory of Open Access Journals (Sweden)

    Pablo Gil Vazquez

    2012-07-01

    Full Text Available Normal.dotm 0 0 1 139 795 Universidad de Salamanca 6 1 976 12.0 0 false 18 pt 18 pt 0 0 false false false /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} This paper discusses different teaching experiences which aims are the learning robotics in the university. These experiences are reflected in the development of several robotics courses and subjects at the University of Alicante.  The authors have created various educational platforms or they have used tools of free distribution and open source for the implementation of these courses. The main objetive of these courses is to teach the design and implementation of robotic solutions to solve various problems not only such as the control, programming and handling of robot but also the assembly, building and programming of educational mini-robots. On the one hand, new teaching tools are used such as simulators and virtual labs which make flexible the learning of robot arms. On the other hand, competitions are used to motivate students because this way, the students put into action the skills learned through building and programming low-cost mini-robots.

  5. Reinforcement learning or active inference?

    Science.gov (United States)

    Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J

    2009-07-29

    This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  6. Reinforcement learning or active inference?

    Directory of Open Access Journals (Sweden)

    Karl J Friston

    2009-07-01

    Full Text Available This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  7. Continuous residual reinforcement learning for traffic signal control optimization

    NARCIS (Netherlands)

    Aslani, Mohammad; Seipel, Stefan; Wiering, Marco

    2018-01-01

    Traffic signal control can be naturally regarded as a reinforcement learning problem. Unfortunately, it is one of the most difficult classes of reinforcement learning problems owing to its large state space. A straightforward approach to address this challenge is to control traffic signals based on

  8. Experiential Learning of Electronics Subject Matter in Middle School Robotics Courses

    Science.gov (United States)

    Rihtaršic, David; Avsec, Stanislav; Kocijancic, Slavko

    2016-01-01

    The purpose of this paper is to investigate whether the experiential learning of electronics subject matter is effective in the middle school open learning of robotics. Electronics is often ignored in robotics courses. Since robotics courses are typically comprised of computer-related subjects, and mechanical and electrical engineering, these…

  9. Effect of reinforcement learning on coordination of multiangent systems

    Science.gov (United States)

    Bukkapatnam, Satish T. S.; Gao, Greg

    2000-12-01

    For effective coordination of distributed environments involving multiagent systems, learning ability of each agent in the environment plays a crucial role. In this paper, we develop a simple group learning method based on reinforcement, and study its effect on coordination through application to a supply chain procurement scenario involving a computer manufacturer. Here, all parties are represented by self-interested, autonomous agents, each capable of performing specific simple tasks. They negotiate with each other to perform complex tasks and thus coordinate supply chain procurement. Reinforcement learning is intended to enable each agent to reach a best negotiable price within a shortest possible time. Our simulations of the application scenario under different learning strategies reveals the positive effects of reinforcement learning on an agent's as well as the system's performance.

  10. Intelligent control and cooperation for mobile robots

    Science.gov (United States)

    Stingu, Petru Emanuel

    The topic discussed in this work addresses the current research being conducted at the Automation & Robotics Research Institute in the areas of UAV quadrotor control and heterogenous multi-vehicle cooperation. Autonomy can be successfully achieved by a robot under the following conditions: the robot has to be able to acquire knowledge about the environment and itself, and it also has to be able to reason under uncertainty. The control system must react quickly to immediate challenges, but also has to slowly adapt and improve based on accumulated knowledge. The major contribution of this work is the transfer of the ADP algorithms from the purely theoretical environment to the complex real-world robotic platforms that work in real-time and in uncontrolled environments. Many solutions are adopted from those present in nature because they have been proven to be close to optimal in very different settings. For the control of a single platform, reinforcement learning algorithms are used to design suboptimal controllers for a class of complex systems that can be conceptually split in local loops with simpler dynamics and relatively weak coupling to the rest of the system. Optimality is enforced by having a global critic but the curse of dimensionality is avoided by using local actors and intelligent pre-processing of the information used for learning the optimal controllers. The system model is used for constructing the structure of the control system, but on top of that the adaptive neural networks that form the actors use the knowledge acquired during normal operation to get closer to optimal control. In real-world experiments, efficient learning is a strong requirement for success. This is accomplished by using an approximation of the system model to focus the learning for equivalent configurations of the state space. Due to the availability of only local data for training, neural networks with local activation functions are implemented. For the control of a formation

  11. Project InterActions: A Multigenerational Robotic Learning Environment

    Science.gov (United States)

    Bers, Marina U.

    2007-12-01

    This paper presents Project InterActions, a series of 5-week workshops in which very young learners (4- to 7-year-old children) and their parents come together to build and program a personally meaningful robotic project in the context of a multigenerational robotics-based community of practice. The goal of these family workshops is to teach both parents and children about the mechanical and programming aspects involved in robotics, as well as to initiate them in a learning trajectory with and about technology. Results from this project address different ways in which parents and children learn together and provide insights into how to develop educational interventions that would educate parents, as well as children, in new domains of knowledge and skills such as robotics and new technologies.

  12. Can model-free reinforcement learning explain deontological moral judgments?

    Science.gov (United States)

    Ayars, Alisabeth

    2016-05-01

    Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.

    Science.gov (United States)

    Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna

    2016-09-01

    Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing.

  14. Structured Kernel Subspace Learning for Autonomous Robot Navigation.

    Science.gov (United States)

    Kim, Eunwoo; Choi, Sungjoon; Oh, Songhwai

    2018-02-14

    This paper considers two important problems for autonomous robot navigation in a dynamic environment, where the goal is to predict pedestrian motion and control a robot with the prediction for safe navigation. While there are several methods for predicting the motion of a pedestrian and controlling a robot to avoid incoming pedestrians, it is still difficult to safely navigate in a dynamic environment due to challenges, such as the varying quality and complexity of training data with unwanted noises. This paper addresses these challenges simultaneously by proposing a robust kernel subspace learning algorithm based on the recent advances in nuclear-norm and l 1 -norm minimization. We model the motion of a pedestrian and the robot controller using Gaussian processes. The proposed method efficiently approximates a kernel matrix used in Gaussian process regression by learning low-rank structured matrix (with symmetric positive semi-definiteness) to find an orthogonal basis, which eliminates the effects of erroneous and inconsistent data. Based on structured kernel subspace learning, we propose a robust motion model and motion controller for safe navigation in dynamic environments. We evaluate the proposed robust kernel learning in various tasks, including regression, motion prediction, and motion control problems, and demonstrate that the proposed learning-based systems are robust against outliers and outperform existing regression and navigation methods.

  15. Human demonstrations for fast and safe exploration in reinforcement learning

    NARCIS (Netherlands)

    Schonebaum, G.K.; Junell, J.L.; van Kampen, E.

    2017-01-01

    Reinforcement learning is a promising framework for controlling complex vehicles with a high level of autonomy, since it does not need a dynamic model of the vehicle, and it is able to adapt to changing conditions. When learning from scratch, the performance of a reinforcement learning controller

  16. A distributed and morphology-independent strategy for adaptive locomotion in self-reconfigurable modular robots

    DEFF Research Database (Denmark)

    Christensen, David Johan; Schultz, Ulrik Pagh; Stoy, Kasper

    2013-01-01

    In this paper, we present a distributed reinforcement learning strategy for morphology-independent lifelong gait learning for modular robots. All modules run identical controllers that locally and independently optimize their action selection based on the robot’s velocity as a global, shared reward...

  17. GA-based fuzzy reinforcement learning for control of a magnetic bearing system.

    Science.gov (United States)

    Lin, C T; Jou, C P

    2000-01-01

    This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.

  18. A Project-based Learning approach for teaching Robotics to ...

    African Journals Online (AJOL)

    In this research we used a project-based learning approach to teach robotics basics to undergraduate business computing students. The course coverage includes basic electronics, robot construction and programming using arduino. Students developed and tested a robot prototype. The project was evaluated using a ...

  19. Reinforcement Learning in Continuous Action Spaces

    NARCIS (Netherlands)

    Hasselt, H. van; Wiering, M.A.

    2007-01-01

    Quite some research has been done on Reinforcement Learning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new class of algorithms named Continuous Actor Critic Learning Automaton (CACLA)

  20. Service Oriented Robotic Architecture for Space Robotics: Design, Testing, and Lessons Learned

    Science.gov (United States)

    Fluckiger, Lorenzo Jean Marc E; Utz, Hans Heinrich

    2013-01-01

    This paper presents the lessons learned from six years of experiments with planetary rover prototypes running the Service Oriented Robotic Architecture (SORA) developed by the Intelligent Robotics Group (IRG) at the NASA Ames Research Center. SORA relies on proven software engineering methods and technologies applied to space robotics. Based on a Service Oriented Architecture and robust middleware, SORA encompasses on-board robot control and a full suite of software tools necessary for remotely operated exploration missions. SORA has been eld tested in numerous scenarios of robotic lunar and planetary exploration. The experiments conducted by IRG with SORA exercise a large set of the constraints encountered in space applications: remote robotic assets, ight relevant science instruments, distributed operations, high network latencies and unreliable or intermittent communication links. In this paper, we present the results of these eld tests in regard to the developed architecture, and discuss its bene ts and limitations.

  1. Reversal Learning Task in Children with Autism Spectrum Disorder: A Robot-Based Approach

    Science.gov (United States)

    Costescu, Cristina A.; Vanderborght, Bram; David, Daniel O.

    2015-01-01

    Children with autism spectrum disorder (ASD) engage in highly perseverative and inflexible behaviours. Technological tools, such as robots, received increased attention as social reinforces and/or assisting tools for improving the performance of children with ASD. The aim of our study is to investigate the role of the robotic toy Keepon in a…

  2. Learning compliant manipulation through kinesthetic and tactile human-robot interaction.

    Science.gov (United States)

    Kronander, Klas; Billard, Aude

    2014-01-01

    Robot Learning from Demonstration (RLfD) has been identified as a key element for making robots useful in daily lives. A wide range of techniques has been proposed for deriving a task model from a set of demonstrations of the task. Most previous works use learning to model the kinematics of the task, and for autonomous execution the robot then relies on a stiff position controller. While many tasks can and have been learned this way, there are tasks in which controlling the position alone is insufficient to achieve the goals of the task. These are typically tasks that involve contact or require a specific response to physical perturbations. The question of how to adjust the compliance to suit the need of the task has not yet been fully treated in Robot Learning from Demonstration. In this paper, we address this issue and present interfaces that allow a human teacher to indicate compliance variations by physically interacting with the robot during task execution. We validate our approach in two different experiments on the 7 DoF Barrett WAM and KUKA LWR robot manipulators. Furthermore, we conduct a user study to evaluate the usability of our approach from a non-roboticists perspective.

  3. Robot technologies, autism and designs for learning

    DEFF Research Database (Denmark)

    Hansbøl, Mikala

    2015-01-01

    technologies involves several very different educational approaches to supporting young people’s learning and development. The paper discusses how robot technologies as learning resources have been related to the field of autism and education, and argues for a need to further expand the areas of application...... in the future, with a focus on children and young people diagnosed with autism spectrum disorders, their ICT interests and engagement in innovative and creative learning. The paper draws on international research and examples from the author’s own research into education for children and young people diagnosed...... with autism spectrum disorders, drawing on teachers’ and the students’ interests in working with ICT (e.g. robot technology)....

  4. Software for project-based learning of robot motion planning

    Science.gov (United States)

    Moll, Mark; Bordeaux, Janice; Kavraki, Lydia E.

    2013-12-01

    Motion planning is a core problem in robotics concerned with finding feasible paths for a given robot. Motion planning algorithms perform a search in the high-dimensional continuous space of robot configurations and exemplify many of the core algorithmic concepts of search algorithms and associated data structures. Motion planning algorithms can be explained in a simplified two-dimensional setting, but this masks many of the subtleties and complexities of the underlying problem. We have developed software for project-based learning of motion planning that enables deep learning. The projects that we have developed allow advanced undergraduate students and graduate students to reflect on the performance of existing textbook algorithms and their own variations on such algorithms. Formative assessment has been conducted at three institutions. The core of the software used for this teaching module is also used within the Robot Operating System, a widely adopted platform by the robotics research community. This allows for transfer of knowledge and skills to robotics research projects involving a large variety robot hardware platforms.

  5. Manifold learning in machine vision and robotics

    Science.gov (United States)

    Bernstein, Alexander

    2017-02-01

    Smart algorithms are used in Machine vision and Robotics to organize or extract high-level information from the available data. Nowadays, Machine learning is an essential and ubiquitous tool to automate extraction patterns or regularities from data (images in Machine vision; camera, laser, and sonar sensors data in Robotics) in order to solve various subject-oriented tasks such as understanding and classification of images content, navigation of mobile autonomous robot in uncertain environments, robot manipulation in medical robotics and computer-assisted surgery, and other. Usually such data have high dimensionality, however, due to various dependencies between their components and constraints caused by physical reasons, all "feasible and usable data" occupy only a very small part in high dimensional "observation space" with smaller intrinsic dimensionality. Generally accepted model of such data is manifold model in accordance with which the data lie on or near an unknown manifold (surface) of lower dimensionality embedded in an ambient high dimensional observation space; real-world high-dimensional data obtained from "natural" sources meet, as a rule, this model. The use of Manifold learning technique in Machine vision and Robotics, which discovers a low-dimensional structure of high dimensional data and results in effective algorithms for solving of a large number of various subject-oriented tasks, is the content of the conference plenary speech some topics of which are in the paper.

  6. Robot education peers in a situated primary school study: Personalisation promotes child learning.

    Science.gov (United States)

    Baxter, Paul; Ashurst, Emily; Read, Robin; Kennedy, James; Belpaeme, Tony

    2017-01-01

    The benefit of social robots to support child learning in an educational context over an extended period of time is evaluated. Specifically, the effect of personalisation and adaptation of robot social behaviour is assessed. Two autonomous robots were embedded within two matched classrooms of a primary school for a continuous two week period without experimenter supervision to act as learning companions for the children for familiar and novel subjects. Results suggest that while children in both personalised and non-personalised conditions learned, there was increased child learning of a novel subject exhibited when interacting with a robot that personalised its behaviours, with indications that this benefit extended to other class-based performance. Additional evidence was obtained suggesting that there is increased acceptance of the personalised robot peer over a non-personalised version. These results provide the first evidence in support of peer-robot behavioural personalisation having a positive influence on learning when embedded in a learning environment for an extended period of time.

  7. Robot education peers in a situated primary school study: Personalisation promotes child learning.

    Directory of Open Access Journals (Sweden)

    Paul Baxter

    Full Text Available The benefit of social robots to support child learning in an educational context over an extended period of time is evaluated. Specifically, the effect of personalisation and adaptation of robot social behaviour is assessed. Two autonomous robots were embedded within two matched classrooms of a primary school for a continuous two week period without experimenter supervision to act as learning companions for the children for familiar and novel subjects. Results suggest that while children in both personalised and non-personalised conditions learned, there was increased child learning of a novel subject exhibited when interacting with a robot that personalised its behaviours, with indications that this benefit extended to other class-based performance. Additional evidence was obtained suggesting that there is increased acceptance of the personalised robot peer over a non-personalised version. These results provide the first evidence in support of peer-robot behavioural personalisation having a positive influence on learning when embedded in a learning environment for an extended period of time.

  8. Robot education peers in a situated primary school study: Personalisation promotes child learning

    Science.gov (United States)

    Ashurst, Emily; Read, Robin; Kennedy, James; Belpaeme, Tony

    2017-01-01

    The benefit of social robots to support child learning in an educational context over an extended period of time is evaluated. Specifically, the effect of personalisation and adaptation of robot social behaviour is assessed. Two autonomous robots were embedded within two matched classrooms of a primary school for a continuous two week period without experimenter supervision to act as learning companions for the children for familiar and novel subjects. Results suggest that while children in both personalised and non-personalised conditions learned, there was increased child learning of a novel subject exhibited when interacting with a robot that personalised its behaviours, with indications that this benefit extended to other class-based performance. Additional evidence was obtained suggesting that there is increased acceptance of the personalised robot peer over a non-personalised version. These results provide the first evidence in support of peer-robot behavioural personalisation having a positive influence on learning when embedded in a learning environment for an extended period of time. PMID:28542648

  9. Human reinforcement learning subdivides structured action spaces by learning effector-specific values.

    Science.gov (United States)

    Gershman, Samuel J; Pesaran, Bijan; Daw, Nathaniel D

    2009-10-28

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.

  10. Evolutionary computation for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Wiering, M.; van Otterlo, M.

    2012-01-01

    Algorithms for evolutionary computation, which simulate the process of natural selection to solve optimization problems, are an effective tool for discovering high-performing reinforcement-learning policies. Because they can automatically find good representations, handle continuous action spaces,

  11. The Impact of Robot Tutor Nonverbal Social Behavior on Child Learning

    Directory of Open Access Journals (Sweden)

    James Kennedy

    2017-04-01

    Full Text Available Several studies have indicated that interacting with social robots in educational contexts may lead to a greater learning than interactions with computers or virtual agents. As such, an increasing amount of social human–robot interaction research is being conducted in the learning domain, particularly with children. However, it is unclear precisely what social behavior a robot should employ in such interactions. Inspiration can be taken from human–human studies; this often leads to an assumption that the more social behavior an agent utilizes, the better the learning outcome will be. We apply a nonverbal behavior metric to a series of studies in which children are taught how to identify prime numbers by a robot with various behavioral manipulations. We find a trend, which generally agrees with the pedagogy literature, but also that overt nonverbal behavior does not account for all learning differences. We discuss the impact of novelty, child expectations, and responses to social cues to further the understanding of the relationship between robot social behavior and learning. We suggest that the combination of nonverbal behavior and social cue congruency is necessary to facilitate learning.

  12. Real-time Stereoscopic 3D for E-Robotics Learning

    Directory of Open Access Journals (Sweden)

    Richard Y. Chiou

    2011-02-01

    Full Text Available Following the design and testing of a successful 3-Dimensional surveillance system, this 3D scheme has been implemented into online robotics learning at Drexel University. A real-time application, utilizing robot controllers, programmable logic controllers and sensors, has been developed in the “MET 205 Robotics and Mechatronics” class to provide the students with a better robotic education. The integration of the 3D system allows the students to precisely program the robot and execute functions remotely. Upon the students’ recommendation, polarization has been chosen to be the main platform behind the 3D robotic system. Stereoscopic calculations are carried out for calibration purposes to display the images with the highest possible comfort-level and 3D effect. The calculations are further validated by comparing the results with students’ evaluations. Due to the Internet-based feature, multiple clients have the opportunity to perform the online automation development. In the future, students, in different universities, will be able to cross-control robotic components of different types around the world. With the development of this 3D ERobotics interface, automation resources and robotic learning can be shared and enriched regardless of location.

  13. Future Challenges of Robotics and Artificial Intelligence in Nursing: What Can We Learn from Monsters in Popular Culture?

    Science.gov (United States)

    Erikson, Henrik; Salzmann-Erikson, Martin

    It is highly likely that artificial intelligence (AI) will be implemented in nursing robotics in various forms, both in medical and surgical robotic instruments, but also as different types of droids and humanoids, physical reinforcements, and also animal/pet robots. Exploring and discussing AI and robotics in nursing and health care before these tools become commonplace is of great importance. We propose that monsters in popular culture might be studied with the hope of learning about situations and relationships that generate empathic capacities in their monstrous existences. The aim of the article is to introduce the theoretical framework and assumptions behind this idea. Both robots and monsters are posthuman creations. The knowledge we present here gives ideas about how nursing science can address the postmodern, technologic, and global world to come. Monsters therefore serve as an entrance to explore technologic innovations such as AI. Analyzing when and why monsters step out of character can provide important insights into the conceptualization of caring and nursing as a science, which is important for discussing these empathic protocols, as well as more general insight into human knowledge. The relationship between caring, monsters, robotics, and AI is not as farfetched as it might seem at first glance.

  14. Human-robot cooperative movement training: Learning a novel sensory motor transformation during walking with robotic assistance-as-needed

    Directory of Open Access Journals (Sweden)

    Benitez Raul

    2007-03-01

    Full Text Available Abstract Background A prevailing paradigm of physical rehabilitation following neurologic injury is to "assist-as-needed" in completing desired movements. Several research groups are attempting to automate this principle with robotic movement training devices and patient cooperative algorithms that encourage voluntary participation. These attempts are currently not based on computational models of motor learning. Methods Here we assume that motor recovery from a neurologic injury can be modelled as a process of learning a novel sensory motor transformation, which allows us to study a simplified experimental protocol amenable to mathematical description. Specifically, we use a robotic force field paradigm to impose a virtual impairment on the left leg of unimpaired subjects walking on a treadmill. We then derive an "assist-as-needed" robotic training algorithm to help subjects overcome the virtual impairment and walk normally. The problem is posed as an optimization of performance error and robotic assistance. The optimal robotic movement trainer becomes an error-based controller with a forgetting factor that bounds kinematic errors while systematically reducing its assistance when those errors are small. As humans have a natural range of movement variability, we introduce an error weighting function that causes the robotic trainer to disregard this variability. Results We experimentally validated the controller with ten unimpaired subjects by demonstrating how it helped the subjects learn the novel sensory motor transformation necessary to counteract the virtual impairment, while also preventing them from experiencing large kinematic errors. The addition of the error weighting function allowed the robot assistance to fade to zero even though the subjects' movements were variable. We also show that in order to assist-as-needed, the robot must relax its assistance at a rate faster than that of the learning human. Conclusion The assist

  15. Human-robot cooperative movement training: learning a novel sensory motor transformation during walking with robotic assistance-as-needed.

    Science.gov (United States)

    Emken, Jeremy L; Benitez, Raul; Reinkensmeyer, David J

    2007-03-28

    A prevailing paradigm of physical rehabilitation following neurologic injury is to "assist-as-needed" in completing desired movements. Several research groups are attempting to automate this principle with robotic movement training devices and patient cooperative algorithms that encourage voluntary participation. These attempts are currently not based on computational models of motor learning. Here we assume that motor recovery from a neurologic injury can be modelled as a process of learning a novel sensory motor transformation, which allows us to study a simplified experimental protocol amenable to mathematical description. Specifically, we use a robotic force field paradigm to impose a virtual impairment on the left leg of unimpaired subjects walking on a treadmill. We then derive an "assist-as-needed" robotic training algorithm to help subjects overcome the virtual impairment and walk normally. The problem is posed as an optimization of performance error and robotic assistance. The optimal robotic movement trainer becomes an error-based controller with a forgetting factor that bounds kinematic errors while systematically reducing its assistance when those errors are small. As humans have a natural range of movement variability, we introduce an error weighting function that causes the robotic trainer to disregard this variability. We experimentally validated the controller with ten unimpaired subjects by demonstrating how it helped the subjects learn the novel sensory motor transformation necessary to counteract the virtual impairment, while also preventing them from experiencing large kinematic errors. The addition of the error weighting function allowed the robot assistance to fade to zero even though the subjects' movements were variable. We also show that in order to assist-as-needed, the robot must relax its assistance at a rate faster than that of the learning human. The assist-as-needed algorithm proposed here can limit error during the learning of a

  16. Cross-Situational Learning with Bayesian Generative Models for Multimodal Category and Word Learning in Robots

    Directory of Open Access Journals (Sweden)

    Akira Taniguchi

    2017-12-01

    Full Text Available In this paper, we propose a Bayesian generative model that can form multiple categories based on each sensory-channel and can associate words with any of the four sensory-channels (action, position, object, and color. This paper focuses on cross-situational learning using the co-occurrence between words and information of sensory-channels in complex situations rather than conventional situations of cross-situational learning. We conducted a learning scenario using a simulator and a real humanoid iCub robot. In the scenario, a human tutor provided a sentence that describes an object of visual attention and an accompanying action to the robot. The scenario was set as follows: the number of words per sensory-channel was three or four, and the number of trials for learning was 20 and 40 for the simulator and 25 and 40 for the real robot. The experimental results showed that the proposed method was able to estimate the multiple categorizations and to learn the relationships between multiple sensory-channels and words accurately. In addition, we conducted an action generation task and an action description task based on word meanings learned in the cross-situational learning scenario. The experimental results showed that the robot could successfully use the word meanings learned by using the proposed method.

  17. A Case-Study for Life-Long Learning and Adaptation in Cooperative Robot Teams

    International Nuclear Information System (INIS)

    Parker, L.E.

    1999-01-01

    While considerable progress has been made in recent years toward the development of multi-robot teams, much work remains to be done before these teams are used widely in real-world applications. Two particular needs toward this end are the development of mechanisms that enable robot teams to generate cooperative behaviors on their own, and the development of techniques that allow these teams to autonomously adapt their behavior over time as the environment or the robot team changes. This paper proposes the use of the Cooperative Multi-Robot Observation of Multiple Moving Targets (CMOMMT) application as a rich domain for studying the issues of multi-robot learning and adaptation. After discussing the need for learning and adaptation in multi-robot teams, this paper describes the CMOMMT application and its relevance to multi-robot learning. We discuss the results of the previously- developed, hand-generated algorithm for CMOMMT and the potential for learning that was discovered from the hand-generated approach. We then describe the early work that has been done (by us and others) to generate multi- robot learning techniques for the CMOMMT application, as well as our ongoing research to develop approaches that give performance as good, or better, than the hand-generated approach. The ultimate goal of this research is to develop techniques for multi-robot learning and adaptation in the CMOMMT application domain that will generalize to cooperative robot applications in other domains, thus making the practical use of multi-robot teams in a wide variety of real-world applications much closer to reality

  18. Social Intelligence for a Robot Engaging People in Cognitive Training Activities

    Directory of Open Access Journals (Sweden)

    Jeanie Chan

    2012-10-01

    Full Text Available Current research supports the use of cognitive training interventions to improve the brain functioning of both adults and children. Our work focuses on exploring the potential use of robot assistants to allow for these interventions to become more accessible. Namely, we aim to develop an intelligent, socially assistive robot that can engage individuals in person-centred cognitively stimulating activities. In this paper, we present the design of a novel control architecture for the robot Brian 2.0, which enables the robot to be a social motivator by providing assistance, encouragement and celebration during an activity. A hierarchical reinforcement learning approach is used in the architecture to allow the robot to: 1 learn appropriate assistive behaviours based on the structure of the activity, and 2 personalize an interaction based on user states. Experiments show that the control architecture is effective in determining the robot's optimal assistive behaviours during a memory game interaction.

  19. What can we learn from Robots

    DEFF Research Database (Denmark)

    Clausen, Thea Juhl Roloff

    2015-01-01

    Robot Technology is going to be a big part of the future, but robotics is also the present. We ask the question, “What can we use this technology for and does it demand a new way to think digital literacy?” One thing we do know is that we must start having focus on the needs the future indicates....... With Robot Technology in a play-and learning perspective it is being a co-producer and being experimental there will be in focus. We must have an understanding of programming language, an understanding of what happens behind the interface and how to use this in an innovative perspective to help shape...

  20. Fast Conflict Resolution Based on Reinforcement Learning in Multi-agent System

    Institute of Scientific and Technical Information of China (English)

    PIAOSonghao; HONGBingrong; CHUHaitao

    2004-01-01

    In multi-agent system where each agen thas a different goal (even the team of agents has the same goal), agents must be able to resolve conflicts arising in the process of achieving their goal. Many researchers presented methods for conflict resolution, e.g., Reinforcement learning (RL), but the conventional RL requires a large computation cost because every agent must learn, at the same time the overlap of actions selected by each agent results in local conflict. Therefore in this paper, we propose a novel method to solve these problems. In order to deal with the conflict within the multi-agent system, the concept of potential field function based Action selection priority level (ASPL) is brought forward. In this method, all kinds of environment factor that may have influence on the priority are effectively computed with the potential field function. So the priority to access the local resource can be decided rapidly. By avoiding the complex coordination mechanism used in general multi-agent system, the conflict in multi-agent system is settled more efficiently. Our system consists of RL with ASPL module and generalized rules module. Using ASPL, RL module chooses a proper cooperative behavior, and generalized rule module can accelerate the learning process. By applying the proposed method to Robot Soccer, the learning process can be accelerated. The results of simulation and real experiments indicate the effectiveness of the method.

  1. Human reinforcement learning subdivides structured action spaces by learning effector-specific values

    OpenAIRE

    Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.

    2009-01-01

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable, due to the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning – such as prediction error signals for action valuation associated with dopamine and the striatum – can cope with this “curse of dimensionality...

  2. Reinforcement Learning Based Novel Adaptive Learning Framework for Smart Grid Prediction

    Directory of Open Access Journals (Sweden)

    Tian Li

    2017-01-01

    Full Text Available Smart grid is a potential infrastructure to supply electricity demand for end users in a safe and reliable manner. With the rapid increase of the share of renewable energy and controllable loads in smart grid, the operation uncertainty of smart grid has increased briskly during recent years. The forecast is responsible for the safety and economic operation of the smart grid. However, most existing forecast methods cannot account for the smart grid due to the disabilities to adapt to the varying operational conditions. In this paper, reinforcement learning is firstly exploited to develop an online learning framework for the smart grid. With the capability of multitime scale resolution, wavelet neural network has been adopted in the online learning framework to yield reinforcement learning and wavelet neural network (RLWNN based adaptive learning scheme. The simulations on two typical prediction problems in smart grid, including wind power prediction and load forecast, validate the effectiveness and the scalability of the proposed RLWNN based learning framework and algorithm.

  3. Acquisition of earthworm-like movement patterns of many-segmented peristaltic crawling robots

    Directory of Open Access Journals (Sweden)

    Norihiko Saga

    2016-09-01

    Full Text Available In recent years, attention has been increasingly devoted to the development of rescue robots that can protect humans from the inherent risks of rescue work. Particularly, anticipated is the development of a robot that can move deeply through small spaces. We have devoted our attention to peristalsis, the movement mechanism used by earthworms. A reinforcement learning technique used for the derivation of the robot movement pattern, Q-learning, was used to develop a three-segmented peristaltic crawling robot with a motor drive. Characteristically, peristalsis can provide movement capability if at least three segments work, even if a segmented part does not function. Therefore, we had intended to derive the movement pattern of many-segmented peristaltic crawling robots using Q-learning. However, because of the necessary increase in calculations, in the case of many segments, Q-learning cannot be used because of insufficient memory. Therefore, we devoted our attention to a learning method called Actor–Critic, which can be implemented with low memory. Because Actor-Critic methods are TD methods that have a separate memory structure to explicitly represent the policy independent of the value function. Using it, we examined the movement patterns of six-segmented peristaltic crawling robots.

  4. Learning Spatial Object Localization from Vision on a Humanoid Robot

    Directory of Open Access Journals (Sweden)

    Jürgen Leitner

    2012-12-01

    Full Text Available We present a combined machine learning and computer vision approach for robots to localize objects. It allows our iCub humanoid to quickly learn to provide accurate 3D position estimates (in the centimetre range of objects seen. Biologically inspired approaches, such as Artificial Neural Networks (ANN and Genetic Programming (GP, are trained to provide these position estimates using the two cameras and the joint encoder readings. No camera calibration or explicit knowledge of the robot's kinematic model is needed. We find that ANN and GP are not just faster and have lower complexity than traditional techniques, but also learn without the need for extensive calibration procedures. In addition, the approach is localizing objects robustly, when placed in the robot's workspace at arbitrary positions, even while the robot is moving its torso, head and eyes.

  5. Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models.

    Science.gov (United States)

    Najnin, Shamima; Banerjee, Bonny

    2018-01-01

    Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The "novel words to novel objects" language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task.

  6. Evolutionary online behaviour learning and adaptation in real robots.

    Science.gov (United States)

    Silva, Fernando; Correia, Luís; Christensen, Anders Lyhne

    2017-07-01

    Online evolution of behavioural control on real robots is an open-ended approach to autonomous learning and adaptation: robots have the potential to automatically learn new tasks and to adapt to changes in environmental conditions, or to failures in sensors and/or actuators. However, studies have so far almost exclusively been carried out in simulation because evolution in real hardware has required several days or weeks to produce capable robots. In this article, we successfully evolve neural network-based controllers in real robotic hardware to solve two single-robot tasks and one collective robotics task. Controllers are evolved either from random solutions or from solutions pre-evolved in simulation. In all cases, capable solutions are found in a timely manner (1 h or less). Results show that more accurate simulations may lead to higher-performing controllers, and that completing the optimization process in real robots is meaningful, even if solutions found in simulation differ from solutions in reality. We furthermore demonstrate for the first time the adaptive capabilities of online evolution in real robotic hardware, including robots able to overcome faults injected in the motors of multiple units simultaneously, and to modify their behaviour in response to changes in the task requirements. We conclude by assessing the contribution of each algorithmic component on the performance of the underlying evolutionary algorithm.

  7. Instructional control of reinforcement learning: a behavioral and neurocomputational investigation.

    Science.gov (United States)

    Doll, Bradley B; Jacobs, W Jake; Sanfey, Alan G; Frank, Michael J

    2009-11-24

    Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.

  8. Enhancement of Online Robotics Learning Using Real-Time 3D Visualization Technology

    Directory of Open Access Journals (Sweden)

    Richard Chiou

    2010-06-01

    Full Text Available This paper discusses a real-time e-Lab Learning system based on the integration of 3D visualization technology with a remote robotic laboratory. With the emergence and development of the Internet field, online learning is proving to play a significant role in the upcoming era. In an effort to enhance Internet-based learning of robotics and keep up with the rapid progression of technology, a 3- Dimensional scheme of viewing the robotic laboratory has been introduced in addition to the remote controlling of the robots. The uniqueness of the project lies in making this process Internet-based, and remote robot operated and visualized in 3D. This 3D system approach provides the students with a more realistic feel of the 3D robotic laboratory even though they are working remotely. As a result, the 3D visualization technology has been tested as part of a laboratory in the MET 205 Robotics and Mechatronics class and has received positive feedback by most of the students. This type of research has introduced a new level of realism and visual communications to online laboratory learning in a remote classroom.

  9. An architecture for an autonomous learning robot

    Science.gov (United States)

    Tillotson, Brian

    1988-01-01

    An autonomous learning device must solve the example bounding problem, i.e., it must divide the continuous universe into discrete examples from which to learn. We describe an architecture which incorporates an example bounder for learning. The architecture is implemented in the GPAL program. An example run with a real mobile robot shows that the program learns and uses new causal, qualitative, and quantitative relationships.

  10. Reinforcement Learning Based Artificial Immune Classifier

    Directory of Open Access Journals (Sweden)

    Mehmet Karakose

    2013-01-01

    Full Text Available One of the widely used methods for classification that is a decision-making process is artificial immune systems. Artificial immune systems based on natural immunity system can be successfully applied for classification, optimization, recognition, and learning in real-world problems. In this study, a reinforcement learning based artificial immune classifier is proposed as a new approach. This approach uses reinforcement learning to find better antibody with immune operators. The proposed new approach has many contributions according to other methods in the literature such as effectiveness, less memory cell, high accuracy, speed, and data adaptability. The performance of the proposed approach is demonstrated by simulation and experimental results using real data in Matlab and FPGA. Some benchmark data and remote image data are used for experimental results. The comparative results with supervised/unsupervised based artificial immune system, negative selection classifier, and resource limited artificial immune classifier are given to demonstrate the effectiveness of the proposed new method.

  11. Iterative learning control with sampled-data feedback for robot manipulators

    Directory of Open Access Journals (Sweden)

    Delchev Kamen

    2014-09-01

    Full Text Available This paper deals with the improvement of the stability of sampled-data (SD feedback control for nonlinear multiple-input multiple-output time varying systems, such as robotic manipulators, by incorporating an off-line model based nonlinear iterative learning controller. The proposed scheme of nonlinear iterative learning control (NILC with SD feedback is applicable to a large class of robots because the sampled-data feedback is required for model based feedback controllers, especially for robotic manipulators with complicated dynamics (6 or 7 DOF, or more, while the feedforward control from the off-line iterative learning controller should be assumed as a continuous one. The robustness and convergence of the proposed NILC law with SD feedback is proven, and the derived sufficient condition for convergence is the same as the condition for a NILC with a continuous feedback control input. With respect to the presented NILC algorithm applied to a virtual PUMA 560 robot, simulation results are presented in order to verify convergence and applicability of the proposed learning controller with SD feedback controller attached

  12. Online reinforcement learning control for aerospace systems

    NARCIS (Netherlands)

    Zhou, Y.

    2018-01-01

    Reinforcement Learning (RL) methods are relatively new in the field of aerospace guidance, navigation, and control. This dissertation aims to exploit RL methods to improve the autonomy and online learning of aerospace systems with respect to the a priori unknown system and environment, dynamical

  13. Teaching Functional Patterns through Robotic Applications

    Directory of Open Access Journals (Sweden)

    J. Boender

    2016-11-01

    Full Text Available We present our approach to teaching functional programming to First Year Computer Science students at Middlesex University through projects in robotics. A holistic approach is taken to the curriculum, emphasising the connections between different subject areas. A key part of the students' learning is through practical projects that draw upon and integrate the taught material. To support these, we developed the Middlesex Robotic plaTfOrm (MIRTO, an open-source platform built using Raspberry Pi, Arduino, HUB-ee wheels and running Racket (a LISP dialect. In this paper we present the motivations for our choices and explain how a number of concepts of functional programming may be employed when programming robotic applications. We present some students' work with robotics projects: we consider the use of robotics projects to have been a success, both for their value in reinforcing students' understanding of programming concepts and for their value in motivating the students.

  14. Multi-agent machine learning a reinforcement approach

    CERN Document Server

    Schwartz, H M

    2014-01-01

    The book begins with a chapter on traditional methods of supervised learning, covering recursive least squares learning, mean square error methods, and stochastic approximation. Chapter 2 covers single agent reinforcement learning. Topics include learning value functions, Markov games, and TD learning with eligibility traces. Chapter 3 discusses two player games including two player matrix games with both pure and mixed strategies. Numerous algorithms and examples are presented. Chapter 4 covers learning in multi-player games, stochastic games, and Markov games, focusing on learning multi-pla

  15. Authoring Robot-Assisted Instructional Materials for Improving Learning Performance and Motivation in EFL Classrooms

    Science.gov (United States)

    Hong, Zeng-Wei; Huang, Yueh-Min; Hsu, Marie; Shen, Wei-Wei

    2016-01-01

    Anthropomorphized robots are regarded as beneficial tools in education due to their capabilities of improving teaching effectiveness and learning motivation. Therefore, one major trend of research, known as Robot- Assisted Language Learning (RALL), is trying to develop robots to support teaching and learning English as a foreign language (EFL). As…

  16. Robots Learn to Recognize Individuals from Imitative Encounters with People and Avatars

    Science.gov (United States)

    Boucenna, Sofiane; Cohen, David; Meltzoff, Andrew N.; Gaussier, Philippe; Chetouani, Mohamed

    2016-02-01

    Prior to language, human infants are prolific imitators. Developmental science grounds infant imitation in the neural coding of actions, and highlights the use of imitation for learning from and about people. Here, we used computational modeling and a robot implementation to explore the functional value of action imitation. We report 3 experiments using a mutual imitation task between robots, adults, typically developing children, and children with Autism Spectrum Disorder. We show that a particular learning architecture - specifically one combining artificial neural nets for (i) extraction of visual features, (ii) the robot’s motor internal state, (iii) posture recognition, and (iv) novelty detection - is able to learn from an interactive experience involving mutual imitation. This mutual imitation experience allowed the robot to recognize the interactive agent in a subsequent encounter. These experiments using robots as tools for modeling human cognitive development, based on developmental theory, confirm the promise of developmental robotics. Additionally, findings illustrate how person recognition may emerge through imitative experience, intercorporeal mapping, and statistical learning.

  17. Enhancement of Online Robotics Learning Using Real-Time 3D Visualization Technology

    OpenAIRE

    Richard Chiou; Yongjin (james) Kwon; Tzu-Liang (bill) Tseng; Robin Kizirian; Yueh-Ting Yang

    2010-01-01

    This paper discusses a real-time e-Lab Learning system based on the integration of 3D visualization technology with a remote robotic laboratory. With the emergence and development of the Internet field, online learning is proving to play a significant role in the upcoming era. In an effort to enhance Internet-based learning of robotics and keep up with the rapid progression of technology, a 3- Dimensional scheme of viewing the robotic laboratory has been introduced in addition to the remote c...

  18. Structure identification in fuzzy inference using reinforcement learning

    Science.gov (United States)

    Berenji, Hamid R.; Khedkar, Pratap

    1993-01-01

    In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.

  19. Reinforcement Learning Based on the Bayesian Theorem for Electricity Markets Decision Support

    DEFF Research Database (Denmark)

    Sousa, Tiago; Pinto, Tiago; Praca, Isabel

    2014-01-01

    This paper presents the applicability of a reinforcement learning algorithm based on the application of the Bayesian theorem of probability. The proposed reinforcement learning algorithm is an advantageous and indispensable tool for ALBidS (Adaptive Learning strategic Bidding System), a multi...

  20. Training in robotics: The learning curve and contemporary concepts in training.

    Science.gov (United States)

    Bach, Christian; Miernik, Arkadiusz; Schönthaler, Martin

    2014-03-01

    To define the learning curve of robot-assisted laparoscopic surgery for prostatectomy (RALP) and upper tract procedures, and show the differences between the classical approach to training and the new concept of parallel learning. This mini-review is based on the results of a Medline search using the keywords 'da Vinci', 'robot-assisted laparoscopic surgery', 'training', 'teaching' and 'learning curve'. For RALP and robot-assisted upper tract surgery, a learning curve of 8-150 procedures is quoted, with most articles proposing that 30-40 cases are needed to carry out the procedure safely. There is no consensus about which endpoints should be measured. In the traditional proctored training model, the surgeon learns the procedure linearly, following the sequential order of the surgical steps. A more recent approach is to specify the relative difficulty of each step and to train the surgeon simultaneously in several steps of equal difficulty. The entire procedure is only performed after all the steps are mastered in a timely manner. Recently, a 'warm-up' before robotic surgery has been shown to be beneficial for successful surgery in the operating room. There is no clear definition of the duration of the effective learning curve for RALP and robotic upper tract surgery. The concept of stepwise, parallel learning has the potential to accelerate the learning process and to make sure that initial cases are not too long. It can also be assumed that a preoperative 'warm up' could help significantly to improve the progress of the trainee.

  1. Using a board game to reinforce learning.

    Science.gov (United States)

    Yoon, Bona; Rodriguez, Leslie; Faselis, Charles J; Liappis, Angelike P

    2014-03-01

    Experiential gaming strategies offer a variation on traditional learning. A board game was used to present synthesized content of fundamental catheter care concepts and reinforce evidence-based practices relevant to nursing. Board games are innovative educational tools that can enhance active learning. Copyright 2014, SLACK Incorporated.

  2. Embodied Computation: An Active-Learning Approach to Mobile Robotics Education

    Science.gov (United States)

    Riek, L. D.

    2013-01-01

    This paper describes a newly designed upper-level undergraduate and graduate course, Autonomous Mobile Robots. The course employs active, cooperative, problem-based learning and is grounded in the fundamental computational problems in mobile robotics defined by Dudek and Jenkin. Students receive a broad survey of robotics through lectures, weekly…

  3. "Notice of Violation of IEEE Publication Principles" Multiobjective Reinforcement Learning: A Comprehensive Overview.

    Science.gov (United States)

    Liu, Chunming; Xu, Xin; Hu, Dewen

    2013-04-29

    Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.

  4. Exploiting Best-Match Equations for Efficient Reinforcement Learning

    NARCIS (Netherlands)

    van Seijen, Harm; Whiteson, Shimon; van Hasselt, Hado; Wiering, Marco

    This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of model-free methods. Best-match learning works by approximating the solution to a set of best-match equations,

  5. Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing

    OpenAIRE

    Le, Minh; Fokkens, Antske

    2017-01-01

    Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this paper, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error propagation. Reinforcement learning improves accuracy of both labeled and unlabeled dependencies of the Stanford Neural Dependency Parser, a high performance greedy parser, while maintaining its eff...

  6. Learning-based identification and iterative learning control of direct-drive robots

    NARCIS (Netherlands)

    Bukkems, B.H.M.; Kostic, D.; Jager, de A.G.; Steinbuch, M.

    2005-01-01

    A combination of model-based and Iterative Learning Control is proposed as a method to achieve high-quality motion control of direct-drive robots in repetitive motion tasks. We include both model-based and learning components in the total control law, as their individual properties influence the

  7. Longitudinal investigation on learned helplessness tested under negative and positive reinforcement involving stimulus control.

    Science.gov (United States)

    Oliveira, Emileane C; Hunziker, Maria Helena

    2014-07-01

    In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.

  8. Adaptive Trajectory Tracking Control using Reinforcement Learning for Quadrotor

    Directory of Open Access Journals (Sweden)

    Wenjie Lou

    2016-02-01

    Full Text Available Inaccurate system parameters and unpredicted external disturbances affect the performance of non-linear controllers. In this paper, a new adaptive control algorithm under the reinforcement framework is proposed to stabilize a quadrotor helicopter. Based on a command-filtered non-linear control algorithm, adaptive elements are added and learned by policy-search methods. To predict the inaccurate system parameters, a new kernel-based regression learning method is provided. In addition, Policy learning by Weighting Exploration with the Returns (PoWER and Return Weighted Regression (RWR are utilized to learn the appropriate parameters for adaptive elements in order to cancel the effect of external disturbance. Furthermore, numerical simulations under several conditions are performed, and the ability of adaptive trajectory-tracking control with reinforcement learning are demonstrated.

  9. STDP-based behavior learning on the TriBot robot

    Science.gov (United States)

    Arena, P.; De Fiore, S.; Patané, L.; Pollino, M.; Ventura, C.

    2009-05-01

    This paper describes a correlation-based navigation algorithm, based on an unsupervised learning paradigm for spiking neural networks, called Spike Timing Dependent Plasticity (STDP). This algorithm was implemented on a new bio-inspired hybrid mini-robot called TriBot to learn and increase its behavioral capabilities. In fact correlation based algorithms have been found to explain many basic behaviors in simple animals. The main interesting consequence of STDP is that the system is able to learn high-level sensor features, based on a set of basic reflexes, depending on some low-level sensor inputs. TriBot is composed of 3 modules, the first two being identical and inspired by the Whegs hybrid robot. The peculiar characteristics of the robot consists in the innovative shape of the three-spoke appendages that allow to increase stability of the structure. The last module is composed of two standard legs with 3 degrees of freedom each. Thanks to the cooperation among these modules, TriBot is able to face with irregular terrains overcoming potential deadlock situations, to climb high obstacles compared to its size and to manipulate objects. Robot experiments will be reported to demonstrate the potentiality and the effectiveness of the approach.

  10. Robot Learning from Demonstration: A Task-level Planning Approach

    Directory of Open Access Journals (Sweden)

    Staffan Ekvall

    2008-09-01

    Full Text Available In this paper, we deal with the problem of learning by demonstration, task level learning and planning for robotic applications that involve object manipulation. Preprogramming robots for execution of complex domestic tasks such as setting a dinner table is of little use, since the same order of subtasks may not be conceivable in the run time due to the changed state of the world. In our approach, we aim to learn the goal of the task and use a task planner to reach the goal given different initial states of the world. For some tasks, there are underlying constraints that must be fulfille, and knowing just the final goal is not sufficient. We propose two techniques for constraint identification. In the first case, the teacher can directly instruct the system about the underlying constraints. In the second case, the constraints are identified by the robot itself based on multiple observations. The constraints are then considered in the planning phase, allowing the task to be executed without violating any of them. We evaluate our work on a real robot performing pick-and-place tasks.

  11. Enriching behavioral ecology with reinforcement learning methods.

    Science.gov (United States)

    Frankenhuis, Willem E; Panchanathan, Karthik; Barto, Andrew G

    2018-02-13

    This article focuses on the division of labor between evolution and development in solving sequential, state-dependent decision problems. Currently, behavioral ecologists tend to use dynamic programming methods to study such problems. These methods are successful at predicting animal behavior in a variety of contexts. However, they depend on a distinct set of assumptions. Here, we argue that behavioral ecology will benefit from drawing more than it currently does on a complementary collection of tools, called reinforcement learning methods. These methods allow for the study of behavior in highly complex environments, which conventional dynamic programming methods do not feasibly address. In addition, reinforcement learning methods are well-suited to studying how biological mechanisms solve developmental and learning problems. For instance, we can use them to study simple rules that perform well in complex environments. Or to investigate under what conditions natural selection favors fixed, non-plastic traits (which do not vary across individuals), cue-driven-switch plasticity (innate instructions for adaptive behavioral development based on experience), or developmental selection (the incremental acquisition of adaptive behavior based on experience). If natural selection favors developmental selection, which includes learning from environmental feedback, we can also make predictions about the design of reward systems. Our paper is written in an accessible manner and for a broad audience, though we believe some novel insights can be drawn from our discussion. We hope our paper will help advance the emerging bridge connecting the fields of behavioral ecology and reinforcement learning. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  12. Learning in robotic manipulation: The role of dimensionality reduction in policy search methods. Comment on "Hand synergies: Integration of robotics and neuroscience for understanding the control of biological and artificial hands" by Marco Santello et al.

    Science.gov (United States)

    Ficuciello, Fanny; Siciliano, Bruno

    2016-07-01

    learning into control naturally leads to relaxing the above requirements through the adoption of coordinated motion patterns and sensory-motor synergies as useful tools leading to a problem of reduced dimension. To this purpose, model-based control strategies relying on synergistic models of manipulation activities learned from human experience can be integrated with real-time learning from actions strategies [5]. In [6] a classification of learning strategies for robotics is provided, while the difference between imitation learning and reinforcement learning (RL) is highlighted in [7]. From recent research in the field [8,9], it seems that RL represents the future toward autonomous and intelligent robots since it provides learning capabilities as those of humans, i.e. based on exploration and trial-and-error policies. In this context, suitable policy search methods to be implemented in a synergy-based framework are to be sought in order to reduce the search space dimension while guaranteeing the convergence and efficiency of the learning algorithm.

  13. Robotic Mitral Valve Repair: The Learning Curve.

    Science.gov (United States)

    Goodman, Avi; Koprivanac, Marijan; Kelava, Marta; Mick, Stephanie L; Gillinov, A Marc; Rajeswaran, Jeevanantham; Brzezinski, Anna; Blackstone, Eugene H; Mihaljevic, Tomislav

    Adoption of robotic mitral valve surgery has been slow, likely in part because of its perceived technical complexity and a poorly understood learning curve. We sought to correlate changes in technical performance and outcome with surgeon experience in the "learning curve" part of our series. From 2006 to 2011, two surgeons undertook robotically assisted mitral valve repair in 458 patients (intent-to-treat); 404 procedures were completed entirely robotically (as-treated). Learning curves were constructed by modeling surgical sequence number semiparametrically with flexible penalized spline smoothing best-fit curves. Operative efficiency, reflecting technical performance, improved for (1) operating room time for case 1 to cases 200 (early experience) and 400 (later experience), from 414 to 364 to 321 minutes (12% and 22% decrease, respectively), (2) cardiopulmonary bypass time, from 148 to 102 to 91 minutes (31% and 39% decrease), and (3) myocardial ischemic time, from 119 to 75 to 68 minutes (37% and 43% decrease). Composite postoperative complications, reflecting safety, decreased from 17% to 6% to 2% (63% and 85% decrease). Intensive care unit stay decreased from 32 to 28 to 24 hours (13% and 25% decrease). Postoperative stay fell from 5.2 to 4.5 to 3.8 days (13% and 27% decrease). There were no in-hospital deaths. Predischarge mitral regurgitation of less than 2+, reflecting effectiveness, was achieved in 395 (97.8%), without correlation to experience; return-to-work times did not change substantially with experience. Technical efficiency of robotic mitral valve repair improves with experience and permits its safe and effective conduct.

  14. A neural network-based exploratory learning and motor planning system for co-robots

    Directory of Open Access Journals (Sweden)

    Byron V Galbraith

    2015-07-01

    Full Text Available Collaborative robots, or co-robots, are semi-autonomous robotic agents designed to work alongside humans in shared workspaces. To be effective, co-robots require the ability to respond and adapt to dynamic scenarios encountered in natural environments. One way to achieve this is through exploratory learning, or learning by doing, an unsupervised method in which co-robots are able to build an internal model for motor planning and coordination based on real-time sensory inputs. In this paper, we present an adaptive neural network-based system for co-robot control that employs exploratory learning to achieve the coordinated motor planning needed to navigate toward, reach for, and grasp distant objects. To validate this system we used the 11-degrees-of-freedom RoPro Calliope mobile robot. Through motor babbling of its wheels and arm, the Calliope learned how to relate visual and proprioceptive information to achieve hand-eye-body coordination. By continually evaluating sensory inputs and externally provided goal directives, the Calliope was then able to autonomously select the appropriate wheel and joint velocities needed to perform its assigned task, such as following a moving target or retrieving an indicated object.

  15. A neural network-based exploratory learning and motor planning system for co-robots.

    Science.gov (United States)

    Galbraith, Byron V; Guenther, Frank H; Versace, Massimiliano

    2015-01-01

    Collaborative robots, or co-robots, are semi-autonomous robotic agents designed to work alongside humans in shared workspaces. To be effective, co-robots require the ability to respond and adapt to dynamic scenarios encountered in natural environments. One way to achieve this is through exploratory learning, or "learning by doing," an unsupervised method in which co-robots are able to build an internal model for motor planning and coordination based on real-time sensory inputs. In this paper, we present an adaptive neural network-based system for co-robot control that employs exploratory learning to achieve the coordinated motor planning needed to navigate toward, reach for, and grasp distant objects. To validate this system we used the 11-degrees-of-freedom RoPro Calliope mobile robot. Through motor babbling of its wheels and arm, the Calliope learned how to relate visual and proprioceptive information to achieve hand-eye-body coordination. By continually evaluating sensory inputs and externally provided goal directives, the Calliope was then able to autonomously select the appropriate wheel and joint velocities needed to perform its assigned task, such as following a moving target or retrieving an indicated object.

  16. How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

    Science.gov (United States)

    Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C

    2014-03-01

    Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward

  17. What pupils can learn from working with robotic direct manipulation environments

    NARCIS (Netherlands)

    Lou Slangen; Hanno van Keulen; Koeno Gravemeijer

    2011-01-01

    This study investigates what pupils aged 10-12 can learn from working with robots, assuming that understanding robotics is a sign of technological literacy. We conducted cognitive and conceptual analysis to develop a frame of reference for determining pupils' understanding of robotics. Four

  18. What pupils can learn from working with robotic direct manipulation environments

    NARCIS (Netherlands)

    Lou Slangen; Hanno van Keulen; Koeno Gravemeijer

    2010-01-01

    This study investigates what pupils aged 10-12 can learn from working with robots, assuming that understanding robotics is a sign of technological literacy. We conducted cognitive and conceptual analysis to develop a frame of reference for determining pupils' understanding of robotics. Four

  19. Flow Navigation by Smart Microswimmers via Reinforcement Learning

    Science.gov (United States)

    Colabrese, Simona; Biferale, Luca; Celani, Antonio; Gustavsson, Kristian

    2017-11-01

    We have numerically modeled active particles which are able to acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. We show that those swimmers can learn effective strategies just by experience, using a reinforcement learning algorithm. As an example, we focus on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, exploiting the underlying flow whenever possible. The reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This work paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. ERC AdG NewTURB 339032.

  20. The drift diffusion model as the choice rule in reinforcement learning.

    Science.gov (United States)

    Pedersen, Mads Lund; Frank, Michael J; Biele, Guido

    2017-08-01

    Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.

  1. Cognitive-Developmental Learning for a Humanoid Robot: A Caregiver's Gift

    National Research Council Canada - National Science Library

    Arsenio, Artur M

    2004-01-01

    The goal of this work is to build a cognitive system for the humanoid robot, Cog, that exploits human caregivers as catalysts to perceive and learn about actions, objects, scenes, people, and the robot itself...

  2. A Combination of Machine Learning and Cerebellar Models for the Motor Control and Learning of a Modular Robot

    DEFF Research Database (Denmark)

    Baira Ojeda, Ismael; Tolu, Silvia; Pacheco, Moises

    2017-01-01

    We scaled up a bio-inspired control architecture for the motor control and motor learning of a real modular robot. In our approach, the Locally Weighted Projection Regression algorithm (LWPR) and a cerebellar microcircuit coexist, forming a Unit Learning Machine. The LWPR optimizes the input space...... and learns the internal model of a single robot module to command the robot to follow a desired trajectory with its end-effector. The cerebellar microcircuit refines the LWPR output delivering corrective commands. We contrasted distinct cerebellar circuits including analytical models and spiking models...

  3. Embedded Incremental Feature Selection for Reinforcement Learning

    Science.gov (United States)

    2012-05-01

    Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature

  4. Flexible Heuristic Dynamic Programming for Reinforcement Learning in Quadrotors

    NARCIS (Netherlands)

    Helmer, Alexander; de Visser, C.C.; van Kampen, E.

    2018-01-01

    Reinforcement learning is a paradigm for learning decision-making tasks from interaction with the environment. Function approximators solve a part of the curse of dimensionality when learning in high-dimensional state and/or action spaces. It can be a time-consuming process to learn a good policy in

  5. Working Memory and Reinforcement Schedule Jointly Determine Reinforcement Learning in Children: Potential Implications for Behavioral Parent Training

    Directory of Open Access Journals (Sweden)

    Elien Segers

    2018-03-01

    Full Text Available Introduction: Behavioral Parent Training (BPT is often provided for childhood psychiatric disorders. These disorders have been shown to be associated with working memory impairments. BPT is based on operant learning principles, yet how operant principles shape behavior (through the partial reinforcement (PRF extinction effect, i.e., greater resistance to extinction that is created when behavior is reinforced partially rather than continuously and the potential role of working memory therein is scarcely studied in children. This study explored the PRF extinction effect and the role of working memory therein using experimental tasks in typically developing children.Methods: Ninety-seven children (age 6–10 completed a working memory task and an operant learning task, in which children acquired a response-sequence rule under either continuous or PRF (120 trials, followed by an extinction phase (80 trials. Data of 88 children were used for analysis.Results: The PRF extinction effect was confirmed: We observed slower acquisition and extinction in the PRF condition as compared to the continuous reinforcement (CRF condition. Working memory was negatively related to acquisition but not extinction performance.Conclusion: Both reinforcement contingencies and working memory relate to acquisition performance. Potential implications for BPT are that decreasing working memory load may enhance the chance of optimally learning through reinforcement.

  6. Humanoid Cognitive Robots That Learn by Imitating: Implications for Consciousness Studies

    Directory of Open Access Journals (Sweden)

    James A. Reggia

    2018-01-01

    Full Text Available While the concept of a conscious machine is intriguing, producing such a machine remains controversial and challenging. Here, we describe how our work on creating a humanoid cognitive robot that learns to perform tasks via imitation learning relates to this issue. Our discussion is divided into three parts. First, we summarize our previous framework for advancing the understanding of the nature of phenomenal consciousness. This framework is based on identifying computational correlates of consciousness. Second, we describe a cognitive robotic system that we recently developed that learns to perform tasks by imitating human-provided demonstrations. This humanoid robot uses cause–effect reasoning to infer a demonstrator’s intentions in performing a task, rather than just imitating the observed actions verbatim. In particular, its cognitive components center on top-down control of a working memory that retains the explanatory interpretations that the robot constructs during learning. Finally, we describe our ongoing work that is focused on converting our robot’s imitation learning cognitive system into purely neurocomputational form, including both its low-level cognitive neuromotor components, its use of working memory, and its causal reasoning mechanisms. Based on our initial results, we argue that the top-down cognitive control of working memory, and in particular its gating mechanisms, is an important potential computational correlate of consciousness in humanoid robots. We conclude that developing high-level neurocognitive control systems for cognitive robots and using them to search for computational correlates of consciousness provides an important approach to advancing our understanding of consciousness, and that it provides a credible and achievable route to ultimately developing a phenomenally conscious machine.

  7. Learning probabilistic features for robotic navigation using laser sensors.

    Directory of Open Access Journals (Sweden)

    Fidel Aznar

    Full Text Available SLAM is a popular task used by robots and autonomous vehicles to build a map of an unknown environment and, at the same time, to determine their location within the map. This paper describes a SLAM-based, probabilistic robotic system able to learn the essential features of different parts of its environment. Some previous SLAM implementations had computational complexities ranging from O(Nlog(N to O(N(2, where N is the number of map features. Unlike these methods, our approach reduces the computational complexity to O(N by using a model to fuse the information from the sensors after applying the Bayesian paradigm. Once the training process is completed, the robot identifies and locates those areas that potentially match the sections that have been previously learned. After the training, the robot navigates and extracts a three-dimensional map of the environment using a single laser sensor. Thus, it perceives different sections of its world. In addition, in order to make our system able to be used in a low-cost robot, low-complexity algorithms that can be easily implemented on embedded processors or microcontrollers are used.

  8. Learning probabilistic features for robotic navigation using laser sensors.

    Science.gov (United States)

    Aznar, Fidel; Pujol, Francisco A; Pujol, Mar; Rizo, Ramón; Pujol, María-José

    2014-01-01

    SLAM is a popular task used by robots and autonomous vehicles to build a map of an unknown environment and, at the same time, to determine their location within the map. This paper describes a SLAM-based, probabilistic robotic system able to learn the essential features of different parts of its environment. Some previous SLAM implementations had computational complexities ranging from O(Nlog(N)) to O(N(2)), where N is the number of map features. Unlike these methods, our approach reduces the computational complexity to O(N) by using a model to fuse the information from the sensors after applying the Bayesian paradigm. Once the training process is completed, the robot identifies and locates those areas that potentially match the sections that have been previously learned. After the training, the robot navigates and extracts a three-dimensional map of the environment using a single laser sensor. Thus, it perceives different sections of its world. In addition, in order to make our system able to be used in a low-cost robot, low-complexity algorithms that can be easily implemented on embedded processors or microcontrollers are used.

  9. Speeding up the learning of robot kinematics through function decomposition.

    Science.gov (United States)

    Ruiz de Angulo, Vicente; Torras, Carme

    2005-11-01

    The main drawback of using neural networks or other example-based learning procedures to approximate the inverse kinematics (IK) of robot arms is the high number of training samples (i.e., robot movements) required to attain an acceptable precision. We propose here a trick, valid for most industrial robots, that greatly reduces the number of movements needed to learn or relearn the IK to a given accuracy. This trick consists in expressing the IK as a composition of learnable functions, each having half the dimensionality of the original mapping. Off-line and on-line training schemes to learn these component functions are also proposed. Experimental results obtained by using nearest neighbors and parameterized self-organizing map, with and without the decomposition, show that the time savings granted by the proposed scheme grow polynomially with the precision required.

  10. What Pupils Can Learn from Working with Robotic Direct Manipulation Environments

    Science.gov (United States)

    Slangen, Lou; van Keulen, Hanno; Gravemeijer, Koeno

    2011-01-01

    This study investigates what pupils aged 10-12 can learn from working with robots, assuming that understanding robotics is a sign of technological literacy. We conducted cognitive and conceptual analysis to develop a frame of reference for determining pupils' understanding of robotics. Four perspectives were distinguished with increasing…

  11. A Contest-Oriented Project for Learning Intelligent Mobile Robots

    Science.gov (United States)

    Huang, Hsin-Hsiung; Su, Juing-Huei; Lee, Chyi-Shyong

    2013-01-01

    A contest-oriented project for undergraduate students to learn implementation skills and theories related to intelligent mobile robots is presented in this paper. The project, related to Micromouse, Robotrace (Robotrace is the title of Taiwanese and Japanese robot races), and line-maze contests was developed by the embedded control system research…

  12. Systems control with generalized probabilistic fuzzy-reinforcement learning

    NARCIS (Netherlands)

    Hinojosa, J.; Nefti, S.; Kaymak, U.

    2011-01-01

    Reinforcement learning (RL) is a valuable learning method when the systems require a selection of control actions whose consequences emerge over long periods for which input-output data are not available. In most combinations of fuzzy systems and RL, the environment is considered to be

  13. Efficient abstraction selection in reinforcement learning

    NARCIS (Netherlands)

    Seijen, H. van; Whiteson, S.; Kester, L.

    2013-01-01

    This paper introduces a novel approach for abstraction selection in reinforcement learning problems modelled as factored Markov decision processes (MDPs), for which a state is described via a set of state components. In abstraction selection, an agent must choose an abstraction from a set of

  14. Task path planning, scheduling and learning for free-ranging robot systems

    Science.gov (United States)

    Wakefield, G. Steve

    1987-01-01

    The development of robotics applications for space operations is often restricted by the limited movement available to guided robots. Free ranging robots can offer greater flexibility than physically guided robots in these applications. Presented here is an object oriented approach to path planning and task scheduling for free-ranging robots that allows the dynamic determination of paths based on the current environment. The system also provides task learning for repetitive jobs. This approach provides a basis for the design of free-ranging robot systems which are adaptable to various environments and tasks.

  15. Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play

    NARCIS (Netherlands)

    van der Ree, Michiel; Wiering, Marco

    2013-01-01

    This paper compares three strategies in using reinforcement learning algorithms to let an artificial agent learnto play the game of Othello. The three strategies that are compared are: Learning by self-play, learning from playing against a fixed opponent, and learning from playing against a fixed

  16. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning.

    Science.gov (United States)

    Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B J

    2014-06-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.

  17. Learning feedforward controller for a mobile robot vehicle

    NARCIS (Netherlands)

    Starrenburg, J.G.; Starrenburg, J.G.; van Luenen, W.T.C.; van Luenen, W.T.C.; Oelen, W.; Oelen, W.; van Amerongen, J.

    1996-01-01

    This paper describes the design and realisation of an on-line learning posetracking controller for a three-wheeled mobile robot vehicle. The controller consists of two components. The first is a constant-gain feedback component, designed on the basis of a second-order model. The second is a learning

  18. Time representation in reinforcement learning models of the basal ganglia

    Directory of Open Access Journals (Sweden)

    Samuel Joseph Gershman

    2014-01-01

    Full Text Available Reinforcement learning models have been influential in understanding many aspects of basal ganglia function, from reward prediction to action selection. Time plays an important role in these models, but there is still no theoretical consensus about what kind of time representation is used by the basal ganglia. We review several theoretical accounts and their supporting evidence. We then discuss the relationship between reinforcement learning models and the timing mechanisms that have been attributed to the basal ganglia. We hypothesize that a single computational system may underlie both reinforcement learning and interval timing—the perception of duration in the range of seconds to hours. This hypothesis, which extends earlier models by incorporating a time-sensitive action selection mechanism, may have important implications for understanding disorders like Parkinson's disease in which both decision making and timing are impaired.

  19. The Development of a Robot-Based Learning Companion: A User-Centered Design Approach

    Science.gov (United States)

    Hsieh, Yi-Zeng; Su, Mu-Chun; Chen, Sherry Y.; Chen, Gow-Dong

    2015-01-01

    A computer-vision-based method is widely employed to support the development of a variety of applications. In this vein, this study uses a computer-vision-based method to develop a playful learning system, which is a robot-based learning companion named RobotTell. Unlike existing playful learning systems, a user-centered design (UCD) approach is…

  20. Safe Exploration of State and Action Spaces in Reinforcement Learning

    OpenAIRE

    Garcia, Javier; Fernandez, Fernando

    2014-01-01

    In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some sta...

  1. Mobile Robot Navigation Based on Q-Learning Technique

    Directory of Open Access Journals (Sweden)

    Lazhar Khriji

    2011-03-01

    Full Text Available This paper shows how Q-learning approach can be used in a successful way to deal with the problem of mobile robot navigation. In real situations where a large number of obstacles are involved, normal Q-learning approach would encounter two major problems due to excessively large state space. First, learning the Q-values in tabular form may be infeasible because of the excessive amount of memory needed to store the table. Second, rewards in the state space may be so sparse that with random exploration they will only be discovered extremely slowly. In this paper, we propose a navigation approach for mobile robot, in which the prior knowledge is used within Q-learning. We address the issue of individual behavior design using fuzzy logic. The strategy of behaviors based navigation reduces the complexity of the navigation problem by dividing them in small actions easier for design and implementation. The Q-Learning algorithm is applied to coordinate between these behaviors, which make a great reduction in learning convergence times. Simulation and experimental results confirm the convergence to the desired results in terms of saved time and computational resources.

  2. Adversarial Reinforcement Learning in a Cyber Security Simulation}

    OpenAIRE

    Elderman, Richard; Pater, Leon; Thie, Albert; Drugan, Madalina; Wiering, Marco

    2017-01-01

    This paper focuses on cyber-security simulations in networks modeled as a Markov game with incomplete information and stochastic elements. The resulting game is an adversarial sequential decision making problem played with two agents, the attacker and defender. The two agents pit one reinforcement learning technique, like neural networks, Monte Carlo learning and Q-learning, against each other and examine their effectiveness against learning opponents. The results showed that Monte Carlo lear...

  3. Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing

    NARCIS (Netherlands)

    Le, M.N.; Fokkens, A.S.

    Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this paper, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error

  4. The Computational Development of Reinforcement Learning during Adolescence.

    Directory of Open Access Journals (Sweden)

    Stefano Palminteri

    2016-06-01

    Full Text Available Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed. Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback and a value contextualisation module (enabling symmetrical reward and punishment learning. Unlike adults, adolescent performance did not benefit from counterfactual (complete feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.

  5. Learning Robotics in a Science Museum Theatre Play: Investigation of Learning Outcomes, Contexts and Experiences

    Science.gov (United States)

    Peleg, Ran; Baram-Tsabari, Ayelet

    2017-12-01

    Theatre is often introduced into science museums to enhance visitor experience. While learning in museums exhibitions received considerable research attention, learning from museum theatre has not. The goal of this exploratory study was to investigate the potential educational role of a science museum theatre play. The study aimed to investigate (1) cognitive learning outcomes of the play, (2) how these outcomes interact with different viewing contexts and (3) experiential learning outcomes through the theatrical experience. The play `Robot and I', addressing principles in robotics, was commissioned by a science museum. Data consisted of 391 questionnaires and interviews with 47 children and 20 parents. Findings indicate that explicit but not implicit learning goals were decoded successfully. There was little synergy between learning outcomes of the play and an exhibition on robotics, demonstrating the effect of two different physical contexts. Interview data revealed that prior knowledge, experience and interest played a major role in children's understanding of the play. Analysis of the theatrical experience showed that despite strong identification with the child protagonist, children often doubted the protagonist's knowledge jeopardizing integration of scientific content. The study extends the empirical knowledge and theoretical thinking on museum theatre to better support claims of its virtues and respond to their criticism.

  6. Reinforcement learning agents providing advice in complex video games

    Science.gov (United States)

    Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa

    2014-01-01

    This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

  7. A multiplicative reinforcement learning model capturing learning dynamics and interindividual variability in mice

    OpenAIRE

    Bathellier, Brice; Tee, Sui Poh; Hrovat, Christina; Rumpel, Simon

    2013-01-01

    Learning speed can strongly differ across individuals. This is seen in humans and animals. Here, we measured learning speed in mice performing a discrimination task and developed a theoretical model based on the reinforcement learning framework to account for differences between individual mice. We found that, when using a multiplicative learning rule, the starting connectivity values of the model strongly determine the shape of learning curves. This is in contrast to current learning models ...

  8. Unimodal Learning Enhances Crossmodal Learning in Robotic Audio-Visual Tracking

    DEFF Research Database (Denmark)

    Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

    2017-01-01

    Crossmodal sensory integration is a fundamental feature of the brain that aids in forming an coherent and unified representation of observed events in the world. Spatiotemporally correlated sensory stimuli brought about by rich sensorimotor experiences drive the development of crossmodal integrat...... a non-holonomic robotic agent towards a moving audio-visual target. Simulation results demonstrate that unimodal learning enhances crossmodal learning and improves both the overall accuracy and precision of multisensory orientation response....

  9. Unimodal Learning Enhances Crossmodal Learning in Robotic Audio-Visual Tracking

    DEFF Research Database (Denmark)

    Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

    2018-01-01

    Crossmodal sensory integration is a fundamental feature of the brain that aids in forming an coherent and unified representation of observed events in the world. Spatiotemporally correlated sensory stimuli brought about by rich sensorimotor experiences drive the development of crossmodal integrat...... a non-holonomic robotic agent towards a moving audio-visual target. Simulation results demonstrate that unimodal learning enhances crossmodal learning and improves both the overall accuracy and precision of multisensory orientation response....

  10. Self-supervised learning as an enabling technology for future space exploration robots: ISS experiments on monocular distance learning

    Science.gov (United States)

    van Hecke, Kevin; de Croon, Guido C. H. E.; Hennes, Daniel; Setterfield, Timothy P.; Saenz-Otero, Alvar; Izzo, Dario

    2017-11-01

    Although machine learning holds an enormous promise for autonomous space robots, it is currently not employed because of the inherent uncertain outcome of learning processes. In this article we investigate a learning mechanism, Self-Supervised Learning (SSL), which is very reliable and hence an important candidate for real-world deployment even on safety-critical systems such as space robots. To demonstrate this reliability, we introduce a novel SSL setup that allows a stereo vision equipped robot to cope with the failure of one of its cameras. The setup learns to estimate average depth using a monocular image, by using the stereo vision depths from the past as trusted ground truth. We present preliminary results from an experiment on the International Space Station (ISS) performed with the MIT/NASA SPHERES VERTIGO satellite. The presented experiments were performed on October 8th, 2015 on board the ISS. The main goals were (1) data gathering, and (2) navigation based on stereo vision. First the astronaut Kimiya Yui moved the satellite around the Japanese Experiment Module to gather stereo vision data for learning. Subsequently, the satellite freely explored the space in the module based on its (trusted) stereo vision system and a pre-programmed exploration behavior, while simultaneously performing the self-supervised learning of monocular depth estimation on board. The two main goals were successfully achieved, representing the first online learning robotic experiments in space. These results lay the groundwork for a follow-up experiment in which the satellite will use the learned single-camera depth estimation for autonomous exploration in the ISS, and are an advancement towards future space robots that continuously improve their navigation capabilities over time, even in harsh and completely unknown space environments.

  11. An iterative learning controller for nonholonomic mobile robots

    International Nuclear Information System (INIS)

    Oriolo, G.; Panzieri, S.; Ulivi, G.

    1998-01-01

    The authors present an iterative learning controller that applies to nonholonomic mobile robots, as well as other systems that can be put in chained form. The learning algorithm exploits the fact that chained-form. The learning algorithm exploits the fact that chained-form systems are linear under piecewise-constant inputs. The proposed control scheme requires the execution of a small number of experiments to drive the system to the desired state in finite time, with nice convergence and robustness properties with respect to modeling inaccuracies as well as disturbances. To avoid the necessity of exactly reinitializing the system at each iteration, the basic method is modified so as to obtain a cyclic controller, by which the system is cyclically steered through an arbitrary sequence of states. As a case study, a carlike mobile robot is considered. Both simulation and experimental results are reported to show the performance of the method

  12. Design based action research in the world of robot technology and learning

    DEFF Research Database (Denmark)

    Majgaard, Gunver

    2010-01-01

    Why is design based action research method important in the world of robot technology and learning? The article explores how action research and interaction-driven design can be used in development of educational robot technological tools. The actual case is the development of “Fraction Battle......” which is about learning fractions in primary school. The technology is based on robot technology. An outdoor digital playground is taken into to the classroom and then redesigned. The article argues for interaction design takes precedence to technology or goal driven design for development...... of educational tools....

  13. Manifold traversing as a model for learning control of autonomous robots

    Science.gov (United States)

    Szakaly, Zoltan F.; Schenker, Paul S.

    1992-01-01

    This paper describes a recipe for the construction of control systems that support complex machines such as multi-limbed/multi-fingered robots. The robot has to execute a task under varying environmental conditions and it has to react reasonably when previously unknown conditions are encountered. Its behavior should be learned and/or trained as opposed to being programmed. The paper describes one possible method for organizing the data that the robot has learned by various means. This framework can accept useful operator input even if it does not fully specify what to do, and can combine knowledge from autonomous, operator assisted and programmed experiences.

  14. Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

    Directory of Open Access Journals (Sweden)

    Takahiro Ezaki

    2016-07-01

    Full Text Available Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.

  15. Reinforcement Learning for a New Piano Mover

    Directory of Open Access Journals (Sweden)

    Yuko Ishiwaka

    2005-08-01

    Full Text Available We attempt to achieve corporative behavior of autonomous decentralized agents constructed via Q-Learning, which is a type of reinforcement learning. As such, in the present paper, we examine the piano mover's problem. We propose a multi-agent architecture that has a training agent, learning agents and intermediate agent. Learning agents are heterogeneous and can communicate with each other. The movement of an object with three kinds of agent depends on the composition of the actions of the learning agents. By learning its own shape through the learning agents, avoidance of obstacles by the object is expected. We simulate the proposed method in a two-dimensional continuous world. Results obtained in the present investigation reveal the effectiveness of the proposed method.

  16. Place preference and vocal learning rely on distinct reinforcers in songbirds.

    Science.gov (United States)

    Murdoch, Don; Chen, Ruidong; Goldberg, Jesse H

    2018-04-30

    In reinforcement learning (RL) agents are typically tasked with maximizing a single objective function such as reward. But it remains poorly understood how agents might pursue distinct objectives at once. In machines, multiobjective RL can be achieved by dividing a single agent into multiple sub-agents, each of which is shaped by agent-specific reinforcement, but it remains unknown if animals adopt this strategy. Here we use songbirds to test if navigation and singing, two behaviors with distinct objectives, can be differentially reinforced. We demonstrate that strobe flashes aversively condition place preference but not song syllables. Brief noise bursts aversively condition song syllables but positively reinforce place preference. Thus distinct behavior-generating systems, or agencies, within a single animal can be shaped by correspondingly distinct reinforcement signals. Our findings suggest that spatially segregated vocal circuits can solve a credit assignment problem associated with multiobjective learning.

  17. A reward optimization method based on action subrewards in hierarchical reinforcement learning.

    Science.gov (United States)

    Fu, Yuchen; Liu, Quan; Ling, Xionghong; Cui, Zhiming

    2014-01-01

    Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

  18. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning

    Directory of Open Access Journals (Sweden)

    Yuntian Feng

    2017-01-01

    Full Text Available We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  19. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning.

    Science.gov (United States)

    Feng, Yuntian; Zhang, Hongjun; Hao, Wenning; Chen, Gang

    2017-01-01

    We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q -Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  20. Pleasurable music affects reinforcement learning according to the listener

    Science.gov (United States)

    Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira

    2013-01-01

    Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875

  1. Evidence of Self-Directed Learning on a High School Robotics Team

    Directory of Open Access Journals (Sweden)

    Nathan R. Dolenc

    2014-12-01

    Full Text Available Self-directed learning is described as an individual taking the initiative to engage in a learning experience while assuming responsibility to follow through to its conclusion. Robotics competitions are examples of informal environments that can facilitate self-directed learning. This study examined how mentor involvement, student behavior, and physical workspace contributed to self-directed learning on one robotics competition team. How did mentors transfer responsibility to students? How did students respond to managing a team? Are the physical attributes of a workspace important? The mentor, student, and workplace factors captured in the research showed mentors wanting students to do the work, students assuming leadership roles, and the limited workspace having a positive effect on student productivity.

  2. Associative learning for a robot intelligence

    CERN Document Server

    Andreae, John H

    1998-01-01

    The explanation of brain functioning in terms of the association of ideas has been popular since the 17th century. Recently, however, the process of association has been dismissed as computationally inadequate by prominent cognitive scientists. In this book, a sharper definition of the term "association" is used to revive the process by showing that associative learning can indeed be computationally powerful. Within an appropriate organization, associative learning can be embodied in a robot to realize a human-like intelligence, which sets its own goals, exhibits unique unformalizable behaviou

  3. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning

    National Research Council Canada - National Science Library

    Bowling, Michael

    2000-01-01

    .... In this paper we contribute a comprehensive presentation of the relevant techniques for solving stochastic games from both the game theory community and reinforcement learning communities. We examine the assumptions and limitations of these algorithms, and identify similarities between these algorithms, single agent reinforcement learners, and basic game theory techniques.

  4. Towards Behavior Control for Evolutionary Robot Based on RL with ENN

    Directory of Open Access Journals (Sweden)

    Jingan Yang

    2012-03-01

    Full Text Available This paper proposes a behavior-switching control strategy of anevolutionary robotics based on Artificial NeuralNetwork (ANN and Genetic Algorithms (GA. This method is able not only to construct thereinforcement learning models for autonomous robots and evolutionary robot modules thatcontrol behaviors and reinforcement learning environments, and but also to perform thebehavior-switching control and obstacle avoidance of an evolutionary robotics (ER intime-varying environments with static and moving obstacles by combining ANN and GA.The experimental results on thebasic behaviors and behavior-switching control have demonstrated that ourmethod can perform the decision-making strategy and parameters set opimization ofFNN and GA by learning and can escape successfully from the trap of a localminima and avoid \\emph{"motion deadlock" status} of humanoid soccer robotics agents,and reduce the oscillation of the planned trajectory betweenthe multiple obstacles by crossover and mutation. Some results of the proposed algorithmhave been successfully applied to our simulation humanoid robotics soccer team CIT3Dwhich won \\emph{the 1st prize} of RoboCup Championship and ChinaOpen2010 (July 2010 and \\emph{the $2^{nd}$ place}of the official RoboCup World Championship on 5-11 July, 2011 in Istanbul, Turkey.As compared with the conventional behavior network and the adaptive behavior method,the genetic encoding complexity of our algorithm is simplified, and the networkperformance and the {\\em convergence rate $\\rho$} have been greatlyimproved.

  5. Robot Competence Development by Constructive Learning

    Science.gov (United States)

    Meng, Q.; Lee, M. H.; Hinde, C. J.

    This paper presents a constructive learning approach for developing sensor-motor mapping in autonomous systems. The system’s adaptation to environment changes is discussed and three methods are proposed to deal with long term and short term changes. The proposed constructive learning allows autonomous systems to develop network topology and adjust network parameters. The approach is supported by findings from psychology and neuroscience especially during infants cognitive development at early stages. A growing radial basis function network is introduced as a computational substrate for sensory-motor mapping learning. Experiments are conducted on a robot eye/hand coordination testbed and results show the incremental development of sensory-motor mapping and its adaptation to changes such as in tool-use.

  6. Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement.

    Science.gov (United States)

    Fernández, Thalía; Bosch-Bayard, Jorge; Harmony, Thalía; Caballero, María I; Díaz-Comas, Lourdes; Galán, Lídice; Ricardo-Garcell, Josefina; Aubert, Eduardo; Otero-Ojeda, Gloria

    2016-03-01

    Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer.

  7. Reinforcement learning for optimal control of low exergy buildings

    International Nuclear Information System (INIS)

    Yang, Lei; Nagy, Zoltan; Goffin, Philippe; Schlueter, Arno

    2015-01-01

    Highlights: • Implementation of reinforcement learning control for LowEx Building systems. • Learning allows adaptation to local environment without prior knowledge. • Presentation of reinforcement learning control for real-life applications. • Discussion of the applicability for real-life situations. - Abstract: Over a third of the anthropogenic greenhouse gas (GHG) emissions stem from cooling and heating buildings, due to their fossil fuel based operation. Low exergy building systems are a promising approach to reduce energy consumption as well as GHG emissions. They consists of renewable energy technologies, such as PV, PV/T and heat pumps. Since careful tuning of parameters is required, a manual setup may result in sub-optimal operation. A model predictive control approach is unnecessarily complex due to the required model identification. Therefore, in this work we present a reinforcement learning control (RLC) approach. The studied building consists of a PV/T array for solar heat and electricity generation, as well as geothermal heat pumps. We present RLC for the PV/T array, and the full building model. Two methods, Tabular Q-learning and Batch Q-learning with Memory Replay, are implemented with real building settings and actual weather conditions in a Matlab/Simulink framework. The performance is evaluated against standard rule-based control (RBC). We investigated different neural network structures and find that some outperformed RBC already during the learning phase. Overall, every RLC strategy for PV/T outperformed RBC by over 10% after the third year. Likewise, for the full building, RLC outperforms RBC in terms of meeting the heating demand, maintaining the optimal operation temperature and compensating more effectively for ground heat. This allows to reduce engineering costs associated with the setup of these systems, as well as decrease the return-of-invest period, both of which are necessary to create a sustainable, zero-emission building

  8. Effect of Robotics-Enhanced Inquiry-Based Learning in Elementary Science Education in South Korea

    Science.gov (United States)

    Park, Jungho

    2015-01-01

    Much research has been conducted in educational robotics, a new instructional technology, for K-12 education. However, there are arguments on the effect of robotics and limited empirical evidence to investigate the impact of robotics in science learning. Also most robotics studies were carried in an informal educational setting. This study…

  9. Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.

    Directory of Open Access Journals (Sweden)

    Borja Fernandez-Gauna

    Full Text Available Multi-Agent Reinforcement Learning (MARL algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS in the relevant state-action subspace. Each agent's local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.

  10. Tracked robot controllers for climbing obstacles autonomously

    Science.gov (United States)

    Vincent, Isabelle

    2009-05-01

    Research in mobile robot navigation has demonstrated some success in navigating flat indoor environments while avoiding obstacles. However, the challenge of analyzing complex environments to climb obstacles autonomously has had very little success due to the complexity of the task. Unmanned ground vehicles currently exhibit simple autonomous behaviours compared to the human ability to move in the world. This paper presents the control algorithms designed for a tracked mobile robot to autonomously climb obstacles by varying its tracks configuration. Two control algorithms are proposed to solve the autonomous locomotion problem for climbing obstacles. First, a reactive controller evaluates the appropriate geometric configuration based on terrain and vehicle geometric considerations. Then, a reinforcement learning algorithm finds alternative solutions when the reactive controller gets stuck while climbing an obstacle. The methodology combines reactivity to learning. The controllers have been demonstrated in box and stair climbing simulations. The experiments illustrate the effectiveness of the proposed approach for crossing obstacles.

  11. Knowledge transfer for learning robot models via local procrustes analysis

    CSIR Research Space (South Africa)

    Makondo, N

    2015-11-01

    Full Text Available Learning of robot kinematic and dynamic models from data has attracted much interest recently as an alternative to manually defined models. However, the amount of data required to learn these models becomes large when the number of degrees...

  12. Intranasal oxytocin enhances socially-reinforced learning in rhesus monkeys

    Directory of Open Access Journals (Sweden)

    Lisa A Parr

    2014-09-01

    Full Text Available There are currently no drugs approved for the treatment of social deficits associated with autism spectrum disorders (ASD. One hypothesis for these deficits is that individuals with ASD lack the motivation to attend to social cues because those cues are not implicitly rewarding. Therefore, any drug that could enhance the rewarding quality of social stimuli could have a profound impact on the treatment of ASD, and other social disorders. Oxytocin (OT is a neuropeptide that has been effective in enhancing social cognition and social reward in humans. The present study examined the ability of OT to selectively enhance learning after social compared to nonsocial reward in rhesus monkeys, an important species for modeling the neurobiology of social behavior in humans. Monkeys were required to learn an implicit visual matching task after receiving either intranasal (IN OT or Placebo (saline. Correct trials were rewarded with the presentation of positive and negative social (play faces/threat faces or nonsocial (banana/cage locks stimuli, plus food. Incorrect trials were not rewarded. Results demonstrated a strong effect of socially-reinforced learning, monkeys’ performed significantly better when reinforced with social versus nonsocial stimuli. Additionally, socially-reinforced learning was significantly better and occurred faster after IN-OT compared to placebo treatment. Performance in the IN-OT, but not Placebo, condition was also significantly better when the reinforcement stimuli were emotionally positive compared to negative facial expressions. These data support the hypothesis that OT may function to enhance prosocial behavior in primates by increasing the rewarding quality of emotionally positive, social compared to emotionally negative or nonsocial images. These data also support the use of the rhesus monkey as a model for exploring the neurobiological basis of social behavior and its impairment.

  13. applying reinforcement learning to the weapon assignment problem

    African Journals Online (AJOL)

    ismith

    Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy ..... closest to the threat should fire (that weapon also had the highest probability to ... Monte Carlo ..... “Reinforcement learning: Theory, methods and application to.

  14. Reinforcement Learning for Ramp Control: An Analysis of Learning Parameters

    Directory of Open Access Journals (Sweden)

    Chao Lu

    2016-08-01

    Full Text Available Reinforcement Learning (RL has been proposed to deal with ramp control problems under dynamic traffic conditions; however, there is a lack of sufficient research on the behaviour and impacts of different learning parameters. This paper describes a ramp control agent based on the RL mechanism and thoroughly analyzed the influence of three learning parameters; namely, learning rate, discount rate and action selection parameter on the algorithm performance. Two indices for the learning speed and convergence stability were used to measure the algorithm performance, based on which a series of simulation-based experiments were designed and conducted by using a macroscopic traffic flow model. Simulation results showed that, compared with the discount rate, the learning rate and action selection parameter made more remarkable impacts on the algorithm performance. Based on the analysis, some suggestionsabout how to select suitable parameter values that can achieve a superior performance were provided.

  15. Applying reinforcement learning to the weapon assignment problem in air defence

    CSIR Research Space (South Africa)

    Mouton, H

    2011-12-01

    Full Text Available . The techniques investigated in this article were two methods from the machine-learning subfield of reinforcement learning (RL), namely a Monte Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy temporal-difference (TD) learning...

  16. The combination of appetitive and aversive reinforcers and the nature of their interaction during auditory learning.

    Science.gov (United States)

    Ilango, A; Wetzel, W; Scheich, H; Ohl, F W

    2010-03-31

    Learned changes in behavior can be elicited by either appetitive or aversive reinforcers. It is, however, not clear whether the two types of motivation, (approaching appetitive stimuli and avoiding aversive stimuli) drive learning in the same or different ways, nor is their interaction understood in situations where the two types are combined in a single experiment. To investigate this question we have developed a novel learning paradigm for Mongolian gerbils, which not only allows rewards and punishments to be presented in isolation or in combination with each other, but also can use these opposite reinforcers to drive the same learned behavior. Specifically, we studied learning of tone-conditioned hurdle crossing in a shuttle box driven by either an appetitive reinforcer (brain stimulation reward) or an aversive reinforcer (electrical footshock), or by a combination of both. Combination of the two reinforcers potentiated speed of acquisition, led to maximum possible performance, and delayed extinction as compared to either reinforcer alone. Additional experiments, using partial reinforcement protocols and experiments in which one of the reinforcers was omitted after the animals had been previously trained with the combination of both reinforcers, indicated that appetitive and aversive reinforcers operated together but acted in different ways: in this particular experimental context, punishment appeared to be more effective for initial acquisition and reward more effective to maintain a high level of conditioned responses (CRs). The results imply that learning mechanisms in problem solving were maximally effective when the initial punishment of mistakes was combined with the subsequent rewarding of correct performance. Copyright 2010 IBRO. Published by Elsevier Ltd. All rights reserved.

  17. Training and learning robotic surgery, time for a more structured approach: a systematic review

    NARCIS (Netherlands)

    Schreuder, H. W. R.; Wolswijk, R.; Zweemer, R. P.; Schijven, M. P.; Verheijen, R. H. M.

    2012-01-01

    Background Robotic assisted laparoscopic surgery is growing rapidly and there is an increasing need for a structured approach to train future robotic surgeons. Objectives To review the literature on training and learning strategies for robotic assisted laparoscopic surgery. Search strategy A

  18. A Scalable Neuro-inspired Robot Controller Integrating a Machine Learning Algorithm and a Spiking Cerebellar-like Network

    DEFF Research Database (Denmark)

    Baira Ojeda, Ismael; Tolu, Silvia; Lund, Henrik Hautop

    2017-01-01

    Combining Fable robot, a modular robot, with a neuroinspired controller, we present the proof of principle of a system that can scale to several neurally controlled compliant modules. The motor control and learning of a robot module are carried out by a Unit Learning Machine (ULM) that embeds...... the Locally Weighted Projection Regression algorithm (LWPR) and a spiking cerebellar-like microcircuit. The LWPR guarantees both an optimized representation of the input space and the learning of the dynamic internal model (IM) of the robot. However, the cerebellar-like sub-circuit integrates LWPR input...

  19. Behavioral similarity measurement based on image processing for robots that use imitative learning

    Science.gov (United States)

    Sterpin B., Dante G.; Martinez S., Fernando; Jacinto G., Edwar

    2017-02-01

    In the field of the artificial societies, particularly those are based on memetics, imitative behavior is essential for the development of cultural evolution. Applying this concept for robotics, through imitative learning, a robot can acquire behavioral patterns from another robot. Assuming that the learning process must have an instructor and, at least, an apprentice, the fact to obtain a quantitative measurement for their behavioral similarity, would be potentially useful, especially in artificial social systems focused on cultural evolution. In this paper the motor behavior of both kinds of robots, for two simple tasks, is represented by 2D binary images, which are processed in order to measure their behavioral similarity. The results shown here were obtained comparing some similarity measurement methods for binary images.

  20. Robotic Cooperative Learning Promotes Student STEM Interest

    Science.gov (United States)

    Mosley, Pauline; Ardito, Gerald; Scollins, Lauren

    2016-01-01

    The principal purpose of this investigation is to study the effect of robotic cooperative learning methodologies on middle school students' critical thinking, and STEM interest. The semi-experimental inquiry consisted of ninety four six-grade students (forty nine students in the experimental group, forty five students in the control group), chosen…

  1. Reinforcement and Systemic Machine Learning for Decision Making

    CERN Document Server

    Kulkarni, Parag

    2012-01-01

    Reinforcement and Systemic Machine Learning for Decision Making There are always difficulties in making machines that learn from experience. Complete information is not always available-or it becomes available in bits and pieces over a period of time. With respect to systemic learning, there is a need to understand the impact of decisions and actions on a system over that period of time. This book takes a holistic approach to addressing that need and presents a new paradigm-creating new learning applications and, ultimately, more intelligent machines. The first book of its kind in this new an

  2. A Combination of Machine Learning and Cerebellar-like Neural Networks for the Motor Control and Motor Learning of the Fable Modular Robot

    DEFF Research Database (Denmark)

    Baira Ojeda, Ismael; Tolu, Silvia; Pacheco, Moises

    2017-01-01

    We scaled up a bio-inspired control architecture for the motor control and motor learning of a real modular robot. In our approach, the Locally Weighted Projection Regression algorithm (LWPR) and a cerebellar microcircuit coexist, in the form of a Unit Learning Machine. The LWPR algorithm optimizes...... the input space and learns the internal model of a single robot module to command the robot to follow a desired trajectory with its end-effector. The cerebellar-like microcircuit refines the LWPR output delivering corrective commands. We contrasted distinct cerebellar-like circuits including analytical...

  3. Reinforcement active learning in the vibrissae system: optimal object localization.

    Science.gov (United States)

    Gordon, Goren; Dorfman, Nimrod; Ahissar, Ehud

    2013-01-01

    Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment. Copyright © 2012 Elsevier Ltd. All rights reserved.

  4. Neural correlates of reinforcement learning and social preferences in competitive bidding.

    Science.gov (United States)

    van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M

    2013-01-30

    In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.

  5. Interactive Rhythm Learning System by Combining Tablet Computers and Robots

    Directory of Open Access Journals (Sweden)

    Chien-Hsing Chou

    2017-03-01

    Full Text Available This study proposes a percussion learning device that combines tablet computers and robots. This device comprises two systems: a rhythm teaching system, in which users can compose and practice rhythms by using a tablet computer, and a robot performance system. First, teachers compose the rhythm training contents on the tablet computer. Then, the learners practice these percussion exercises by using the tablet computer and a small drum set. The teaching system provides a new and user-friendly score editing interface for composing a rhythm exercise. It also provides a rhythm rating function to facilitate percussion training for children and improve the stability of rhythmic beating. To encourage children to practice percussion exercises, a robotic performance system is used to interact with the children; this system can perform percussion exercises for students to listen to and then help them practice the exercise. This interaction enhances children’s interest and motivation to learn and practice rhythm exercises. The results of experimental course and field trials reveal that the proposed system not only increases students’ interest and efficiency in learning but also helps them in understanding musical rhythms through interaction and composing simple rhythms.

  6. Autonomous learning in humanoid robotics through mental imagery.

    Science.gov (United States)

    Di Nuovo, Alessandro G; Marocco, Davide; Di Nuovo, Santo; Cangelosi, Angelo

    2013-05-01

    In this paper we focus on modeling autonomous learning to improve performance of a humanoid robot through a modular artificial neural networks architecture. A model of a neural controller is presented, which allows a humanoid robot iCub to autonomously improve its sensorimotor skills. This is achieved by endowing the neural controller with a secondary neural system that, by exploiting the sensorimotor skills already acquired by the robot, is able to generate additional imaginary examples that can be used by the controller itself to improve the performance through a simulated mental training. Results and analysis presented in the paper provide evidence of the viability of the approach proposed and help to clarify the rational behind the chosen model and its implementation. Copyright © 2012 Elsevier Ltd. All rights reserved.

  7. An appraisal of the learning curve in robotic general surgery.

    Science.gov (United States)

    Pernar, Luise I M; Robertson, Faith C; Tavakkoli, Ali; Sheu, Eric G; Brooks, David C; Smink, Douglas S

    2017-11-01

    Robotic-assisted surgery is used with increasing frequency in general surgery for a variety of applications. In spite of this increase in usage, the learning curve is not yet defined. This study reviews the literature on the learning curve in robotic general surgery to inform adopters of the technology. PubMed and EMBASE searches yielded 3690 abstracts published between July 1986 and March 2016. The abstracts were evaluated based on the following inclusion criteria: written in English, reporting original work, focus on general surgery operations, and with explicit statistical methods. Twenty-six full-length articles were included in final analysis. The articles described the learning curves in colorectal (9 articles, 35%), foregut/bariatric (8, 31%), biliary (5, 19%), and solid organ (4, 15%) surgery. Eighteen of 26 (69%) articles report single-surgeon experiences. Time was used as a measure of the learning curve in all studies (100%); outcomes were examined in 10 (38%). In 12 studies (46%), the authors identified three phases of the learning curve. Numbers of cases needed to achieve plateau performance were wide-ranging but overlapping for different kinds of operations: 19-128 cases for colorectal, 8-95 for foregut/bariatric, 20-48 for biliary, and 10-80 for solid organ surgery. Although robotic surgery is increasingly utilized in general surgery, the literature provides few guidelines on the learning curve for adoption. In this heterogeneous sample of reviewed articles, the number of cases needed to achieve plateau performance varies by case type and the learning curve may have multiple phases as surgeons add more complex cases to their case mix with growing experience. Time is the most common determinant for the learning curve. The literature lacks a uniform assessment of outcomes and complications, which would arguably reflect expertise in a more meaningful way than time to perform the operation alone.

  8. Bi-directional effect of increasing doses of baclofen on reinforcement learning

    Directory of Open Access Journals (Sweden)

    Jean eTerrier

    2011-07-01

    Full Text Available In rodents as well as in humans, efficient reinforcement learning depends on dopamine (DA released from ventral tegmental area (VTA neurons. It has been shown that in brain slices of mice, GABAB-receptor agonists at low concentrations increase the firing frequency of VTA-DA neurons, while high concentrations reduce the firing frequency. It remains however elusive whether baclofen can modulate reinforcement learning. Here, in a double blind study in 34 healthy human volunteers, we tested the effects of a low and a high concentration of oral baclofen in a gambling task associated with monetary reward. A low (20 mg dose of baclofen increased the efficiency of reward-associated learning but had no effect on the avoidance of monetary loss. A high (50 mg dose of baclofen on the other hand did not affect the learning curve. At the end of the task, subjects who received 20 mg baclofen p.o. were more accurate in choosing the symbol linked to the highest probability of earning money compared to the control group (89.55±1.39% vs 81.07±1.55%, p=0.002. Our results support a model where baclofen, at low concentrations, causes a disinhibition of DA neurons, increases DA levels and thus facilitates reinforcement learning.

  9. Traffic light control by multiagent reinforcement learning systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.; Groen, F.C.A.; Babuška, R.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  10. Traffic Light Control by Multiagent Reinforcement Learning Systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.J.H.M.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  11. Retention of fundamental surgical skills learned in robot-assisted surgery.

    Science.gov (United States)

    Suh, Irene H; Mukherjee, Mukul; Shah, Bhavin C; Oleynikov, Dmitry; Siu, Ka-Chun

    2012-12-01

    Evaluation of the learning curve for robotic surgery has shown reduced errors and decreased task completion and training times compared with regular laparoscopic surgery. However, most training evaluations of robotic surgery have only addressed short-term retention after the completion of training. Our goal was to investigate the amount of surgical skills retained after 3 months of training with the da Vinci™ Surgical System. Seven medical students without any surgical experience were recruited. Participants were trained with a 4-day training program of robotic surgical skills and underwent a series of retention tests at 1 day, 1 week, 1 month, and 3 months post-training. Data analysis included time to task completion, speed, distance traveled, and movement curvature by the instrument tip. Performance of the participants was graded using the modified Objective Structured Assessment of Technical Skills (OSATS) for robotic surgery. Participants filled out a survey after each training session by answering a set of questions. Time to task completion and the movement curvature was decreased from pre- to post-training and the performance was retained at all the corresponding retention periods: 1 day, 1 week, 1 month, and 3 months. The modified OSATS showed improvement from pre-test to post-test and this improvement was maintained during all the retention periods. Participants increased in self-confidence and mastery in performing robotic surgical tasks after training. Our novel comprehensive training program improved robot-assisted surgical performance and learning. All trainees retained their fundamental surgical skills for 3 months after receiving the training program.

  12. A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control

    Science.gov (United States)

    Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu

    This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.

  13. Arousal regulation and affective adaptation to human responsiveness by a robot that explores and learns a novel environment.

    Science.gov (United States)

    Hiolle, Antoine; Lewis, Matthew; Cañamero, Lola

    2014-01-01

    In the context of our work in developmental robotics regarding robot-human caregiver interactions, in this paper we investigate how a "baby" robot that explores and learns novel environments can adapt its affective regulatory behavior of soliciting help from a "caregiver" to the preferences shown by the caregiver in terms of varying responsiveness. We build on two strands of previous work that assessed independently (a) the differences between two "idealized" robot profiles-a "needy" and an "independent" robot-in terms of their use of a caregiver as a means to regulate the "stress" (arousal) produced by the exploration and learning of a novel environment, and (b) the effects on the robot behaviors of two caregiving profiles varying in their responsiveness-"responsive" and "non-responsive"-to the regulatory requests of the robot. Going beyond previous work, in this paper we (a) assess the effects that the varying regulatory behavior of the two robot profiles has on the exploratory and learning patterns of the robots; (b) bring together the two strands previously investigated in isolation and take a step further by endowing the robot with the capability to adapt its regulatory behavior along the "needy" and "independent" axis as a function of the varying responsiveness of the caregiver; and (c) analyze the effects that the varying regulatory behavior has on the exploratory and learning patterns of the adaptive robot.

  14. Perceptual learning rules based on reinforcers and attention

    NARCIS (Netherlands)

    Roelfsema, Pieter R.; van Ooyen, Arjen; Watanabe, Takeo

    2010-01-01

    How does the brain learn those visual features that are relevant for behavior? In this article, we focus on two factors that guide plasticity of visual representations. First, reinforcers cause the global release of diffusive neuromodulatory signals that gate plasticity. Second, attentional feedback

  15. Optimizing microstimulation using a reinforcement learning framework.

    Science.gov (United States)

    Brockmeier, Austin J; Choi, John S; Distasio, Marcello M; Francis, Joseph T; Príncipe, José C

    2011-01-01

    The ability to provide sensory feedback is desired to enhance the functionality of neuroprosthetics. Somatosensory feedback provides closed-loop control to the motor system, which is lacking in feedforward neuroprosthetics. In the case of existing somatosensory function, a template of the natural response can be used as a template of desired response elicited by electrical microstimulation. In the case of no initial training data, microstimulation parameters that produce responses close to the template must be selected in an online manner. We propose using reinforcement learning as a framework to balance the exploration of the parameter space and the continued selection of promising parameters for further stimulation. This approach avoids an explicit model of the neural response from stimulation. We explore a preliminary architecture--treating the task as a k-armed bandit--using offline data recorded for natural touch and thalamic microstimulation, and we examine the methods efficiency in exploring the parameter space while concentrating on promising parameter forms. The best matching stimulation parameters, from k = 68 different forms, are selected by the reinforcement learning algorithm consistently after 334 realizations.

  16. Experiments with Online Reinforcement Learning in Real-Time Strategy Games

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-time strategy (RTS) games provide a challenging platform to implement online reinforcement learning (RL) techniques in a real application. Computer, as one game player, monitors opponents' (human or other computers) strategies and then updates its own policy using RL methods. In this article......, we first examine the suitability of applying the online RL in various computer games. Reinforcement learning application depends on both RL complexity and the game features. We then propose a multi-layer framework for implementing online RL in an RTS game. The framework significantly reduces RL...... the effectiveness of our proposed framework and shed light on relevant issues in using online RL in RTS games....

  17. Adaptive Strategy for Online Gait Learning Evaluated on the Polymorphic Robotic LocoKit

    DEFF Research Database (Denmark)

    Christensen, David Johan; Larsen, Jørgen Christian; Stoy, Kasper

    2012-01-01

    This paper presents experiments with a morphologyindependent, life-long strategy for online learning of locomotion gaits, performed on a quadruped robot constructed from the LocoKit modular robot. The learning strategy applies a stochastic optimization algorithm to optimize eight open parameters...... of a central pattern generator based gait implementation. We observe that the strategy converges in roughly ten minutes to gaits of similar or higher velocity than a manually designed gait and that the strategy readapts in the event of failed actuators. In future work we plan to study co-learning...

  18. Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments

    OpenAIRE

    Kidziński, Łukasz; Mohanty, Sharada Prasanna; Ong, Carmichael; Huang, Zhewei; Zhou, Shuchang; Pechenko, Anton; Stelmaszczyk, Adam; Jarosik, Piotr; Pavlov, Mikhail; Kolesnikov, Sergey; Plis, Sergey; Chen, Zhibo; Zhang, Zhizheng; Chen, Jiale; Shi, Jun

    2018-01-01

    In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar ...

  19. Three-dimensional neural net for learning visuomotor coordination of a robot arm.

    Science.gov (United States)

    Martinetz, T M; Ritter, H J; Schulten, K J

    1990-01-01

    An extension of T. Kohonen's (1982) self-organizing mapping algorithm together with an error-correction scheme based on the Widrow-Hoff learning rule is applied to develop a learning algorithm for the visuomotor coordination of a simulated robot arm. Learning occurs by a sequence of trial movements without the need for an external teacher. Using input signals from a pair of cameras, the closed robot arm system is able to reduce its positioning error to about 0.3% of the linear dimensions of its work space. This is achieved by choosing the connectivity of a three-dimensional lattice consisting of the units of the neural net.

  20. Hierarchical HMM based learning of navigation primitives for cooperative robotic endovascular catheterization.

    Science.gov (United States)

    Rafii-Tari, Hedyeh; Liu, Jindong; Payne, Christopher J; Bicknell, Colin; Yang, Guang-Zhong

    2014-01-01

    Despite increased use of remote-controlled steerable catheter navigation systems for endovascular intervention, most current designs are based on master configurations which tend to alter natural operator tool interactions. This introduces problems to both ergonomics and shared human-robot control. This paper proposes a novel cooperative robotic catheterization system based on learning-from-demonstration. By encoding the higher-level structure of a catheterization task as a sequence of primitive motions, we demonstrate how to achieve prospective learning for complex tasks whilst incorporating subject-specific variations. A hierarchical Hidden Markov Model is used to model each movement primitive as well as their sequential relationship. This model is applied to generation of motion sequences, recognition of operator input, and prediction of future movements for the robot. The framework is validated by comparing catheter tip motions against the manual approach, showing significant improvements in the quality of catheterization. The results motivate the design of collaborative robotic systems that are intuitive to use, while reducing the cognitive workload of the operator.

  1. Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.

    Science.gov (United States)

    Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

    2015-01-01

    Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.

  2. Design of Intelligent Robot as A Tool for Teaching Media Based on Computer Interactive Learning and Computer Assisted Learning to Improve the Skill of University Student

    Science.gov (United States)

    Zuhrie, M. S.; Basuki, I.; Asto B, I. G. P.; Anifah, L.

    2018-01-01

    The focus of the research is the teaching module which incorporates manufacturing, planning mechanical designing, controlling system through microprocessor technology and maneuverability of the robot. Computer interactive and computer-assisted learning is strategies that emphasize the use of computers and learning aids (computer assisted learning) in teaching and learning activity. This research applied the 4-D model research and development. The model is suggested by Thiagarajan, et.al (1974). 4-D Model consists of four stages: Define Stage, Design Stage, Develop Stage, and Disseminate Stage. This research was conducted by applying the research design development with an objective to produce a tool of learning in the form of intelligent robot modules and kit based on Computer Interactive Learning and Computer Assisted Learning. From the data of the Indonesia Robot Contest during the period of 2009-2015, it can be seen that the modules that have been developed confirm the fourth stage of the research methods of development; disseminate method. The modules which have been developed for students guide students to produce Intelligent Robot Tool for Teaching Based on Computer Interactive Learning and Computer Assisted Learning. Results of students’ responses also showed a positive feedback to relate to the module of robotics and computer-based interactive learning.

  3. Memetic Engineering as a Basis for Learning in Robotic Communities

    Science.gov (United States)

    Truszkowski, Walter F.; Rouff, Christopher; Akhavannik, Mohammad H.

    2014-01-01

    This paper represents a new contribution to the growing literature on memes. While most memetic thought has been focused on its implications on humans, this paper speculates on the role that memetics can have on robotic communities. Though speculative, the concepts are based on proven advanced multi agent technology work done at NASA - Goddard Space Flight Center and Lockheed Martin. The paper is composed of the following sections : 1) An introductory section which gently leads the reader into the realm of memes. 2) A section on memetic engineering which addresses some of the central issues with robotic learning via memes. 3) A section on related work which very concisely identifies three other areas of memetic applications, i.e., news, psychology, and the study of human behaviors. 4) A section which discusses the proposed approach for realizing memetic behaviors in robots and robotic communities. 5) A section which presents an exploration scenario for a community of robots working on Mars. 6) A final section which discusses future research which will be required to realize a comprehensive science of robotic memetics.

  4. Tema 2: The NAO robot as a Persuasive Educational and Entertainment Robot (PEER – a case study on children’s articulation, categorization and interaction with a social robot for learning

    Directory of Open Access Journals (Sweden)

    Lykke Brogaard Bertel

    2016-01-01

    Full Text Available The application of social robots as motivational tools and companions in education is increasingly being explored from a theoretical and practical point of view. In this paper, we examine the social robot NAO as a Persuasive Educational and Entertainment Robot (PEER and present findings from a case study on the use of NAO to support learning environments in Danish primary schools. In the case study we focus on the children’s practice of articulation and embodied interaction with NAO and investigate the role of NAO as a ‘tool’, ‘social actor’ or ‘simulating medium’ in the learning designs. We examine whether this categorization is static or dynamic, i. e. develops and changes over the course of the interaction and explore how this relates to and affects the student’s motivation to engage in the NAO-supported learning activities.

  5. Tema 2: The NAO robot as a Persuasive Educational and Entertainment Robot (PEER – a case study on children’s articulation, categorization and interaction with a social robot for learning

    Directory of Open Access Journals (Sweden)

    Lykke Brogaard Bertel

    2015-12-01

    Full Text Available The application of social robots as motivational tools and companions in education is increasingly being explored from a theoretical and practical point of view. In this paper, we examine the social robot NAO as a Persuasive Educational and Entertainment Robot (PEER and present findings from a case study on the use of NAO to support learning environments in Danish primary schools. In the case study we focus on the children’s practice of articulation and embodied interaction with NAO and investigate the role of NAO as a ‘tool’, ‘social actor’ or ‘simulating medium’ in the learning designs. We examine whether this categorization is static or dynamic, i. e. develops and changes over the course of the interaction and explore how this relates to and affects the student’s motivation to engage in the NAO-supported learning activities.

  6. A biologically inspired meta-control navigation system for the Psikharpax rat robot

    International Nuclear Information System (INIS)

    Caluwaerts, K; Staffa, M; N’Guyen, S; Grand, C; Dollé, L; Favre-Félix, A; Girard, B; Khamassi, M

    2012-01-01

    A biologically inspired navigation system for the mobile rat-like robot named Psikharpax is presented, allowing for self-localization and autonomous navigation in an initially unknown environment. The ability of parts of the model (e.g. the strategy selection mechanism) to reproduce rat behavioral data in various maze tasks has been validated before in simulations. But the capacity of the model to work on a real robot platform had not been tested. This paper presents our work on the implementation on the Psikharpax robot of two independent navigation strategies (a place-based planning strategy and a cue-guided taxon strategy) and a strategy selection meta-controller. We show how our robot can memorize which was the optimal strategy in each situation, by means of a reinforcement learning algorithm. Moreover, a context detector enables the controller to quickly adapt to changes in the environment—recognized as new contexts—and to restore previously acquired strategy preferences when a previously experienced context is recognized. This produces adaptivity closer to rat behavioral performance and constitutes a computational proposition of the role of the rat prefrontal cortex in strategy shifting. Moreover, such a brain-inspired meta-controller may provide an advancement for learning architectures in robotics. (paper)

  7. Challenges in the Verification of Reinforcement Learning Algorithms

    Science.gov (United States)

    Van Wesel, Perry; Goodloe, Alwyn E.

    2017-01-01

    Machine learning (ML) is increasingly being applied to a wide array of domains from search engines to autonomous vehicles. These algorithms, however, are notoriously complex and hard to verify. This work looks at the assumptions underlying machine learning algorithms as well as some of the challenges in trying to verify ML algorithms. Furthermore, we focus on the specific challenges of verifying reinforcement learning algorithms. These are highlighted using a specific example. Ultimately, we do not offer a solution to the complex problem of ML verification, but point out possible approaches for verification and interesting research opportunities.

  8. Inquiry learning with a social robot: can you explain that to me?

    NARCIS (Netherlands)

    Wijnen, Frances Martine; Charisi, Vasiliki; Davison, Daniel Patrick; van der Meij, Jan; Reidsma, Dennis; Evers, Vanessa; Heerink, M.; de Jong, M.

    2015-01-01

    This paper presents preliminary results of a study which assesses the impact a social robot might have on the verbalization of a child’s internal reasoning and knowledge while working on a learning task. In a comparative experiment we offered children the context of either a social robot or an

  9. Social Learning, Reinforcement and Crime: Evidence from Three European Cities

    Science.gov (United States)

    Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina

    2012-01-01

    This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…

  10. Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning.

    Science.gov (United States)

    Mkrtchian, Anahit; Aylward, Jessica; Dayan, Peter; Roiser, Jonathan P; Robinson, Oliver J

    2017-10-01

    Serious and debilitating symptoms of anxiety are the most common mental health problem worldwide, accounting for around 5% of all adult years lived with disability in the developed world. Avoidance behavior-avoiding social situations for fear of embarrassment, for instance-is a core feature of such anxiety. However, as for many other psychiatric symptoms the biological mechanisms underlying avoidance remain unclear. Reinforcement learning models provide formal and testable characterizations of the mechanisms of decision making; here, we examine avoidance in these terms. A total of 101 healthy participants and individuals with mood and anxiety disorders completed an approach-avoidance go/no-go task under stress induced by threat of unpredictable shock. We show an increased reliance in the mood and anxiety group on a parameter of our reinforcement learning model that characterizes a prepotent (pavlovian) bias to withhold responding in the face of negative outcomes. This was particularly the case when the mood and anxiety group was under stress. This formal description of avoidance within the reinforcement learning framework provides a new means of linking clinical symptoms with biophysically plausible models of neural circuitry and, as such, takes us closer to a mechanistic understanding of mood and anxiety disorders. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  11. Reinforcement Learning for Online Control of Evolutionary Algorithms

    NARCIS (Netherlands)

    Eiben, A.; Horvath, Mark; Kowalczyk, Wojtek; Schut, Martijn

    2007-01-01

    The research reported in this paper is concerned with assessing the usefulness of reinforcment learning (RL) for on-line calibration of parameters in evolutionary algorithms (EA). We are running an RL procedure and the EA simultaneously and the RL is changing the EA parameters on-the-fly. We

  12. Imitation learning of Non-Linear Point-to-Point Robot Motions using Dirichlet Processes

    DEFF Research Database (Denmark)

    Krüger, Volker; Tikhanoff, Vadim; Natale, Lorenzo

    2012-01-01

    In this paper we discuss the use of the infinite Gaussian mixture model and Dirichlet processes for learning robot movements from demonstrations. Starting point of this work is an earlier paper where the authors learn a non-linear dynamic robot movement model from a small number of observations....... The model in that work is learned using a classical finite Gaussian mixture model (FGMM) where the Gaussian mixtures are appropriately constrained. The problem with this approach is that one needs to make a good guess for how many mixtures the FGMM should use. In this work, we generalize this approach...... our algorithm on the same data that was used in [5], where the authors use motion capture devices to record the demonstrations. As further validation we test our approach on novel data acquired on our iCub in a different demonstration scenario in which the robot is physically driven by the human...

  13. E-Learning System for Learning Virtual Circuit Making with a Microcontroller and Programming to Control a Robot

    Science.gov (United States)

    Takemura, Atsushi

    2015-01-01

    This paper proposes a novel e-Learning system for learning electronic circuit making and programming a microcontroller to control a robot. The proposed e-Learning system comprises a virtual-circuit-making function for the construction of circuits with a versatile, Arduino microcontroller and an educational system that can simulate behaviors of…

  14. Video Demo: Deep Reinforcement Learning for Coordination in Traffic Light Control

    NARCIS (Netherlands)

    van der Pol, E.; Oliehoek, F.A.; Bosse, T.; Bredeweg, B.

    2016-01-01

    This video demonstration contrasts two approaches to coordination in traffic light control using reinforcement learning: earlier work, based on a deconstruction of the state space into a linear combination of vehicle states, and our own approach based on the Deep Q-learning algorithm.

  15. Simulation-based optimization parametric optimization techniques and reinforcement learning

    CERN Document Server

    Gosavi, Abhijit

    2003-01-01

    Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning introduces the evolving area of simulation-based optimization. The book's objective is two-fold: (1) It examines the mathematical governing principles of simulation-based optimization, thereby providing the reader with the ability to model relevant real-life problems using these techniques. (2) It outlines the computational technology underlying these methods. Taken together these two aspects demonstrate that the mathematical and computational methods discussed in this book do work. Broadly speaking, the book has two parts: (1) parametric (static) optimization and (2) control (dynamic) optimization. Some of the book's special features are: *An accessible introduction to reinforcement learning and parametric-optimization techniques. *A step-by-step description of several algorithms of simulation-based optimization. *A clear and simple introduction to the methodology of neural networks. *A gentle introduction to converg...

  16. Cultural Robotics: The Culture of Robotics and Robotics in Culture

    OpenAIRE

    Hooman Samani; Elham Saadatian; Natalie Pang; Doros Polydorou; Owen Noel Newton Fernando; Ryohei Nakatsu; Jeffrey Tzu Kwan Valino Koh

    2013-01-01

    In this paper, we have investigated the concept of “Cultural Robotics” with regard to the evolution of social into cultural robots in the 21st Century. By defining the concept of culture, the potential development of a culture between humans and robots is explored. Based on the cultural values of the robotics developers, and the learning ability of current robots, cultural attributes in this regard are in the process of being formed, which would define the new concept of cultural robotics. Ac...

  17. Perception-based Co-evolutionary Reinforcement Learning for UAV Sensor Allocation

    National Research Council Canada - National Science Library

    Berenji, Hamid

    2003-01-01

    .... A Perception-based reasoning approach based on co-evolutionary reinforcement learning was developed for jointly addressing sensor allocation on each individual UAV and allocation of a team of UAVs...

  18. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention-A Neuroeducation Study.

    Science.gov (United States)

    Anderson, Sarah J; Hecker, Kent G; Krigolson, Olave E; Jamniczky, Heather A

    2018-01-01

    In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise.

  19. Designing a Robot Teaching Assistant for Enhancing and Sustaining Learning Motivation

    Science.gov (United States)

    Hung, I-Chun; Chao, Kuo-Jen; Lee, Ling; Chen, Nian-Shing

    2013-01-01

    Although many researchers have pointed out that educational robots can stimulate learners' learning motivation, the learning motivation will be hardly sustained and rapidly decreased over time if the amusement and interaction merely come from the new technology itself without incorporating instructional strategies. Many researchers have…

  20. Optimal Search Strategy of Robotic Assembly Based on Neural Vibration Learning

    Directory of Open Access Journals (Sweden)

    Lejla Banjanovic-Mehmedovic

    2011-01-01

    Full Text Available This paper presents implementation of optimal search strategy (OSS in verification of assembly process based on neural vibration learning. The application problem is the complex robot assembly of miniature parts in the example of mating the gears of one multistage planetary speed reducer. Assembly of tube over the planetary gears was noticed as the most difficult problem of overall assembly. The favourable influence of vibration and rotation movement on compensation of tolerance was also observed. With the proposed neural-network-based learning algorithm, it is possible to find extended scope of vibration state parameter. Using optimal search strategy based on minimal distance path between vibration parameter stage sets (amplitude and frequencies of robots gripe vibration and recovery parameter algorithm, we can improve the robot assembly behaviour, that is, allow the fastest possible way of mating. We have verified by using simulation programs that search strategy is suitable for the situation of unexpected events due to uncertainties.

  1. Combining Correlation-Based and Reward-Based Learning in Neural Control for Policy Improvement

    DEFF Research Database (Denmark)

    Manoonpong, Poramate; Kolodziejski, Christoph; Wörgötter, Florentin

    2013-01-01

    Classical conditioning (conventionally modeled as correlation-based learning) and operant conditioning (conventionally modeled as reinforcement learning or reward-based learning) have been found in biological systems. Evidence shows that these two mechanisms strongly involve learning about...... associations. Based on these biological findings, we propose a new learning model to achieve successful control policies for artificial systems. This model combines correlation-based learning using input correlation learning (ICO learning) and reward-based learning using continuous actor–critic reinforcement...... learning (RL), thereby working as a dual learner system. The model performance is evaluated by simulations of a cart-pole system as a dynamic motion control problem and a mobile robot system as a goal-directed behavior control problem. Results show that the model can strongly improve pole balancing control...

  2. Reinforcement learning on slow features of high-dimensional input streams.

    Directory of Open Access Journals (Sweden)

    Robert Legenstein

    Full Text Available Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning.

  3. Online constrained model-based reinforcement learning

    CSIR Research Space (South Africa)

    Van Niekerk, B

    2017-08-01

    Full Text Available Constrained Model-based Reinforcement Learning Benjamin van Niekerk School of Computer Science University of the Witwatersrand South Africa Andreas Damianou∗ Amazon.com Cambridge, UK Benjamin Rosman Council for Scientific and Industrial Research, and School... MULTIPLE SHOOTING Using direct multiple shooting (Bock and Plitt, 1984), problem (1) can be transformed into a structured non- linear program (NLP). First, the time horizon [t0, t0 + T ] is partitioned into N equal subintervals [tk, tk+1] for k = 0...

  4. Reinforcement learning account of network reciprocity.

    Science.gov (United States)

    Ezaki, Takahiro; Masuda, Naoki

    2017-01-01

    Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model) approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results.

  5. Learning-based position control of a closed-kinematic chain robot end-effector

    Science.gov (United States)

    Nguyen, Charles C.; Zhou, Zhen-Lei

    1990-01-01

    A trajectory control scheme whose design is based on learning theory, for a six-degree-of-freedom (DOF) robot end-effector built to study robotic assembly of NASA hardwares in space is presented. The control scheme consists of two control systems: the feedback control system and the learning control system. The feedback control system is designed using the concept of linearization about a selected operating point, and the method of pole placement so that the closed-loop linearized system is stabilized. The learning control scheme consisting of PD-type learning controllers, provides additional inputs to improve the end-effector performance after each trial. Experimental studies performed on a 2 DOF end-effector built at CUA, for three tracking cases show that actual trajectories approach desired trajectories as the number of trials increases. The tracking errors are substantially reduced after only five trials.

  6. Distributed Economic Dispatch in Microgrids Based on Cooperative Reinforcement Learning.

    Science.gov (United States)

    Liu, Weirong; Zhuang, Peng; Liang, Hao; Peng, Jun; Huang, Zhiwu; Weirong Liu; Peng Zhuang; Hao Liang; Jun Peng; Zhiwu Huang; Liu, Weirong; Liang, Hao; Peng, Jun; Zhuang, Peng; Huang, Zhiwu

    2018-06-01

    Microgrids incorporated with distributed generation (DG) units and energy storage (ES) devices are expected to play more and more important roles in the future power systems. Yet, achieving efficient distributed economic dispatch in microgrids is a challenging issue due to the randomness and nonlinear characteristics of DG units and loads. This paper proposes a cooperative reinforcement learning algorithm for distributed economic dispatch in microgrids. Utilizing the learning algorithm can avoid the difficulty of stochastic modeling and high computational complexity. In the cooperative reinforcement learning algorithm, the function approximation is leveraged to deal with the large and continuous state spaces. And a diffusion strategy is incorporated to coordinate the actions of DG units and ES devices. Based on the proposed algorithm, each node in microgrids only needs to communicate with its local neighbors, without relying on any centralized controllers. Algorithm convergence is analyzed, and simulations based on real-world meteorological and load data are conducted to validate the performance of the proposed algorithm.

  7. A Neuro-Control Design Based on Fuzzy Reinforcement Learning

    DEFF Research Database (Denmark)

    Katebi, S.D.; Blanke, M.

    This paper describes a neuro-control fuzzy critic design procedure based on reinforcement learning. An important component of the proposed intelligent control configuration is the fuzzy credit assignment unit which acts as a critic, and through fuzzy implications provides adjustment mechanisms....... The fuzzy credit assignment unit comprises a fuzzy system with the appropriate fuzzification, knowledge base and defuzzification components. When an external reinforcement signal (a failure signal) is received, sequences of control actions are evaluated and modified by the action applier unit. The desirable...... ones instruct the neuro-control unit to adjust its weights and are simultaneously stored in the memory unit during the training phase. In response to the internal reinforcement signal (set point threshold deviation), the stored information is retrieved by the action applier unit and utilized for re...

  8. A robot-automated work site for repair of the Chinon A3 reactor

    International Nuclear Information System (INIS)

    Raynal, A.

    1987-01-01

    In 1982, following degradation due to corrosion of low-carbon steel by carbon dioxide gas, the utility undertook to repair some of the support structures at Chinon A3. This involved consolidation and reinforcing thermocouples and gas monitor pipeworks supports. A welding process was selected and the use of robots became indispensable because of the large number of components to be replaced (200 per outage). Two robots, supplied with tool heads and replacement components from outside the reactor were used. The robots and their servers were coordinated by a central computer and monitored by a closed circuit television system. Each repair operation was performed after ''training'' on a full-scale mockup of the top of the reactor reconstructed from telemetry of the real reactor dimensions. Since becoming operational in June 1986, the robots have accumulated over 20 000 hours of operation and seventy parts have been welded to the reactor. A 3D CAD system has been adapted to simulate the robots and analyse long trajectories in order to reduce robot learning time [fr

  9. Learning robotics using Python

    CERN Document Server

    Joseph, Lentin

    2015-01-01

    If you are an engineer, a researcher, or a hobbyist, and you are interested in robotics and want to build your own robot, this book is for you. Readers are assumed to be new to robotics but should have experience with Python.

  10. Robot Grasp Learning by Demonstration without Predefined Rules

    Directory of Open Access Journals (Sweden)

    César Fernández

    2011-12-01

    Full Text Available A learning-based approach to autonomous robot grasping is presented. Pattern recognition techniques are used to measure the similarity between a set of previously stored example grasps and all the possible candidate grasps for a new object. Two sets of features are defined in order to characterize grasps: point attributes describe the surroundings of a contact point; point-set attributes describe the relationship between the set of n contact points (assuming an n-fingered robot gripper is used. In the experiments performed, the nearest neighbour classifier outperforms other approaches like multilayer perceptrons, radial basis functions or decision trees, in terms of classification accuracy, while computational load is not excessive for a real time application (a grasp is fully synthesized in 0.2 seconds. The results obtained on a synthetic database show that the proposed system is able to imitate the grasping behaviour of the user (e.g. the system learns to grasp a mug by its handle. All the code has been made available for testing purposes.

  11. Representation Learning of Logic Words by an RNN: From Word Sequences to Robot Actions

    Directory of Open Access Journals (Sweden)

    Tatsuro Yamada

    2017-12-01

    Full Text Available An important characteristic of human language is compositionality. We can efficiently express a wide variety of real-world situations, events, and behaviors by compositionally constructing the meaning of a complex expression from a finite number of elements. Previous studies have analyzed how machine-learning models, particularly neural networks, can learn from experience to represent compositional relationships between language and robot actions with the aim of understanding the symbol grounding structure and achieving intelligent communicative agents. Such studies have mainly dealt with the words (nouns, adjectives, and verbs that directly refer to real-world matters. In addition to these words, the current study deals with logic words, such as “not,” “and,” and “or” simultaneously. These words are not directly referring to the real world, but are logical operators that contribute to the construction of meaning in sentences. In human–robot communication, these words may be used often. The current study builds a recurrent neural network model with long short-term memory units and trains it to learn to translate sentences including logic words into robot actions. We investigate what kind of compositional representations, which mediate sentences and robot actions, emerge as the network's internal states via the learning process. Analysis after learning shows that referential words are merged with visual information and the robot's own current state, and the logical words are represented by the model in accordance with their functions as logical operators. Words such as “true,” “false,” and “not” work as non-linear transformations to encode orthogonal phrases into the same area in a memory cell state space. The word “and,” which required a robot to lift up both its hands, worked as if it was a universal quantifier. The word “or,” which required action generation that looked apparently random, was represented as an

  12. Skill Learning for Intelligent Robot by Perception-Action Integration: A View from Hierarchical Temporal Memory

    Directory of Open Access Journals (Sweden)

    Xinzheng Zhang

    2017-01-01

    Full Text Available Skill learning autonomously through interactions with the environment is a crucial ability for intelligent robot. A perception-action integration or sensorimotor cycle, as an important issue in imitation learning, is a natural mechanism without the complex program process. Recently, neurocomputing model and developmental intelligence method are considered as a new trend for implementing the robot skill learning. In this paper, based on research of the human brain neocortex model, we present a skill learning method by perception-action integration strategy from the perspective of hierarchical temporal memory (HTM theory. The sequential sensor data representing a certain skill from a RGB-D camera are received and then encoded as a sequence of Sparse Distributed Representation (SDR vectors. The sequential SDR vectors are treated as the inputs of the perception-action HTM. The HTM learns sequences of SDRs and makes predictions of what the next input SDR will be. It stores the transitions of the current perceived sensor data and next predicted actions. We evaluated the performance of this proposed framework for learning the shaking hands skill on a humanoid NAO robot. The experimental results manifest that the skill learning method designed in this paper is promising.

  13. Reinforcement learning account of network reciprocity.

    Directory of Open Access Journals (Sweden)

    Takahiro Ezaki

    Full Text Available Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results.

  14. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study

    Science.gov (United States)

    Anderson, Sarah J.; Hecker, Kent G.; Krigolson, Olave E.; Jamniczky, Heather A.

    2018-01-01

    In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise. PMID:29467638

  15. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study

    Directory of Open Access Journals (Sweden)

    Sarah J. Anderson

    2018-02-01

    Full Text Available In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT methods incorporate pre-class exercises (typically online meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise.

  16. Comparison of Behavior-based and Planning Techniques on the Small Robot Maze Exploration Problem

    Czech Academy of Sciences Publication Activity Database

    Slušný, Stanislav; Neruda, Roman; Vidnerová, Petra

    2010-01-01

    Roč. 23, č. 4 (2010), s. 560-567 ISSN 0893-6080. [ICANN 2008. International Conference on Artificial Neural Networks /18./. Prague, 03.09.2008-06.09.2008] R&D Projects: GA ČR GA201/08/1744 Institutional research plan: CEZ:AV0Z10300504 Keywords : evolutionary robotic s * neural networks * reinforcement learning * localization Subject RIV: IN - Informatics, Computer Science Impact factor: 1.955, year: 2010

  17. A Predictive Study of Learner Attitudes toward Open Learning in a Robotics Class

    Science.gov (United States)

    Avsec, Stanislav; Rihtarsic, David; Kocijancic, Slavko

    2014-01-01

    Open learning (OL) strives to transform teaching and learning by applying learning science and emerging technologies to increase student success, improve learning productivity, and lower barriers to access. OL of robotics has a significant growth rate in secondary and/or high schools, but failures exist. Little is known about why many users stop…

  18. Optimal Control via Reinforcement Learning with Symbolic Policy Approximation

    NARCIS (Netherlands)

    Kubalìk, Jiřì; Alibekov, Eduard; Babuska, R.; Dochain, Denis; Henrion, Didier; Peaucelle, Dimitri

    2017-01-01

    Model-based reinforcement learning (RL) algorithms can be used to derive optimal control laws for nonlinear dynamic systems. With continuous-valued state and input variables, RL algorithms have to rely on function approximators to represent the value function and policy mappings. This paper

  19. Learning Similar Actions by Reinforcement or Sensory-Prediction Errors Rely on Distinct Physiological Mechanisms.

    Science.gov (United States)

    Uehara, Shintaro; Mawase, Firas; Celnik, Pablo

    2017-09-14

    Humans can acquire knowledge of new motor behavior via different forms of learning. The two forms most commonly studied have been the development of internal models based on sensory-prediction errors (error-based learning) and success-based feedback (reinforcement learning). Human behavioral studies suggest these are distinct learning processes, though the neurophysiological mechanisms that are involved have not been characterized. Here, we evaluated physiological markers from the cerebellum and the primary motor cortex (M1) using noninvasive brain stimulations while healthy participants trained finger-reaching tasks. We manipulated the extent to which subjects rely on error-based or reinforcement by providing either vector or binary feedback about task performance. Our results demonstrated a double dissociation where learning the task mainly via error-based mechanisms leads to cerebellar plasticity modifications but not long-term potentiation (LTP)-like plasticity changes in M1; while learning a similar action via reinforcement mechanisms elicited M1 LTP-like plasticity but not cerebellar plasticity changes. Our findings indicate that learning complex motor behavior is mediated by the interplay of different forms of learning, weighing distinct neural mechanisms in M1 and the cerebellum. Our study provides insights for designing effective interventions to enhance human motor learning. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  20. Simultaneous development of laparoscopy and robotics provides acceptable perioperative outcomes and shows robotics to have a faster learning curve and to be overall faster in rectal cancer surgery: analysis of novice MIS surgeon learning curves.

    Science.gov (United States)

    Melich, George; Hong, Young Ki; Kim, Jieun; Hur, Hyuk; Baik, Seung Hyuk; Kim, Nam Kyu; Sender Liberman, A; Min, Byung Soh

    2015-03-01

    Laparoscopy offers some evidence of benefit compared to open rectal surgery. Robotic rectal surgery is evolving into an accepted approach. The objective was to analyze and compare laparoscopic and robotic rectal surgery learning curves with respect to operative times and perioperative outcomes for a novice minimally invasive colorectal surgeon. One hundred and six laparoscopic and 92 robotic LAR rectal surgery cases were analyzed. All surgeries were performed by a surgeon who was primarily trained in open rectal surgery. Patient characteristics and perioperative outcomes were analyzed. Operative time and CUSUM plots were used for evaluating the learning curve for laparoscopic versus robotic LAR. Laparoscopic versus robotic LAR outcomes feature initial group operative times of 308 (291-325) min versus 397 (373-420) min and last group times of 220 (212-229) min versus 204 (196-211) min-reversed in favor of robotics; major complications of 4.7 versus 6.5 % (NS), resection margin involvement of 2.8 versus 4.4 % (NS), conversion rate of 3.8 versus 1.1 (NS), lymph node harvest of 16.3 versus 17.2 (NS), and estimated blood loss of 231 versus 201 cc (NS). Due to faster learning curves for extracorporeal phase and total mesorectal excision phase, the robotic surgery was observed to be faster than laparoscopic surgery after the initial 41 cases. CUSUM plots demonstrate acceptable perioperative surgical outcomes from the beginning of the study. Initial robotic operative times improved with practice rapidly and eventually became faster than those for laparoscopy. Developing both laparoscopic and robotic skills simultaneously can provide acceptable perioperative outcomes in rectal surgery. It might be suggested that in the current milieu of clashing interests between evolving technology and economic constrains, there might be advantages in embracing both approaches.

  1. Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators.

    Science.gov (United States)

    Yang, Qinmin; Jagannathan, Sarangapani

    2012-04-01

    In this paper, reinforcement learning state- and output-feedback-based adaptive critic controller designs are proposed by using the online approximators (OLAs) for a general multi-input and multioutput affine unknown nonlinear discretetime systems in the presence of bounded disturbances. The proposed controller design has two entities, an action network that is designed to produce optimal signal and a critic network that evaluates the performance of the action network. The critic estimates the cost-to-go function which is tuned online using recursive equations derived from heuristic dynamic programming. Here, neural networks (NNs) are used both for the action and critic whereas any OLAs, such as radial basis functions, splines, fuzzy logic, etc., can be utilized. For the output-feedback counterpart, an additional NN is designated as the observer to estimate the unavailable system states, and thus, separation principle is not required. The NN weight tuning laws for the controller schemes are also derived while ensuring uniform ultimate boundedness of the closed-loop system using Lyapunov theory. Finally, the effectiveness of the two controllers is tested in simulation on a pendulum balancing system and a two-link robotic arm system.

  2. Optimal control in microgrid using multi-agent reinforcement learning.

    Science.gov (United States)

    Li, Fu-Dong; Wu, Min; He, Yong; Chen, Xin

    2012-11-01

    This paper presents an improved reinforcement learning method to minimize electricity costs on the premise of satisfying the power balance and generation limit of units in a microgrid with grid-connected mode. Firstly, the microgrid control requirements are analyzed and the objective function of optimal control for microgrid is proposed. Then, a state variable "Average Electricity Price Trend" which is used to express the most possible transitions of the system is developed so as to reduce the complexity and randomicity of the microgrid, and a multi-agent architecture including agents, state variables, action variables and reward function is formulated. Furthermore, dynamic hierarchical reinforcement learning, based on change rate of key state variable, is established to carry out optimal policy exploration. The analysis shows that the proposed method is beneficial to handle the problem of "curse of dimensionality" and speed up learning in the unknown large-scale world. Finally, the simulation results under JADE (Java Agent Development Framework) demonstrate the validity of the presented method in optimal control for a microgrid with grid-connected mode. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  3. From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning.

    Science.gov (United States)

    Elfwing, Stefan; Uchibe, Eiji; Doya, Kenji

    2016-12-01

    Free-energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state and action spaces. However, the FERL method does only really work well with binary, or close to binary, state input, where the number of active states is fewer than the number of non-active states. In the FERL method, the value function is approximated by the negative free energy of a restricted Boltzmann machine (RBM). In our earlier study, we demonstrated that the performance and the robustness of the FERL method can be improved by scaling the free energy by a constant that is related to the size of network. In this study, we propose that RBM function approximation can be further improved by approximating the value function by the negative expected energy (EERL), instead of the negative free energy, as well as being able to handle continuous state input. We validate our proposed method by demonstrating that EERL: (1) outperforms FERL, as well as standard neural network and linear function approximation, for three versions of a gridworld task with high-dimensional image state input; (2) achieves new state-of-the-art results in stochastic SZ-Tetris in both model-free and model-based learning settings; and (3) significantly outperforms FERL and standard neural network function approximation for a robot navigation task with raw and noisy RGB images as state input and a large number of actions. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  4. Framework for Educational Robotics: A Multiphase Approach to Enhance User Learning in a Competitive Arena

    Science.gov (United States)

    Lye, Ngit Chan; Wong, Kok Wai; Chiou, Andrew

    2013-01-01

    Educational robotics involves using robots as an educational tool to provide a long term, and progressive learning activity, to cater to different age group. The current concern is that, using robots in education should not be an instance of a one-off project for the sole purpose of participating in a competitive event. Instead, it should be a…

  5. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    Science.gov (United States)

    Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane

    2017-01-01

    Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning. PMID:29163004

  6. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Lucas Kastner

    2017-10-01

    Full Text Available Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1 Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2 Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3 Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.

  7. Representation and Integration: Combining Robot Control, High-Level Planning, and Action Learning

    DEFF Research Database (Denmark)

    Petrick, Ronald; Kraft, Dirk; Mourao, Kira

    We describe an approach to integrated robot control, high-level planning, and action effect learning that attempts to overcome the representational difficulties that exist between these diverse areas. Our approach combines ideas from robot vision, knowledgelevel planning, and connectionist machine......-level action specifications, suitable for planning, from a robot’s interactions with the world. We present a detailed overview of our approach and show how it supports the learning of certain aspects of a high-level lepresentation from low-level world state information....... learning, and focuses on the representational needs of these components.We also make use of a simple representational unit called an instantiated state transition fragment (ISTF) and a related structure called an object-action complex (OAC). The goal of this work is a general approach for inducing high...

  8. Does Prior Laparoscopic and Open Surgery Experience Have Any Impact on Learning Curve in Transition to Robotic Surgery?

    Directory of Open Access Journals (Sweden)

    Cüneyt Adayener

    2016-12-01

    Full Text Available It has been 15 years since the Food And Drug Administration approved the Da Vinci® robotic surgery system. Robotic applications are being used extensively in urology, particularly in radical prostatectomy. Like all high-tech products, this system also has a high cost and a steep learning curve, therefore, preventing it from becoming widespread. There are various studies on the effect of open surgery or laparoscopy experience on the learning curve of robotic surgery. Analyzing these interactions well will provide valuable information on making the training period of robotic system more efficient.

  9. Learning curve for robotic-assisted surgery for rectal cancer: use of the cumulative sum method.

    Science.gov (United States)

    Yamaguchi, Tomohiro; Kinugasa, Yusuke; Shiomi, Akio; Sato, Sumito; Yamakawa, Yushi; Kagawa, Hiroyasu; Tomioka, Hiroyuki; Mori, Keita

    2015-07-01

    Few data are available to assess the learning curve for robotic-assisted surgery for rectal cancer. The aim of the present study was to evaluate the learning curve for robotic-assisted surgery for rectal cancer by a surgeon at a single institute. From December 2011 to August 2013, a total of 80 consecutive patients who underwent robotic-assisted surgery for rectal cancer performed by the same surgeon were included in this study. The learning curve was analyzed using the cumulative sum method. This method was used for all 80 cases, taking into account operative time. Operative procedures included anterior resections in 6 patients, low anterior resections in 46 patients, intersphincteric resections in 22 patients, and abdominoperineal resections in 6 patients. Lateral lymph node dissection was performed in 28 patients. Median operative time was 280 min (range 135-683 min), and median blood loss was 17 mL (range 0-690 mL). No postoperative complications of Clavien-Dindo classification Grade III or IV were encountered. We arranged operative times and calculated cumulative sum values, allowing differentiation of three phases: phase I, Cases 1-25; phase II, Cases 26-50; and phase III, Cases 51-80. Our data suggested three phases of the learning curve in robotic-assisted surgery for rectal cancer. The first 25 cases formed the learning phase.

  10. Cultural Robotics: The Culture of Robotics and Robotics in Culture

    Directory of Open Access Journals (Sweden)

    Hooman Samani

    2013-12-01

    Full Text Available In this paper, we have investigated the concept of “Cultural Robotics” with regard to the evolution of social into cultural robots in the 21st Century. By defining the concept of culture, the potential development of a culture between humans and robots is explored. Based on the cultural values of the robotics developers, and the learning ability of current robots, cultural attributes in this regard are in the process of being formed, which would define the new concept of cultural robotics. According to the importance of the embodiment of robots in the sense of presence, the influence of robots in communication culture is anticipated. The sustainability of robotics culture based on diversity for cultural communities for various acceptance modalities is explored in order to anticipate the creation of different attributes of culture between robots and humans in the future.

  11. Fuzzy control in robot-soccer, evolutionary learning in the first layer of control

    Directory of Open Access Journals (Sweden)

    Peter J Thomas

    2003-02-01

    Full Text Available In this paper an evolutionary algorithm is developed to learn a fuzzy knowledge base for the control of a soccer playing micro-robot from any configuration belonging to a grid of initial configurations to hit the ball along the ball to goal line of sight. The knowledge base uses relative co-ordinate system including left and right wheel velocities of the robot. Final path positions allow forward and reverse facing robot to ball and include its physical dimensions.

  12. Ensemble Network Architecture for Deep Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Xi-liang Chen

    2018-01-01

    Full Text Available The popular deep Q learning algorithm is known to be instability because of the Q-value’s shake and overestimation action values under certain conditions. These issues tend to adversely affect their performance. In this paper, we develop the ensemble network architecture for deep reinforcement learning which is based on value function approximation. The temporal ensemble stabilizes the training process by reducing the variance of target approximation error and the ensemble of target values reduces the overestimate and makes better performance by estimating more accurate Q-value. Our results show that this architecture leads to statistically significant better value evaluation and more stable and better performance on several classical control tasks at OpenAI Gym environment.

  13. Design-Oriented Enhanced Robotics Curriculum

    Science.gov (United States)

    Yilmaz, M.; Ozcelik, S.; Yilmazer, N.; Nekovei, R.

    2013-01-01

    This paper presents an innovative two-course, laboratory-based, and design-oriented robotics educational model. The robotics curriculum exposed senior-level undergraduate students to major robotics concepts, and enhanced the student learning experience in hybrid learning environments by incorporating the IEEE Region-5 annual robotics competition…

  14. Integrating distributed Bayesian inference and reinforcement learning for sensor management

    NARCIS (Netherlands)

    Grappiolo, C.; Whiteson, S.; Pavlin, G.; Bakker, B.

    2009-01-01

    This paper introduces a sensor management approach that integrates distributed Bayesian inference (DBI) and reinforcement learning (RL). DBI is implemented using distributed perception networks (DPNs), a multiagent approach to performing efficient inference, while RL is used to automatically

  15. Learning User Preferences in Ubiquitous Systems: A User Study and a Reinforcement Learning Approach

    OpenAIRE

    Zaidenberg , Sofia; Reignier , Patrick; Mandran , Nadine

    2010-01-01

    International audience; Our study concerns a virtual assistant, proposing services to the user based on its current perceived activity and situation (ambient intelligence). Instead of asking the user to define his preferences, we acquire them automatically using a reinforcement learning approach. Experiments showed that our system succeeded the learning of user preferences. In order to validate the relevance and usability of such a system, we have first conducted a user study. 26 non-expert s...

  16. Application of GPS systems on a mobile robot

    Science.gov (United States)

    Cao, Peter; Saxena, Mayank; Tedder, Maurice; Mischalske, Steve; Hall, Ernest L.

    2001-10-01

    The purpose of this paper is to describe the use of Global Positioning Systems (GPS) as geographic information and navigational system for a ground based mobile robot. Several low cost wireless systems are now available for a variety of innovative automobile applications including location, messaging and tracking and security. Experiments were conducted with a test bed mobile robot, Bearcat II, for point-to-point motion using a Motorola GPS in June 2001. The Motorola M12 Oncore GPS system is connected to the Bearcat II main control computer through a RS232 interface. A mapping program is used to define a desired route. Then GPS information may be displayed for verification. However, the GPS information is also used to update the control points of the mobile robot using a reinforcement learning method. Local position updates are also used when found in the environment. The significance of the method is in extending the use of GPS to local vehicle control that requires more resolution that is available from the raw data using the adaptive control method.

  17. Diffusion of robotics into clinical practice in the United States: process, patient safety, learning curves, and the public health.

    Science.gov (United States)

    Mirheydar, Hossein S; Parsons, J Kellogg

    2013-06-01

    Robotic technology disseminated into urological practice without robust comparative effectiveness data. To review the diffusion of robotic surgery into urological practice. We performed a comprehensive literature review focusing on diffusion patterns, patient safety, learning curves, and comparative costs for robotic radical prostatectomy, partial nephrectomy, and radical cystectomy. Robotic urologic surgery diffused in patterns typical of novel technology spreading among practicing surgeons. Robust evidence-based data comparing outcomes of robotic to open surgery were sparse. Although initial Level 3 evidence for robotic prostatectomy observed complication outcomes similar to open prostatectomy, subsequent population-based Level 2 evidence noted an increased prevalence of adverse patient safety events and genitourinary complications among robotic patients during the early years of diffusion. Level 2 evidence indicated comparable to improved patient safety outcomes for robotic compared to open partial nephrectomy and cystectomy. Learning curve recommendations for robotic urologic surgery have drawn exclusively on Level 4 evidence and subjective, non-validated metrics. The minimum number of cases required to achieve competency for robotic prostatectomy has increased to unrealistically high levels. Most comparative cost-analyses have demonstrated that robotic surgery is significantly more expensive than open or laparoscopic surgery. Evidence-based data are limited but suggest an increased prevalence of adverse patient safety events for robotic prostatectomy early in the national diffusion period. Learning curves for robotic urologic surgery are subjective and based on non-validated metrics. The urological community should develop rigorous, evidence-based processes by which future technological innovations may diffuse in an organized and safe manner.

  18. Case Study Analyses of the Impact of Flipped Learning in Teaching Programming Robots

    Directory of Open Access Journals (Sweden)

    Majlinda Fetaji

    2016-11-01

    Full Text Available The focus of the research study was to investigate and find out the benefits of the flipped learning pedagogy on the student learning in teaching programming Robotics classes. Also, the assessment of whether it has any advantages over the traditional teaching methods in computer sciences. Assessment of learners on their attitudes, motivation, and effectiveness when using flipped classroom compared with traditional classroom has been realized. The research questions investigated are: “What kind of problems can we face when we have robotics classes in the traditional methods?”, “If we applied flipped learning method, can we solve these problems?”. In order to analyze all this, a case study experiment was realized and insights as well as recommendations are presented.

  19. Multiagent cooperation and competition with deep reinforcement learning.

    Directory of Open Access Journals (Sweden)

    Ardi Tampuu

    Full Text Available Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

  20. Multiagent cooperation and competition with deep reinforcement learning

    Science.gov (United States)

    Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

    2017-01-01

    Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments. PMID:28380078

  1. Multiagent cooperation and competition with deep reinforcement learning.

    Science.gov (United States)

    Tampuu, Ardi; Matiisen, Tambet; Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

    2017-01-01

    Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

  2. Efficient collective swimming by harnessing vortices through deep reinforcement learning.

    Science.gov (United States)

    Verma, Siddhartha; Novati, Guido; Koumoutsakos, Petros

    2018-06-05

    Fish in schooling formations navigate complex flow fields replete with mechanical energy in the vortex wakes of their companions. Their schooling behavior has been associated with evolutionary advantages including energy savings, yet the underlying physical mechanisms remain unknown. We show that fish can improve their sustained propulsive efficiency by placing themselves in appropriate locations in the wake of other swimmers and intercepting judiciously their shed vortices. This swimming strategy leads to collective energy savings and is revealed through a combination of high-fidelity flow simulations with a deep reinforcement learning (RL) algorithm. The RL algorithm relies on a policy defined by deep, recurrent neural nets, with long-short-term memory cells, that are essential for capturing the unsteadiness of the two-way interactions between the fish and the vortical flow field. Surprisingly, we find that swimming in-line with a leader is not associated with energetic benefits for the follower. Instead, "smart swimmer(s)" place themselves at off-center positions, with respect to the axis of the leader(s) and deform their body to synchronize with the momentum of the oncoming vortices, thus enhancing their swimming efficiency at no cost to the leader(s). The results confirm that fish may harvest energy deposited in vortices and support the conjecture that swimming in formation is energetically advantageous. Moreover, this study demonstrates that deep RL can produce navigation algorithms for complex unsteady and vortical flow fields, with promising implications for energy savings in autonomous robotic swarms.

  3. Optimal and Autonomous Control Using Reinforcement Learning: A Survey.

    Science.gov (United States)

    Kiumarsi, Bahare; Vamvoudakis, Kyriakos G; Modares, Hamidreza; Lewis, Frank L

    2018-06-01

    This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.

  4. Universal effect of dynamical reinforcement learning mechanism in spatial evolutionary games

    International Nuclear Information System (INIS)

    Zhang, Hai-Feng; Wu, Zhi-Xi; Wang, Bing-Hong

    2012-01-01

    One of the prototypical mechanisms in understanding the ubiquitous cooperation in social dilemma situations is the win–stay, lose–shift rule. In this work, a generalized win–stay, lose–shift learning model—a reinforcement learning model with dynamic aspiration level—is proposed to describe how humans adapt their social behaviors based on their social experiences. In the model, the players incorporate the information of the outcomes in previous rounds with time-dependent aspiration payoffs to regulate the probability of choosing cooperation. By investigating such a reinforcement learning rule in the spatial prisoner's dilemma game and public goods game, a most noteworthy viewpoint is that moderate greediness (i.e. moderate aspiration level) favors best the development and organization of collective cooperation. The generality of this observation is tested against different regulation strengths and different types of network of interaction as well. We also make comparisons with two recently proposed models to highlight the importance of the mechanism of adaptive aspiration level in supporting cooperation in structured populations

  5. Alignment Condition-Based Robust Adaptive Iterative Learning Control of Uncertain Robot System

    Directory of Open Access Journals (Sweden)

    Guofeng Tong

    2014-04-01

    Full Text Available This paper proposes an adaptive iterative learning control strategy integrated with saturation-based robust control for uncertain robot system in presence of modelling uncertainties, unknown parameter, and external disturbance under alignment condition. An important merit is that it achieves adaptive switching of gain matrix both in conventional PD-type feedforward control and robust adaptive control in the iteration domain simultaneously. The analysis of convergence of proposed control law is based on Lyapunov's direct method under alignment initial condition. Simulation results demonstrate the faster learning rate and better robust performance with proposed algorithm by comparing with other existing robust controllers. The actual experiment on three-DOF robot manipulator shows its better practical effectiveness.

  6. A Predictive Study of Learner Attitudes Toward Open Learning in a Robotics Class

    Science.gov (United States)

    Avsec, Stanislav; Rihtarsic, David; Kocijancic, Slavko

    2014-10-01

    Open learning (OL) strives to transform teaching and learning by applying learning science and emerging technologies to increase student success, improve learning productivity, and lower barriers to access. OL of robotics has a significant growth rate in secondary and/or high schools, but failures exist. Little is known about why many users stop their OL after their initial experience. Previous research done under different task environments has suggested a variety of factors affecting user satisfaction with different types of OL. In this study, we tested a regression model for student satisfaction involving students' attitudes toward OL usage. A survey was conducted to investigate the critical factors affecting students' achievements and satisfaction in OL of robotics with use of own developed direct manipulation learning environment as learning context. A multiple regression analyses were carried out to investigate how different facets of students' expectations and experiences are related to perceived learning achievements and course satisfaction. Descriptive statistics and analysis of variance was performed to determine the effect of predictor variables to student satisfaction. The results demonstrate that students have significantly positive perceptions toward using OL of robotics as a learning-assisted tool. Furthermore, behavioral intention to use OL is influenced by perceived usefulness and self-efficacy. The following five major categories of satisfaction factors with OL course were revealed during analysis of the studies (effect sizes in parentheses): organization (0.69); implementation (0.61); professional content (0.53); interaction (0.43); self-efficacy (0.14). All these effect sizes were judged to be significant and large. The results also showed that learner-mentor/instructor interaction, learner-professional content interaction, and online and offline self-efficacy were good predictors of student satisfaction and course quality. Peer interactions and

  7. Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain.

    Science.gov (United States)

    Niv, Yael; Edlund, Jeffrey A; Dayan, Peter; O'Doherty, John P

    2012-01-11

    Humans and animals are exquisitely, though idiosyncratically, sensitive to risk or variance in the outcomes of their actions. Economic, psychological, and neural aspects of this are well studied when information about risk is provided explicitly. However, we must normally learn about outcomes from experience, through trial and error. Traditional models of such reinforcement learning focus on learning about the mean reward value of cues and ignore higher order moments such as variance. We used fMRI to test whether the neural correlates of human reinforcement learning are sensitive to experienced risk. Our analysis focused on anatomically delineated regions of a priori interest in the nucleus accumbens, where blood oxygenation level-dependent (BOLD) signals have been suggested as correlating with quantities derived from reinforcement learning. We first provide unbiased evidence that the raw BOLD signal in these regions corresponds closely to a reward prediction error. We then derive from this signal the learned values of cues that predict rewards of equal mean but different variance and show that these values are indeed modulated by experienced risk. Moreover, a close neurometric-psychometric coupling exists between the fluctuations of the experience-based evaluations of risky options that we measured neurally and the fluctuations in behavioral risk aversion. This suggests that risk sensitivity is integral to human learning, illuminating economic models of choice, neuroscientific models of affective learning, and the workings of the underlying neural mechanisms.

  8. Adaptive Landmark-Based Navigation System Using Learning Techniques

    DEFF Research Database (Denmark)

    Zeidan, Bassel; Dasgupta, Sakyasingha; Wörgötter, Florentin

    2014-01-01

    The goal-directed navigational ability of animals is an essential prerequisite for them to survive. They can learn to navigate to a distal goal in a complex environment. During this long-distance navigation, they exploit environmental features, like landmarks, to guide them towards their goal. In...... hexapod robots. As a result, it allows the robots to successfully learn to navigate to distal goals in complex environments.......The goal-directed navigational ability of animals is an essential prerequisite for them to survive. They can learn to navigate to a distal goal in a complex environment. During this long-distance navigation, they exploit environmental features, like landmarks, to guide them towards their goal....... Inspired by this, we develop an adaptive landmark-based navigation system based on sequential reinforcement learning. In addition, correlation-based learning is also integrated into the system to improve learning performance. The proposed system has been applied to simulated simple wheeled and more complex...

  9. Research on Open-Closed-Loop Iterative Learning Control with Variable Forgetting Factor of Mobile Robots

    Directory of Open Access Journals (Sweden)

    Hongbin Wang

    2016-01-01

    Full Text Available We propose an iterative learning control algorithm (ILC that is developed using a variable forgetting factor to control a mobile robot. The proposed algorithm can be categorized as an open-closed-loop iterative learning control, which produces control instructions by using both previous and current data. However, introducing a variable forgetting factor can weaken the former control output and its variance in the control law while strengthening the robustness of the iterative learning control. If it is applied to the mobile robot, this will reduce position errors in robot trajectory tracking control effectively. In this work, we show that the proposed algorithm guarantees tracking error bound convergence to a small neighborhood of the origin under the condition of state disturbances, output measurement noises, and fluctuation of system dynamics. By using simulation, we demonstrate that the controller is effective in realizing the prefect tracking.

  10. Robot initiative in a team learning task increases the rhythm of interaction but not the perceived engagement

    Science.gov (United States)

    Ivaldi, Serena; Anzalone, Salvatore M.; Rousseau, Woody; Sigaud, Olivier; Chetouani, Mohamed

    2014-01-01

    We hypothesize that the initiative of a robot during a collaborative task with a human can influence the pace of interaction, the human response to attention cues, and the perceived engagement. We propose an object learning experiment where the human interacts in a natural way with the humanoid iCub. Through a two-phases scenario, the human teaches the robot about the properties of some objects. We compare the effect of the initiator of the task in the teaching phase (human or robot) on the rhythm of the interaction in the verification phase. We measure the reaction time of the human gaze when responding to attention utterances of the robot. Our experiments show that when the robot is the initiator of the learning task, the pace of interaction is higher and the reaction to attention cues faster. Subjective evaluations suggest that the initiating role of the robot, however, does not affect the perceived engagement. Moreover, subjective and third-person evaluations of the interaction task suggest that the attentive mechanism we implemented in the humanoid robot iCub is able to arouse engagement and make the robot's behavior readable. PMID:24596554

  11. Action understanding and imitation learning in a robot-human task

    NARCIS (Netherlands)

    Erlhagen, W.; Mukovskiy, A.; Bicho, E.; Panin, G.; Kiss, C.; Knoll, A.; Schie, H.T. van; Bekkering, H.

    2005-01-01

    We report results of an interdisciplinary project which aims at endowing a real robot system with the capacity for learning by goal-directed imitation. The control architecture is biologically inspired as it reflects recent experimental findings in action observation/execution studies. We test its

  12. Understanding Human Hand Gestures for Learning Robot Pick-and-Place Tasks

    Directory of Open Access Journals (Sweden)

    Hsien-I Lin

    2015-05-01

    Full Text Available Programming robots by human demonstration is an intuitive approach, especially by gestures. Because robot pick-and-place tasks are widely used in industrial factories, this paper proposes a framework to learn robot pick-and-place tasks by understanding human hand gestures. The proposed framework is composed of the module of gesture recognition and the module of robot behaviour control. For the module of gesture recognition, transport empty (TE, transport loaded (TL, grasp (G, and release (RL from Gilbreth's therbligs are the hand gestures to be recognized. A convolution neural network (CNN is adopted to recognize these gestures from a camera image. To achieve the robust performance, the skin model by a Gaussian mixture model (GMM is used to filter out non-skin colours of an image, and the calibration of position and orientation is applied to obtain the neutral hand pose before the training and testing of the CNN. For the module of robot behaviour control, the corresponding robot motion primitives to TE, TL, G, and RL, respectively, are implemented in the robot. To manage the primitives in the robot system, a behaviour-based programming platform based on the Extensible Agent Behavior Specification Language (XABSL is adopted. Because the XABSL provides the flexibility and re-usability of the robot primitives, the hand motion sequence from the module of gesture recognition can be easily used in the XABSL programming platform to implement the robot pick-and-place tasks. The experimental evaluation of seven subjects performing seven hand gestures showed that the average recognition rate was 95.96%. Moreover, by the XABSL programming platform, the experiment showed the cube-stacking task was easily programmed by human demonstration.

  13. Interaction learning for dynamic movement primitives used in cooperative robotic tasks

    DEFF Research Database (Denmark)

    Kulvicius, Tomas; Biehl, Martin; Aein, Mohamad Javad

    2013-01-01

    Abstract Since several years dynamic movement primitives (DMPs) are more and more getting into the center of interest for flexible movement control in robotics. In this study we introduce sensory feedback together with a predictive learning mechanism which allows tightly coupled dual-agent systems...... to learn an adaptive, sensor-driven interaction based on DMPs. The coupled conventional (no-sensors, no learning) DMP-system automatically equilibrates and can still be solved analytically allowing us to derive conditions for stability. When adding adaptive sensor control we can show that both agents learn...

  14. Pragmatic Frames for Teaching and Learning in Human-Robot Interaction: Review and Challenges.

    Science.gov (United States)

    Vollmer, Anna-Lisa; Wrede, Britta; Rohlfing, Katharina J; Oudeyer, Pierre-Yves

    2016-01-01

    One of the big challenges in robotics today is to learn from human users that are inexperienced in interacting with robots but yet are often used to teach skills flexibly to other humans and to children in particular. A potential route toward natural and efficient learning and teaching in Human-Robot Interaction (HRI) is to leverage the social competences of humans and the underlying interactional mechanisms. In this perspective, this article discusses the importance of pragmatic frames as flexible interaction protocols that provide important contextual cues to enable learners to infer new action or language skills and teachers to convey these cues. After defining and discussing the concept of pragmatic frames, grounded in decades of research in developmental psychology, we study a selection of HRI work in the literature which has focused on learning-teaching interaction and analyze the interactional and learning mechanisms that were used in the light of pragmatic frames. This allows us to show that many of the works have already used in practice, but not always explicitly, basic elements of the pragmatic frames machinery. However, we also show that pragmatic frames have so far been used in a very restricted way as compared to how they are used in human-human interaction and argue that this has been an obstacle preventing robust natural multi-task learning and teaching in HRI. In particular, we explain that two central features of human pragmatic frames, mostly absent of existing HRI studies, are that (1) social peers use rich repertoires of frames, potentially combined together, to convey and infer multiple kinds of cues; (2) new frames can be learnt continually, building on existing ones, and guiding the interaction toward higher levels of complexity and expressivity. To conclude, we give an outlook on the future research direction describing the relevant key challenges that need to be solved for leveraging pragmatic frames for robot learning and teaching.

  15. Autonomous military robotics

    CERN Document Server

    Nath, Vishnu

    2014-01-01

    This SpringerBrief reveals the latest techniques in computer vision and machine learning on robots that are designed as accurate and efficient military snipers. Militaries around the world are investigating this technology to simplify the time, cost and safety measures necessary for training human snipers. These robots are developed by combining crucial aspects of computer science research areas including image processing, robotic kinematics and learning algorithms. The authors explain how a new humanoid robot, the iCub, uses high-speed cameras and computer vision algorithms to track the objec

  16. Multiagent Reinforcement Learning Dynamic Spectrum Access in Cognitive Radios

    Directory of Open Access Journals (Sweden)

    Wu Chun

    2014-02-01

    Full Text Available A multiuser independent Q-learning method which does not need information interaction is proposed for multiuser dynamic spectrum accessing in cognitive radios. The method adopts self-learning paradigm, in which each CR user performs reinforcement learning only through observing individual performance reward without spending communication resource on information interaction with others. The reward is defined suitably to present channel quality and channel conflict status. The learning strategy of sufficient exploration, preference for good channel, and punishment for channel conflict is designed to implement multiuser dynamic spectrum accessing. In two users two channels scenario, a fast learning algorithm is proposed and the convergence to maximal whole reward is proved. The simulation results show that, with the proposed method, the CR system can obtain convergence of Nash equilibrium with large probability and achieve great performance of whole reward.

  17. DYNAMIC AND INCREMENTAL EXPLORATION STRATEGY IN FUSION ADAPTIVE RESONANCE THEORY FOR ONLINE REINFORCEMENT LEARNING

    Directory of Open Access Journals (Sweden)

    Budhitama Subagdja

    2016-06-01

    Full Text Available One of the fundamental challenges in reinforcement learning is to setup a proper balance between exploration and exploitation to obtain the maximum cummulative reward in the long run. Most protocols for exploration bound the overall values to a convergent level of performance. If new knowledge is inserted or the environment is suddenly changed, the issue becomes more intricate as the exploration must compromise the pre-existing knowledge. This paper presents a type of multi-channel adaptive resonance theory (ART neural network model called fusion ART which serves as a fuzzy approximator for reinforcement learning with inherent features that can regulate the exploration strategy. This intrinsic regulation is driven by the condition of the knowledge learnt so far by the agent. The model offers a stable but incremental reinforcement learning that can involve prior rules as bootstrap knowledge for guiding the agent to select the right action. Experiments in obstacle avoidance and navigation tasks demonstrate that in the configuration of learning wherein the agent learns from scratch, the inherent exploration model in fusion ART model is comparable to the basic E-greedy policy. On the other hand, the model is demonstrated to deal with prior knowledge and strike a balance between exploration and exploitation.

  18. A Review of the Relationship between Novelty, Intrinsic Motivation and Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Siddique Nazmul

    2017-11-01

    Full Text Available This paper presents a review on the tri-partite relationship between novelty, intrinsic motivation and reinforcement learning. The paper first presents a literature survey on novelty and the different computational models of novelty detection, with a specific focus on the features of stimuli that trigger a Hedonic value for generating a novelty signal. It then presents an overview of intrinsic motivation and investigations into different models with the aim of exploring deeper co-relationships between specific features of a novelty signal and its effect on intrinsic motivation in producing a reward function. Finally, it presents survey results on reinforcement learning, different models and their functional relationship with intrinsic motivation.

  19. Anticipatory Driving for a Robot-Car Based on Supervised Learning

    DEFF Research Database (Denmark)

    Markelic, I.; Kulvicius, Tomas; Tamosiunaite, M.

    2009-01-01

    Using look ahead information and plan making improves hu- man driving. We therefore propose that also autonomously driving systems should dispose over such abilities. We adapt a machine learning approach, where the system, a car-like robot, is trained by an experienced driver by correlating visual...

  20. Dopamine-Dependent Reinforcement of Motor Skill Learning: Evidence from Gilles de la Tourette Syndrome

    Science.gov (United States)

    Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias

    2011-01-01

    Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…

  1. Joy, Distress, Hope, and Fear in Reinforcement Learning (Extended Abstract)

    NARCIS (Netherlands)

    Jacobs, E.J.; Broekens, J.; Jonker, C.M.

    2014-01-01

    In this paper we present a mapping between joy, distress, hope and fear, and Reinforcement Learning primitives. Joy / distress is a signal that is derived from the RL update signal, while hope/fear is derived from the utility of the current state. Agent-based simulation experiments replicate

  2. Put Your Robot In, Put Your Robot Out: Sequencing through Programming Robots in Early Childhood

    Science.gov (United States)

    Kazakoff, Elizabeth R.; Bers, Marina Umaschi

    2014-01-01

    This article examines the impact of programming robots on sequencing ability in early childhood. Thirty-four children (ages 4.5-6.5 years) participated in computer programming activities with a developmentally appropriate tool, CHERP, specifically designed to program a robot's behaviors. The children learned to build and program robots over three…

  3. Applications of Deep Learning and Reinforcement Learning to Biological Data.

    Science.gov (United States)

    Mahmud, Mufti; Kaiser, Mohammed Shamim; Hussain, Amir; Vassanelli, Stefano

    2018-06-01

    Rapid advances in hardware-based technologies during the past decades have opened up new possibilities for life scientists to gather multimodal data in various application domains, such as omics, bioimaging, medical imaging, and (brain/body)-machine interfaces. These have generated novel opportunities for development of dedicated data-intensive machine learning techniques. In particular, recent research in deep learning (DL), reinforcement learning (RL), and their combination (deep RL) promise to revolutionize the future of artificial intelligence. The growth in computational power accompanied by faster and increased data storage, and declining computing costs have already allowed scientists in various fields to apply these techniques on data sets that were previously intractable owing to their size and complexity. This paper provides a comprehensive survey on the application of DL, RL, and deep RL techniques in mining biological data. In addition, we compare the performances of DL techniques when applied to different data sets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.

  4. Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults.

    Science.gov (United States)

    Vernetti, Angélina; Smith, Tim J; Senju, Atsushi

    2017-03-15

    While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.

  5. Adaptive Load Balancing of Parallel Applications with Multi-Agent Reinforcement Learning on Heterogeneous Systems

    Directory of Open Access Journals (Sweden)

    Johan Parent

    2004-01-01

    Full Text Available We report on the improvements that can be achieved by applying machine learning techniques, in particular reinforcement learning, for the dynamic load balancing of parallel applications. The applications being considered in this paper are coarse grain data intensive applications. Such applications put high pressure on the interconnect of the hardware. Synchronization and load balancing in complex, heterogeneous networks need fast, flexible, adaptive load balancing algorithms. Viewing a parallel application as a one-state coordination game in the framework of multi-agent reinforcement learning, and by using a recently introduced multi-agent exploration technique, we are able to improve upon the classic job farming approach. The improvements are achieved with limited computation and communication overhead.

  6. Learning alternative movement coordination patterns using reinforcement feedback.

    Science.gov (United States)

    Lin, Tzu-Hsiang; Denomme, Amber; Ranganathan, Rajiv

    2018-05-01

    One of the characteristic features of the human motor system is redundancy-i.e., the ability to achieve a given task outcome using multiple coordination patterns. However, once participants settle on using a specific coordination pattern, the process of learning to use a new alternative coordination pattern to perform the same task is still poorly understood. Here, using two experiments, we examined this process of how participants shift from one coordination pattern to another using different reinforcement schedules. Participants performed a virtual reaching task, where they moved a cursor to different targets positioned on the screen. Our goal was to make participants use a coordination pattern with greater trunk motion, and to this end, we provided reinforcement by making the cursor disappear if the trunk motion during the reach did not cross a specified threshold value. In Experiment 1, we compared two reinforcement schedules in two groups of participants-an abrupt group, where the threshold was introduced immediately at the beginning of practice; and a gradual group, where the threshold was introduced gradually with practice. Results showed that both abrupt and gradual groups were effective in shifting their coordination patterns to involve greater trunk motion, but the abrupt group showed greater retention when the reinforcement was removed. In Experiment 2, we examined the basis of this advantage in the abrupt group using two additional control groups. Results showed that the advantage of the abrupt group was because of a greater number of practice trials with the desired coordination pattern. Overall, these results show that reinforcement can be successfully used to shift coordination patterns, which has potential in the rehabilitation of movement disorders.

  7. Identifying Factors Reinforcing Robotization: Interactive Forces of Employment, Working Hour and Wage

    OpenAIRE

    Joonmo Cho; Jinha Kim

    2018-01-01

    Unlike previous studies on robotization approaching the future based on the cutting-edge technologies and adopting a framework where robotization is considered as an exogenous variable, this study considers that robotization occurs endogenously and uses it as a dependent variable for an objective examination of the effect of robotization on the labor market. To this end, a robotization indicator is created based on the actual number of industrial robots currently deployed in workplaces, and a...

  8. Identifying Factors Reinforcing Robotization: Interactive Forces of Employment, Working Hour and Wage

    Directory of Open Access Journals (Sweden)

    Joonmo Cho

    2018-02-01

    Full Text Available Unlike previous studies on robotization approaching the future based on the cutting-edge technologies and adopting a framework where robotization is considered as an exogenous variable, this study considers that robotization occurs endogenously and uses it as a dependent variable for an objective examination of the effect of robotization on the labor market. To this end, a robotization indicator is created based on the actual number of industrial robots currently deployed in workplaces, and a multiple regression analysis is performed using the robotization indicator and labor variables such as employment, working hours, and wage. The results using the multiple regression considering the triangular relationship of employment–working-hours–wages show that job destruction due to robotization is not too remarkable yet that use. Our results show the complementary relation between employment and robotization, but the substituting relation between working hour and robotization. The results also demonstrate the effects of union, the size of the company and the proportion of production workers and simple labor workers etc. These findings indicate that the degree of robotization may vary with many factors of the labor market. Limitations of this study and implications for future research are also discussed.

  9. A non-linear manifold alignment approach to robot learning from demonstrations

    CSIR Research Space (South Africa)

    Makondo, Ndivhuwo

    2018-04-01

    Full Text Available with potentially different, but unknown, kinematics from humans. This paper proposes a method that enables robots with unknown kinematics to learn skills from demonstrations. Our proposed method requires a motion trajectory obtained from human demonstrations via a...

  10. Nonparametric bayesian reward segmentation for skill discovery using inverse reinforcement learning

    CSIR Research Space (South Africa)

    Ranchod, P

    2015-10-01

    Full Text Available We present a method for segmenting a set of unstructured demonstration trajectories to discover reusable skills using inverse reinforcement learning (IRL). Each skill is characterised by a latent reward function which the demonstrator is assumed...

  11. Reusable Reinforcement Learning via Shallow Trails.

    Science.gov (United States)

    Yu, Yang; Chen, Shi-Yong; Da, Qing; Zhou, Zhi-Hua

    2018-06-01

    Reinforcement learning has shown great success in helping learning agents accomplish tasks autonomously from environment interactions. Meanwhile in many real-world applications, an agent needs to accomplish not only a fixed task but also a range of tasks. For this goal, an agent can learn a metapolicy over a set of training tasks that are drawn from an underlying distribution. By maximizing the total reward summed over all the training tasks, the metapolicy can then be reused in accomplishing test tasks from the same distribution. However, in practice, we face two major obstacles to train and reuse metapolicies well. First, how to identify tasks that are unrelated or even opposite with each other, in order to avoid their mutual interference in the training. Second, how to characterize task features, according to which a metapolicy can be reused. In this paper, we propose the MetA-Policy LEarning (MAPLE) approach that overcomes the two difficulties by introducing the shallow trail. It probes a task by running a roughly trained policy. Using the rewards of the shallow trail, MAPLE automatically groups similar tasks. Moreover, when the task parameters are unknown, the rewards of the shallow trail also serve as task features. Empirical studies on several controlling tasks verify that MAPLE can train metapolicies well and receives high reward on test tasks.

  12. Explicit and implicit reinforcement learning across the psychosis spectrum.

    Science.gov (United States)

    Barch, Deanna M; Carter, Cameron S; Gold, James M; Johnson, Sheri L; Kring, Ann M; MacDonald, Angus W; Pizzagalli, Diego A; Ragland, J Daniel; Silverstein, Steven M; Strauss, Milton E

    2017-07-01

    Motivational and hedonic impairments are core features of a variety of types of psychopathology. An important aspect of motivational function is reinforcement learning (RL), including implicit (i.e., outside of conscious awareness) and explicit (i.e., including explicit representations about potential reward associations) learning, as well as both positive reinforcement (learning about actions that lead to reward) and punishment (learning to avoid actions that lead to loss). Here we present data from paradigms designed to assess both positive and negative components of both implicit and explicit RL, examine performance on each of these tasks among individuals with schizophrenia, schizoaffective disorder, and bipolar disorder with psychosis, and examine their relative relationships to specific symptom domains transdiagnostically. None of the diagnostic groups differed significantly from controls on the implicit RL tasks in either bias toward a rewarded response or bias away from a punished response. However, on the explicit RL task, both the individuals with schizophrenia and schizoaffective disorder performed significantly worse than controls, but the individuals with bipolar did not. Worse performance on the explicit RL task, but not the implicit RL task, was related to worse motivation and pleasure symptoms across all diagnostic categories. Performance on explicit RL, but not implicit RL, was related to working memory, which accounted for some of the diagnostic group differences. However, working memory did not account for the relationship of explicit RL to motivation and pleasure symptoms. These findings suggest transdiagnostic relationships across the spectrum of psychotic disorders between motivation and pleasure impairments and explicit RL. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  13. Pragmatic Frames for Teaching and Learning in Human–Robot Interaction: Review and Challenges

    Science.gov (United States)

    Vollmer, Anna-Lisa; Wrede, Britta; Rohlfing, Katharina J.; Oudeyer, Pierre-Yves

    2016-01-01

    One of the big challenges in robotics today is to learn from human users that are inexperienced in interacting with robots but yet are often used to teach skills flexibly to other humans and to children in particular. A potential route toward natural and efficient learning and teaching in Human-Robot Interaction (HRI) is to leverage the social competences of humans and the underlying interactional mechanisms. In this perspective, this article discusses the importance of pragmatic frames as flexible interaction protocols that provide important contextual cues to enable learners to infer new action or language skills and teachers to convey these cues. After defining and discussing the concept of pragmatic frames, grounded in decades of research in developmental psychology, we study a selection of HRI work in the literature which has focused on learning–teaching interaction and analyze the interactional and learning mechanisms that were used in the light of pragmatic frames. This allows us to show that many of the works have already used in practice, but not always explicitly, basic elements of the pragmatic frames machinery. However, we also show that pragmatic frames have so far been used in a very restricted way as compared to how they are used in human–human interaction and argue that this has been an obstacle preventing robust natural multi-task learning and teaching in HRI. In particular, we explain that two central features of human pragmatic frames, mostly absent of existing HRI studies, are that (1) social peers use rich repertoires of frames, potentially combined together, to convey and infer multiple kinds of cues; (2) new frames can be learnt continually, building on existing ones, and guiding the interaction toward higher levels of complexity and expressivity. To conclude, we give an outlook on the future research direction describing the relevant key challenges that need to be solved for leveraging pragmatic frames for robot learning and teaching

  14. Active Learning Environments with Robotic Tangibles: Children's Physical and Virtual Spatial Programming Experiences

    Science.gov (United States)

    Burleson, Winslow S.; Harlow, Danielle B.; Nilsen, Katherine J.; Perlin, Ken; Freed, Natalie; Jensen, Camilla Nørgaard; Lahey, Byron; Lu, Patrick; Muldner, Kasia

    2018-01-01

    As computational thinking becomes increasingly important for children to learn, we must develop interfaces that leverage the ways that young children learn to provide opportunities for them to develop these skills. Active Learning Environments with Robotic Tangibles (ALERT) and Robopad, an analogous on-screen virtual spatial programming…

  15. Manufacturing Scheduling Using Colored Petri Nets and Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Maria Drakaki

    2017-02-01

    Full Text Available Agent-based intelligent manufacturing control systems are capable to efficiently respond and adapt to environmental changes. Manufacturing system adaptation and evolution can be addressed with learning mechanisms that increase the intelligence of agents. In this paper a manufacturing scheduling method is presented based on Timed Colored Petri Nets (CTPNs and reinforcement learning (RL. CTPNs model the manufacturing system and implement the scheduling. In the search for an optimal solution a scheduling agent uses RL and in particular the Q-learning algorithm. A warehouse order-picking scheduling is presented as a case study to illustrate the method. The proposed scheduling method is compared to existing methods. Simulation and state space results are used to evaluate performance and identify system properties.

  16. An Improved Reinforcement Learning System Using Affective Factors

    Directory of Open Access Journals (Sweden)

    Takashi Kuremoto

    2013-07-01

    Full Text Available As a powerful and intelligent machine learning method, reinforcement learning (RL has been widely used in many fields such as game theory, adaptive control, multi-agent system, nonlinear forecasting, and so on. The main contribution of this technique is its exploration and exploitation approaches to find the optimal solution or semi-optimal solution of goal-directed problems. However, when RL is applied to multi-agent systems (MASs, problems such as “curse of dimension”, “perceptual aliasing problem”, and uncertainty of the environment constitute high hurdles to RL. Meanwhile, although RL is inspired by behavioral psychology and reward/punishment from the environment is used, higher mental factors such as affects, emotions, and motivations are rarely adopted in the learning procedure of RL. In this paper, to challenge agents learning in MASs, we propose a computational motivation function, which adopts two principle affective factors “Arousal” and “Pleasure” of Russell’s circumplex model of affects, to improve the learning performance of a conventional RL algorithm named Q-learning (QL. Compared with the conventional QL, computer simulations of pursuit problems with static and dynamic preys were carried out, and the results showed that the proposed method results in agents having a faster and more stable learning performance.

  17. Optimizing Chemical Reactions with Deep Reinforcement Learning.

    Science.gov (United States)

    Zhou, Zhenpeng; Li, Xiaocheng; Zare, Richard N

    2017-12-27

    Deep reinforcement learning was employed to optimize chemical reactions. Our model iteratively records the results of a chemical reaction and chooses new experimental conditions to improve the reaction outcome. This model outperformed a state-of-the-art blackbox optimization algorithm by using 71% fewer steps on both simulations and real reactions. Furthermore, we introduced an efficient exploration strategy by drawing the reaction conditions from certain probability distributions, which resulted in an improvement on regret from 0.062 to 0.039 compared with a deterministic policy. Combining the efficient exploration policy with accelerated microdroplet reactions, optimal reaction conditions were determined in 30 min for the four reactions considered, and a better understanding of the factors that control microdroplet reactions was reached. Moreover, our model showed a better performance after training on reactions with similar or even dissimilar underlying mechanisms, which demonstrates its learning ability.

  18. 'Proactive' use of cue-context congruence for building reinforcement learning's reward function.

    Science.gov (United States)

    Zsuga, Judit; Biro, Klara; Tajti, Gabor; Szilasi, Magdolna Emma; Papp, Csaba; Juhasz, Bela; Gesztelyi, Rudolf

    2016-10-28

    Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent's knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability. The central objective of reinforcement learning is to solve these two functions outside the agent's control either using, or not using a model. In the present paper, using the proactive model of reinforcement learning we offer insight on how the brain creates simplified representations of the environment, and how these representations are organized to support the identification of relevant stimuli and action. Furthermore, we identify neurobiological correlates of our model by suggesting that the reward and policy functions, attributes of the Bellman equitation, are built by the orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC), respectively. Based on this we propose that the OFC assesses cue-context congruence to activate the most context frame. Furthermore given the bidirectional neuroanatomical link between the OFC and model-free structures, we suggest that model-based input is incorporated into the reward prediction error (RPE) signal, and conversely RPE signal may be used to update the reward-related information of context frames and the policy underlying action selection in the OFC and ACC, respectively. Furthermore clinical implications for cognitive behavioral interventions are discussed.

  19. Reasoning on the Self-Organizing Incremental Associative Memory for Online Robot Path Planning

    Science.gov (United States)

    Kawewong, Aram; Honda, Yutaro; Tsuboyama, Manabu; Hasegawa, Osamu

    Robot path-planning is one of the important issues in robotic navigation. This paper presents a novel robot path-planning approach based on the associative memory using Self-Organizing Incremental Neural Networks (SOINN). By the proposed method, an environment is first autonomously divided into a set of path-fragments by junctions. Each fragment is represented by a sequence of preliminarily generated common patterns (CPs). In an online manner, a robot regards the current path as the associative path-fragments, each connected by junctions. The reasoning technique is additionally proposed for decision making at each junction to speed up the exploration time. Distinct from other methods, our method does not ignore the important information about the regions between junctions (path-fragments). The resultant number of path-fragments is also less than other method. Evaluation is done via Webots physical 3D-simulated and real robot experiments, where only distance sensors are available. Results show that our method can represent the environment effectively; it enables the robot to solve the goal-oriented navigation problem in only one episode, which is actually less than that necessary for most of the Reinforcement Learning (RL) based methods. The running time is proved finite and scales well with the environment. The resultant number of path-fragments matches well to the environment.

  20. The "proactive" model of learning: Integrative framework for model-free and model-based reinforcement learning utilizing the associative learning-based proactive brain concept.

    Science.gov (United States)

    Zsuga, Judit; Biro, Klara; Papp, Csaba; Tajti, Gabor; Gesztelyi, Rudolf

    2016-02-01

    Reinforcement learning (RL) is a powerful concept underlying forms of associative learning governed by the use of a scalar reward signal, with learning taking place if expectations are violated. RL may be assessed using model-based and model-free approaches. Model-based reinforcement learning involves the amygdala, the hippocampus, and the orbitofrontal cortex (OFC). The model-free system involves the pedunculopontine-tegmental nucleus (PPTgN), the ventral tegmental area (VTA) and the ventral striatum (VS). Based on the functional connectivity of VS, model-free and model based RL systems center on the VS that by integrating model-free signals (received as reward prediction error) and model-based reward related input computes value. Using the concept of reinforcement learning agent we propose that the VS serves as the value function component of the RL agent. Regarding the model utilized for model-based computations we turned to the proactive brain concept, which offers an ubiquitous function for the default network based on its great functional overlap with contextual associative areas. Hence, by means of the default network the brain continuously organizes its environment into context frames enabling the formulation of analogy-based association that are turned into predictions of what to expect. The OFC integrates reward-related information into context frames upon computing reward expectation by compiling stimulus-reward and context-reward information offered by the amygdala and hippocampus, respectively. Furthermore we suggest that the integration of model-based expectations regarding reward into the value signal is further supported by the efferent of the OFC that reach structures canonical for model-free learning (e.g., the PPTgN, VTA, and VS). (c) 2016 APA, all rights reserved).

  1. Tank War Using Online Reinforcement Learning

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-Time Strategy(RTS) games provide a challenging platform to implement online reinforcement learning(RL) techniques in a real application. Computer as one player monitors opponents'(human or other computers) strategies and then updates its own policy using RL methods. In this paper, we propose...... a multi-layer framework for implementing the online RL in a RTS game. The framework significantly reduces the RL computational complexity by decomposing the state space in a hierarchical manner. We implement the RTS game - Tank General, and perform a thorough test on the proposed framework. The results...... show the effectiveness of our proposed framework and shed light on relevant issues on using the RL in RTS games....

  2. A New Approach of Multi-robot Cooperative Pursuit Based on Association Rule Data Mining

    Directory of Open Access Journals (Sweden)

    Jun Li

    2010-02-01

    Full Text Available An approach of cooperative hunting for multiple mobile targets by multi-robot is presented, which divides the pursuit process into forming the pursuit teams and capturing the targets. The data sets of attribute relationship is built by consulting all of factors about capturing evaders, then the interesting rules can be found by data mining from the data sets to build the pursuit teams. Through doping out the positions of targets, the pursuit game can be transformed into multi-robot path planning. Reinforcement learning is used to find the best path. The simulation results show that the mobile evaders can be captured effectively and efficiently, and prove the feasibility and validity of the given algorithm under a dynamic environment.

  3. A New Approach of Multi-Robot Cooperative Pursuit Based on Association Rule Data Mining

    Directory of Open Access Journals (Sweden)

    Jun Li

    2009-12-01

    Full Text Available An approach of cooperative hunting for multiple mobile targets by multi-robot is presented, which divides the pursuit process into forming the pursuit teams and capturing the targets. The data sets of attribute relationship is built by consulting all of factors about capturing evaders, then the interesting rules can be found by data mining from the data sets to build the pursuit teams. Through doping out the positions of targets, the pursuit game can be transformed into multi-robot path planning. Reinforcement learning is used to find the best path. The simulation results show that the mobile evaders can be captured effectively and efficiently, and prove the feasibility and validity of the given algorithm under a dynamic environment.

  4. A Model to Explain the Emergence of Reward Expectancy neurons using Reinforcement Learning and Neural Network

    OpenAIRE

    Shinya, Ishii; Munetaka, Shidara; Katsunari, Shibata

    2006-01-01

    In an experiment of multi-trial task to obtain a reward, reward expectancy neurons,###which responded only in the non-reward trials that are necessary to advance###toward the reward, have been observed in the anterior cingulate cortex of monkeys.###In this paper, to explain the emergence of the reward expectancy neuron in###terms of reinforcement learning theory, a model that consists of a recurrent neural###network trained based on reinforcement learning is proposed. The analysis of the###hi...

  5. FMRQ-A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks.

    Science.gov (United States)

    Zhang, Zhen; Zhao, Dongbin; Gao, Junwei; Wang, Dongqing; Dai, Yujie

    2017-06-01

    In this paper, we propose a multiagent reinforcement learning algorithm dealing with fully cooperative tasks. The algorithm is called frequency of the maximum reward Q-learning (FMRQ). FMRQ aims to achieve one of the optimal Nash equilibria so as to optimize the performance index in multiagent systems. The frequency of obtaining the highest global immediate reward instead of immediate reward is used as the reinforcement signal. With FMRQ each agent does not need the observation of the other agents' actions and only shares its state and reward at each step. We validate FMRQ through case studies of repeated games: four cases of two-player two-action and one case of three-player two-action. It is demonstrated that FMRQ can converge to one of the optimal Nash equilibria in these cases. Moreover, comparison experiments on tasks with multiple states and finite steps are conducted. One is box-pushing and the other one is distributed sensor network problem. Experimental results show that the proposed algorithm outperforms others with higher performance.

  6. Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning.

    Science.gov (United States)

    Zhu, Lusha; Mathewson, Kyle E; Hsu, Ming

    2012-01-31

    Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents' beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs.

  7. SVM-Based Control System for a Robot Manipulator

    Directory of Open Access Journals (Sweden)

    Foudil Abdessemed

    2012-12-01

    Full Text Available Real systems are usually non-linear, ill-defined, have variable parameters and are subject to external disturbances. Modelling these systems is often an approximation of the physical phenomena involved. However, it is from this approximate system of representation that we propose - in this paper - to build a robust control, in the sense that it must ensure low sensitivity towards parameters, uncertainties, variations and external disturbances. The computed torque method is a well-established robot control technique which takes account of the dynamic coupling between the robot links. However, its main disadvantage lies on the assumption of an exactly known dynamic model which is not realizable in practice. To overcome this issue, we propose the estimation of the dynamics model of the nonlinear system with a machine learning regression method. The output of this regressor is used in conjunction with a PD controller to achieve the tracking trajectory task of a robot manipulator. In cases where some of the parameters of the plant undergo a change in their values, poor performance may result. To cope with this drawback, a fuzzy precompensator is inserted to reinforce the SVM computed torque-based controller and avoid any deterioration. The theory is developed and the simulation results are carried out on a two-degree of freedom robot manipulator to demonstrate the validity of the proposed approach.

  8. The learning curve of robot-assisted laparoscopic aortofemoral bypass grafting for aortoiliac occlusive disease.

    Science.gov (United States)

    Novotný, Tomáš; Dvorák, Martin; Staffa, Robert

    2011-02-01

    Since the end of the 20th century, robot-assisted surgery has been finding its role among other minimally invasive methods. Vascular surgery seems to be another specialty in which the benefits of this technology can be expected. Our objective was to assess the learning curve of robot-assisted laparoscopic aortofemoral bypass grafting for aortoiliac occlusive disease in a group of 40 patients. Between May 2006 and January 2010, 40 patients (32 men, 8 women), who were a median age of 58 years (range, 48-75 years), underwent 40 robot-assisted laparoscopic aortofemoral reconstructions. Learning curve estimations were used for anastomosis, clamping, and operative time assessment. For conversion rate evaluation, the cumulative summation (CUSUM) technique was used. Statistical analysis comparing the first and second half of our group, and unilateral-to-bilateral reconstructions were performed. We created 21 aortofemoral and 19 aortobifemoral bypasses. The median proximal anastomosis time was 23 minutes (range, 18-50 minutes), median clamping time was 60 minutes (range, 40-95 minutes), and median operative time was 295 minutes (range, 180-475 minutes). The 30-day mortality rate was 0%, and no graft or wound infection or cardiopulmonary or hepatorenal complications were observed. During the median 18-month follow-up (range, 2-48 months), three early graft occlusions occurred (7%). After reoperations, the secondary patency of reconstructions was 100%. Data showed a typical short learning curve for robotic proximal anastomosis creation with anastomosis and clamping time reduction. The operative time learning curve was flat, confirming the procedure's complexity. There were two conversions to open surgery. CUSUM analysis confirmed that an acceptable conversion rate set at 5% was achieved. Comparing the first and second half of our group, all recorded times showed statistically significant improvements. Differences between unilateral and bilateral reconstructions were not

  9. Reinforcement Learning with Autonomous Small Unmanned Aerial Vehicles in Cluttered Environments

    Science.gov (United States)

    Tran, Loc; Cross, Charles; Montague, Gilbert; Motter, Mark; Neilan, James; Qualls, Garry; Rothhaar, Paul; Trujillo, Anna; Allen, B. Danette

    2015-01-01

    We present ongoing work in the Autonomy Incubator at NASA Langley Research Center (LaRC) exploring the efficacy of a data set aggregation approach to reinforcement learning for small unmanned aerial vehicle (sUAV) flight in dense and cluttered environments with reactive obstacle avoidance. The goal is to learn an autonomous flight model using training experiences from a human piloting a sUAV around static obstacles. The training approach uses video data from a forward-facing camera that records the human pilot's flight. Various computer vision based features are extracted from the video relating to edge and gradient information. The recorded human-controlled inputs are used to train an autonomous control model that correlates the extracted feature vector to a yaw command. As part of the reinforcement learning approach, the autonomous control model is iteratively updated with feedback from a human agent who corrects undesired model output. This data driven approach to autonomous obstacle avoidance is explored for simulated forest environments furthering autonomous flight under the tree canopy research. This enables flight in previously inaccessible environments which are of interest to NASA researchers in Earth and Atmospheric sciences.

  10. Concurrent Unimodal Learning Enhances Multisensory Responses of Bi-Directional Crossmodal Learning in Robotic Audio-Visual Tracking

    DEFF Research Database (Denmark)

    Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

    2018-01-01

    modalities to independently update modality-specific neural weights on a moment-by-moment basis, in response to dynamic changes in noisy sensory stimuli. The circuit is embodied as a non-holonomic robotic agent that must orient a towards a moving audio-visual target. The circuit continuously learns the best...

  11. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Science.gov (United States)

    Kato, Ayaka; Morita, Kenji

    2016-10-01

    It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2) value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning

  12. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Directory of Open Access Journals (Sweden)

    Ayaka Kato

    2016-10-01

    Full Text Available It has been suggested that dopamine (DA represents reward-prediction-error (RPE defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1 decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2 value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i slowdown of behavior by post-training blockade of DA signaling, (ii observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems

  13. Vicarious reinforcement learning signals when instructing others.

    Science.gov (United States)

    Apps, Matthew A J; Lesage, Elise; Ramnani, Narender

    2015-02-18

    Reinforcement learning (RL) theory posits that learning is driven by discrepancies between the predicted and actual outcomes of actions (prediction errors [PEs]). In social environments, learning is often guided by similar RL mechanisms. For example, teachers monitor the actions of students and provide feedback to them. This feedback evokes PEs in students that guide their learning. We report the first study that investigates the neural mechanisms that underpin RL signals in the brain of a teacher. Neurons in the anterior cingulate cortex (ACC) signal PEs when learning from the outcomes of one's own actions but also signal information when outcomes are received by others. Does a teacher's ACC signal PEs when monitoring a student's learning? Using fMRI, we studied brain activity in human subjects (teachers) as they taught a confederate (student) action-outcome associations by providing positive or negative feedback. We examined activity time-locked to the students' responses, when teachers infer student predictions and know actual outcomes. We fitted a RL-based computational model to the behavior of the student to characterize their learning, and examined whether a teacher's ACC signals when a student's predictions are wrong. In line with our hypothesis, activity in the teacher's ACC covaried with the PE values in the model. Additionally, activity in the teacher's insula and ventromedial prefrontal cortex covaried with the predicted value according to the student. Our findings highlight that the ACC signals PEs vicariously for others' erroneous predictions, when monitoring and instructing their learning. These results suggest that RL mechanisms, processed vicariously, may underpin and facilitate teaching behaviors. Copyright © 2015 Apps et al.

  14. Visual Semantic Navigation Based on Deep Learning for Indoor Mobile Robots

    Directory of Open Access Journals (Sweden)

    Li Wang

    2018-01-01

    Full Text Available In order to improve the environmental perception ability of mobile robots during semantic navigation, a three-layer perception framework based on transfer learning is proposed, including a place recognition model, a rotation region recognition model, and a “side” recognition model. The first model is used to recognize different regions in rooms and corridors, the second one is used to determine where the robot should be rotated, and the third one is used to decide the walking side of corridors or aisles in the room. Furthermore, the “side” recognition model can also correct the motion of robots in real time, according to which accurate arrival to the specific target is guaranteed. Moreover, semantic navigation is accomplished using only one sensor (a camera. Several experiments are conducted in a real indoor environment, demonstrating the effectiveness and robustness of the proposed perception framework.

  15. Error amplification to promote motor learning and motivation in therapy robotics.

    Science.gov (United States)

    Shirzad, Navid; Van der Loos, H F Machiel

    2012-01-01

    To study the effects of different feedback error amplification methods on a subject's upper-limb motor learning and affect during a point-to-point reaching exercise, we developed a real-time controller for a robotic manipulandum. The reaching environment was visually distorted by implementing a thirty degrees rotation between the coordinate systems of the robot's end-effector and the visual display. Feedback error amplification was provided to subjects as they trained to learn reaching within the visually rotated environment. Error amplification was provided either visually or through both haptic and visual means, each method with two different amplification gains. Subjects' performance (i.e., trajectory error) and self-reports to a questionnaire were used to study the speed and amount of adaptation promoted by each error amplification method and subjects' emotional changes. We found that providing haptic and visual feedback promotes faster adaptation to the distortion and increases subjects' satisfaction with the task, leading to a higher level of attentiveness during the exercise. This finding can be used to design a novel exercise regimen, where alternating between error amplification methods is used to both increase a subject's motor learning and maintain a minimum level of motivational engagement in the exercise. In future experiments, we will test whether such exercise methods will lead to a faster learning time and greater motivation to pursue a therapy exercise regimen.

  16. Filtering sensory information with XCSF: improving learning robustness and robot arm control performance.

    Science.gov (United States)

    Kneissler, Jan; Stalph, Patrick O; Drugowitsch, Jan; Butz, Martin V

    2014-01-01

    It has been shown previously that the control of a robot arm can be efficiently learned using the XCSF learning classifier system, which is a nonlinear regression system based on evolutionary computation. So far, however, the predictive knowledge about how actual motor activity changes the state of the arm system has not been exploited. In this paper, we utilize the forward velocity kinematics knowledge of XCSF to alleviate the negative effect of noisy sensors for successful learning and control. We incorporate Kalman filtering for estimating successive arm positions, iteratively combining sensory readings with XCSF-based predictions of hand position changes over time. The filtered arm position is used to improve both trajectory planning and further learning of the forward velocity kinematics. We test the approach on a simulated kinematic robot arm model. The results show that the combination can improve learning and control performance significantly. However, it also shows that variance estimates of XCSF prediction may be underestimated, in which case self-delusional spiraling effects can hinder effective learning. Thus, we introduce a heuristic parameter, which can be motivated by theory, and which limits the influence of XCSF's predictions on its own further learning input. As a result, we obtain drastic improvements in noise tolerance, allowing the system to cope with more than 10 times higher noise levels.

  17. Robotics as a resource to facilitate the learning and general skills development

    Directory of Open Access Journals (Sweden)

    Flor Ángela Bravo Sánchez

    2012-07-01

    Full Text Available Normal.dotm 0 0 1 127 729 Universidad de Salamanca 6 1 895 12.0 0 false 18 pt 18 pt 0 0 false false false /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} The growing importance of technology in the world today and its continuous development, makes the technology becomes an integral part of the formation process in childhood and youth. For this reason is important to develop proposals that are offered to children and young people to come into contact with new technologies, that is possible through the use of software and hardware tools, such as robotic prototypes and specialized programs for educational purposes This paper shows the importance of the use of robotics as a learning tool and presents the typical stages that must be confronted in implementing educational robotics projects in the classroom. It also presents an educational robotics project called "Mundo Robotica" which seeks to involve robotics in the classroom through practical activities and learning resources, all this is articulated from a virtual platform.

  18. Amygdala and ventral striatum make distinct contributions to reinforcement learning

    Science.gov (United States)

    Costa, Vincent D.; Monte, Olga Dal; Lucas, Daniel R.; Murray, Elisabeth A.; Averbeck, Bruno B.

    2016-01-01

    Summary Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with a RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys’ choice reaction times, which emphasized a speed-accuracy tradeoff that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL. PMID:27720488

  19. Indirect iterative learning control for a discrete visual servo without a camera-robot model.

    Science.gov (United States)

    Jiang, Ping; Bamforth, Leon C A; Feng, Zuren; Baruch, John E F; Chen, YangQuan

    2007-08-01

    This paper presents a discrete learning controller for vision-guided robot trajectory imitation with no prior knowledge of the camera-robot model. A teacher demonstrates a desired movement in front of a camera, and then, the robot is tasked to replay it by repetitive tracking. The imitation procedure is considered as a discrete tracking control problem in the image plane, with an unknown and time-varying image Jacobian matrix. Instead of updating the control signal directly, as is usually done in iterative learning control (ILC), a series of neural networks are used to approximate the unknown Jacobian matrix around every sample point in the demonstrated trajectory, and the time-varying weights of local neural networks are identified through repetitive tracking, i.e., indirect ILC. This makes repetitive segmented training possible, and a segmented training strategy is presented to retain the training trajectories solely within the effective region for neural network approximation. However, a singularity problem may occur if an unmodified neural-network-based Jacobian estimation is used to calculate the robot end-effector velocity. A new weight modification algorithm is proposed which ensures invertibility of the estimation, thus circumventing the problem. Stability is further discussed, and the relationship between the approximation capability of the neural network and the tracking accuracy is obtained. Simulations and experiments are carried out to illustrate the validity of the proposed controller for trajectory imitation of robot manipulators with unknown time-varying Jacobian matrices.

  20. Active-learning strategies: the use of a game to reinforce learning in nursing education. A case study.

    Science.gov (United States)

    Boctor, Lisa

    2013-03-01

    The majority of nursing students are kinesthetic learners, preferring a hands-on, active approach to education. Research shows that active-learning strategies can increase student learning and satisfaction. This study looks at the use of one active-learning strategy, a Jeopardy-style game, 'Nursopardy', to reinforce Fundamentals of Nursing material, aiding in students' preparation for a standardized final exam. The game was created keeping students varied learning styles and the NCLEX blueprint in mind. The blueprint was used to create 5 categories, with 26 total questions. Student survey results, using a five-point Likert scale showed that they did find this learning method enjoyable and beneficial to learning. More research is recommended regarding learning outcomes, when using active-learning strategies, such as games. Copyright © 2012 Elsevier Ltd. All rights reserved.

  1. An approach to robot SLAM based on incremental appearance learning with omnidirectional vision

    Science.gov (United States)

    Wu, Hua; Qin, Shi-Yin

    2011-03-01

    Localisation and mapping with an omnidirectional camera becomes more difficult as the landmark appearances change dramatically in the omnidirectional image. With conventional techniques, it is difficult to match the features of the landmark with the template. We present a novel robot simultaneous localisation and mapping (SLAM) algorithm with an omnidirectional camera, which uses incremental landmark appearance learning to provide posterior probability distribution for estimating the robot pose under a particle filtering framework. The major contribution of our work is to represent the posterior estimation of the robot pose by incremental probabilistic principal component analysis, which can be naturally incorporated into the particle filtering algorithm for robot SLAM. Moreover, the innovative method of this article allows the adoption of the severe distorted landmark appearances viewed with omnidirectional camera for robot SLAM. The experimental results demonstrate that the localisation error is less than 1 cm in an indoor environment using five landmarks, and the location of the landmark appearances can be estimated within 5 pixels deviation from the ground truth in the omnidirectional image at a fairly fast speed.

  2. Reinforcement learning for dpm of embedded visual sensor nodes

    International Nuclear Information System (INIS)

    Khani, U.; Sadhayo, I. H.

    2014-01-01

    This paper proposes a RL (Reinforcement Learning) based DPM (Dynamic Power Management) technique to learn time out policies during a visual sensor node's operation which has multiple power/performance states. As opposed to the widely used static time out policies, our proposed DPM policy which is also referred to as OLTP (Online Learning of Time out Policies), learns to dynamically change the time out decisions in the different node states including the non-operational states. The selection of time out values in different power/performance states of a visual sensing platform is based on the workload estimates derived from a ML-ANN (Multi-Layer Artificial Neural Network) and an objective function given by weighted performance and power parameters. The DPM approach is also able to dynamically adjust the power-performance weights online to satisfy a given constraint of either power consumption or performance. Results show that the proposed learning algorithm explores the power-performance tradeoff with non-stationary workload and outperforms other DPM policies. It also performs the online adjustment of the tradeoff parameters in order to meet a user-specified constraint. (author)

  3. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning

    OpenAIRE

    Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ

    2014-01-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated....

  4. Dynamic Learning from Adaptive Neural Control of Uncertain Robots with Guaranteed Full-State Tracking Precision

    Directory of Open Access Journals (Sweden)

    Min Wang

    2017-01-01

    Full Text Available A dynamic learning method is developed for an uncertain n-link robot with unknown system dynamics, achieving predefined performance attributes on the link angular position and velocity tracking errors. For a known nonsingular initial robotic condition, performance functions and unconstrained transformation errors are employed to prevent the violation of the full-state tracking error constraints. By combining two independent Lyapunov functions and radial basis function (RBF neural network (NN approximator, a novel and simple adaptive neural control scheme is proposed for the dynamics of the unconstrained transformation errors, which guarantees uniformly ultimate boundedness of all the signals in the closed-loop system. In the steady-state control process, RBF NNs are verified to satisfy the partial persistent excitation (PE condition. Subsequently, an appropriate state transformation is adopted to achieve the accurate convergence of neural weight estimates. The corresponding experienced knowledge on unknown robotic dynamics is stored in NNs with constant neural weight values. Using the stored knowledge, a static neural learning controller is developed to improve the full-state tracking performance. A comparative simulation study on a 2-link robot illustrates the effectiveness of the proposed scheme.

  5. High and low temperatures have unequal reinforcing properties in Drosophila spatial learning.

    Science.gov (United States)

    Zars, Melissa; Zars, Troy

    2006-07-01

    Small insects regulate their body temperature solely through behavior. Thus, sensing environmental temperature and implementing an appropriate behavioral strategy can be critical for survival. The fly Drosophila melanogaster prefers 24 degrees C, avoiding higher and lower temperatures when tested on a temperature gradient. Furthermore, temperatures above 24 degrees C have negative reinforcing properties. In contrast, we found that flies have a preference in operant learning experiments for a low-temperature-associated position rather than the 24 degrees C alternative in the heat-box. Two additional differences between high- and low-temperature reinforcement, i.e., temperatures above and below 24 degrees C, were found. Temperatures equally above and below 24 degrees C did not reinforce equally and only high temperatures supported increased memory performance with reversal conditioning. Finally, low- and high-temperature reinforced memories are similarly sensitive to two genetic mutations. Together these results indicate the qualitative meaning of temperatures below 24 degrees C depends on the dynamics of the temperatures encountered and that the reinforcing effects of these temperatures depend on at least some common genetic components. Conceptualizing these results using the Wolf-Heisenberg model of operant conditioning, we propose the maximum difference in experienced temperatures determines the magnitude of the reinforcement input to a conditioning circuit.

  6. Spared internal but impaired external reward prediction error signals in major depressive disorder during reinforcement learning.

    Science.gov (United States)

    Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris

    2017-01-01

    Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.

  7. Programming Robots with Associative Memories

    Energy Technology Data Exchange (ETDEWEB)

    Touzet, C

    1999-07-10

    Today, there are several drawbacks that impede the necessary and much needed use of robot learning techniques in real applications. First, the time needed to achieve the synthesis of any behavior is prohibitive. Second, the robot behavior during the learning phase is "by definition" bad, it may even be dangerous. Third, except within the lazy learning approach, a new behavior implies a new learning phase. We propose in this paper to use self-organizing maps to encode the non explicit model of the robot-world interaction sampled by the lazy memory, and then generate a robot behavior by means of situations to be achieved, i.e., points on the self-organizing maps. Any behavior can instantaneously be synthesized by the definition of a goal situation. Its performance will be minimal (not evidently bad) and will improve by the mere repetition of the behavior.

  8. Programming Robots with Associative Memories

    International Nuclear Information System (INIS)

    Touzet, C.

    1999-01-01

    Today, there are several drawbacks that impede the necessary and much needed use of robot learning techniques in real applications. First, the time needed to achieve the synthesis of any behavior is prohibitive. Second, the robot behavior during the learning phase is by definition bad, it may even be dangerous. Third, except within the lazy learning approach, a new behavior implies a new learning phase. We propose in this paper to use self-organizing maps to encode the non explicit model of the robot-world interaction sampled by the lazy memory, and then generate a robot behavior by means of situations to be achieved, i.e., points on the self-organizing maps. Any behavior can instantaneously be synthesized by the definition of a goal situation. Its performance will be minimal (not evidently bad) and will improve by the mere repetition of the behavior

  9. Lessons learned over a decade of pediatric robotic ureteral reimplantation

    Directory of Open Access Journals (Sweden)

    Minki Baek

    2017-01-01

    Full Text Available The da Vinci robotic system has improved surgeon dexterity, ergonomics, and visualization to allow for a minimally invasive option for complex reconstructive procedures in children. Over the past decade, robot-assisted laparoscopic ureteral reimplantation (RALUR has become a viable minimally invasive surgical option for pediatric vesicoureteral reflux (VUR. However, higher-thanexpected complication rates and suboptimal reflux resolution rates at some centers have also been reported. The heterogeneity of surgical outcomes may arise from the inherent and underestimated complexity of the RALUR procedure that may justify its reclassification as a complex reconstructive procedure and especially for robotic surgeons early in their learning curve. Currently, no consensus exists on the role of RALUR for the surgical management of VUR. High success rates and low major complication rates are the expected norm for the current gold standard surgical option of open ureteral reimplantation. Similar to how robot-assisted laparoscopic surgery has gradually replaced open surgery as the most utilized option for prostatectomy in prostate cancer patients, RALUR may become a higher utilized surgical option in children with VUR if the adoption of standardized surgical techniques that have been associated with optimal outcomes can be adopted during the second decade of RALUR. A future standard of RALUR for children with VUR whose parents seek a minimally invasive surgical option can arise if widespread achievement of high success rates and low major complication rates can be obtained, similar to the replacement of open surgery with robot-assisted laparoscopic radical prostectomy as the new strandard for men with prostate cancer.

  10. Child, Robot and Educational Material : A Triadic Interaction

    NARCIS (Netherlands)

    Davison, Daniel Patrick

    The process in which a child and a robot work together to solve a learning task can be characterised as a triadic interaction. Interactions between the child and robot; the child and learning materials; and the robot and learning materials will each shape the perception and appreciation the child

  11. Child, Robot and Educational Material: A Triadic Interaction

    NARCIS (Netherlands)

    Davison, Daniel Patrick

    The process in which a child and a robot work together to solve a learning task can be characterised as a triadic interaction. Interactions between the child and robot; the child and learning materials; and the robot and learning materials will each shape the perception and appreciation the child

  12. Optimal critic learning for robot control in time-varying environments.

    Science.gov (United States)

    Wang, Chen; Li, Yanan; Ge, Shuzhi Sam; Lee, Tong Heng

    2015-10-01

    In this paper, optimal critic learning is developed for robot control in a time-varying environment. The unknown environment is described as a linear system with time-varying parameters, and impedance control is employed for the interaction control. Desired impedance parameters are obtained in the sense of an optimal realization of the composite of trajectory tracking and force regulation. Q -function-based critic learning is developed to determine the optimal impedance parameters without the knowledge of the system dynamics. The simulation results are presented and compared with existing methods, and the efficacy of the proposed method is verified.

  13. An Architecture for Emotional and Context-Aware Associative Learning for Robot Companions

    OpenAIRE

    Rizzi Raymundo, C.; Johnson, C. G.; Vargas, P. A.

    2015-01-01

    This work proposes a theoretical architectural model based on the brain's fear learning system with the purpose of generating artificial fear conditioning at both stimuli and context abstraction levels in robot companions. The proposed architecture is inspired by the different brain regions involved in fear learning, here divided into four modules that work in an integrated and parallel manner: the sensory system, the amygdala system, the hippocampal system and the working memory. Each of the...

  14. Locally optimal control under unknown dynamics with learnt cost function: application to industrial robot positioning

    Science.gov (United States)

    Guérin, Joris; Gibaru, Olivier; Thiery, Stéphane; Nyiri, Eric

    2017-01-01

    Recent methods of Reinforcement Learning have enabled to solve difficult, high dimensional, robotic tasks under unknown dynamics using iterative Linear Quadratic Gaussian control theory. These algorithms are based on building a local time-varying linear model of the dynamics from data gathered through interaction with the environment. In such tasks, the cost function is often expressed directly in terms of the state and control variables so that it can be locally quadratized to run the algorithm. If the cost is expressed in terms of other variables, a model is required to compute the cost function from the variables manipulated. We propose a method to learn the cost function directly from the data, in the same way as for the dynamics. This way, the cost function can be defined in terms of any measurable quantity and thus can be chosen more appropriately for the task to be carried out. With our method, any sensor information can be used to design the cost function. We demonstrate the efficiency of this method through simulating, with the V-REP software, the learning of a Cartesian positioning task on several industrial robots with different characteristics. The robots are controlled in joint space and no model is provided a priori. Our results are compared with another model free technique, consisting in writing the cost function as a state variable.

  15. Deep reinforcement learning for automated radiation adaptation in lung cancer.

    Science.gov (United States)

    Tseng, Huan-Hsin; Luo, Yi; Cui, Sunan; Chien, Jen-Tzung; Ten Haken, Randall K; Naqa, Issam El

    2017-12-01

    To investigate deep reinforcement learning (DRL) based on historical treatment plans for developing automated radiation adaptation protocols for nonsmall cell lung cancer (NSCLC) patients that aim to maximize tumor local control at reduced rates of radiation pneumonitis grade 2 (RP2). In a retrospective population of 114 NSCLC patients who received radiotherapy, a three-component neural networks framework was developed for deep reinforcement learning (DRL) of dose fractionation adaptation. Large-scale patient characteristics included clinical, genetic, and imaging radiomics features in addition to tumor and lung dosimetric variables. First, a generative adversarial network (GAN) was employed to learn patient population characteristics necessary for DRL training from a relatively limited sample size. Second, a radiotherapy artificial environment (RAE) was reconstructed by a deep neural network (DNN) utilizing both original and synthetic data (by GAN) to estimate the transition probabilities for adaptation of personalized radiotherapy patients' treatment courses. Third, a deep Q-network (DQN) was applied to the RAE for choosing the optimal dose in a response-adapted treatment setting. This multicomponent reinforcement learning approach was benchmarked against real clinical decisions that were applied in an adaptive dose escalation clinical protocol. In which, 34 patients were treated based on avid PET signal in the tumor and constrained by a 17.2% normal tissue complication probability (NTCP) limit for RP2. The uncomplicated cure probability (P+) was used as a baseline reward function in the DRL. Taking our adaptive dose escalation protocol as a blueprint for the proposed DRL (GAN + RAE + DQN) architecture, we obtained an automated dose adaptation estimate for use at ∼2/3 of the way into the radiotherapy treatment course. By letting the DQN component freely control the estimated adaptive dose per fraction (ranging from 1-5 Gy), the DRL automatically favored dose

  16. Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.

    Science.gov (United States)

    Schönberg, Tom; Daw, Nathaniel D; Joel, Daphna; O'Doherty, John P

    2007-11-21

    The computational framework of reinforcement learning has been used to forward our understanding of the neural mechanisms underlying reward learning and decision-making behavior. It is known that humans vary widely in their performance in decision-making tasks. Here, we used a simple four-armed bandit task in which subjects are almost evenly split into two groups on the basis of their performance: those who do learn to favor choice of the optimal action and those who do not. Using models of reinforcement learning we sought to determine the neural basis of these intrinsic differences in performance by scanning both groups with functional magnetic resonance imaging. We scanned 29 subjects while they performed the reward-based decision-making task. Our results suggest that these two groups differ markedly in the degree to which reinforcement learning signals in the striatum are engaged during task performance. While the learners showed robust prediction error signals in both the ventral and dorsal striatum during learning, the nonlearner group showed a marked absence of such signals. Moreover, the magnitude of prediction error signals in a region of dorsal striatum correlated significantly with a measure of behavioral performance across all subjects. These findings support a crucial role of prediction error signals, likely originating from dopaminergic midbrain neurons, in enabling learning of action selection preferences on the basis of obtained rewards. Thus, spontaneously observed individual differences in decision making performance demonstrate the suggested dependence of this type of learning on the functional integrity of the dopaminergic striatal system in humans.

  17. Learning to Automatically Detect Features for Mobile Robots Using Second-Order Hidden Markov Models

    Directory of Open Access Journals (Sweden)

    Olivier Aycard

    2004-12-01

    Full Text Available In this paper, we propose a new method based on Hidden Markov Models to interpret temporal sequences of sensor data from mobile robots to automatically detect features. Hidden Markov Models have been used for a long time in pattern recognition, especially in speech recognition. Their main advantages over other methods (such as neural networks are their ability to model noisy temporal signals of variable length. We show in this paper that this approach is well suited for interpretation of temporal sequences of mobile-robot sensor data. We present two distinct experiments and results: the first one in an indoor environment where a mobile robot learns to detect features like open doors or T-intersections, the second one in an outdoor environment where a different mobile robot has to identify situations like climbing a hill or crossing a rock.

  18. Why Are There Developmental Stages in Language Learning? A Developmental Robotics Model of Language Development.

    Science.gov (United States)

    Morse, Anthony F; Cangelosi, Angelo

    2017-02-01

    Most theories of learning would predict a gradual acquisition and refinement of skills as learning progresses, and while some highlight exponential growth, this fails to explain why natural cognitive development typically progresses in stages. Models that do span multiple developmental stages typically have parameters to "switch" between stages. We argue that by taking an embodied view, the interaction between learning mechanisms, the resulting behavior of the agent, and the opportunities for learning that the environment provides can account for the stage-wise development of cognitive abilities. We summarize work relevant to this hypothesis and suggest two simple mechanisms that account for some developmental transitions: neural readiness focuses on changes in the neural substrate resulting from ongoing learning, and perceptual readiness focuses on the perceptual requirements for learning new tasks. Previous work has demonstrated these mechanisms in replications of a wide variety of infant language experiments, spanning multiple developmental stages. Here we piece this work together as a single model of ongoing learning with no parameter changes at all. The model, an instance of the Epigenetic Robotics Architecture (Morse et al 2010) embodied on the iCub humanoid robot, exhibits ongoing multi-stage development while learning pre-linguistic and then basic language skills. Copyright © 2016 Cognitive Science Society, Inc.

  19. Closed loop interactions between spiking neural network and robotic simulators based on MUSIC and ROS

    Directory of Open Access Journals (Sweden)

    Philipp Weidel

    2016-08-01

    Full Text Available In order to properly assess the function and computational properties of simulated neural systems, it is necessary to account for the nature of the stimuli that drive the system. However, providing stimuli that are rich and yet both reproducible and amenable to experimental manipulations is technically challenging, and even more so if a closed-loop scenario is required. In this work, we present a novel approach to solve this problem, connecting robotics and neural network simulators. We implement a middleware solution that bridges the Robotic Operating System (ROS to the Multi-Simulator Coordinator (MUSIC. This enables any robotic and neural simulators that implement the corresponding interfaces to be efficiently coupled, allowing real-time performance for a wide range of configurations. This work extends the toolset available for researchers in both neurorobotics and computational neuroscience, and creates the opportunity to perform closed-loop experiments of arbitrary complexity to address questions in multiple areas, including embodiment, agency, and reinforcement learning.

  20. Reinforcement Learning in Distributed Domains: Beyond Team Games

    Science.gov (United States)

    Wolpert, David H.; Sill, Joseph; Turner, Kagan

    2000-01-01

    Distributed search algorithms are crucial in dealing with large optimization problems, particularly when a centralized approach is not only impractical but infeasible. Many machine learning concepts have been applied to search algorithms in order to improve their effectiveness. In this article we present an algorithm that blends Reinforcement Learning (RL) and hill climbing directly, by using the RL signal to guide the exploration step of a hill climbing algorithm. We apply this algorithm to the domain of a constellations of communication satellites where the goal is to minimize the loss of importance weighted data. We introduce the concept of 'ghost' traffic, where correctly setting this traffic induces the satellites to act to optimize the world utility. Our results indicated that the bi-utility search introduced in this paper outperforms both traditional hill climbing algorithms and distributed RL approaches such as team games.

  1. Cerebellar and prefrontal cortex contributions to adaptation, strategies, and reinforcement learning.

    Science.gov (United States)

    Taylor, Jordan A; Ivry, Richard B

    2014-01-01

    Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. © 2014 Elsevier B.V. All rights reserved.

  2. An Integrated Framework for Human-Robot Collaborative Manipulation.

    Science.gov (United States)

    Sheng, Weihua; Thobbi, Anand; Gu, Ye

    2015-10-01

    This paper presents an integrated learning framework that enables humanoid robots to perform human-robot collaborative manipulation tasks. Specifically, a table-lifting task performed jointly by a human and a humanoid robot is chosen for validation purpose. The proposed framework is split into two phases: 1) phase I-learning to grasp the table and 2) phase II-learning to perform the manipulation task. An imitation learning approach is proposed for phase I. In phase II, the behavior of the robot is controlled by a combination of two types of controllers: 1) reactive and 2) proactive. The reactive controller lets the robot take a reactive control action to make the table horizontal. The proactive controller lets the robot take proactive actions based on human motion prediction. A measure of confidence of the prediction is also generated by the motion predictor. This confidence measure determines the leader/follower behavior of the robot. Hence, the robot can autonomously switch between the behaviors during the task. Finally, the performance of the human-robot team carrying out the collaborative manipulation task is experimentally evaluated on a platform consisting of a Nao humanoid robot and a Vicon motion capture system. Results show that the proposed framework can enable the robot to carry out the collaborative manipulation task successfully.

  3. Starting a robotic program in general thoracic surgery: why, how, and lessons learned.

    Science.gov (United States)

    Cerfolio, Robert J; Bryant, Ayesha S; Minnich, Douglas J

    2011-06-01

    We report our experience in starting a robotic program in thoracic surgery. We retrospectively reviewed our experience in starting a robotic program in general thoracic surgery on a consecutive series of patients. Between February 2009 and September 2010, 150 patients underwent robotic operations. Types of procedures were lobectomy in 62, thymectomy in 30, and benign esophageal procedures in 6. No thymectomy or esophageal procedures required conversion. One conversion was needed for suspected bleeding for a mediastinal mass. Twelve patients were converted for lobectomy (none for bleeding, 1 in the last 24). Median operative time for robotic thymectomy was 119 minutes, and median length of stay was 1 day. The median time for robotic lobectomy was 185 minutes, and median length of stay was 2 days. There were no operative deaths. Morbidity occurred in 23 patients (15%). All patients with cancer had R0 resections and resection of all visible mediastinal and hilar lymph nodes. Robotic surgery is safe and oncologically sound. It requires training of the entire operating room team. The learning curve is steep, involving port placement, availability of the proper instrumentation, use of the correct robotic arms, and proper patient positioning. The robot provides an ideal surgical approach for thymectomy and other mediastinal tumors. Its advantage over thoracoscopy for pulmonary resection is unproven; however, we believe complete thoracic lymph node dissection and teaching is easier. Importantly, defined credentialing for surgeons and cost analysis studies are needed. Copyright © 2011 The Society of Thoracic Surgeons. Published by Elsevier Inc. All rights reserved.

  4. Effects of a robotic storyteller's moody gestures on storytelling perception

    NARCIS (Netherlands)

    Xu, J.; Broekens, J.; Hindriks, K.; Neerincx, M.A.

    2015-01-01

    A parameterized behavior model was developed for robots to show mood during task execution. In this study, we applied the model to the coverbal gestures of a robotic storyteller. This study investigated whether parameterized mood expression can 1) show mood that is changing over time; 2) reinforce

  5. VBOT: Motivating computational and complex systems fluencies with constructionist virtual/physical robotics

    Science.gov (United States)

    Berland, Matthew W.

    As scientists use the tools of computational and complex systems theory to broaden science perspectives (e.g., Bar-Yam, 1997; Holland, 1995; Wolfram, 2002), so can middle-school students broaden their perspectives using appropriate tools. The goals of this dissertation project are to build, study, evaluate, and compare activities designed to foster both computational and complex systems fluencies through collaborative constructionist virtual and physical robotics. In these activities, each student builds an agent (e.g., a robot-bird) that must interact with fellow students' agents to generate a complex aggregate (e.g., a flock of robot-birds) in a participatory simulation environment (Wilensky & Stroup, 1999a). In a participatory simulation, students collaborate by acting in a common space, teaching each other, and discussing content with one another. As a result, the students improve both their computational fluency and their complex systems fluency, where fluency is defined as the ability to both consume and produce relevant content (DiSessa, 2000). To date, several systems have been designed to foster computational and complex systems fluencies through computer programming and collaborative play (e.g., Hancock, 2003; Wilensky & Stroup, 1999b); this study suggests that, by supporting the relevant fluencies through collaborative play, they become mutually reinforcing. In this work, I will present both the design of the VBOT virtual/physical constructionist robotics learning environment and a comparative study of student interaction with the virtual and physical environments across four middle-school classrooms, focusing on the contrast in systems perspectives differently afforded by the two environments. In particular, I found that while performance gains were similar overall, the physical environment supported agent perspectives on aggregate behavior, and the virtual environment supported aggregate perspectives on agent behavior. The primary research questions

  6. Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder.

    Science.gov (United States)

    Rothkirch, Marcus; Tonn, Jonas; Köhler, Stephan; Sterzer, Philipp

    2017-04-01

    According to current concepts, major depressive disorder is strongly related to dysfunctional neural processing of motivational information, entailing impairments in reinforcement learning. While computational modelling can reveal the precise nature of neural learning signals, it has not been used to study learning-related neural dysfunctions in unmedicated patients with major depressive disorder so far. We thus aimed at comparing the neural coding of reward and punishment prediction errors, representing indicators of neural learning-related processes, between unmedicated patients with major depressive disorder and healthy participants. To this end, a group of unmedicated patients with major depressive disorder (n = 28) and a group of age- and sex-matched healthy control participants (n = 30) completed an instrumental learning task involving monetary gains and losses during functional magnetic resonance imaging. The two groups did not differ in their learning performance. Patients and control participants showed the same level of prediction error-related activity in the ventral striatum and the anterior insula. In contrast, neural coding of reward prediction errors in the medial orbitofrontal cortex was reduced in patients. Moreover, neural reward prediction error signals in the medial orbitofrontal cortex and ventral striatum showed negative correlations with anhedonia severity. Using a standard instrumental learning paradigm we found no evidence for an overall impairment of reinforcement learning in medication-free patients with major depressive disorder. Importantly, however, the attenuated neural coding of reward in the medial orbitofrontal cortex and the relation between anhedonia and reduced reward prediction error-signalling in the medial orbitofrontal cortex and ventral striatum likely reflect an impairment in experiencing pleasure from rewarding events as a key mechanism of anhedonia in major depressive disorder. © The Author (2017). Published by Oxford

  7. Multiobjective Reinforcement Learning for Traffic Signal Control Using Vehicular Ad Hoc Network

    Directory of Open Access Journals (Sweden)

    Houli Duan

    2010-01-01

    Full Text Available We propose a new multiobjective control algorithm based on reinforcement learning for urban traffic signal control, named multi-RL. A multiagent structure is used to describe the traffic system. A vehicular ad hoc network is used for the data exchange among agents. A reinforcement learning algorithm is applied to predict the overall value of the optimization objective given vehicles' states. The policy which minimizes the cumulative value of the optimization objective is regarded as the optimal one. In order to make the method adaptive to various traffic conditions, we also introduce a multiobjective control scheme in which the optimization objective is selected adaptively to real-time traffic states. The optimization objectives include the vehicle stops, the average waiting time, and the maximum queue length of the next intersection. In addition, we also accommodate a priority control to the buses and the emergency vehicles through our model. The simulation results indicated that our algorithm could perform more efficiently than traditional traffic light control methods.

  8. Final report for LDRD project 11-0783 : directed robots for increased military manpower effectiveness.

    Energy Technology Data Exchange (ETDEWEB)

    Rohrer, Brandon Robinson; Rothganger, Fredrick H.; Wagner, John S.; Xavier, Patrick Gordon; Morrow, James Dan

    2011-09-01

    The purpose of this LDRD is to develop technology allowing warfighters to provide high-level commands to their unmanned assets, freeing them to command a group of them or commit the bulk of their attention elsewhere. To this end, a brain-emulating cognition and control architecture (BECCA) was developed, incorporating novel and uniquely capable feature creation and reinforcement learning algorithms. BECCA was demonstrated on both a mobile manipulator platform and on a seven degree of freedom serial link robot arm. Existing military ground robots are almost universally teleoperated and occupy the complete attention of an operator. They may remove a soldier from harm's way, but they do not necessarily reduce manpower requirements. Current research efforts to solve the problem of autonomous operation in an unstructured, dynamic environment fall short of the desired performance. In order to increase the effectiveness of unmanned vehicle (UV) operators, we proposed to develop robots that can be 'directed' rather than remote-controlled. They are instructed and trained by human operators, rather than driven. The technical approach is modeled closely on psychological and neuroscientific models of human learning. Two Sandia-developed models are utilized in this effort: the Sandia Cognitive Framework (SCF), a cognitive psychology-based model of human processes, and BECCA, a psychophysical-based model of learning, motor control, and conceptualization. Together, these models span the functional space from perceptuo-motor abilities, to high-level motivational and attentional processes.

  9. Detecting and Classifying Human Touches in a Social Robot Through Acoustic Sensing and Machine Learning.

    Science.gov (United States)

    Alonso-Martín, Fernando; Gamboa-Montero, Juan José; Castillo, José Carlos; Castro-González, Álvaro; Salichs, Miguel Ángel

    2017-05-16

    An important aspect in Human-Robot Interaction is responding to different kinds of touch stimuli. To date, several technologies have been explored to determine how a touch is perceived by a social robot, usually placing a large number of sensors throughout the robot's shell. In this work, we introduce a novel approach, where the audio acquired from contact microphones located in the robot's shell is processed using machine learning techniques to distinguish between different types of touches. The system is able to determine when the robot is touched (touch detection), and to ascertain the kind of touch performed among a set of possibilities: stroke , tap , slap , and tickle (touch classification). This proposal is cost-effective since just a few microphones are able to cover the whole robot's shell since a single microphone is enough to cover each solid part of the robot. Besides, it is easy to install and configure as it just requires a contact surface to attach the microphone to the robot's shell and plug it into the robot's computer. Results show the high accuracy scores in touch gesture recognition. The testing phase revealed that Logistic Model Trees achieved the best performance, with an F -score of 0.81. The dataset was built with information from 25 participants performing a total of 1981 touch gestures.

  10. Surgical robotics beyond enhanced dexterity instrumentation: a survey of machine learning techniques and their role in intelligent and autonomous surgical actions.

    Science.gov (United States)

    Kassahun, Yohannes; Yu, Bingbin; Tibebu, Abraham Temesgen; Stoyanov, Danail; Giannarou, Stamatia; Metzen, Jan Hendrik; Vander Poorten, Emmanuel

    2016-04-01

    Advances in technology and computing play an increasingly important role in the evolution of modern surgical techniques and paradigms. This article reviews the current role of machine learning (ML) techniques in the context of surgery with a focus on surgical robotics (SR). Also, we provide a perspective on the future possibilities for enhancing the effectiveness of procedures by integrating ML in the operating room. The review is focused on ML techniques directly applied to surgery, surgical robotics, surgical training and assessment. The widespread use of ML methods in diagnosis and medical image computing is beyond the scope of the review. Searches were performed on PubMed and IEEE Explore using combinations of keywords: ML, surgery, robotics, surgical and medical robotics, skill learning, skill analysis and learning to perceive. Studies making use of ML methods in the context of surgery are increasingly being reported. In particular, there is an increasing interest in using ML for developing tools to understand and model surgical skill and competence or to extract surgical workflow. Many researchers begin to integrate this understanding into the control of recent surgical robots and devices. ML is an expanding field. It is popular as it allows efficient processing of vast amounts of data for interpreting and real-time decision making. Already widely used in imaging and diagnosis, it is believed that ML will also play an important role in surgery and interventional treatments. In particular, ML could become a game changer into the conception of cognitive surgical robots. Such robots endowed with cognitive skills would assist the surgical team also on a cognitive level, such as possibly lowering the mental load of the team. For example, ML could help extracting surgical skill, learned through demonstration by human experts, and could transfer this to robotic skills. Such intelligent surgical assistance would significantly surpass the state of the art in surgical

  11. The Study of Reinforcement Learning for Traffic Self-Adaptive Control under Multiagent Markov Game Environment

    Directory of Open Access Journals (Sweden)

    Lun-Hui Xu

    2013-01-01

    Full Text Available Urban traffic self-adaptive control problem is dynamic and uncertain, so the states of traffic environment are hard to be observed. Efficient agent which controls a single intersection can be discovered automatically via multiagent reinforcement learning. However, in the majority of the previous works on this approach, each agent needed perfect observed information when interacting with the environment and learned individually with less efficient coordination. This study casts traffic self-adaptive control as a multiagent Markov game problem. The design employs traffic signal control agent (TSCA for each signalized intersection that coordinates with neighboring TSCAs. A mathematical model for TSCAs’ interaction is built based on nonzero-sum markov game which has been applied to let TSCAs learn how to cooperate. A multiagent Markov game reinforcement learning approach is constructed on the basis of single-agent Q-learning. This method lets each TSCA learn to update its Q-values under the joint actions and imperfect information. The convergence of the proposed algorithm is analyzed theoretically. The simulation results show that the proposed method is convergent and effective in realistic traffic self-adaptive control setting.

  12. TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow

    OpenAIRE

    Hafner, Danijar; Davidson, James; Vanhoucke, Vincent

    2017-01-01

    We introduce TensorFlow Agents, an efficient infrastructure paradigm for building parallel reinforcement learning algorithms in TensorFlow. We simulate multiple environments in parallel, and group them to perform the neural network computation on a batch rather than individual observations. This allows the TensorFlow execution engine to parallelize computation, without the need for manual synchronization. Environments are stepped in separate Python processes to progress them in parallel witho...

  13. Self-organization via active exploration in robotic applications

    Science.gov (United States)

    Ogmen, H.; Prakash, R. V.

    1992-01-01

    We describe a neural network based robotic system. Unlike traditional robotic systems, our approach focussed on non-stationary problems. We indicate that self-organization capability is necessary for any system to operate successfully in a non-stationary environment. We suggest that self-organization should be based on an active exploration process. We investigated neural architectures having novelty sensitivity, selective attention, reinforcement learning, habit formation, flexible criteria categorization properties and analyzed the resulting behavior (consisting of an intelligent initiation of exploration) by computer simulations. While various computer vision researchers acknowledged recently the importance of active processes (Swain and Stricker, 1991), the proposed approaches within the new framework still suffer from a lack of self-organization (Aloimonos and Bandyopadhyay, 1987; Bajcsy, 1988). A self-organizing, neural network based robot (MAVIN) has been recently proposed (Baloch and Waxman, 1991). This robot has the capability of position, size rotation invariant pattern categorization, recognition and pavlovian conditioning. Our robot does not have initially invariant processing properties. The reason for this is the emphasis we put on active exploration. We maintain the point of view that such invariant properties emerge from an internalization of exploratory sensory-motor activity. Rather than coding the equilibria of such mental capabilities, we are seeking to capture its dynamics to understand on the one hand how the emergence of such invariances is possible and on the other hand the dynamics that lead to these invariances. The second point is crucial for an adaptive robot to acquire new invariances in non-stationary environments, as demonstrated by the inverting glass experiments of Helmholtz. We will introduce Pavlovian conditioning circuits in our future work for the precise objective of achieving the generation, coordination, and internalization

  14. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia

    Science.gov (United States)

    Markou, Athina; Salamone, John D.; Bussey, Timothy; Mar, Adam; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-01-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu). A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. PMID:23994273

  15. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia.

    Science.gov (United States)

    Markou, Athina; Salamone, John D; Bussey, Timothy J; Mar, Adam C; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-11-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu) meeting. A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. Copyright © 2013 Elsevier

  16. Interactive robots in experimental biology.

    Science.gov (United States)

    Krause, Jens; Winfield, Alan F T; Deneubourg, Jean-Louis

    2011-07-01

    Interactive robots have the potential to revolutionise the study of social behaviour because they provide several methodological advances. In interactions with live animals, the behaviour of robots can be standardised, morphology and behaviour can be decoupled (so that different morphologies and behavioural strategies can be combined), behaviour can be manipulated in complex interaction sequences and models of behaviour can be embodied by the robot and thereby be tested. Furthermore, robots can be used as demonstrators in experiments on social learning. As we discuss here, the opportunities that robots create for new experimental approaches have far-reaching consequences for research in fields such as mate choice, cooperation, social learning, personality studies and collective behaviour. Copyright © 2011 Elsevier Ltd. All rights reserved.

  17. Peripersonal Space and Margin of Safety around the Body: Learning Visuo-Tactile Associations in a Humanoid Robot with Artificial Skin.

    Science.gov (United States)

    Roncone, Alessandro; Hoffmann, Matej; Pattacini, Ugo; Fadiga, Luciano; Metta, Giorgio

    2016-01-01

    This paper investigates a biologically motivated model of peripersonal space through its implementation on a humanoid robot. Guided by the present understanding of the neurophysiology of the fronto-parietal system, we developed a computational model inspired by the receptive fields of polymodal neurons identified, for example, in brain areas F4 and VIP. The experiments on the iCub humanoid robot show that the peripersonal space representation i) can be learned efficiently and in real-time via a simple interaction with the robot, ii) can lead to the generation of behaviors like avoidance and reaching, and iii) can contribute to the understanding the biological principle of motor equivalence. More specifically, with respect to i) the present model contributes to hypothesizing a learning mechanisms for peripersonal space. In relation to point ii) we show how a relatively simple controller can exploit the learned receptive fields to generate either avoidance or reaching of an incoming stimulus and for iii) we show how the robot can select arbitrary body parts as the controlled end-point of an avoidance or reaching movement.

  18. Neural Control of a Tracking Task via Attention-Gated Reinforcement Learning for Brain-Machine Interfaces.

    Science.gov (United States)

    Wang, Yiwen; Wang, Fang; Xu, Kai; Zhang, Qiaosheng; Zhang, Shaomin; Zheng, Xiaoxiang

    2015-05-01

    Reinforcement learning (RL)-based brain machine interfaces (BMIs) enable the user to learn from the environment through interactions to complete the task without desired signals, which is promising for clinical applications. Previous studies exploited Q-learning techniques to discriminate neural states into simple directional actions providing the trial initial timing. However, the movements in BMI applications can be quite complicated, and the action timing explicitly shows the intention when to move. The rich actions and the corresponding neural states form a large state-action space, imposing generalization difficulty on Q-learning. In this paper, we propose to adopt attention-gated reinforcement learning (AGREL) as a new learning scheme for BMIs to adaptively decode high-dimensional neural activities into seven distinct movements (directional moves, holdings and resting) due to the efficient weight-updating. We apply AGREL on neural data recorded from M1 of a monkey to directly predict a seven-action set in a time sequence to reconstruct the trajectory of a center-out task. Compared to Q-learning techniques, AGREL could improve the target acquisition rate to 90.16% in average with faster convergence and more stability to follow neural activity over multiple days, indicating the potential to achieve better online decoding performance for more complicated BMI tasks.

  19. Reinforcement Learning Approach to Generate Goal-directed Locomotion of a Snake-Like Robot with Screw-Drive Units

    DEFF Research Database (Denmark)

    Chatterjee, Sromona; Nachstedt, Timo; Tamosiunaite, Minija

    2014-01-01

    Abstract—In this paper we apply a policy improvement algorithm called Policy Improvement using Path Integrals (PI2) to generate goal-directed locomotion of a complex snake-like robot with screw-drive units. PI2 is numerically simple and has an ability to deal with high dimensional systems. Here...

  20. Studying social robots in practiced places

    DEFF Research Database (Denmark)

    Hasse, Cathrine; Bruun, Maja Hojer; Hanghøj, Signe

    2015-01-01

    values, social relations and materialities. Though substantial funding has been invested in developing health service robots, few studies have been undertaken that explore human-robot interactions as they play out in everyday practice. We argue that the complex learning processes involve not only so...... of technologies in use, e.g., technologies as multistable ontologies. The argument builds on an empirical study of robots at a Danish rehabilitation centre. Ethnographic methods combined with anthropological learning processes open up new way for exploring how robots enter into professional practices and change...

  1. Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing

    Science.gov (United States)

    Lefebvre, Germain; Blakemore, Sarah-Jayne

    2017-01-01

    Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice. PMID:28800597

  2. Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing.

    Science.gov (United States)

    Palminteri, Stefano; Lefebvre, Germain; Kilford, Emma J; Blakemore, Sarah-Jayne

    2017-08-01

    Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice.

  3. Lessons Learned in Designing User-configurable Modular Robotics

    DEFF Research Database (Denmark)

    Lund, Henrik Hautop

    2013-01-01

    User-configurable robotics allows users to easily configure robotic systems to perform task-fulfilling behaviors as desired by the users. With a user configurable robotic system, the user can easily modify the physical and func-tional aspect in terms of hardware and software components of a robotic...... with the semi-autonomous com-ponents of the user-configurable robotic system in interaction with the given environment. Components constituting such a user-configurable robotic system can be characterized as modules in a modular robotic system. Several factors in the definition and implementation...

  4. Within- and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory.

    Science.gov (United States)

    Collins, Anne G E; Frank, Michael J

    2018-03-06

    Learning from rewards and punishments is essential to survival and facilitates flexible human behavior. It is widely appreciated that multiple cognitive and reinforcement learning systems contribute to decision-making, but the nature of their interactions is elusive. Here, we leverage methods for extracting trial-by-trial indices of reinforcement learning (RL) and working memory (WM) in human electro-encephalography to reveal single-trial computations beyond that afforded by behavior alone. Neural dynamics confirmed that increases in neural expectation were predictive of reduced neural surprise in the following feedback period, supporting central tenets of RL models. Within- and cross-trial dynamics revealed a cooperative interplay between systems for learning, in which WM contributes expectations to guide RL, despite competition between systems during choice. Together, these results provide a deeper understanding of how multiple neural systems interact for learning and decision-making and facilitate analysis of their disruption in clinical populations.

  5. A Novel Robot System Integrating Biological and Mechanical Intelligence Based on Dissociated Neural Network-Controlled Closed-Loop Environment.

    Science.gov (United States)

    Li, Yongcheng; Sun, Rong; Wang, Yuechao; Li, Hongyi; Zheng, Xiongfei

    2016-01-01

    We propose the architecture of a novel robot system merging biological and artificial intelligence based on a neural controller connected to an external agent. We initially built a framework that connected the dissociated neural network to a mobile robot system to implement a realistic vehicle. The mobile robot system characterized by a camera and two-wheeled robot was designed to execute the target-searching task. We modified a software architecture and developed a home-made stimulation generator to build a bi-directional connection between the biological and the artificial components via simple binomial coding/decoding schemes. In this paper, we utilized a specific hierarchical dissociated neural network for the first time as the neural controller. Based on our work, neural cultures were successfully employed to control an artificial agent resulting in high performance. Surprisingly, under the tetanus stimulus training, the robot performed better and better with the increasement of training cycle because of the short-term plasticity of neural network (a kind of reinforced learning). Comparing to the work previously reported, we adopted an effective experimental proposal (i.e. increasing the training cycle) to make sure of the occurrence of the short-term plasticity, and preliminarily demonstrated that the improvement of the robot's performance could be caused independently by the plasticity development of dissociated neural network. This new framework may provide some possible solutions for the learning abilities of intelligent robots by the engineering application of the plasticity processing of neural networks, also for the development of theoretical inspiration for the next generation neuro-prostheses on the basis of the bi-directional exchange of information within the hierarchical neural networks.

  6. A Novel Robot System Integrating Biological and Mechanical Intelligence Based on Dissociated Neural Network-Controlled Closed-Loop Environment.

    Directory of Open Access Journals (Sweden)

    Yongcheng Li

    Full Text Available We propose the architecture of a novel robot system merging biological and artificial intelligence based on a neural controller connected to an external agent. We initially built a framework that connected the dissociated neural network to a mobile robot system to implement a realistic vehicle. The mobile robot system characterized by a camera and two-wheeled robot was designed to execute the target-searching task. We modified a software architecture and developed a home-made stimulation generator to build a bi-directional connection between the biological and the artificial components via simple binomial coding/decoding schemes. In this paper, we utilized a specific hierarchical dissociated neural network for the first time as the neural controller. Based on our work, neural cultures were successfully employed to control an artificial agent resulting in high performance. Surprisingly, under the tetanus stimulus training, the robot performed better and better with the increasement of training cycle because of the short-term plasticity of neural network (a kind of reinforced learning. Comparing to the work previously reported, we adopted an effective experimental proposal (i.e. increasing the training cycle to make sure of the occurrence of the short-term plasticity, and preliminarily demonstrated that the improvement of the robot's performance could be caused independently by the plasticity development of dissociated neural network. This new framework may provide some possible solutions for the learning abilities of intelligent robots by the engineering application of the plasticity processing of neural networks, also for the development of theoretical inspiration for the next generation neuro-prostheses on the basis of the bi-directional exchange of information within the hierarchical neural networks.

  7. Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

    Directory of Open Access Journals (Sweden)

    Nicolas Frémaux

    2013-04-01

    Full Text Available Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD learning of Doya (2000 to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

  8. Comparison of haptic guidance and error amplification robotic trainings for the learning of a timing-based motor task by healthy seniors.

    Science.gov (United States)

    Bouchard, Amy E; Corriveau, Hélène; Milot, Marie-Hélène

    2015-01-01

    With age, a decline in the temporal aspect of movement is observed such as a longer movement execution time and a decreased timing accuracy. Robotic training can represent an interesting approach to help improve movement timing among the elderly. Two types of robotic training-haptic guidance (HG; demonstrating the correct movement for a better movement planning and improved execution of movement) and error amplification (EA; exaggerating movement errors to have a more rapid and complete learning) have been positively used in young healthy subjects to boost timing accuracy. For healthy seniors, only HG training has been used so far where significant and positive timing gains have been obtained. The goal of the study was to evaluate and compare the impact of both HG and EA robotic trainings on the improvement of seniors' movement timing. Thirty-two healthy seniors (mean age 68 ± 4 years) learned to play a pinball-like game by triggering a one-degree-of-freedom hand robot at the proper time to make a flipper move and direct a falling ball toward a randomly positioned target. During HG and EA robotic trainings, the subjects' timing errors were decreased and increased, respectively, based on the subjects' timing errors in initiating a movement. Results showed that only HG training benefited learning, but the improvement did not generalize to untrained targets. Also, age had no influence on the efficacy of HG robotic training, meaning that the oldest subjects did not benefit more from HG training than the younger senior subjects. Using HG to teach the correct timing of movement seems to be a good strategy to improve motor learning for the elderly as for younger people. However, more studies are needed to assess the long-term impact of HG robotic training on improvement in movement timing.

  9. Closed-loop adaptation of neurofeedback based on mental effort facilitates reinforcement learning of brain self-regulation.

    Science.gov (United States)

    Bauer, Robert; Fels, Meike; Royter, Vladislav; Raco, Valerio; Gharabaghi, Alireza

    2016-09-01

    Considering self-rated mental effort during neurofeedback may improve training of brain self-regulation. Twenty-one healthy, right-handed subjects performed kinesthetic motor imagery of opening their left hand, while threshold-based classification of beta-band desynchronization resulted in proprioceptive robotic feedback. The experiment consisted of two blocks in a cross-over design. The participants rated their perceived mental effort nine times per block. In the adaptive block, the threshold was adjusted on the basis of these ratings whereas adjustments were carried out at random in the other block. Electroencephalography was used to examine the cortical activation patterns during the training sessions. The perceived mental effort was correlated with the difficulty threshold of neurofeedback training. Adaptive threshold-setting reduced mental effort and increased the classification accuracy and positive predictive value. This was paralleled by an inter-hemispheric cortical activation pattern in low frequency bands connecting the right frontal and left parietal areas. Optimal balance of mental effort was achieved at thresholds significantly higher than maximum classification accuracy. Rating of mental effort is a feasible approach for effective threshold-adaptation during neurofeedback training. Closed-loop adaptation of the neurofeedback difficulty level facilitates reinforcement learning of brain self-regulation. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  10. A Passive Learning Sensor Architecture for Multimodal Image Labeling: An Application for Social Robots

    Directory of Open Access Journals (Sweden)

    Marco A. Gutiérrez

    2017-02-01

    Full Text Available Object detection and classification have countless applications in human–robot interacting systems. It is a necessary skill for autonomous robots that perform tasks in household scenarios. Despite the great advances in deep learning and computer vision, social robots performing non-trivial tasks usually spend most of their time finding and modeling objects. Working in real scenarios means dealing with constant environment changes and relatively low-quality sensor data due to the distance at which objects are often found. Ambient intelligence systems equipped with different sensors can also benefit from the ability to find objects, enabling them to inform humans about their location. For these applications to succeed, systems need to detect the objects that may potentially contain other objects, working with relatively low-resolution sensor data. A passive learning architecture for sensors has been designed in order to take advantage of multimodal information, obtained using an RGB-D camera and trained semantic language models. The main contribution of the architecture lies in the improvement of the performance of the sensor under conditions of low resolution and high light variations using a combination of image labeling and word semantics. The tests performed on each of the stages of the architecture compare this solution with current research labeling techniques for the application of an autonomous social robot working in an apartment. The results obtained demonstrate that the proposed sensor architecture outperforms state-of-the-art approaches.

  11. IMPLEMENTATION OF MULTIAGENT REINFORCEMENT LEARNING MECHANISM FOR OPTIMAL ISLANDING OPERATION OF DISTRIBUTION NETWORK

    DEFF Research Database (Denmark)

    Saleem, Arshad; Lind, Morten

    2008-01-01

    among electric power utilities to utilize modern information and communication technologies (ICT) in order to improve the automation of the distribution system. In this paper we present our work for the implementation of a dynamic multi-agent based distributed reinforcement learning mechanism...

  12. Flat vs. Expressive Storytelling: Young Children's Learning and Retention of a Social Robot's Narrative.

    Science.gov (United States)

    Kory Westlund, Jacqueline M; Jeong, Sooyeon; Park, Hae W; Ronfard, Samuel; Adhikari, Aradhana; Harris, Paul L; DeSteno, David; Breazeal, Cynthia L

    2017-01-01

    Prior research with preschool children has established that dialogic or active book reading is an effective method for expanding young children's vocabulary. In this exploratory study, we asked whether similar benefits are observed when a robot engages in dialogic reading with preschoolers. Given the established effectiveness of active reading, we also asked whether this effectiveness was critically dependent on the expressive characteristics of the robot. For approximately half the children, the robot's active reading was expressive; the robot's voice included a wide range of intonation and emotion ( Expressive ). For the remaining children, the robot read and conversed with a flat voice, which sounded similar to a classic text-to-speech engine and had little dynamic range ( Flat ). The robot's movements were kept constant across conditions. We performed a verification study using Amazon Mechanical Turk (AMT) to confirm that the Expressive robot was viewed as significantly more expressive, more emotional, and less passive than the Flat robot. We invited 45 preschoolers with an average age of 5 years who were either English Language Learners (ELL), bilingual, or native English speakers to engage in the reading task with the robot. The robot narrated a story from a picture book, using active reading techniques and including a set of target vocabulary words in the narration. Children were post-tested on the vocabulary words and were also asked to retell the story to a puppet. A subset of 34 children performed a second story retelling 4-6 weeks later. Children reported liking and learning from the robot a similar amount in the Expressive and Flat conditions. However, as compared to children in the Flat condition, children in the Expressive condition were more concentrated and engaged as indexed by their facial expressions; they emulated the robot's story more in their story retells; and they told longer stories during their delayed retelling. Furthermore, children who

  13. Robotic Construction Kits as Computational Manipulatives for Learning in the STEM Disciplines

    Science.gov (United States)

    Sullivan, Florence R.; Heffernan, John

    2016-01-01

    This article presents a systematic review of research related to the use of robotics construction kits (RCKs) in P-12 learning in the STEM disciplines for typically developing children. The purpose of this review is to configure primarily qualitative and mixed methods findings from studies meeting our selection and quality criterion to answer the…

  14. Remotely controlling of mobile robots using gesture captured by the Kinect and recognized by machine learning method

    Science.gov (United States)

    Hsu, Roy CHaoming; Jian, Jhih-Wei; Lin, Chih-Chuan; Lai, Chien-Hung; Liu, Cheng-Ting

    2013-01-01

    The main purpose of this paper is to use machine learning method and Kinect and its body sensation technology to design a simple, convenient, yet effective robot remote control system. In this study, a Kinect sensor is used to capture the human body skeleton with depth information, and a gesture training and identification method is designed using the back propagation neural network to remotely command a mobile robot for certain actions via the Bluetooth. The experimental results show that the designed mobile robots remote control system can achieve, on an average, more than 96% of accurate identification of 7 types of gestures and can effectively control a real e-puck robot for the designed commands.

  15. Feedback error learning controller for functional electrical stimulation assistance in a hybrid robotic system for reaching rehabilitation

    Directory of Open Access Journals (Sweden)

    Francisco Resquín

    2016-07-01

    Full Text Available Hybrid robotic systems represent a novel research field, where functional electrical stimulation (FES is combined with a robotic device for rehabilitation of motor impairment. Under this approach, the design of robust FES controllers still remains an open challenge. In this work, we aimed at developing a learning FES controller to assist in the performance of reaching movements in a simple hybrid robotic system setting. We implemented a Feedback Error Learning (FEL control strategy consisting of a feedback PID controller and a feedforward controller based on a neural network. A passive exoskeleton complemented the FES controller by compensating the effects of gravity. We carried out experiments with healthy subjects to validate the performance of the system. Results show that the FEL control strategy is able to adjust the FES intensity to track the desired trajectory accurately without the need of a previous mathematical model.

  16. Nonparametric Online Learning Control for Soft Continuum Robot: An Enabling Technique for Effective Endoscopic Navigation

    Science.gov (United States)

    Lee, Kit-Hang; Fu, Denny K.C.; Leong, Martin C.W.; Chow, Marco; Fu, Hing-Choi; Althoefer, Kaspar; Sze, Kam Yim; Yeung, Chung-Kwong

    2017-01-01

    Abstract Bioinspired robotic structures comprising soft actuation units have attracted increasing research interest. Taking advantage of its inherent compliance, soft robots can assure safe interaction with external environments, provided that precise and effective manipulation could be achieved. Endoscopy is a typical application. However, previous model-based control approaches often require simplified geometric assumptions on the soft manipulator, but which could be very inaccurate in the presence of unmodeled external interaction forces. In this study, we propose a generic control framework based on nonparametric and online, as well as local, training to learn the inverse model directly, without prior knowledge of the robot's structural parameters. Detailed experimental evaluation was conducted on a soft robot prototype with control redundancy, performing trajectory tracking in dynamically constrained environments. Advanced element formulation of finite element analysis is employed to initialize the control policy, hence eliminating the need for random exploration in the robot's workspace. The proposed control framework enabled a soft fluid-driven continuum robot to follow a 3D trajectory precisely, even under dynamic external disturbance. Such enhanced control accuracy and adaptability would facilitate effective endoscopic navigation in complex and changing environments. PMID:29251567

  17. Automatic learning rate adjustment for self-supervising autonomous robot control

    Science.gov (United States)

    Arras, Michael K.; Protzel, Peter W.; Palumbo, Daniel L.

    1992-01-01

    Described is an application in which an Artificial Neural Network (ANN) controls the positioning of a robot arm with five degrees of freedom by using visual feedback provided by two cameras. This application and the specific ANN model, local liner maps, are based on the work of Ritter, Martinetz, and Schulten. We extended their approach by generating a filtered, average positioning error from the continuous camera feedback and by coupling the learning rate to this error. When the network learns to position the arm, the positioning error decreases and so does the learning rate until the system stabilizes at a minimum error and learning rate. This abolishes the need for a predetermined cooling schedule. The automatic cooling procedure results in a closed loop control with no distinction between a learning phase and a production phase. If the positioning error suddenly starts to increase due to an internal failure such as a broken joint, or an environmental change such as a camera moving, the learning rate increases accordingly. Thus, learning is automatically activated and the network adapts to the new condition after which the error decreases again and learning is 'shut off'. The automatic cooling is therefore a prerequisite for the autonomy and the fault tolerance of the system.

  18. Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer.

    Science.gov (United States)

    Hu, Yujing; Gao, Yang; An, Bo

    2015-07-01

    An important approach in multiagent reinforcement learning (MARL) is equilibrium-based MARL, which adopts equilibrium solution concepts in game theory and requires agents to play equilibrium strategies at each state. However, most existing equilibrium-based MARL algorithms cannot scale due to a large number of computationally expensive equilibrium computations (e.g., computing Nash equilibria is PPAD-hard) during learning. For the first time, this paper finds that during the learning process of equilibrium-based MARL, the one-shot games corresponding to each state's successive visits often have the same or similar equilibria (for some states more than 90% of games corresponding to successive visits have similar equilibria). Inspired by this observation, this paper proposes to use equilibrium transfer to accelerate equilibrium-based MARL. The key idea of equilibrium transfer is to reuse previously computed equilibria when each agent has a small incentive to deviate. By introducing transfer loss and transfer condition, a novel framework called equilibrium transfer-based MARL is proposed. We prove that although equilibrium transfer brings transfer loss, equilibrium-based MARL algorithms can still converge to an equilibrium policy under certain assumptions. Experimental results in widely used benchmarks (e.g., grid world game, soccer game, and wall game) show that the proposed framework: 1) not only significantly accelerates equilibrium-based MARL (up to 96.7% reduction in learning time), but also achieves higher average rewards than algorithms without equilibrium transfer and 2) scales significantly better than algorithms without equilibrium transfer when the state/action space grows and the number of agents increases.

  19. A Robot-Partner for Preschool Children Learning English Using Socio-Cognitive Conflict

    Science.gov (United States)

    Mazzoni, Elvis; Benvenuti, Martina

    2015-01-01

    This paper presents an exploratory study in which a humanoid robot (MecWilly) acted as a partner to preschool children, helping them to learn English words. In order to use the Socio-Cognitive Conflict paradigm to induce the knowledge acquisition process, we designed a playful activity in which children worked in pairs with another child or with…

  20. Correction of Visual Perception Based on Neuro-Fuzzy Learning for the Humanoid Robot TEO

    Directory of Open Access Journals (Sweden)

    Juan Hernandez-Vicen

    2018-03-01

    Full Text Available New applications related to robotic manipulation or transportation tasks, with or without physical grasping, are continuously being developed. To perform these activities, the robot takes advantage of different kinds of perceptions. One of the key perceptions in robotics is vision. However, some problems related to image processing makes the application of visual information within robot control algorithms difficult. Camera-based systems have inherent errors that affect the quality and reliability of the information obtained. The need of correcting image distortion slows down image parameter computing, which decreases performance of control algorithms. In this paper, a new approach to correcting several sources of visual distortions on images in only one computing step is proposed. The goal of this system/algorithm is the computation of the tilt angle of an object transported by a robot, minimizing image inherent errors and increasing computing speed. After capturing the image, the computer system extracts the angle using a Fuzzy filter that corrects at the same time all possible distortions, obtaining the real angle in only one processing step. This filter has been developed by the means of Neuro-Fuzzy learning techniques, using datasets with information obtained from real experiments. In this way, the computing time has been decreased and the performance of the application has been improved. The resulting algorithm has been tried out experimentally in robot transportation tasks in the humanoid robot TEO (Task Environment Operator from the University Carlos III of Madrid.

  1. Energy Management Strategy for a Hybrid Electric Vehicle Based on Deep Reinforcement Learning

    OpenAIRE

    Yue Hu; Weimin Li; Kun Xu; Taimoor Zahid; Feiyan Qin; Chenming Li

    2018-01-01

    An energy management strategy (EMS) is important for hybrid electric vehicles (HEVs) since it plays a decisive role on the performance of the vehicle. However, the variation of future driving conditions deeply influences the effectiveness of the EMS. Most existing EMS methods simply follow predefined rules that are not adaptive to different driving conditions online. Therefore, it is useful that the EMS can learn from the environment or driving cycle. In this paper, a deep reinforcement learn...

  2. Reinforcement learning produces dominant strategies for the Iterated Prisoner's Dilemma.

    Science.gov (United States)

    Harper, Marc; Knight, Vincent; Jones, Martin; Koutsovoulos, Georgios; Glynatsi, Nikoleta E; Campbell, Owen

    2017-01-01

    We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.

  3. Interactive Exploration Robots: Human-Robotic Collaboration and Interactions

    Science.gov (United States)

    Fong, Terry

    2017-01-01

    For decades, NASA has employed different operational approaches for human and robotic missions. Human spaceflight missions to the Moon and in low Earth orbit have relied upon near-continuous communication with minimal time delays. During these missions, astronauts and mission control communicate interactively to perform tasks and resolve problems in real-time. In contrast, deep-space robotic missions are designed for operations in the presence of significant communication delay - from tens of minutes to hours. Consequently, robotic missions typically employ meticulously scripted and validated command sequences that are intermittently uplinked to the robot for independent execution over long periods. Over the next few years, however, we will see increasing use of robots that blend these two operational approaches. These interactive exploration robots will be remotely operated by humans on Earth or from a spacecraft. These robots will be used to support astronauts on the International Space Station (ISS), to conduct new missions to the Moon, and potentially to enable remote exploration of planetary surfaces in real-time. In this talk, I will discuss the technical challenges associated with building and operating robots in this manner, along with lessons learned from research conducted with the ISS and in the field.

  4. Lessons learned from the STS-120/ISS 10A robotics operations

    Science.gov (United States)

    Aziz, Sarmad

    2010-01-01

    analysis of the robotics operations executed during the STS-120 and 10A stage mission. The paper will highlight the unique challenges associated with the planning and execution of the P6 truss and Node 2 module relocation tasks, as well as the robotics issues faced during the planning and execution of the solar array repair space walk. The analysis will address the operational techniques and mission planning guideline used to deal with tight timelines, structural loads issues, ISS attitude control issues, and complex interdependencies between various ISS systems during the assembly and solar array repair operations. A discussion of the lessons learned from the planning and execution of these complex robotics tasks will also be presented.

  5. Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex.

    Directory of Open Access Journals (Sweden)

    George L Chadderdon

    Full Text Available Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint "forearm" to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1, no learning (0, or punishment (-1, corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.

  6. Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex.

    Science.gov (United States)

    Chadderdon, George L; Neymotin, Samuel A; Kerr, Cliff C; Lytton, William W

    2012-01-01

    Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint "forearm" to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1), no learning (0), or punishment (-1), corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.

  7. Marine Robot Autonomy

    CERN Document Server

    2013-01-01

    Autonomy for Marine Robots provides a timely and insightful overview of intelligent autonomy in marine robots. A brief history of this emerging field is provided, along with a discussion of the challenges unique to the underwater environment and their impact on the level of intelligent autonomy required.  Topics covered at length examine advanced frameworks, path-planning, fault tolerance, machine learning, and cooperation as relevant to marine robots that need intelligent autonomy.  This book also: Discusses and offers solutions for the unique challenges presented by more complex missions and the dynamic underwater environment when operating autonomous marine robots Includes case studies that demonstrate intelligent autonomy in marine robots to perform underwater simultaneous localization and mapping  Autonomy for Marine Robots is an ideal book for researchers and engineers interested in the field of marine robots.      

  8. Students Learn Programming Faster through Robotic Simulation

    Science.gov (United States)

    Liu, Allison; Newsom, Jeff; Schunn, Chris; Shoop, Robin

    2013-01-01

    Schools everywhere are using robotics education to engage kids in applied science, technology, engineering, and mathematics (STEM) activities, but teaching programming can be challenging due to lack of resources. This article reports on using Robot Virtual Worlds (RVW) and curriculum available on the Internet to teach robot programming. It also…

  9. Knowledge-Based Reinforcement Learning for Data Mining

    Science.gov (United States)

    Kudenko, Daniel; Grzes, Marek

    Data Mining is the process of extracting patterns from data. Two general avenues of research in the intersecting areas of agents and data mining can be distinguished. The first approach is concerned with mining an agent’s observation data in order to extract patterns, categorize environment states, and/or make predictions of future states. In this setting, data is normally available as a batch, and the agent’s actions and goals are often independent of the data mining task. The data collection is mainly considered as a side effect of the agent’s activities. Machine learning techniques applied in such situations fall into the class of supervised learning. In contrast, the second scenario occurs where an agent is actively performing the data mining, and is responsible for the data collection itself. For example, a mobile network agent is acquiring and processing data (where the acquisition may incur a certain cost), or a mobile sensor agent is moving in a (perhaps hostile) environment, collecting and processing sensor readings. In these settings, the tasks of the agent and the data mining are highly intertwined and interdependent (or even identical). Supervised learning is not a suitable technique for these cases. Reinforcement Learning (RL) enables an agent to learn from experience (in form of reward and punishment for explorative actions) and adapt to new situations, without a teacher. RL is an ideal learning technique for these data mining scenarios, because it fits the agent paradigm of continuous sensing and acting, and the RL agent is able to learn to make decisions on the sampling of the environment which provides the data. Nevertheless, RL still suffers from scalability problems, which have prevented its successful use in many complex real-world domains. The more complex the tasks, the longer it takes a reinforcement learning algorithm to converge to a good solution. For many real-world tasks, human expert knowledge is available. For example, human

  10. Towards a synergy framework across neuroscience and robotics: Lessons learned and open questions. Reply to comments on: "Hand synergies: Integration of robotics and neuroscience for understanding the control of biological and artificial hands"

    Science.gov (United States)

    Santello, Marco; Bianchi, Matteo; Gabiccini, Marco; Ricciardi, Emiliano; Salvietti, Gionata; Prattichizzo, Domenico; Ernst, Marc; Moscatelli, Alessandro; Jorntell, Henrik; Kappers, Astrid M. L.; Kyriakopoulos, Kostas; Schaeffer, Alin Abu; Castellini, Claudio; Bicchi, Antonio

    2016-07-01

    We would like to thank all commentators for their insightful commentaries. Thanks to their diverse and complementary expertise in neuroscience and robotics, the commentators have provided us with the opportunity to further discuss state-of-the-art and gaps in the integration of neuroscience and robotics reviewed in our article. We organized our reply in two sections that capture the main points of all commentaries [1-9]: (1) Advantages and limitations of the synergy approach in neuroscience and robotics, and (2) Learning and role of sensory feedback in biological and robotics synergies.

  11. An Infant Development-inspired Approach to Robot Hand-eye Coordination

    Directory of Open Access Journals (Sweden)

    Fei Chao

    2014-02-01

    Full Text Available This paper presents a novel developmental learning approach for hand-eye coordination in an autonomous robotic system. Robotic hand-eye coordination plays an important role in dealing with real-time environments. Under the approach, infant developmental patterns are introduced to build our robot's learning system. The method works by first constructing a brain-like computational structure to control the robot, and then by using infant behavioural patterns to build a hand-eye coordination learning algorithm. This work is supported by an experimental evaluation, which shows that the control system is implemented simply, and that the learning approach provides fast and incremental learning of behavioural competence.

  12. Learning to locate an odour source with a mobile robot

    OpenAIRE

    Duckett, T.; Axelsson, M.; Saffiotti, A.

    2001-01-01

    We address the problem of enabling a mobile robot to locate a stationary odour source using an electronic nose constructed from gas sensors. On the hardware side, we use a stereo nose architecture consisting of two parallel chambers, each containing an identical set of sensors. On the software side, we use a recurrent artificial neural network to learn the direction to a stationary source from a time series of sensor readings. This contrasts with previous approaches, that rely on the existenc...

  13. Reinforcement learning produces dominant strategies for the Iterated Prisoner's Dilemma.

    Directory of Open Access Journals (Sweden)

    Marc Harper

    Full Text Available We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms. These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.

  14. Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task.

    Science.gov (United States)

    Kinjo, Ken; Uchibe, Eiji; Doya, Kenji

    2013-01-01

    Linearly solvable Markov Decision Process (LMDP) is a class of optimal control problem in which the Bellman's equation can be converted into a linear equation by an exponential transformation of the state value function (Todorov, 2009b). In an LMDP, the optimal value function and the corresponding control policy are obtained by solving an eigenvalue problem in a discrete state space or an eigenfunction problem in a continuous state using the knowledge of the system dynamics and the action, state, and terminal cost functions. In this study, we evaluate the effectiveness of the LMDP framework in real robot control, in which the dynamics of the body and the environment have to be learned from experience. We first perform a simulation study of a pole swing-up task to evaluate the effect of the accuracy of the learned dynamics model on the derived the action policy. The result shows that a crude linear approximation of the non-linear dynamics can still allow solution of the task, despite with a higher total cost. We then perform real robot experiments of a battery-catching task using our Spring Dog mobile robot platform. The state is given by the position and the size of a battery in its camera view and two neck joint angles. The action is the velocities of two wheels, while the neck joints were controlled by a visual servo controller. We test linear and bilinear dynamic models in tasks with quadratic and Guassian state cost functions. In the quadratic cost task, the LMDP controller derived from a learned linear dynamics model performed equivalently with the optimal linear quadratic regulator (LQR). In the non-quadratic task, the LMDP controller with a linear dynamics model showed the best performance. The results demonstrate the usefulness of the LMDP framework in real robot control even when simple linear models are used for dynamics learning.

  15. Unibot, a Universal Agent Architecture for Robots

    Directory of Open Access Journals (Sweden)

    Saša Mladenović

    2017-01-01

    Full Text Available Today there are numerous robots in different applications domains despite the fact that they still have limitations in perception, actuation and decision process. Consequently, robots usually have limited autonomy, they are domain specific or have difficulty to adapt on new environments. Learning is the property that makes an agent intelligent and the crucial property that a robot should have to proliferate into the human society. Embedding the learning ability into the robot may simplify the development of the robot control mechanism. The motivation for this research is to develop the agent architecture of the universal robot – Unibot. In our approach the agent is the robot i.e. Unibot that acts in the physical world and is capable of learning. The Unibot conducts several simultaneous simulations of a problem of interest like path-finding. The novelty in our approach is the Multi-Agent Decision Support System which is developed and integrated into the Unibot agent architecture in order to execute simultaneous simulations. Furthermore, the Unibot calculates and evaluates between multiple solutions, decides which action should be performed and performs the action. The prototype of the Unibot agent architecture is described and evaluated in the experiment supported by the Lego Mindstorms robot and the NetLogo.

  16. Memory Transformation Enhances Reinforcement Learning in Dynamic Environments.

    Science.gov (United States)

    Santoro, Adam; Frankland, Paul W; Richards, Blake A

    2016-11-30

    Over the course of systems consolidation, there is a switch from a reliance on detailed episodic memories to generalized schematic memories. This switch is sometimes referred to as "memory transformation." Here we demonstrate a previously unappreciated benefit of memory transformation, namely, its ability to enhance reinforcement learning in a dynamic environment. We developed a neural network that is trained to find rewards in a foraging task where reward locations are continuously changing. The network can use memories for specific locations (episodic memories) and statistical patterns of locations (schematic memories) to guide its search. We find that switching from an episodic to a schematic strategy over time leads to enhanced performance due to the tendency for the reward location to be highly correlated with itself in the short-term, but regress to a stable distribution in the long-term. We also show that the statistics of the environment determine the optimal utilization of both types of memory. Our work recasts the theoretical question of why memory transformation occurs, shifting the focus from the avoidance of memory interference toward the enhancement of reinforcement learning across multiple timescales. As time passes, memories transform from a highly detailed state to a more gist-like state, in a process called "memory transformation." Theories of memory transformation speak to its advantages in terms of reducing memory interference, increasing memory robustness, and building models of the environment. However, the role of memory transformation from the perspective of an agent that continuously acts and receives reward in its environment is not well explored. In this work, we demonstrate a view of memory transformation that defines it as a way of optimizing behavior across multiple timescales. Copyright © 2016 the authors 0270-6474/16/3612228-15$15.00/0.

  17. Robotic Motion Learning Framework to Promote Social Engagement

    Directory of Open Access Journals (Sweden)

    Rachael Burns

    2018-02-01

    Full Text Available Imitation is a powerful component of communication between people, and it poses an important implication in improving the quality of interaction in the field of human–robot interaction (HRI. This paper discusses a novel framework designed to improve human–robot interaction through robotic imitation of a participant’s gestures. In our experiment, a humanoid robotic agent socializes with and plays games with a participant. For the experimental group, the robot additionally imitates one of the participant’s novel gestures during a play session. We hypothesize that the robot’s use of imitation will increase the participant’s openness towards engaging with the robot. Experimental results from a user study of 12 subjects show that post-imitation, experimental subjects displayed a more positive emotional state, had higher instances of mood contagion towards the robot, and interpreted the robot to have a higher level of autonomy than their control group counterparts did. These results point to an increased participant interest in engagement fueled by personalized imitation during interaction.

  18. Multi-Objective Reinforcement Learning-Based Deep Neural Networks for Cognitive Space Communications

    Science.gov (United States)

    Ferreria, Paulo Victor R.; Paffenroth, Randy; Wyglinski, Alexander M.; Hackett, Timothy M.; Bilen, Sven G.; Reinhart, Richard C.; Mortensen, Dale J.

    2017-01-01

    Future communication subsystems of space exploration missions can potentially benefit from software-defined radios (SDRs) controlled by machine learning algorithms. In this paper, we propose a novel hybrid radio resource allocation management control algorithm that integrates multi-objective reinforcement learning and deep artificial neural networks. The objective is to efficiently manage communications system resources by monitoring performance functions with common dependent variables that result in conflicting goals. The uncertainty in the performance of thousands of different possible combinations of radio parameters makes the trade-off between exploration and exploitation in reinforcement learning (RL) much more challenging for future critical space-based missions. Thus, the system should spend as little time as possible on exploring actions, and whenever it explores an action, it should perform at acceptable levels most of the time. The proposed approach enables on-line learning by interactions with the environment and restricts poor resource allocation performance through virtual environment exploration. Improvements in the multiobjective performance can be achieved via transmitter parameter adaptation on a packet-basis, with poorly predicted performance promptly resulting in rejected decisions. Simulations presented in this work considered the DVB-S2 standard adaptive transmitter parameters and additional ones expected to be present in future adaptive radio systems. Performance results are provided by analysis of the proposed hybrid algorithm when operating across a satellite communication channel from Earth to GEO orbit during clear sky conditions. The proposed approach constitutes part of the core cognitive engine proof-of-concept to be delivered to the NASA Glenn Research Center SCaN Testbed located onboard the International Space Station.

  19. A Day-to-Day Route Choice Model Based on Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Fangfang Wei

    2014-01-01

    Full Text Available Day-to-day traffic dynamics are generated by individual traveler’s route choice and route adjustment behaviors, which are appropriate to be researched by using agent-based model and learning theory. In this paper, we propose a day-to-day route choice model based on reinforcement learning and multiagent simulation. Travelers’ memory, learning rate, and experience cognition are taken into account. Then the model is verified and analyzed. Results show that the network flow can converge to user equilibrium (UE if travelers can remember all the travel time they have experienced, but which is not necessarily the case under limited memory; learning rate can strengthen the flow fluctuation, but memory leads to the contrary side; moreover, high learning rate results in the cyclical oscillation during the process of flow evolution. Finally, both the scenarios of link capacity degradation and random link capacity are used to illustrate the model’s applications. Analyses and applications of our model demonstrate the model is reasonable and useful for studying the day-to-day traffic dynamics.

  20. Robotic Mission to Mars: Hands-on, minds-on, web-based learning

    Science.gov (United States)

    Mathers, Naomi; Goktogen, Ali; Rankin, John; Anderson, Marion

    2012-11-01

    Problem-based learning has been demonstrated as an effective methodology for developing analytical skills and critical thinking. The use of scenario-based learning incorporates problem-based learning whilst encouraging students to collaborate with their colleagues and dynamically adapt to their environment. This increased interaction stimulates a deeper understanding and the generation of new knowledge. The Victorian Space Science Education Centre (VSSEC) uses scenario-based learning in its Mission to Mars, Mission to the Orbiting Space Laboratory and Primary Expedition to the M.A.R.S. Base programs. These programs utilize methodologies such as hands-on applications, immersive-learning, integrated technologies, critical thinking and mentoring to engage students in Science, Technology, Engineering and Mathematics (STEM) and highlight potential career paths in science and engineering. The immersive nature of the programs demands specialist environments such as a simulated Mars environment, Mission Control and Space Laboratory, thus restricting these programs to a physical location and limiting student access to the programs. To move beyond these limitations, VSSEC worked with its university partners to develop a web-based mission that delivered the benefits of scenario-based learning within a school environment. The Robotic Mission to Mars allows students to remotely control a real rover, developed by the Australian Centre for Field Robotics (ACFR), on the VSSEC Mars surface. After completing a pre-mission training program and site selection activity, students take on the roles of scientists and engineers in Mission Control to complete a mission and collect data for further analysis. Mission Control is established using software developed by the ACRI Games Technology Lab at La Trobe University using the principles of serious gaming. The software allows students to control the rover, monitor its systems and collect scientific data for analysis. This program encourages

  1. Reinforcement Learning Based Data Self-Destruction Scheme for Secured Data Management

    Directory of Open Access Journals (Sweden)

    Young Ki Kim

    2018-04-01

    Full Text Available As technologies and services that leverage cloud computing have evolved, the number of businesses and individuals who use them are increasing rapidly. In the course of using cloud services, as users store and use data that include personal information, research on privacy protection models to protect sensitive information in the cloud environment is becoming more important. As a solution to this problem, a self-destructing scheme has been proposed that prevents the decryption of encrypted user data after a certain period of time using a Distributed Hash Table (DHT network. However, the existing self-destructing scheme does not mention how to set the number of key shares and the threshold value considering the environment of the dynamic DHT network. This paper proposes a method to set the parameters to generate the key shares needed for the self-destructing scheme considering the availability and security of data. The proposed method defines state, action, and reward of the reinforcement learning model based on the similarity of the graph, and applies the self-destructing scheme process by updating the parameter based on the reinforcement learning model. Through the proposed technique, key sharing parameters can be set in consideration of data availability and security in dynamic DHT network environments.

  2. Are Commercial "Personal Robots" Ready for Language Learning? Focus on Second Language Speech

    Science.gov (United States)

    Moussalli, Souheila; Cardoso, Walcir

    2016-01-01

    Today's language classrooms are challenged with limited classroom time and lack of input, and output practice in a stress-free environment (Hsu, 2015). The use of commercial, readily available tools such as Personal Robots (PRs; e.g. Amazon's Echo, Jibo) might promote language learning by freeing up class time, allowing for a more focused…

  3. Reasons for singularity in robot teleoperation

    DEFF Research Database (Denmark)

    Marhenke, Ilka; Fischer, Kerstin; Savarimuthu, Thiusius Rajeeth

    2014-01-01

    In this paper, the causes for singularity of a robot arm in teleoperation for robot learning from demonstration are analyzed. Singularity is the alignment of robot joints, which prevents the configuration of the inverse kinematics. Inspired by users' own hypotheses, we investigated speed and dela...

  4. Going Green Robots

    Science.gov (United States)

    Nelson, Jacqueline M.

    2011-01-01

    In looking at the interesting shapes and sizes of old computer parts, creating robots quickly came to the author's mind. In this article, she describes how computer parts can be used creatively. Students will surely enjoy creating their very own robots while learning about the importance of recycling in the society. (Contains 1 online resource.)

  5. Measuring and Modelling Delays in Robot Manipulators for Temporally Precise Control using Machine Learning

    DEFF Research Database (Denmark)

    Andersen, Thomas Timm; Amor, Heni Ben; Andersen, Nils Axel

    2015-01-01

    and separate. In this paper, we present a data-driven methodology for separating and modelling inherent delays during robot control. We show how both actuation and response delays can be modelled using modern machine learning methods. The resulting models can be used to predict the delays as well...

  6. Literature review of tufted reinforcement for composite structures

    Science.gov (United States)

    Gnaba, I.; Legrand, X.; Wang, P.; Soulat, D.

    2017-10-01

    In order to minimize the damage caused by the 2D structures, several research have been done on more complex structures (3D-preforms) which have more interesting mechanical characteristics. Divers textile technologies are used to manufacture 3D preforms such as weaving, knitting, stitching, z-pinning, tufting… This kind of reinforcement aims to achieve a balance between the in-plane and out-of-plane properties. Recently, the tufting technology shows more opportunities to develop 3D reinforcements especially with the advances in robotics. The present paper focuses not only on the various technologies of reinforcement through the thickness but also on the mechanical behaviour of a tufted preform in a stamping process.

  7. Novelty Detection for Interactive Pose Recognition by a Social Robot

    Directory of Open Access Journals (Sweden)

    Victor Gonzalez-Pacheco

    2015-04-01

    Full Text Available Active robot learners take an active role in their own learning by making queries to their human teachers when they receive new data. However, not every received input is useful for the robot, and asking for non-informative inputs or asking too many questions might worsen the user's perception of the robot. We present a novelty detection system that enables a robot to ask labels for new stimuli only when they seem both novel and interesting. Our system separates the decision process into two steps: first, it discriminates novel from known stimuli, and second, it estimates if these stimuli are likely to happen again. Our approach uses the notion of curiosity, which controls the eagerness with which the robot asks questions to the user. We evaluate our approach in the domain of pose learning by training our robot with a set of pointing poses able to detect up to 84%, 79%, and 78% of the observed novelties in three different experiments. Our approach enables robots to keep learning continuously, even after training is finished. The introduction of the curiosity parameter allows tuning, for the conditions in which the robot should want to learn more.

  8. Self-Imitation and Environmental Scaffolding for Robot Teaching

    Directory of Open Access Journals (Sweden)

    Joe Saunders

    2007-03-01

    Full Text Available Imitative learning and learning by observation are social mechanisms that allow a robot to acquire knowledge from a human or another robot. However to be able to obtain skills in this way the robot faces many complex issues, one of which is that of finding solutions to the correspondence problem. Evolutionary predecessors to observational imitation may have been self-imitation where an agent avoids the complexities of the correspondence problem by learning and replicating actions it has experienced through the manipulation of its body. We investigate how a robotic control and teaching system using self-imitation can be constructed with reference to psychological models of motor control and ideas from social scaffolding seen in animals. Within these scaffolded environments sets of competencies can be built by constructing hierarchical state/action memory maps of the robot's interaction within that environment. The scaffolding process provides a mechanism to enable learning to be scaled up. The resulting system allows a human trainer to teach a robot new skills and modify skills that the robot may possess. Additionally the system allows the robot to notify the trainer when it is being taught skills it already has in its repertoire and to direct and focus its attention and sensor resources to relevant parts of the skill being executed. We argue that these mechanisms may be a first step towards the transformation from self-imitation to observational imitation. The system is validated on a physical pioneer robot that is taught using self-imitation to track, follow and point to a patterned object.

  9. Self-imitation and Environmental Scaffolding for Robot Teaching

    Directory of Open Access Journals (Sweden)

    Chrystopher L. Nehaniv

    2008-11-01

    Full Text Available Imitative learning and learning by observation are social mechanisms that allow a robot to acquire knowledge from a human or another robot. However to be able to obtain skills in this way the robot faces many complex issues, one of which is that of finding solutions to the correspondence problem. Evolutionary predecessors to observational imitation may have been self-imitation where an agent avoids the complexities of the correspondence problem by learning and replicating actions it has experienced through the manipulation of its body. We investigate how a robotic control and teaching system using self-imitation can be constructed with reference to psychological models of motor control and ideas from social scaffolding seen in animals. Within these scaffolded environments sets of competencies can be built by constructing hierarchical state/action memory maps of the robot's interaction within that environment. The scaffolding process provides a mechanism to enable learning to be scaled up. The resulting system allows a human trainer to teach a robot new skills and modify skills that the robot may possess. Additionally the system allows the robot to notify the trainer when it is being taught skills it already has in its repertoire and to direct and focus its attention and sensor resources to relevant parts of the skill being executed. We argue that these mechanisms may be a first step towards the transformation from self-imitation to observational imitation. The system is validated on a physical pioneer robot that is taught using self-imitation to track, follow and point to a patterned object.

  10. Design of Smart Educational Robot as a Tool For Teaching Media Based on Contextual Teaching and Learning to Improve the Skill of Electrical Engineering Student

    Science.gov (United States)

    Zuhrie, M. S.; Basuki, I.; Asto, B. I. G. P.; Anifah, L.

    2018-04-01

    The development of robotics in Indonesia has been very encouraging. The barometer is the success of the Indonesian Robot Contest. The focus of research is a teaching module manufacturing, planning mechanical design, control system through microprocessor technology and maneuverability of the robot. Contextual Teaching and Learning (CTL) strategy is the concept of learning where the teacher brings the real world into the classroom and encourage students to make connections between knowledge possessed by its application in everyday life. This research the development model used is the 4-D model. This Model consists of four stages: Define Stage, Design Stage, Develop Stage, and Disseminate Stage. This research was conducted by applying the research design development with the aim to produce a tool of learning in the form of smart educational robot modules and kit based on Contextual Teaching and Learning at the Department of Electrical Engineering to improve the skills of the Electrical Engineering student. Socialization questionnaires showed that levels of the student majoring in electrical engineering competencies image currently only limited to conventional machines. The average assessment is 3.34 validator included in either category. Modules developed can give hope to the future are able to produce Intelligent Robot Tool for Teaching.

  11. An Adaptive Robot Game

    DEFF Research Database (Denmark)

    Hansen, Søren Tranberg; Svenstrup, Mikael; Dalgaard, Lars

    2010-01-01

    The goal of this paper is to describe an adaptive robot game, which motivates elderly people to do a regular amount of physical exercise while playing. One of the advantages of robot based games is that the initiative to play can be taken autonomously by the robot. In this case, the goal is to im......The goal of this paper is to describe an adaptive robot game, which motivates elderly people to do a regular amount of physical exercise while playing. One of the advantages of robot based games is that the initiative to play can be taken autonomously by the robot. In this case, the goal...... is to improve the mental and physical state of the user by playing a physical game with the robot. Ideally, a robot game should be simple to learn but difficult to master, providing an appropriate degree of challenge for players with different skills. In order to achieve that, the robot should be able to adapt...

  12. Adaptive and Energy Efficient Walking in a Hexapod Robot under Neuromechanical Control and Sensorimotor Learning

    DEFF Research Database (Denmark)

    Xiong, Xiaofeng; Wörgötter, Florentin; Manoonpong, Poramate

    2016-01-01

    The control of multilegged animal walking is a neuromechanical process, and to achieve this in an adaptive and energy efficient way is a difficult and challenging problem. This is due to the fact that this process needs in real time: 1) to coordinate very many degrees of freedom of jointed legs; 2......) to generate the proper leg stiffness (i.e., compliance); and 3) to determine joint angles that give rise to particular positions at the endpoints of the legs. To tackle this problem for a robotic application, here we present a neuromechanical controller coupled with sensorimotor learning. The controller...... energy efficient walking, compared to other small legged robots. In addition, this paper also shows that the tight combination of neural control with tunable muscle-like functions, guided by sensory feedback and coupled with sensorimotor learning, is a way forward to better understand and solve adaptive...

  13. Tunnel Ventilation Control Using Reinforcement Learning Methodology

    Science.gov (United States)

    Chu, Baeksuk; Kim, Dongnam; Hong, Daehie; Park, Jooyoung; Chung, Jin Taek; Kim, Tae-Hyung

    The main purpose of tunnel ventilation system is to maintain CO pollutant concentration and VI (visibility index) under an adequate level to provide drivers with comfortable and safe driving environment. Moreover, it is necessary to minimize power consumption used to operate ventilation system. To achieve the objectives, the control algorithm used in this research is reinforcement learning (RL) method. RL is a goal-directed learning of a mapping from situations to actions without relying on exemplary supervision or complete models of the environment. The goal of RL is to maximize a reward which is an evaluative feedback from the environment. In the process of constructing the reward of the tunnel ventilation system, two objectives listed above are included, that is, maintaining an adequate level of pollutants and minimizing power consumption. RL algorithm based on actor-critic architecture and gradient-following algorithm is adopted to the tunnel ventilation system. The simulations results performed with real data collected from existing tunnel ventilation system and real experimental verification are provided in this paper. It is confirmed that with the suggested controller, the pollutant level inside the tunnel was well maintained under allowable limit and the performance of energy consumption was improved compared to conventional control scheme.

  14. Rats bred for helplessness exhibit positive reinforcement learning deficits which are not alleviated by an antidepressant dose of the MAO-B inhibitor deprenyl.

    Science.gov (United States)

    Schulz, Daniela; Henn, Fritz A; Petri, David; Huston, Joseph P

    2016-08-04

    Principles of negative reinforcement learning may play a critical role in the etiology and treatment of depression. We examined the integrity of positive reinforcement learning in congenitally helpless (cH) rats, an animal model of depression, using a random ratio schedule and a devaluation-extinction procedure. Furthermore, we tested whether an antidepressant dose of the monoamine oxidase (MAO)-B inhibitor deprenyl would reverse any deficits in positive reinforcement learning. We found that cH rats (n=9) were impaired in the acquisition of even simple operant contingencies, such as a fixed interval (FI) 20 schedule. cH rats exhibited no apparent deficits in appetite or reward sensitivity. They reacted to the devaluation of food in a manner consistent with a dose-response relationship. Reinforcer motivation as assessed by lever pressing across sessions with progressively decreasing reward probabilities was highest in congenitally non-helpless (cNH, n=10) rats as long as the reward probabilities remained relatively high. cNH compared to wild-type (n=10) rats were also more resistant to extinction across sessions. Compared to saline (n=5), deprenyl (n=5) reduced the duration of immobility of cH rats in the forced swimming test, indicative of antidepressant effects, but did not restore any deficits in the acquisition of a FI 20 schedule. We conclude that positive reinforcement learning was impaired in rats bred for helplessness, possibly due to motivational impairments but not deficits in reward sensitivity, and that deprenyl exerted antidepressant effects but did not reverse the deficits in positive reinforcement learning. Copyright © 2016 IBRO. Published by Elsevier Ltd. All rights reserved.

  15. Teaching Human Poses Interactively to a Social Robot

    Science.gov (United States)

    Gonzalez-Pacheco, Victor; Malfaz, Maria; Fernandez, Fernando; Salichs, Miguel A.

    2013-01-01

    The main activity of social robots is to interact with people. In order to do that, the robot must be able to understand what the user is saying or doing. Typically, this capability consists of pre-programmed behaviors or is acquired through controlled learning processes, which are executed before the social interaction begins. This paper presents a software architecture that enables a robot to learn poses in a similar way as people do. That is, hearing its teacher's explanations and acquiring new knowledge in real time. The architecture leans on two main components: an RGB-D (Red-, Green-, Blue- Depth) -based visual system, which gathers the user examples, and an Automatic Speech Recognition (ASR) system, which processes the speech describing those examples. The robot is able to naturally learn the poses the teacher is showing to it by maintaining a natural interaction with the teacher. We evaluate our system with 24 users who teach the robot a predetermined set of poses. The experimental results show that, with a few training examples, the system reaches high accuracy and robustness. This method shows how to combine data from the visual and auditory systems for the acquisition of new knowledge in a natural manner. Such a natural way of training enables robots to learn from users, even if they are not experts in robotics. PMID:24048336

  16. Teaching Human Poses Interactively to a Social Robot

    Directory of Open Access Journals (Sweden)

    Miguel A. Salichs

    2013-09-01

    Full Text Available The main activity of social robots is to interact with people. In order to do that, the robot must be able to understand what the user is saying or doing. Typically, this capability consists of pre-programmed behaviors or is acquired through controlled learning processes, which are executed before the social interaction begins. This paper presents a software architecture that enables a robot to learn poses in a similar way as people do. That is, hearing its teacher’s explanations and acquiring new knowledge in real time. The architecture leans on two main components: an RGB-D (Red-, Green-, Blue- Depth -based visual system, which gathers the user examples, and an Automatic Speech Recognition (ASR system, which processes the speech describing those examples. The robot is able to naturally learn the poses the teacher is showing to it by maintaining a natural interaction with the teacher. We evaluate our system with 24 users who teach the robot a predetermined set of poses. The experimental results show that, with a few training examples, the system reaches high accuracy and robustness. This method shows how to combine data from the visual and auditory systems for the acquisition of new knowledge in a natural manner. Such a natural way of training enables robots to learn from users, even if they are not experts in robotics.

  17. Control algorithms for autonomous robot navigation

    International Nuclear Information System (INIS)

    Jorgensen, C.C.

    1985-01-01

    This paper examines control algorithm requirements for autonomous robot navigation outside laboratory environments. Three aspects of navigation are considered: navigation control in explored terrain, environment interactions with robot sensors, and navigation control in unanticipated situations. Major navigation methods are presented and relevance of traditional human learning theory is discussed. A new navigation technique linking graph theory and incidental learning is introduced

  18. A neural learning classifier system with self-adaptive constructivism for mobile robot control.

    Science.gov (United States)

    Hurst, Jacob; Bull, Larry

    2006-01-01

    For artificial entities to achieve true autonomy and display complex lifelike behavior, they will need to exploit appropriate adaptable learning algorithms. In this context adaptability implies flexibility guided by the environment at any given time and an open-ended ability to learn appropriate behaviors. This article examines the use of constructivism-inspired mechanisms within a neural learning classifier system architecture that exploits parameter self-adaptation as an approach to realize such behavior. The system uses a rule structure in which each rule is represented by an artificial neural network. It is shown that appropriate internal rule complexity emerges during learning at a rate controlled by the learner and that the structure indicates underlying features of the task. Results are presented in simulated mazes before moving to a mobile robot platform.

  19. Biologically-Inspired Adaptive Obstacle Negotiation Behavior of Hexapod Robots

    DEFF Research Database (Denmark)

    Goldschmidt, Dennis; Wörgötter, Florentin; Manoonpong, Poramate

    2014-01-01

    by these findings, we present an adaptive neural control mechanism for obstacle negotiation behavior in hexapod robots. It combines locomotion control, backbone joint control, local leg reflexes, and neural learning. While the first three components generate locomotion including walking and climbing, the neural...... learning mechanism allows the robot to adapt its behavior for obstacle negotiation with respect to changing conditions, e.g., variable obstacle heights and different walking gaits. By successfully learning the association of an early, predictive signal (conditioned stimulus, CS) and a late, reflex signal...... (unconditioned stimulus, UCS), both provided by ultrasonic sensors at the front of the robot, the robot can autonomously find an appropriate distance from an obstacle to initiate climbing. The adaptive neural control was developed and tested first on a physical robot simulation, and was then successfully...

  20. Reinforcement Learning Based Web Service Compositions for Mobile Business

    Science.gov (United States)

    Zhou, Juan; Chen, Shouming

    In this paper, we propose a new solution to Reactive Web Service Composition, via molding with Reinforcement Learning, and introducing modified (alterable) QoS variables into the model as elements in the Markov Decision Process tuple. Moreover, we give an example of Reactive-WSC-based mobile banking, to demonstrate the intrinsic capability of the solution in question of obtaining the optimized service composition, characterized by (alterable) target QoS variable sets with optimized values. Consequently, we come to the conclusion that the solution has decent potentials in boosting customer experiences and qualities of services in Web Services, and those in applications in the whole electronic commerce and business sector.