WorldWideScience

Sample records for multi-agent reinforcement learning

  1. Optimal control in microgrid using multi-agent reinforcement learning.

    Science.gov (United States)

    Li, Fu-Dong; Wu, Min; He, Yong; Chen, Xin

    2012-11-01

    This paper presents an improved reinforcement learning method to minimize electricity costs on the premise of satisfying the power balance and generation limit of units in a microgrid with grid-connected mode. Firstly, the microgrid control requirements are analyzed and the objective function of optimal control for microgrid is proposed. Then, a state variable "Average Electricity Price Trend" which is used to express the most possible transitions of the system is developed so as to reduce the complexity and randomicity of the microgrid, and a multi-agent architecture including agents, state variables, action variables and reward function is formulated. Furthermore, dynamic hierarchical reinforcement learning, based on change rate of key state variable, is established to carry out optimal policy exploration. The analysis shows that the proposed method is beneficial to handle the problem of "curse of dimensionality" and speed up learning in the unknown large-scale world. Finally, the simulation results under JADE (Java Agent Development Framework) demonstrate the validity of the presented method in optimal control for a microgrid with grid-connected mode. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  2. Multi-agent machine learning a reinforcement approach

    CERN Document Server

    Schwartz, H M

    2014-01-01

    The book begins with a chapter on traditional methods of supervised learning, covering recursive least squares learning, mean square error methods, and stochastic approximation. Chapter 2 covers single agent reinforcement learning. Topics include learning value functions, Markov games, and TD learning with eligibility traces. Chapter 3 discusses two player games including two player matrix games with both pure and mixed strategies. Numerous algorithms and examples are presented. Chapter 4 covers learning in multi-player games, stochastic games, and Markov games, focusing on learning multi-pla

  3. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving

    OpenAIRE

    Shalev-Shwartz, Shai; Shammah, Shaked; Shashua, Amnon

    2016-01-01

    Autonomous driving is a multi-agent setting where the host vehicle must apply sophisticated negotiation skills with other road users when overtaking, giving way, merging, taking left and right turns and while pushing ahead in unstructured urban roadways. Since there are many possible scenarios, manually tackling all possible cases will likely yield a too simplistic policy. Moreover, one must balance between unexpected behavior of other drivers/pedestrians and at the same time not to be too de...

  4. Adaptive Load Balancing of Parallel Applications with Multi-Agent Reinforcement Learning on Heterogeneous Systems

    Directory of Open Access Journals (Sweden)

    Johan Parent

    2004-01-01

    Full Text Available We report on the improvements that can be achieved by applying machine learning techniques, in particular reinforcement learning, for the dynamic load balancing of parallel applications. The applications being considered in this paper are coarse grain data intensive applications. Such applications put high pressure on the interconnect of the hardware. Synchronization and load balancing in complex, heterogeneous networks need fast, flexible, adaptive load balancing algorithms. Viewing a parallel application as a one-state coordination game in the framework of multi-agent reinforcement learning, and by using a recently introduced multi-agent exploration technique, we are able to improve upon the classic job farming approach. The improvements are achieved with limited computation and communication overhead.

  5. Fast Conflict Resolution Based on Reinforcement Learning in Multi-agent System

    Institute of Scientific and Technical Information of China (English)

    PIAOSonghao; HONGBingrong; CHUHaitao

    2004-01-01

    In multi-agent system where each agen thas a different goal (even the team of agents has the same goal), agents must be able to resolve conflicts arising in the process of achieving their goal. Many researchers presented methods for conflict resolution, e.g., Reinforcement learning (RL), but the conventional RL requires a large computation cost because every agent must learn, at the same time the overlap of actions selected by each agent results in local conflict. Therefore in this paper, we propose a novel method to solve these problems. In order to deal with the conflict within the multi-agent system, the concept of potential field function based Action selection priority level (ASPL) is brought forward. In this method, all kinds of environment factor that may have influence on the priority are effectively computed with the potential field function. So the priority to access the local resource can be decided rapidly. By avoiding the complex coordination mechanism used in general multi-agent system, the conflict in multi-agent system is settled more efficiently. Our system consists of RL with ASPL module and generalized rules module. Using ASPL, RL module chooses a proper cooperative behavior, and generalized rule module can accelerate the learning process. By applying the proposed method to Robot Soccer, the learning process can be accelerated. The results of simulation and real experiments indicate the effectiveness of the method.

  6. Switching dynamics of multi-agent learning

    NARCIS (Netherlands)

    Vrancx, P.; Tuyls, K.P.; Westra, R.

    2008-01-01

    This paper presents the dynamics of multi-agent reinforcement learning in multiple state problems. We extend previous work that formally modelled the relation between reinforcement learning agents and replicator dynamics in stateless multi-agent games. More precisely, in this work we use a

  7. Concurrent Learning of Control in Multi agent Sequential Decision Tasks

    Science.gov (United States)

    2018-04-17

    Concurrent Learning of Control in Multi-agent Sequential Decision Tasks The overall objective of this project was to develop multi-agent reinforcement... learning (MARL) approaches for intelligent agents to autonomously learn distributed control policies in decentral- ized partially observable... learning of policies in Dec-POMDPs, established performance bounds, evaluated these algorithms both theoretically and empirically, The views

  8. Strategic farsighted learning in competitive multi-agent games

    NARCIS (Netherlands)

    t Hoen, P.J.; Bohté, S.M.; Poutré, la J.A.; Brewka, G.; Coradeschi, S.; Perini, A.

    2006-01-01

    We describe a generalized Q-learning type algorithm for reinforcement learning in competitive multi-agent games. We make the observation that in a competitive setting with adaptive agents an agent's actions will (likely) result in changes in the opponents policies. In addition to accounting for the

  9. Construction of multi-agent mobile robots control system in the problem of persecution with using a modified reinforcement learning method based on neural networks

    Science.gov (United States)

    Patkin, M. L.; Rogachev, G. N.

    2018-02-01

    A method for constructing a multi-agent control system for mobile robots based on training with reinforcement using deep neural networks is considered. Synthesis of the management system is proposed to be carried out with reinforcement training and the modified Actor-Critic method, in which the Actor module is divided into Action Actor and Communication Actor in order to simultaneously manage mobile robots and communicate with partners. Communication is carried out by sending partners at each step a vector of real numbers that are added to the observation vector and affect the behaviour. Functions of Actors and Critic are approximated by deep neural networks. The Critics value function is trained by using the TD-error method and the Actor’s function by using DDPG. The Communication Actor’s neural network is trained through gradients received from partner agents. An environment in which a cooperative multi-agent interaction is present was developed, computer simulation of the application of this method in the control problem of two robots pursuing two goals was carried out.

  10. Reinforcement Learning Multi-Agent Modeling of Decision-Making Agents for the Study of Transboundary Surface Water Conflicts with Application to the Syr Darya River Basin

    Science.gov (United States)

    Riegels, N.; Siegfried, T.; Pereira Cardenal, S. J.; Jensen, R. A.; Bauer-Gottwein, P.

    2008-12-01

    In most economics--driven approaches to optimizing water use at the river basin scale, the system is modelled deterministically with the goal of maximizing overall benefits. However, actual operation and allocation decisions must be made under hydrologic and economic uncertainty. In addition, river basins often cross political boundaries, and different states may not be motivated to cooperate so as to maximize basin- scale benefits. Even within states, competing agents such as irrigation districts, municipal water agencies, and large industrial users may not have incentives to cooperate to realize efficiency gains identified in basin- level studies. More traditional simulation--optimization approaches assume pre-commitment by individual agents and stakeholders and unconditional compliance on each side. While this can help determine attainable gains and tradeoffs from efficient management, such hardwired policies do not account for dynamic feedback between agents themselves or between agents and their environments (e.g. due to climate change etc.). In reality however, we are dealing with an out-of-equilibrium multi-agent system, where there is neither global knowledge nor global control, but rather continuous strategic interaction between decision making agents. Based on the theory of stochastic games, we present a computational framework that allows for studying the dynamic feedback between decision--making agents themselves and an inherently uncertain environment in a spatially and temporally distributed manner. Agents with decision-making control over water allocation such as countries, irrigation districts, and municipalities are represented by reinforcement learning agents and coupled to a detailed hydrologic--economic model. This approach emphasizes learning by agents from their continuous interaction with other agents and the environment. It provides a convenient framework for the solution of the problem of dynamic decision-making in a mixed cooperative / non

  11. Learning in engineered multi-agent systems

    Science.gov (United States)

    Menon, Anup

    Consider the problem of maximizing the total power produced by a wind farm. Due to aerodynamic interactions between wind turbines, each turbine maximizing its individual power---as is the case in present-day wind farms---does not lead to optimal farm-level power capture. Further, there are no good models to capture the said aerodynamic interactions, rendering model based optimization techniques ineffective. Thus, model-free distributed algorithms are needed that help turbines adapt their power production on-line so as to maximize farm-level power capture. Motivated by such problems, the main focus of this dissertation is a distributed model-free optimization problem in the context of multi-agent systems. The set-up comprises of a fixed number of agents, each of which can pick an action and observe the value of its individual utility function. An individual's utility function may depend on the collective action taken by all agents. The exact functional form (or model) of the agent utility functions, however, are unknown; an agent can only measure the numeric value of its utility. The objective of the multi-agent system is to optimize the welfare function (i.e. sum of the individual utility functions). Such a collaborative task requires communications between agents and we allow for the possibility of such inter-agent communications. We also pay attention to the role played by the pattern of such information exchange on certain aspects of performance. We develop two algorithms to solve this problem. The first one, engineered Interactive Trial and Error Learning (eITEL) algorithm, is based on a line of work in the Learning in Games literature and applies when agent actions are drawn from finite sets. While in a model-free setting, we introduce a novel qualitative graph-theoretic framework to encode known directed interactions of the form "which agents' action affect which others' payoff" (interaction graph). We encode explicit inter-agent communications in a directed

  12. Multi-Agent Framework for Virtual Learning Spaces.

    Science.gov (United States)

    Sheremetov, Leonid; Nunez, Gustavo

    1999-01-01

    Discussion of computer-supported collaborative learning, distributed artificial intelligence, and intelligent tutoring systems focuses on the concept of agents, and describes a virtual learning environment that has a multi-agent system. Describes a model of interactions in collaborative learning and discusses agents for Web-based virtual…

  13. Agendas for Multi-Agent Learning

    National Research Council Canada - National Science Library

    Gordon, Geoffrey J

    2006-01-01

    .... We then consider research goals for modelling, design, and learning, and identify the problem of finding learning algorithms that guarantee convergence to Pareto-dominant equilibria against a wide range of opponents...

  14. Iterative learning control for multi-agent systems coordination

    CERN Document Server

    Yang, Shiping; Li, Xuefang; Shen, Dong

    2016-01-01

    A timely guide using iterative learning control (ILC) as a solution for multi-agent systems (MAS) challenges, this book showcases recent advances and industrially relevant applications. Readers are first given a comprehensive overview of the intersection between ILC and MAS, then introduced to a range of topics that include both basic and advanced theoretical discussions, rigorous mathematics, engineering practice, and both linear and nonlinear systems. Through systematic discussion of network theory and intelligent control, the authors explore future research possibilities, develop new tools, and provide numerous applications such as power grids, communication and sensor networks, intelligent transportation systems, and formation control. Readers will gain a roadmap of the latest advances in the fields and can use their newfound knowledge to design their own algorithms.

  15. Behavior Self-Organization in Multi-Agent Learning

    National Research Council Canada - National Science Library

    Bay, John

    1999-01-01

    There are four primary results of the first year of the project: It was discovered that clustering algorithms for pre-sorting high-dimensional datasets was not effective in improving subsequent processing by reinforcement learning methods...

  16. Collective Machine Learning: Team Learning and Classification in Multi-Agent Systems

    Science.gov (United States)

    Gifford, Christopher M.

    2009-01-01

    This dissertation focuses on the collaboration of multiple heterogeneous, intelligent agents (hardware or software) which collaborate to learn a task and are capable of sharing knowledge. The concept of collaborative learning in multi-agent and multi-robot systems is largely under studied, and represents an area where further research is needed to…

  17. Multi-agents and learning: Implications for Webusage mining

    Science.gov (United States)

    Lotfy, Hewayda M.S.; Khamis, Soheir M.S.; Aboghazalah, Maie M.

    2015-01-01

    Characterization of user activities is an important issue in the design and maintenance of websites. Server weblog files have abundant information about the user’s current interests. This information can be mined and analyzed therefore the administrators may be able to guide the users in their browsing activity so they may obtain relevant information in a shorter span of time to obtain user satisfaction. Web-based technology facilitates the creation of personally meaningful and socially useful knowledge through supportive interactions, communication and collaboration among educators, learners and information. This paper suggests a new methodology based on learning techniques for a Web-based Multiagent-based application to discover the hidden patterns in the user’s visited links. It presents a new approach that involves unsupervised, reinforcement learning, and cooperation between agents. It is utilized to discover patterns that represent the user’s profiles in a sample website into specific categories of materials using significance percentages. These profiles are used to make recommendations of interesting links and categories to the user. The experimental results of the approach showed successful user pattern recognition, and cooperative learning among agents to obtain user profiles. It indicates that combining different learning algorithms is capable of improving user satisfaction indicated by the percentage of precision, recall, the progressive category weight and F1-measure. PMID:26966569

  18. Learning from induced changes in opponent (re)actions in multi-agent games

    NARCIS (Netherlands)

    P.J. 't Hoen (Pieter Jan); S.M. Bohte (Sander); J.A. La Poutré (Han)

    2005-01-01

    textabstractMulti-agent learning is a growing area of research. An important topic is to formulate how an agent can learn a good policy in the face of adaptive, competitive opponents. Most research has focused on extensions of single agent learning techniques originally designed for agents in more

  19. Personalised learning object based on multi-agent model and learners’ learning styles

    Directory of Open Access Journals (Sweden)

    Noppamas Pukkhem

    2011-09-01

    Full Text Available A multi-agent model is proposed in which learning styles and a word analysis technique to create a learning object recommendation system are used. On the basis of a learning style-based design, a concept map combination model is proposed to filter out unsuitable learning concepts from a given course. Our learner model classifies learners into eight styles and implements compatible computational methods consisting of three recommendations: i non-personalised, ii preferred feature-based, and iii neighbour-based collaborative filtering. The analysis of preference error (PE was performed by comparing the actual preferred learning object with the predicted one. In our experiments, the feature-based recommendation algorithm has the fewest PE.

  20. Decentralized Reinforcement Learning of robot behaviors

    NARCIS (Netherlands)

    Leottau, David L.; Ruiz-del-Solar, Javier; Babuska, R.

    2018-01-01

    A multi-agent methodology is proposed for Decentralized Reinforcement Learning (DRL) of individual behaviors in problems where multi-dimensional action spaces are involved. When using this methodology, sub-tasks are learned in parallel by individual agents working toward a common goal. In

  1. Multi-agent models of spatial cognition, learning and complex choice behavior in urban environments

    NARCIS (Netherlands)

    Arentze, Theo; Timmermans, Harry; Portugali, J.

    2006-01-01

    This chapter provides an overview of ongoing research projects in the DDSS research program at TUE related to multi-agents. Projects include (a) the use of multi-agent models and concepts of artificial intelligence to develop models of activity-travel behavior; (b) the use of a multi-agent model to

  2. Reinforcement learning in supply chains.

    Science.gov (United States)

    Valluri, Annapurna; North, Michael J; Macal, Charles M

    2009-10-01

    Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.

  3. Multi-Agent Inference in Social Networks: A Finite Population Learning Approach.

    Science.gov (United States)

    Fan, Jianqing; Tong, Xin; Zeng, Yao

    When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people's incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning , to address whether with high probability, a large fraction of people in a given finite population network can make "good" inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows.

  4. OWL model of multi-agent Smart-system of distance learning for people with vision disabilities

    Directory of Open Access Journals (Sweden)

    Galina A. Samigulina

    2017-01-01

    Full Text Available The aim of the study is to develop an ontological model of multiagent smart-system of distance learning for visually impaired people based on Java Agent Development Framework for obtaining high-quality engineering education in laboratories of join use on modern equipment.Materials and methods of research. In developing multi-agent smart-system of distance learning, using various agents based on cognitive, ontological, statistical and intellectual methods is important. It is more convenient to implement this task in the form of software using multi-agent approach and Java Agent Development Framework. The main advantages of the platform are stability of operation, clear interface, simplicity of creating agents and extensive user database. In multi-agent systems, the solution is obtained automatically as result of interaction of many independent, purposeful agents. Each agent can perform certain tasks and pursue specified goals. Intellectual multi-agent systems and practical applications in distance learning based on them are considered.Results. The structural diagram of functioning of smart system distance learning for visually impaired people using various agents based on the system approach and the multi-agent platform Java Agent Development Framework is developed. The complex approach of distance learning of visually impaired people for obtaining highquality engineering education in laboratories of joint use on modern equipment is offered.The ontological model of multi-agent smart-system with a detailed description of the functions of following agents is created: personal, manager, ontological, cognitive, statistical, intellectual, shared laboratory agent, health agent, assistant to the agent and state agent. These agents execute their individual functions and provide a quality environment for learning.Conclusion. Thus, the proposed smart-system of distance learning for visually impaired people can significantly improve effectiveness and

  5. A Two-Stage Multi-Agent Based Assessment Approach to Enhance Students' Learning Motivation through Negotiated Skills Assessment

    Science.gov (United States)

    Chadli, Abdelhafid; Bendella, Fatima; Tranvouez, Erwan

    2015-01-01

    In this paper we present an Agent-based evaluation approach in a context of Multi-agent simulation learning systems. Our evaluation model is based on a two stage assessment approach: (1) a Distributed skill evaluation combining agents and fuzzy sets theory; and (2) a Negotiation based evaluation of students' performance during a training…

  6. Learning Natural Selection in 4th Grade with Multi-Agent-Based Computational Models

    Science.gov (United States)

    Dickes, Amanda Catherine; Sengupta, Pratim

    2013-01-01

    In this paper, we investigate how elementary school students develop multi-level explanations of population dynamics in a simple predator-prey ecosystem, through scaffolded interactions with a multi-agent-based computational model (MABM). The term "agent" in an MABM indicates individual computational objects or actors (e.g., cars), and these…

  7. Algorithms for Reinforcement Learning

    CERN Document Server

    Szepesvari, Csaba

    2010-01-01

    Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms'

  8. Game-theoretic learning and distributed optimization in memoryless multi-agent systems

    CERN Document Server

    Tatarenko, Tatiana

    2017-01-01

    This book presents new efficient methods for optimization in realistic large-scale, multi-agent systems. These methods do not require the agents to have the full information about the system, but instead allow them to make their local decisions based only on the local information, possibly obtained during scommunication with their local neighbors. The book, primarily aimed at researchers in optimization and control, considers three different information settings in multi-agent systems: oracle-based, communication-based, and payoff-based. For each of these information types, an efficient optimization algorithm is developed, which leads the system to an optimal state. The optimization problems are set without such restrictive assumptions as convexity of the objective functions, complicated communication topologies, closed-form expressions for costs and utilities, and finiteness of the system’s state space. .

  9. Reinforcement Learning for a New Piano Mover

    Directory of Open Access Journals (Sweden)

    Yuko Ishiwaka

    2005-08-01

    Full Text Available We attempt to achieve corporative behavior of autonomous decentralized agents constructed via Q-Learning, which is a type of reinforcement learning. As such, in the present paper, we examine the piano mover's problem. We propose a multi-agent architecture that has a training agent, learning agents and intermediate agent. Learning agents are heterogeneous and can communicate with each other. The movement of an object with three kinds of agent depends on the composition of the actions of the learning agents. By learning its own shape through the learning agents, avoidance of obstacles by the object is expected. We simulate the proposed method in a two-dimensional continuous world. Results obtained in the present investigation reveal the effectiveness of the proposed method.

  10. Cooperative learning neural network output feedback control of uncertain nonlinear multi-agent systems under directed topologies

    Science.gov (United States)

    Wang, W.; Wang, D.; Peng, Z. H.

    2017-09-01

    Without assuming that the communication topologies among the neural network (NN) weights are to be undirected and the states of each agent are measurable, the cooperative learning NN output feedback control is addressed for uncertain nonlinear multi-agent systems with identical structures in strict-feedback form. By establishing directed communication topologies among NN weights to share their learned knowledge, NNs with cooperative learning laws are employed to identify the uncertainties. By designing NN-based κ-filter observers to estimate the unmeasurable states, a new cooperative learning output feedback control scheme is proposed to guarantee that the system outputs can track nonidentical reference signals with bounded tracking errors. A simulation example is given to demonstrate the effectiveness of the theoretical results.

  11. An adaptive multi-agent memetic system for personalizing e-learning experiences

    NARCIS (Netherlands)

    Acampora, G.; Gaeta, M.; Munoz, E.; Vitiello, A.

    2011-01-01

    The rapid changes in modern knowledge, due to exponential growth of information sources, are complicating learners' activity. For this reason, novel approaches are necessary to obtain suitable learning solutions able to generate efficient, personalized and flexible learning experiences. From this

  12. Combining multi agent paradigm and memetic computing for personalized and adaptive learning experiences

    NARCIS (Netherlands)

    Acampora, G.; Gaeta, M.; Loia, V.

    2011-01-01

    Learning is a critical support mechanism for industrial and academic organizations to enhance the skills of employees and students and, consequently, the overall competitiveness in the new economy. The remarkable velocity and volatility of modern knowledge require novel learning methods offering

  13. Manifold Regularized Reinforcement Learning.

    Science.gov (United States)

    Li, Hongliang; Liu, Derong; Wang, Ding

    2018-04-01

    This paper introduces a novel manifold regularized reinforcement learning scheme for continuous Markov decision processes. Smooth feature representations for value function approximation can be automatically learned using the unsupervised manifold regularization method. The learned features are data-driven, and can be adapted to the geometry of the state space. Furthermore, the scheme provides a direct basis representation extension for novel samples during policy learning and control. The performance of the proposed scheme is evaluated on two benchmark control tasks, i.e., the inverted pendulum and the energy storage problem. Simulation results illustrate the concepts of the proposed scheme and show that it can obtain excellent performance.

  14. A novel multi-agent decentralized win or learn fast policy hill-climbing with eligibility trace algorithm for smart generation control of interconnected complex power grids

    International Nuclear Information System (INIS)

    Xi, Lei; Yu, Tao; Yang, Bo; Zhang, Xiaoshun

    2015-01-01

    Highlights: • Proposing a decentralized smart generation control scheme for the automatic generation control coordination. • A novel multi-agent learning algorithm is developed to resolve stochastic control problems in power systems. • A variable learning rate are introduced base on the framework of stochastic games. • A simulation platform is developed to test the performance of different algorithms. - Abstract: This paper proposes a multi-agent smart generation control scheme for the automatic generation control coordination in interconnected complex power systems. A novel multi-agent decentralized win or learn fast policy hill-climbing with eligibility trace algorithm is developed, which can effectively identify the optimal average policies via a variable learning rate under various operation conditions. Based on control performance standards, the proposed approach is implemented in a flexible multi-agent stochastic dynamic game-based smart generation control simulation platform. Based on the mixed strategy and average policy, it is highly adaptive in stochastic non-Markov environments and large time-delay systems, which can fulfill automatic generation control coordination in interconnected complex power systems in the presence of increasing penetration of decentralized renewable energy. Two case studies on both a two-area load–frequency control power system and the China Southern Power Grid model have been done. Simulation results verify that multi-agent smart generation control scheme based on the proposed approach can obtain optimal average policies thus improve the closed-loop system performances, and can achieve a fast convergence rate with significant robustness compared with other methods

  15. Cloud Computing and Multi Agent System to improve Learning Object Paradigm

    Directory of Open Access Journals (Sweden)

    Ana B. Gil

    2015-05-01

    Full Text Available The paradigm of Learning Object provides Educators and Learners with the ability to access an extensive number of learning resources. To do so, this paradigm provides different technologies and tools, such as federated search platforms and storage repositories, in order to obtain information ubiquitously and on demand. However, the vast amount and variety of educational content, which is distributed among several repositories, and the existence of various and incompatible standards, technologies and interoperability layers among repositories, constitutes a real problem for the expansion of this paradigm. This study presents an agent-based architecture that uses the advantages provided by Cloud Computing platforms to deal with the open issues on the Learning Object paradigm.

  16. Multi-agent system for Knowledge-based recommendation of Learning Objects

    Directory of Open Access Journals (Sweden)

    Paula Andrea RODRÍGUEZ MARÍN

    2015-12-01

    Full Text Available Learning Object (LO is a content unit being used within virtual learning environments, which -once found and retrieved- may assist students in the teaching - learning process. Such LO search and retrieval are recently supported and enhanced by data mining techniques. In this sense, clustering can be used to find groups holding similar LOs so that from obtained groups, knowledge-based recommender systems (KRS can recommend more adapted and relevant LOs. In particular, prior knowledge come from LOs previously selected, liked and ranked by the student to whom the recommendation will be performed. In this paper, we present a KRS for LOs, which uses a conventional clustering technique, namely K-means, aimed at finding similar LOs and delivering resources adapted to a specific student. Obtained promising results show that proposed KRS is able to both retrieve relevant LO and improve the recommendation precision.Learning Object (LO is a content unit being used within virtual learning environments, which -once found and retrieved- may assist students in the teaching - learning process. Such LO search and retrieval are recently supported and enhanced by data mining techniques. In this sense, clustering can be used to find groups holding similar LOs so that from obtained groups, knowledge-based recommender systems (KRS can recommend more adapted and relevant LOs. In particular, prior knowledge come from LOs previously selected, liked and ranked by the student to whom the recommendation will be performed. In this paper, we present a KRS for LOs, which uses a conventional clustering technique, namely K-means, aimed at finding similar LOs and delivering resources adapted to a specific student. Obtained promising results show that proposed KRS is able to both retrieve relevant LO and improve the recommendation precision.

  17. An Online Q-learning Based Multi-Agent LFC for a Multi-Area Multi-Source Power System Including Distributed Energy Resources

    Directory of Open Access Journals (Sweden)

    H. Shayeghi

    2017-12-01

    Full Text Available This paper presents an online two-stage Q-learning based multi-agent (MA controller for load frequency control (LFC in an interconnected multi-area multi-source power system integrated with distributed energy resources (DERs. The proposed control strategy consists of two stages. The first stage is employed a PID controller which its parameters are designed using sine cosine optimization (SCO algorithm and are fixed. The second one is a reinforcement learning (RL based supplementary controller that has a flexible structure and improves the output of the first stage adaptively based on the system dynamical behavior. Due to the use of RL paradigm integrated with PID controller in this strategy, it is called RL-PID controller. The primary motivation for the integration of RL technique with PID controller is to make the existing local controllers in the industry compatible to reduce the control efforts and system costs. This novel control strategy combines the advantages of the PID controller with adaptive behavior of MA to achieve the desired level of robust performance under different kind of uncertainties caused by stochastically power generation of DERs, plant operational condition changes, and physical nonlinearities of the system. The suggested decentralized controller is composed of the autonomous intelligent agents, who learn the optimal control policy from interaction with the system. These agents update their knowledge about the system dynamics continuously to achieve a good frequency oscillation damping under various severe disturbances without any knowledge of them. It leads to an adaptive control structure to solve LFC problem in the multi-source power system with stochastic DERs. The results of RL-PID controller in comparison to the traditional PID and fuzzy-PID controllers is verified in a multi-area power system integrated with DERs through some performance indices.

  18. Ontological Modeling of Meta Learning Multi-Agent Systems in OWL-DL

    Czech Academy of Sciences Publication Activity Database

    Kazík, O.; Neruda, Roman

    2012-01-01

    Roč. 39, č. 4 (2012), s. 357-362 ISSN 1819-9224 R&D Projects: GA MŠk(CZ) ME10023 Grant - others:GA UK(CZ) 629612; UK(CZ) SVV-265314 Institutional support: RVO:67985807 Keywords : data mining * meta learning * roles * description logic * ontology Subject RIV: IN - Informatics, Computer Science http://www.iaeng.org/IJCS/issues_v39/issue_4/IJCS_39_4_04.pdf

  19. Empirical Centroid Fictitious Play: An Approach For Distributed Learning In Multi-Agent Games

    OpenAIRE

    Swenson, Brian; Kar, Soummya; Xavier, Joao

    2013-01-01

    The paper is concerned with distributed learning in large-scale games. The well-known fictitious play (FP) algorithm is addressed, which, despite theoretical convergence results, might be impractical to implement in large-scale settings due to intense computation and communication requirements. An adaptation of the FP algorithm, designated as the empirical centroid fictitious play (ECFP), is presented. In ECFP players respond to the centroid of all players' actions rather than track and respo...

  20. IMPLEMENTATION OF MULTIAGENT REINFORCEMENT LEARNING MECHANISM FOR OPTIMAL ISLANDING OPERATION OF DISTRIBUTION NETWORK

    DEFF Research Database (Denmark)

    Saleem, Arshad; Lind, Morten

    2008-01-01

    among electric power utilities to utilize modern information and communication technologies (ICT) in order to improve the automation of the distribution system. In this paper we present our work for the implementation of a dynamic multi-agent based distributed reinforcement learning mechanism...

  1. A Distributed Multi-Agent System for Collaborative Information Management and Learning

    Science.gov (United States)

    Chen, James R.; Wolfe, Shawn R.; Wragg, Stephen D.; Koga, Dennis (Technical Monitor)

    2000-01-01

    In this paper, we present DIAMS, a system of distributed, collaborative agents to help users access, manage, share and exchange information. A DIAMS personal agent helps its owner find information most relevant to current needs. It provides tools and utilities for users to manage their information repositories with dynamic organization and virtual views. Flexible hierarchical display is integrated with indexed query search-to support effective information access. Automatic indexing methods are employed to support user queries and communication between agents. Contents of a repository are kept in object-oriented storage to facilitate information sharing. Collaboration between users is aided by easy sharing utilities as well as automated information exchange. Matchmaker agents are designed to establish connections between users with similar interests and expertise. DIAMS agents provide needed services for users to share and learn information from one another on the World Wide Web.

  2. The Reinforcement Learning Competition 2014

    OpenAIRE

    Dimitrakakis, Christos; Li, Guangliang; Tziortziotis, Nikoalos

    2014-01-01

    Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the development of very general learning agents for arbitrary reinforcement learning problems and to provide a test-bed for the unbiased evaluation of algorithms.

  3. Proposed Methodology for Application of Human-like gradual Multi-Agent Q-Learning (HuMAQ) for Multi-robot Exploration

    International Nuclear Information System (INIS)

    Ray, Dip Narayan; Majumder, Somajyoti

    2014-01-01

    Several attempts have been made by the researchers around the world to develop a number of autonomous exploration techniques for robots. But it has been always an important issue for developing the algorithm for unstructured and unknown environments. Human-like gradual Multi-agent Q-leaming (HuMAQ) is a technique developed for autonomous robotic exploration in unknown (and even unimaginable) environments. It has been successfully implemented in multi-agent single robotic system. HuMAQ uses the concept of Subsumption architecture, a well-known Behaviour-based architecture for prioritizing the agents of the multi-agent system and executes only the most common action out of all the different actions recommended by different agents. Instead of using new state-action table (Q-table) each time, HuMAQ uses the immediate past table for efficient and faster exploration. The proof of learning has also been established both theoretically and practically. HuMAQ has the potential to be used in different and difficult situations as well as applications. The same architecture has been modified to use for multi-robot exploration in an environment. Apart from all other existing agents used in the single robotic system, agents for inter-robot communication and coordination/ co-operation with the other similar robots have been introduced in the present research. Current work uses a series of indigenously developed identical autonomous robotic systems, communicating with each other through ZigBee protocol

  4. "Notice of Violation of IEEE Publication Principles" Multiobjective Reinforcement Learning: A Comprehensive Overview.

    Science.gov (United States)

    Liu, Chunming; Xu, Xin; Hu, Dewen

    2013-04-29

    Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.

  5. Deep Reinforcement Learning: An Overview

    OpenAIRE

    Li, Yuxi

    2017-01-01

    We give an overview of recent exciting achievements of deep reinforcement learning (RL). We discuss six core elements, six important mechanisms, and twelve applications. We start with background of machine learning, deep learning and reinforcement learning. Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. After that, we discuss important mechanisms for RL, including attention and memory, unsuperv...

  6. An Improved Reinforcement Learning System Using Affective Factors

    Directory of Open Access Journals (Sweden)

    Takashi Kuremoto

    2013-07-01

    Full Text Available As a powerful and intelligent machine learning method, reinforcement learning (RL has been widely used in many fields such as game theory, adaptive control, multi-agent system, nonlinear forecasting, and so on. The main contribution of this technique is its exploration and exploitation approaches to find the optimal solution or semi-optimal solution of goal-directed problems. However, when RL is applied to multi-agent systems (MASs, problems such as “curse of dimension”, “perceptual aliasing problem”, and uncertainty of the environment constitute high hurdles to RL. Meanwhile, although RL is inspired by behavioral psychology and reward/punishment from the environment is used, higher mental factors such as affects, emotions, and motivations are rarely adopted in the learning procedure of RL. In this paper, to challenge agents learning in MASs, we propose a computational motivation function, which adopts two principle affective factors “Arousal” and “Pleasure” of Russell’s circumplex model of affects, to improve the learning performance of a conventional RL algorithm named Q-learning (QL. Compared with the conventional QL, computer simulations of pursuit problems with static and dynamic preys were carried out, and the results showed that the proposed method results in agents having a faster and more stable learning performance.

  7. Collaborative multi-agent reinforcement learning based on a novel coordination tree frame with dynamic partition

    NARCIS (Netherlands)

    Fang, M.; Groen, F.C.A.; Li, H.; Zhang, J.

    2014-01-01

    In the research of team Markov games, computing the coordinate team dynamically and determining the joint action policy are the main problems. To deal with the first problem, a dynamic team partitioning method is proposed based on a novel coordinate tree frame. We build a coordinate tree with

  8. Evolutionary computation for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Wiering, M.; van Otterlo, M.

    2012-01-01

    Algorithms for evolutionary computation, which simulate the process of natural selection to solve optimization problems, are an effective tool for discovering high-performing reinforcement-learning policies. Because they can automatically find good representations, handle continuous action spaces,

  9. Adaptive representations for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.

    2010-01-01

    This book presents new algorithms for reinforcement learning, a form of machine learning in which an autonomous agent seeks a control policy for a sequential decision task. Since current methods typically rely on manually designed solution representations, agents that automatically adapt their own

  10. Reinforcement learning in computer vision

    Science.gov (United States)

    Bernstein, A. V.; Burnaev, E. V.

    2018-04-01

    Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.

  11. Rational and Mechanistic Perspectives on Reinforcement Learning

    Science.gov (United States)

    Chater, Nick

    2009-01-01

    This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…

  12. A Multi-Agent Control Architecture for a Robotic Wheelchair

    Directory of Open Access Journals (Sweden)

    C. Galindo

    2006-01-01

    Full Text Available Assistant robots like robotic wheelchairs can perform an effective and valuable work in our daily lives. However, they eventually may need external help from humans in the robot environment (particularly, the driver in the case of a wheelchair to accomplish safely and efficiently some tricky tasks for the current technology, i.e. opening a locked door, traversing a crowded area, etc. This article proposes a control architecture for assistant robots designed under a multi-agent perspective that facilitates the participation of humans into the robotic system and improves the overall performance of the robot as well as its dependability. Within our design, agents have their own intentions and beliefs, have different abilities (that include algorithmic behaviours and human skills and also learn autonomously the most convenient method to carry out their actions through reinforcement learning. The proposed architecture is illustrated with a real assistant robot: a robotic wheelchair that provides mobility to impaired or elderly people.

  13. Belief reward shaping in reinforcement learning

    CSIR Research Space (South Africa)

    Marom, O

    2018-02-01

    Full Text Available A key challenge in many reinforcement learning problems is delayed rewards, which can significantly slow down learning. Although reward shaping has previously been introduced to accelerate learning by bootstrapping an agent with additional...

  14. Reinforcement learning or active inference?

    Science.gov (United States)

    Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J

    2009-07-29

    This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  15. Reinforcement learning or active inference?

    Directory of Open Access Journals (Sweden)

    Karl J Friston

    2009-07-01

    Full Text Available This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  16. Reinforcement Learning State-of-the-Art

    CERN Document Server

    Wiering, Marco

    2012-01-01

    Reinforcement learning encompasses both a science of adaptive behavior of rational beings in uncertain environments and a computational methodology for finding optimal behaviors for challenging problems in control, optimization and adaptive behavior of intelligent agents. As a field, reinforcement learning has progressed tremendously in the past decade. The main goal of this book is to present an up-to-date series of survey articles on the main contemporary sub-fields of reinforcement learning. This includes surveys on partially observable environments, hierarchical task decompositions, relational knowledge representation and predictive state representations. Furthermore, topics such as transfer, evolutionary methods and continuous spaces in reinforcement learning are surveyed. In addition, several chapters review reinforcement learning methods in robotics, in games, and in computational neuroscience. In total seventeen different subfields are presented by mostly young experts in those areas, and together the...

  17. Reinforcement Learning in Repeated Portfolio Decisions

    OpenAIRE

    Diao, Linan; Rieskamp, Jörg

    2011-01-01

    How do people make investment decisions when they receive outcome feedback? We examined how well the standard mean-variance model and two reinforcement models predict people's portfolio decisions. The basic reinforcement model predicts a learning process that relies solely on the portfolio's overall return, whereas the proposed extended reinforcement model also takes the risk and covariance of the investments into account. The experimental results illustrate that people reacted sensitively to...

  18. Reinforcement learning improves behaviour from evaluative feedback

    Science.gov (United States)

    Littman, Michael L.

    2015-05-01

    Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.

  19. Multi agent gathering waste system

    Directory of Open Access Journals (Sweden)

    Álvaro LOZANO MURCIEGO

    2016-07-01

    Full Text Available Along this paper, we present a new multi agent-based system to gather waste on cities and villages. We have developed a low cost wireless sensor prototype to measure the volume level of the containers. Furthermore a route system is developed to optimize the routes of the trucks and a mobile application has been developed to help drivers in their working days. In order to evaluate and validate the proposed system a practical case study in a real city environment is modeled using open data available and with the purpose of identifying limitations of the system.

  20. Multi-agent sequential hypothesis testing

    KAUST Repository

    Kim, Kwang-Ki K.

    2014-12-15

    This paper considers multi-agent sequential hypothesis testing and presents a framework for strategic learning in sequential games with explicit consideration of both temporal and spatial coordination. The associated Bayes risk functions explicitly incorporate costs of taking private/public measurements, costs of time-difference and disagreement in actions of agents, and costs of false declaration/choices in the sequential hypothesis testing. The corresponding sequential decision processes have well-defined value functions with respect to (a) the belief states for the case of conditional independent private noisy measurements that are also assumed to be independent identically distributed over time, and (b) the information states for the case of correlated private noisy measurements. A sequential investment game of strategic coordination and delay is also discussed as an application of the proposed strategic learning rules.

  1. Learning to trade via direct reinforcement.

    Science.gov (United States)

    Moody, J; Saffell, M

    2001-01-01

    We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision-making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills.

  2. Using a board game to reinforce learning.

    Science.gov (United States)

    Yoon, Bona; Rodriguez, Leslie; Faselis, Charles J; Liappis, Angelike P

    2014-03-01

    Experiential gaming strategies offer a variation on traditional learning. A board game was used to present synthesized content of fundamental catheter care concepts and reinforce evidence-based practices relevant to nursing. Board games are innovative educational tools that can enhance active learning. Copyright 2014, SLACK Incorporated.

  3. Embedded Incremental Feature Selection for Reinforcement Learning

    Science.gov (United States)

    2012-05-01

    Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature

  4. Online reinforcement learning control for aerospace systems

    NARCIS (Netherlands)

    Zhou, Y.

    2018-01-01

    Reinforcement Learning (RL) methods are relatively new in the field of aerospace guidance, navigation, and control. This dissertation aims to exploit RL methods to improve the autonomy and online learning of aerospace systems with respect to the a priori unknown system and environment, dynamical

  5. Reinforcement Learning in Continuous Action Spaces

    NARCIS (Netherlands)

    Hasselt, H. van; Wiering, M.A.

    2007-01-01

    Quite some research has been done on Reinforcement Learning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new class of algorithms named Continuous Actor Critic Learning Automaton (CACLA)

  6. Autonomous parsing of behavior in a multi-agent setting

    NARCIS (Netherlands)

    Vanderelst, D.; Barakova, E.I.; Rutkowski, L.; Tadeusiewicz, R.

    2008-01-01

    Imitation learning is a promising route to instruct robotic multi-agent systems. However, imitating agents should be able to decide autonomously what behavior, observed in others, is interesting to copy. Here we investigate whether a simple recurrent network (Elman Net) can be used to extract

  7. Multi-Agent Software Engineering

    International Nuclear Information System (INIS)

    Mohamed, A.H.

    2014-01-01

    This paper proposed an alarm-monitoring system for people based on multi-agent using maps. The system monitors the users physical context using their mobile phone. The agents on the mobile phones are responsible for collecting, processing and sending data to the server. They can determine the parameters of their environment by sensors. The data are processed and sent to the server. On the other side, a set of agents on server can store this data and check the preconditions of the restrictions associated with the user, in order to trigger the appropriate alarms. These alarms are sent not only to the user who is alarmed to avoid the appeared restriction, but also to his supervisor. The proposed system is a general purpose alarm system that can be used in different critical application areas. It has been applied for monitoring the workers of radiation sites. However, these workers can do their activity tasks in the radiation environments safely

  8. Product Distribution Theory for Control of Multi-Agent Systems

    Science.gov (United States)

    Lee, Chia Fan; Wolpert, David H.

    2004-01-01

    Product Distribution (PD) theory is a new framework for controlling Multi-Agent Systems (MAS's). First we review one motivation of PD theory, as the information-theoretic extension of conventional full-rationality game theory to the case of bounded rational agents. In this extension the equilibrium of the game is the optimizer of a Lagrangian of the (probability distribution of) the joint stare of the agents. Accordingly we can consider a team game in which the shared utility is a performance measure of the behavior of the MAS. For such a scenario the game is at equilibrium - the Lagrangian is optimized - when the joint distribution of the agents optimizes the system's expected performance. One common way to find that equilibrium is to have each agent run a reinforcement learning algorithm. Here we investigate the alternative of exploiting PD theory to run gradient descent on the Lagrangian. We present computer experiments validating some of the predictions of PD theory for how best to do that gradient descent. We also demonstrate how PD theory can improve performance even when we are not allowed to rerun the MAS from different initial conditions, a requirement implicit in some previous work.

  9. Adaptive, Distributed Control of Constrained Multi-Agent Systems

    Science.gov (United States)

    Bieniawski, Stefan; Wolpert, David H.

    2004-01-01

    Product Distribution (PO) theory was recently developed as a broad framework for analyzing and optimizing distributed systems. Here we demonstrate its use for adaptive distributed control of Multi-Agent Systems (MASS), i.e., for distributed stochastic optimization using MAS s. First we review one motivation of PD theory, as the information-theoretic extension of conventional full-rationality game theory to the case of bounded rational agents. In this extension the equilibrium of the game is the optimizer of a Lagrangian of the (Probability dist&&on on the joint state of the agents. When the game in question is a team game with constraints, that equilibrium optimizes the expected value of the team game utility, subject to those constraints. One common way to find that equilibrium is to have each agent run a Reinforcement Learning (E) algorithm. PD theory reveals this to be a particular type of search algorithm for minimizing the Lagrangian. Typically that algorithm i s quite inefficient. A more principled alternative is to use a variant of Newton's method to minimize the Lagrangian. Here we compare this alternative to RL-based search in three sets of computer experiments. These are the N Queen s problem and bin-packing problem from the optimization literature, and the Bar problem from the distributed RL literature. Our results confirm that the PD-theory-based approach outperforms the RL-based scheme in all three domains.

  10. Optimal Wonderful Life Utility Functions in Multi-Agent Systems

    Science.gov (United States)

    Wolpert, David H.; Tumer, Kagan; Swanson, Keith (Technical Monitor)

    2000-01-01

    The mathematics of Collective Intelligence (COINs) is concerned with the design of multi-agent systems so as to optimize an overall global utility function when those systems lack centralized communication and control. Typically in COINs each agent runs a distinct Reinforcement Learning (RL) algorithm, so that much of the design problem reduces to how best to initialize/update each agent's private utility function, as far as the ensuing value of the global utility is concerned. Traditional team game solutions to this problem assign to each agent the global utility as its private utility function. In previous work we used the COIN framework to derive the alternative Wonderful Life Utility (WLU), and experimentally established that having the agents use it induces global utility performance up to orders of magnitude superior to that induced by use of the team game utility. The WLU has a free parameter (the clamping parameter) which we simply set to zero in that previous work. Here we derive the optimal value of the clamping parameter, and demonstrate experimentally that using that optimal value can result in significantly improved performance over that of clamping to zero, over and above the improvement beyond traditional approaches.

  11. Value learning through reinforcement : The basics of dopamine and reinforcement learning

    NARCIS (Netherlands)

    Daw, N.D.; Tobler, P.N.; Glimcher, P.W.; Fehr, E.

    2013-01-01

    This chapter provides an overview of reinforcement learning and temporal difference learning and relates these topics to the firing properties of midbrain dopamine neurons. First, we review the RescorlaWagner learning rule and basic learning phenomena, such as blocking, which the rule explains. Then

  12. Fairness in multi-agent systems

    NARCIS (Netherlands)

    Jong, de S.; Tuyls, K.P.; Verbeeck, K.

    2008-01-01

    Multi-agent systems are complex systems in which multiple autonomous entities, called agents, cooperate in order to achieve a common or personal goal. These entities may be computer software, robots, and also humans. In fact, many multi-agent systems are intended to operate in cooperation with or as

  13. Reinforcement Learning in Autism Spectrum Disorder

    Directory of Open Access Journals (Sweden)

    Manuela Schuetze

    2017-11-01

    Full Text Available Early behavioral interventions are recognized as integral to standard care in autism spectrum disorder (ASD, and often focus on reinforcing desired behaviors (e.g., eye contact and reducing the presence of atypical behaviors (e.g., echoing others' phrases. However, efficacy of these programs is mixed. Reinforcement learning relies on neurocircuitry that has been reported to be atypical in ASD: prefrontal-sub-cortical circuits, amygdala, brainstem, and cerebellum. Thus, early behavioral interventions rely on neurocircuitry that may function atypically in at least a subset of individuals with ASD. Recent work has investigated physiological, behavioral, and neural responses to reinforcers to uncover differences in motivation and learning in ASD. We will synthesize this work to identify promising avenues for future research that ultimately can be used to enhance the efficacy of early intervention.

  14. Autonomous reinforcement learning with experience replay.

    Science.gov (United States)

    Wawrzyński, Paweł; Tanwani, Ajay Kumar

    2013-05-01

    This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. Copyright © 2012 Elsevier Ltd. All rights reserved.

  15. Reinforcement learning: Solving two case studies

    Science.gov (United States)

    Duarte, Ana Filipa; Silva, Pedro; dos Santos, Cristina Peixoto

    2012-09-01

    Reinforcement Learning algorithms offer interesting features for the control of autonomous systems, such as the ability to learn from direct interaction with the environment, and the use of a simple reward signalas opposed to the input-outputs pairsused in classic supervised learning. The reward signal indicates the success of failure of the actions executed by the agent in the environment. In this work, are described RL algorithmsapplied to two case studies: the Crawler robot and the widely known inverted pendulum. We explore RL capabilities to autonomously learn a basic locomotion pattern in the Crawler, andapproach the balancing problem of biped locomotion using the inverted pendulum.

  16. Efficient abstraction selection in reinforcement learning

    NARCIS (Netherlands)

    Seijen, H. van; Whiteson, S.; Kester, L.

    2013-01-01

    This paper introduces a novel approach for abstraction selection in reinforcement learning problems modelled as factored Markov decision processes (MDPs), for which a state is described via a set of state components. In abstraction selection, an agent must choose an abstraction from a set of

  17. Reinforcement Learning Based Artificial Immune Classifier

    Directory of Open Access Journals (Sweden)

    Mehmet Karakose

    2013-01-01

    Full Text Available One of the widely used methods for classification that is a decision-making process is artificial immune systems. Artificial immune systems based on natural immunity system can be successfully applied for classification, optimization, recognition, and learning in real-world problems. In this study, a reinforcement learning based artificial immune classifier is proposed as a new approach. This approach uses reinforcement learning to find better antibody with immune operators. The proposed new approach has many contributions according to other methods in the literature such as effectiveness, less memory cell, high accuracy, speed, and data adaptability. The performance of the proposed approach is demonstrated by simulation and experimental results using real data in Matlab and FPGA. Some benchmark data and remote image data are used for experimental results. The comparative results with supervised/unsupervised based artificial immune system, negative selection classifier, and resource limited artificial immune classifier are given to demonstrate the effectiveness of the proposed new method.

  18. Social Influence as Reinforcement Learning

    Science.gov (United States)

    2016-01-13

    a brain region associated with motivation and reward learning. Further, individuals’ level of striatal activity in response to consensus tracks...experiment. Economics Letters, 2001. 71(3): p. 397-404. 14. Ledyard, J., Public goods: A survey of experimental research. Pub Econ , 1994.

  19. Building Multi-Agent Systems Using Jason

    DEFF Research Database (Denmark)

    Boss, Niklas Skamriis; Jensen, Andreas Schmidt; Villadsen, Jørgen

    2010-01-01

    We provide a detailed description of the Jason-DTU system, including the used methodology, tools as well as team strategy. We also discuss the experience gathered in the contest. In spring 2009 the course “Artificial Intelligence and Multi- Agent Systems” was held for the first time...... on the Technical University of Denmark (DTU). A part of this course was a short introduction to the multi-agent framework Jason, which is an interpreter for AgentSpeak, an agent-oriented programming language. As the final project in this course a solution to the Multi-Agent Programming Contest from 2007, the Gold...

  20. Multi-agent and complex systems

    CERN Document Server

    Ren, Fenghui; Fujita, Katsuhide; Zhang, Minjie; Ito, Takayuki

    2017-01-01

    This book provides a description of advanced multi-agent and artificial intelligence technologies for the modeling and simulation of complex systems, as well as an overview of the latest scientific efforts in this field. A complex system features a large number of interacting components, whose aggregate activities are nonlinear and self-organized. A multi-agent system is a group or society of agents which interact with others cooperatively and/or competitively in order to reach their individual or common goals. Multi-agent systems are suitable for modeling and simulation of complex systems, which is difficult to accomplish using traditional computational approaches.

  1. Reinforcement Learning and Savings Behavior.

    Science.gov (United States)

    Choi, James J; Laibson, David; Madrian, Brigitte C; Metrick, Andrew

    2009-12-01

    We show that individual investors over-extrapolate from their personal experience when making savings decisions. Investors who experience particularly rewarding outcomes from saving in their 401(k)-a high average and/or low variance return-increase their 401(k) savings rate more than investors who have less rewarding experiences with saving. This finding is not driven by aggregate time-series shocks, income effects, rational learning about investing skill, investor fixed effects, or time-varying investor-level heterogeneity that is correlated with portfolio allocations to stock, bond, and cash asset classes. We discuss implications for the equity premium puzzle and interventions aimed at improving household financial outcomes.

  2. Reinforcement learning for microgrid energy management

    International Nuclear Information System (INIS)

    Kuznetsova, Elizaveta; Li, Yan-Fu; Ruiz, Carlos; Zio, Enrico; Ault, Graham; Bell, Keith

    2013-01-01

    We consider a microgrid for energy distribution, with a local consumer, a renewable generator (wind turbine) and a storage facility (battery), connected to the external grid via a transformer. We propose a 2 steps-ahead reinforcement learning algorithm to plan the battery scheduling, which plays a key role in the achievement of the consumer goals. The underlying framework is one of multi-criteria decision-making by an individual consumer who has the goals of increasing the utilization rate of the battery during high electricity demand (so as to decrease the electricity purchase from the external grid) and increasing the utilization rate of the wind turbine for local use (so as to increase the consumer independence from the external grid). Predictions of available wind power feed the reinforcement learning algorithm for selecting the optimal battery scheduling actions. The embedded learning mechanism allows to enhance the consumer knowledge about the optimal actions for battery scheduling under different time-dependent environmental conditions. The developed framework gives the capability to intelligent consumers to learn the stochastic environment and make use of the experience to select optimal energy management actions. - Highlights: • A consumer exploits a 2 steps-ahead reinforcement learning for battery scheduling. • The Q-learning based mechanism is fed by the predictions of available wind power. • Wind speed state evolutions are modeled with a Markov chain model. • Optimal scheduling actions are learned through the occurrence of similar scenarios. • The consumer manifests a continuous enhance of his knowledge about optimal actions

  3. Argumentation and Multi-Agent Decision Making

    OpenAIRE

    Parsons, S.; Jennings, N. R.

    1998-01-01

    This paper summarises our on-going work on mixed- initiative decision making which extends both classical decision theory and a symbolic theory of decision making based on argumentation to a multi-agent domain.

  4. Reinforcement Learning and Savings Behavior*

    Science.gov (United States)

    Choi, James J.; Laibson, David; Madrian, Brigitte C.; Metrick, Andrew

    2009-01-01

    We show that individual investors over-extrapolate from their personal experience when making savings decisions. Investors who experience particularly rewarding outcomes from saving in their 401(k)—a high average and/or low variance return—increase their 401(k) savings rate more than investors who have less rewarding experiences with saving. This finding is not driven by aggregate time-series shocks, income effects, rational learning about investing skill, investor fixed effects, or time-varying investor-level heterogeneity that is correlated with portfolio allocations to stock, bond, and cash asset classes. We discuss implications for the equity premium puzzle and interventions aimed at improving household financial outcomes. PMID:20352013

  5. Online constrained model-based reinforcement learning

    CSIR Research Space (South Africa)

    Van Niekerk, B

    2017-08-01

    Full Text Available Constrained Model-based Reinforcement Learning Benjamin van Niekerk School of Computer Science University of the Witwatersrand South Africa Andreas Damianou∗ Amazon.com Cambridge, UK Benjamin Rosman Council for Scientific and Industrial Research, and School... MULTIPLE SHOOTING Using direct multiple shooting (Bock and Plitt, 1984), problem (1) can be transformed into a structured non- linear program (NLP). First, the time horizon [t0, t0 + T ] is partitioned into N equal subintervals [tk, tk+1] for k = 0...

  6. Enhanced risk management by an emerging multi-agent architecture

    Science.gov (United States)

    Lin, Sin-Jin; Hsu, Ming-Fu

    2014-07-01

    Classification in imbalanced datasets has attracted much attention from researchers in the field of machine learning. Most existing techniques tend not to perform well on minority class instances when the dataset is highly skewed because they focus on minimising the forecasting error without considering the relative distribution of each class. This investigation proposes an emerging multi-agent architecture, grounded on cooperative learning, to solve the class-imbalanced classification problem. Additionally, this study deals further with the obscure nature of the multi-agent architecture and expresses comprehensive rules for auditors. The results from this study indicate that the presented model performs satisfactorily in risk management and is able to tackle a highly class-imbalanced dataset comparatively well. Furthermore, the knowledge visualised process, supported by real examples, can assist both internal and external auditors who must allocate limited detecting resources; they can take the rules as roadmaps to modify the auditing programme.

  7. Enriching behavioral ecology with reinforcement learning methods.

    Science.gov (United States)

    Frankenhuis, Willem E; Panchanathan, Karthik; Barto, Andrew G

    2018-02-13

    This article focuses on the division of labor between evolution and development in solving sequential, state-dependent decision problems. Currently, behavioral ecologists tend to use dynamic programming methods to study such problems. These methods are successful at predicting animal behavior in a variety of contexts. However, they depend on a distinct set of assumptions. Here, we argue that behavioral ecology will benefit from drawing more than it currently does on a complementary collection of tools, called reinforcement learning methods. These methods allow for the study of behavior in highly complex environments, which conventional dynamic programming methods do not feasibly address. In addition, reinforcement learning methods are well-suited to studying how biological mechanisms solve developmental and learning problems. For instance, we can use them to study simple rules that perform well in complex environments. Or to investigate under what conditions natural selection favors fixed, non-plastic traits (which do not vary across individuals), cue-driven-switch plasticity (innate instructions for adaptive behavioral development based on experience), or developmental selection (the incremental acquisition of adaptive behavior based on experience). If natural selection favors developmental selection, which includes learning from environmental feedback, we can also make predictions about the design of reward systems. Our paper is written in an accessible manner and for a broad audience, though we believe some novel insights can be drawn from our discussion. We hope our paper will help advance the emerging bridge connecting the fields of behavioral ecology and reinforcement learning. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  8. SCAFFOLDINGAND REINFORCEMENT: USING DIGITAL LOGBOOKS IN LEARNING VOCABULARY

    OpenAIRE

    Khalifa, Salma Hasan Almabrouk; Shabdin, Ahmad Affendi

    2016-01-01

    Reinforcement and scaffolding are tested approaches to enhance learning achievements. Keeping a record of the learning process as well as the new learned words functions as scaffolding to help learners build a comprehensive vocabulary. Similarly, repetitive learning of new words reinforces permanent learning for long-term memory. Paper-based logbooks may prove to be good records of the learning process, but if learners use digital logbooks, the results may be even better. Digital logbooks wit...

  9. Human demonstrations for fast and safe exploration in reinforcement learning

    NARCIS (Netherlands)

    Schonebaum, G.K.; Junell, J.L.; van Kampen, E.

    2017-01-01

    Reinforcement learning is a promising framework for controlling complex vehicles with a high level of autonomy, since it does not need a dynamic model of the vehicle, and it is able to adapt to changing conditions. When learning from scratch, the performance of a reinforcement learning controller

  10. Reinforcement learning account of network reciprocity.

    Science.gov (United States)

    Ezaki, Takahiro; Masuda, Naoki

    2017-01-01

    Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model) approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results.

  11. Reinforcement learning account of network reciprocity.

    Directory of Open Access Journals (Sweden)

    Takahiro Ezaki

    Full Text Available Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results.

  12. Ontology-based multi-agent systems

    Energy Technology Data Exchange (ETDEWEB)

    Hadzic, Maja; Wongthongtham, Pornpit; Dillon, Tharam; Chang, Elizabeth [Digital Ecosystems and Business Intelligence Institute, Perth, WA (Australia)

    2009-07-01

    The Semantic web has given a great deal of impetus to the development of ontologies and multi-agent systems. Several books have appeared which discuss the development of ontologies or of multi-agent systems separately on their own. The growing interaction between agents and ontologies has highlighted the need for integrated development of these. This book is unique in being the first to provide an integrated treatment of the modeling, design and implementation of such combined ontology/multi-agent systems. It provides clear exposition of this integrated modeling and design methodology. It further illustrates this with two detailed case studies in (a) the biomedical area and (b) the software engineering area. The book is, therefore, of interest to researchers, graduate students and practitioners in the semantic web and web science area. (orig.)

  13. Improving Multi-Agent Systems Using Jason

    DEFF Research Database (Denmark)

    Vester, Steen; Boss, Niklas Skamriis; Jensen, Andreas Schmidt

    2011-01-01

    We describe the approach used to develop the multi-agent system of herders that competed as the Jason-DTU team at the Multi-Agent Programming Contest 2010. We also participated in 2009 with a system developed in the agentoriented programming language Jason which is an extension of AgentSpeak. We ...... used the implementation from 2009 as a foundation and therefore much of the work done this year was on improving that implementation. We present a description which includes design and analysis of the system as well as the main features of our agent team strategy. In addition we discuss...

  14. Multi-agent systems simulation and applications

    CERN Document Server

    Uhrmacher, Adelinde M

    2009-01-01

    Methodological Guidelines for Modeling and Developing MAS-Based SimulationsThe intersection of agents, modeling, simulation, and application domains has been the subject of active research for over two decades. Although agents and simulation have been used effectively in a variety of application domains, much of the supporting research remains scattered in the literature, too often leaving scientists to develop multi-agent system (MAS) models and simulations from scratch. Multi-Agent Systems: Simulation and Applications provides an overdue review of the wide ranging facets of MAS simulation, i

  15. Research and application of multi-agent genetic algorithm in tower defense game

    Science.gov (United States)

    Jin, Shaohua

    2018-04-01

    In this paper, a new multi-agent genetic algorithm based on orthogonal experiment is proposed, which is based on multi-agent system, genetic algorithm and orthogonal experimental design. The design of neighborhood competition operator, orthogonal crossover operator, Son and self-learning operator. The new algorithm is applied to mobile tower defense game, according to the characteristics of the game, the establishment of mathematical models, and finally increases the value of the game's monster.

  16. Ensemble Network Architecture for Deep Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Xi-liang Chen

    2018-01-01

    Full Text Available The popular deep Q learning algorithm is known to be instability because of the Q-value’s shake and overestimation action values under certain conditions. These issues tend to adversely affect their performance. In this paper, we develop the ensemble network architecture for deep reinforcement learning which is based on value function approximation. The temporal ensemble stabilizes the training process by reducing the variance of target approximation error and the ensemble of target values reduces the overestimate and makes better performance by estimating more accurate Q-value. Our results show that this architecture leads to statistically significant better value evaluation and more stable and better performance on several classical control tasks at OpenAI Gym environment.

  17. Optimizing Chemical Reactions with Deep Reinforcement Learning.

    Science.gov (United States)

    Zhou, Zhenpeng; Li, Xiaocheng; Zare, Richard N

    2017-12-27

    Deep reinforcement learning was employed to optimize chemical reactions. Our model iteratively records the results of a chemical reaction and chooses new experimental conditions to improve the reaction outcome. This model outperformed a state-of-the-art blackbox optimization algorithm by using 71% fewer steps on both simulations and real reactions. Furthermore, we introduced an efficient exploration strategy by drawing the reaction conditions from certain probability distributions, which resulted in an improvement on regret from 0.062 to 0.039 compared with a deterministic policy. Combining the efficient exploration policy with accelerated microdroplet reactions, optimal reaction conditions were determined in 30 min for the four reactions considered, and a better understanding of the factors that control microdroplet reactions was reached. Moreover, our model showed a better performance after training on reactions with similar or even dissimilar underlying mechanisms, which demonstrates its learning ability.

  18. Vicarious reinforcement learning signals when instructing others.

    Science.gov (United States)

    Apps, Matthew A J; Lesage, Elise; Ramnani, Narender

    2015-02-18

    Reinforcement learning (RL) theory posits that learning is driven by discrepancies between the predicted and actual outcomes of actions (prediction errors [PEs]). In social environments, learning is often guided by similar RL mechanisms. For example, teachers monitor the actions of students and provide feedback to them. This feedback evokes PEs in students that guide their learning. We report the first study that investigates the neural mechanisms that underpin RL signals in the brain of a teacher. Neurons in the anterior cingulate cortex (ACC) signal PEs when learning from the outcomes of one's own actions but also signal information when outcomes are received by others. Does a teacher's ACC signal PEs when monitoring a student's learning? Using fMRI, we studied brain activity in human subjects (teachers) as they taught a confederate (student) action-outcome associations by providing positive or negative feedback. We examined activity time-locked to the students' responses, when teachers infer student predictions and know actual outcomes. We fitted a RL-based computational model to the behavior of the student to characterize their learning, and examined whether a teacher's ACC signals when a student's predictions are wrong. In line with our hypothesis, activity in the teacher's ACC covaried with the PE values in the model. Additionally, activity in the teacher's insula and ventromedial prefrontal cortex covaried with the predicted value according to the student. Our findings highlight that the ACC signals PEs vicariously for others' erroneous predictions, when monitoring and instructing their learning. These results suggest that RL mechanisms, processed vicariously, may underpin and facilitate teaching behaviors. Copyright © 2015 Apps et al.

  19. Optimizing microstimulation using a reinforcement learning framework.

    Science.gov (United States)

    Brockmeier, Austin J; Choi, John S; Distasio, Marcello M; Francis, Joseph T; Príncipe, José C

    2011-01-01

    The ability to provide sensory feedback is desired to enhance the functionality of neuroprosthetics. Somatosensory feedback provides closed-loop control to the motor system, which is lacking in feedforward neuroprosthetics. In the case of existing somatosensory function, a template of the natural response can be used as a template of desired response elicited by electrical microstimulation. In the case of no initial training data, microstimulation parameters that produce responses close to the template must be selected in an online manner. We propose using reinforcement learning as a framework to balance the exploration of the parameter space and the continued selection of promising parameters for further stimulation. This approach avoids an explicit model of the neural response from stimulation. We explore a preliminary architecture--treating the task as a k-armed bandit--using offline data recorded for natural touch and thalamic microstimulation, and we examine the methods efficiency in exploring the parameter space while concentrating on promising parameter forms. The best matching stimulation parameters, from k = 68 different forms, are selected by the reinforcement learning algorithm consistently after 334 realizations.

  20. Data Mining Process Optimization in Computational Multi-agent Systems

    OpenAIRE

    Kazík, O.; Neruda, R. (Roman)

    2015-01-01

    In this paper, we present an agent-based solution of metalearning problem which focuses on optimization of data mining processes. We exploit the framework of computational multi-agent systems in which various meta-learning problems have been already studied, e.g. parameter-space search or simple method recommendation. In this paper, we examine the effect of data preprocessing for machine learning problems. We perform the set of experiments in the search-space of data mining processes which is...

  1. Reinforcement Learning for Ramp Control: An Analysis of Learning Parameters

    Directory of Open Access Journals (Sweden)

    Chao Lu

    2016-08-01

    Full Text Available Reinforcement Learning (RL has been proposed to deal with ramp control problems under dynamic traffic conditions; however, there is a lack of sufficient research on the behaviour and impacts of different learning parameters. This paper describes a ramp control agent based on the RL mechanism and thoroughly analyzed the influence of three learning parameters; namely, learning rate, discount rate and action selection parameter on the algorithm performance. Two indices for the learning speed and convergence stability were used to measure the algorithm performance, based on which a series of simulation-based experiments were designed and conducted by using a macroscopic traffic flow model. Simulation results showed that, compared with the discount rate, the learning rate and action selection parameter made more remarkable impacts on the algorithm performance. Based on the analysis, some suggestionsabout how to select suitable parameter values that can achieve a superior performance were provided.

  2. Reusable Reinforcement Learning via Shallow Trails.

    Science.gov (United States)

    Yu, Yang; Chen, Shi-Yong; Da, Qing; Zhou, Zhi-Hua

    2018-06-01

    Reinforcement learning has shown great success in helping learning agents accomplish tasks autonomously from environment interactions. Meanwhile in many real-world applications, an agent needs to accomplish not only a fixed task but also a range of tasks. For this goal, an agent can learn a metapolicy over a set of training tasks that are drawn from an underlying distribution. By maximizing the total reward summed over all the training tasks, the metapolicy can then be reused in accomplishing test tasks from the same distribution. However, in practice, we face two major obstacles to train and reuse metapolicies well. First, how to identify tasks that are unrelated or even opposite with each other, in order to avoid their mutual interference in the training. Second, how to characterize task features, according to which a metapolicy can be reused. In this paper, we propose the MetA-Policy LEarning (MAPLE) approach that overcomes the two difficulties by introducing the shallow trail. It probes a task by running a roughly trained policy. Using the rewards of the shallow trail, MAPLE automatically groups similar tasks. Moreover, when the task parameters are unknown, the rewards of the shallow trail also serve as task features. Empirical studies on several controlling tasks verify that MAPLE can train metapolicies well and receives high reward on test tasks.

  3. Semiotics, Multi-Agent Systems and Organizations

    NARCIS (Netherlands)

    Gazendam, H.W.M.; Jorna, René J.

    1998-01-01

    Multi-agent systems are promising as models of organization because they are based on the idea that most work in human organizations is done based on intelligence, communication, cooperation, and massive parallel processing. They offer an alternative for system theories of organization, which are

  4. Adaptive hierarchical multi-agent organizations

    NARCIS (Netherlands)

    Ghijsen, M.; Jansweijer, W.N.H.; Wielinga, B.J.; Babuška, R.; Groen, F.C.A.

    2010-01-01

    In this chapter, we discuss the design of adaptive hierarchical organizations for multi-agent systems (MAS). Hierarchical organizations have a number of advantages such as their ability to handle complex problems and their scalability to large organizations. By introducing adaptivity in the

  5. Cooperative heuristic multi-agent planning

    NARCIS (Netherlands)

    De Weerdt, M.M.; Tonino, J.F.M.; Witteveen, C.

    2001-01-01

    In this paper we will use the framework to study cooperative heuristic multi-agent planning. During the construction of their plans, the agents use a heuristic function inspired by the FF planner (l3l). At any time in the process of planning the agents may exchange available resources, or they may

  6. An analysis of multi-agent diagnosis

    NARCIS (Netherlands)

    Roos, Nico; Ten Teije, Annette; Bos, André; Witteveen, Cees; Castelfranchi, C.; Johnson, W.L.

    2002-01-01

    This paper analyzes the use of a Multi-Agent System for Model-Based Diagnosis. In a large dynamical system, it is often infeasible or even impossible to maintain a model of the whole system. Instead, several incomplete models of the system have to be used to establish a diagnosis and to detect

  7. Mansion, A Distributed Multi-Agent System

    NARCIS (Netherlands)

    van t Noordende, G.; Brazier, F.M.; Tanenbaum, A.S.

    2001-01-01

    In this position summary we present work in progress on a worldwide, scalable multi-agent system, based on a paradigm of hyperlinked rooms. The framework offers facilities for managing distribution, security and mobility aspects for both active elements (agents) and passive elements (objects) in the

  8. Framework for robot skill learning using reinforcement learning

    Science.gov (United States)

    Wei, Yingzi; Zhao, Mingyang

    2003-09-01

    Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.

  9. 14th International Conference on Practical Applications of Agents and Multi-Agent Systems : Special Sessions

    CERN Document Server

    Escalona, María; Corchuelo, Rafael; Mathieu, Philippe; Vale, Zita; Campbell, Andrew; Rossi, Silvia; Adam, Emmanuel; Jiménez-López, María; Navarro, Elena; Moreno, María

    2016-01-01

    PAAMS, the International Conference on Practical Applications of Agents and Multi-Agent Systems is an evolution of the International Workshop on Practical Applications of Agents and Multi-Agent Systems. PAAMS is an international yearly tribune to present, to discuss, and to disseminate the latest developments and the most important outcomes related to real-world applications. It provides a unique opportunity to bring multi-disciplinary experts, academics and practitioners together to exchange their experience in the development of Agents and Multi-Agent Systems. This volume presents the papers that have been accepted for the 2016 in the special sessions: Agents Behaviours and Artificial Markets (ABAM); Advances on Demand Response and Renewable Energy Sources in Agent Based Smart Grids (ADRESS); Agents and Mobile Devices (AM); Agent Methodologies for Intelligent Robotics Applications (AMIRA); Learning, Agents and Formal Languages (LAFLang); Multi-Agent Systems and Ambient Intelligence (MASMAI); Web Mining and ...

  10. Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers

    Science.gov (United States)

    Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.

    2014-01-01

    Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…

  11. Continuous residual reinforcement learning for traffic signal control optimization

    NARCIS (Netherlands)

    Aslani, Mohammad; Seipel, Stefan; Wiering, Marco

    2018-01-01

    Traffic signal control can be naturally regarded as a reinforcement learning problem. Unfortunately, it is one of the most difficult classes of reinforcement learning problems owing to its large state space. A straightforward approach to address this challenge is to control traffic signals based on

  12. Reinforcement learning in continuous state and action spaces

    NARCIS (Netherlands)

    H. P. van Hasselt (Hado); M.A. Wiering; M. van Otterlo

    2012-01-01

    textabstractMany traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action

  13. Tank War Using Online Reinforcement Learning

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-Time Strategy(RTS) games provide a challenging platform to implement online reinforcement learning(RL) techniques in a real application. Computer as one player monitors opponents'(human or other computers) strategies and then updates its own policy using RL methods. In this paper, we propose...... a multi-layer framework for implementing the online RL in a RTS game. The framework significantly reduces the RL computational complexity by decomposing the state space in a hierarchical manner. We implement the RTS game - Tank General, and perform a thorough test on the proposed framework. The results...... show the effectiveness of our proposed framework and shed light on relevant issues on using the RL in RTS games....

  14. Reinforcement learning in complementarity game and population dynamics.

    Science.gov (United States)

    Jost, Jürgen; Li, Wei

    2014-02-01

    We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005)] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.

  15. Layered Learning in Multi-Agent Systems

    Science.gov (United States)

    1998-12-15

    project almost from the beginning has tirelessly experimented with different robot architectures, always managing to pull things together and create...TEAM MEMBER AGENT ARCHITECTURE I " ! Midfielder, Left : • i ) ( ^ J Goalie , Center Home Coordinates Home Range Max Range Figure

  16. Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer.

    Science.gov (United States)

    Hu, Yujing; Gao, Yang; An, Bo

    2015-07-01

    An important approach in multiagent reinforcement learning (MARL) is equilibrium-based MARL, which adopts equilibrium solution concepts in game theory and requires agents to play equilibrium strategies at each state. However, most existing equilibrium-based MARL algorithms cannot scale due to a large number of computationally expensive equilibrium computations (e.g., computing Nash equilibria is PPAD-hard) during learning. For the first time, this paper finds that during the learning process of equilibrium-based MARL, the one-shot games corresponding to each state's successive visits often have the same or similar equilibria (for some states more than 90% of games corresponding to successive visits have similar equilibria). Inspired by this observation, this paper proposes to use equilibrium transfer to accelerate equilibrium-based MARL. The key idea of equilibrium transfer is to reuse previously computed equilibria when each agent has a small incentive to deviate. By introducing transfer loss and transfer condition, a novel framework called equilibrium transfer-based MARL is proposed. We prove that although equilibrium transfer brings transfer loss, equilibrium-based MARL algorithms can still converge to an equilibrium policy under certain assumptions. Experimental results in widely used benchmarks (e.g., grid world game, soccer game, and wall game) show that the proposed framework: 1) not only significantly accelerates equilibrium-based MARL (up to 96.7% reduction in learning time), but also achieves higher average rewards than algorithms without equilibrium transfer and 2) scales significantly better than algorithms without equilibrium transfer when the state/action space grows and the number of agents increases.

  17. Solution to reinforcement learning problems with artificial potential field

    Institute of Scientific and Technical Information of China (English)

    XIE Li-juan; XIE Guang-rong; CHEN Huan-wen; LI Xiao-li

    2008-01-01

    A novel method was designed to solve reinforcement learning problems with artificial potential field. Firstly a reinforcement learning problem was transferred to a path planning problem by using artificial potential field(APF), which was a very appropriate method to model a reinforcement learning problem. Secondly, a new APF algorithm was proposed to overcome the local minimum problem in the potential field methods with a virtual water-flow concept. The performance of this new method was tested by a gridworld problem named as key and door maze. The experimental results show that within 45 trials, good and deterministic policies are found in almost all simulations. In comparison with WIERING's HQ-learning system which needs 20 000 trials for stable solution, the proposed new method can obtain optimal and stable policy far more quickly than HQ-learning. Therefore, the new method is simple and effective to give an optimal solution to the reinforcement learning problem.

  18. Autonomous Formations of Multi-Agent Systems

    Science.gov (United States)

    Dhali, Sanjana; Joshi, Suresh M.

    2013-01-01

    Autonomous formation control of multi-agent dynamic systems has a number of applications that include ground-based and aerial robots and satellite formations. For air vehicles, formation flight ("flocking") has the potential to significantly increase airspace utilization as well as fuel efficiency. This presentation addresses two main problems in multi-agent formations: optimal role assignment to minimize the total cost (e.g., combined distance traveled by all agents); and maintaining formation geometry during flock motion. The Kuhn-Munkres ("Hungarian") algorithm is used for optimal assignment, and consensus-based leader-follower type control architecture is used to maintain formation shape despite the leader s independent movements. The methods are demonstrated by animated simulations.

  19. Switching Reinforcement Learning for Continuous Action Space

    Science.gov (United States)

    Nagayoshi, Masato; Murao, Hajime; Tamaki, Hisashi

    Reinforcement Learning (RL) attracts much attention as a technique of realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. This difficulty includes a problem of designing a suitable action space of an agent, i.e., satisfying two requirements in trade-off: (i) to keep the characteristics (or structure) of an original search space as much as possible in order to seek strategies that lie close to the optimal, and (ii) to reduce the search space as much as possible in order to expedite the learning process. In order to design a suitable action space adaptively, we propose switching RL model to mimic a process of an infant's motor development in which gross motor skills develop before fine motor skills. Then, a method for switching controllers is constructed by introducing and referring to the “entropy”. Further, through computational experiments by using robot navigation problems with one and two-dimensional continuous action space, the validity of the proposed method has been confirmed.

  20. Tunnel Ventilation Control Using Reinforcement Learning Methodology

    Science.gov (United States)

    Chu, Baeksuk; Kim, Dongnam; Hong, Daehie; Park, Jooyoung; Chung, Jin Taek; Kim, Tae-Hyung

    The main purpose of tunnel ventilation system is to maintain CO pollutant concentration and VI (visibility index) under an adequate level to provide drivers with comfortable and safe driving environment. Moreover, it is necessary to minimize power consumption used to operate ventilation system. To achieve the objectives, the control algorithm used in this research is reinforcement learning (RL) method. RL is a goal-directed learning of a mapping from situations to actions without relying on exemplary supervision or complete models of the environment. The goal of RL is to maximize a reward which is an evaluative feedback from the environment. In the process of constructing the reward of the tunnel ventilation system, two objectives listed above are included, that is, maintaining an adequate level of pollutants and minimizing power consumption. RL algorithm based on actor-critic architecture and gradient-following algorithm is adopted to the tunnel ventilation system. The simulations results performed with real data collected from existing tunnel ventilation system and real experimental verification are provided in this paper. It is confirmed that with the suggested controller, the pollutant level inside the tunnel was well maintained under allowable limit and the performance of energy consumption was improved compared to conventional control scheme.

  1. Flexible Heuristic Dynamic Programming for Reinforcement Learning in Quadrotors

    NARCIS (Netherlands)

    Helmer, Alexander; de Visser, C.C.; van Kampen, E.

    2018-01-01

    Reinforcement learning is a paradigm for learning decision-making tasks from interaction with the environment. Function approximators solve a part of the curse of dimensionality when learning in high-dimensional state and/or action spaces. It can be a time-consuming process to learn a good policy in

  2. The Multi-Agent Transport Simulation MATSim

    OpenAIRE

    Horni Andreas; Nagel Kai; Axhausen Kay W.

    2016-01-01

    "The MATSim (Multi-Agent Transport Simulation) software project was started around 2006 with the goal of generating traffic and congestion patterns by following individual synthetic travelers through their daily or weekly activity programme. It has since then evolved from a collection of stand-alone C++ programs to an integrated Java-based framework which is publicly hosted, open-source available, automatically regression tested. It is currently used by about 40 groups throughout the world. T...

  3. Episodic reinforcement learning control approach for biped walking

    Directory of Open Access Journals (Sweden)

    Katić Duško

    2012-01-01

    Full Text Available This paper presents a hybrid dynamic control approach to the realization of humanoid biped robotic walk, focusing on the policy gradient episodic reinforcement learning with fuzzy evaluative feedback. The proposed structure of controller involves two feedback loops: a conventional computed torque controller and an episodic reinforcement learning controller. The reinforcement learning part includes fuzzy information about Zero-Moment- Point errors. Simulation tests using a medium-size 36-DOF humanoid robot MEXONE were performed to demonstrate the effectiveness of our method.

  4. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning

    National Research Council Canada - National Science Library

    Bowling, Michael

    2000-01-01

    .... In this paper we contribute a comprehensive presentation of the relevant techniques for solving stochastic games from both the game theory community and reinforcement learning communities. We examine the assumptions and limitations of these algorithms, and identify similarities between these algorithms, single agent reinforcement learners, and basic game theory techniques.

  5. Lung Nodule Detection via Deep Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Issa Ali

    2018-04-01

    Full Text Available Lung cancer is the most common cause of cancer-related death globally. As a preventive measure, the United States Preventive Services Task Force (USPSTF recommends annual screening of high risk individuals with low-dose computed tomography (CT. The resulting volume of CT scans from millions of people will pose a significant challenge for radiologists to interpret. To fill this gap, computer-aided detection (CAD algorithms may prove to be the most promising solution. A crucial first step in the analysis of lung cancer screening results using CAD is the detection of pulmonary nodules, which may represent early-stage lung cancer. The objective of this work is to develop and validate a reinforcement learning model based on deep artificial neural networks for early detection of lung nodules in thoracic CT images. Inspired by the AlphaGo system, our deep learning algorithm takes a raw CT image as input and views it as a collection of states, and output a classification of whether a nodule is present or not. The dataset used to train our model is the LIDC/IDRI database hosted by the lung nodule analysis (LUNA challenge. In total, there are 888 CT scans with annotations based on agreement from at least three out of four radiologists. As a result, there are 590 individuals having one or more nodules, and 298 having none. Our training results yielded an overall accuracy of 99.1% [sensitivity 99.2%, specificity 99.1%, positive predictive value (PPV 99.1%, negative predictive value (NPV 99.2%]. In our test, the results yielded an overall accuracy of 64.4% (sensitivity 58.9%, specificity 55.3%, PPV 54.2%, and NPV 60.0%. These early results show promise in solving the major issue of false positives in CT screening of lung nodules, and may help to save unnecessary follow-up tests and expenditures.

  6. Neural Basis of Reinforcement Learning and Decision Making

    Science.gov (United States)

    Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan

    2012-01-01

    Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543

  7. Integrating distributed Bayesian inference and reinforcement learning for sensor management

    NARCIS (Netherlands)

    Grappiolo, C.; Whiteson, S.; Pavlin, G.; Bakker, B.

    2009-01-01

    This paper introduces a sensor management approach that integrates distributed Bayesian inference (DBI) and reinforcement learning (RL). DBI is implemented using distributed perception networks (DPNs), a multiagent approach to performing efficient inference, while RL is used to automatically

  8. applying reinforcement learning to the weapon assignment problem

    African Journals Online (AJOL)

    ismith

    Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy ..... closest to the threat should fire (that weapon also had the highest probability to ... Monte Carlo ..... “Reinforcement learning: Theory, methods and application to.

  9. Exploiting Best-Match Equations for Efficient Reinforcement Learning

    NARCIS (Netherlands)

    van Seijen, Harm; Whiteson, Shimon; van Hasselt, Hado; Wiering, Marco

    This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of model-free methods. Best-match learning works by approximating the solution to a set of best-match equations,

  10. Safe Exploration of State and Action Spaces in Reinforcement Learning

    OpenAIRE

    Garcia, Javier; Fernandez, Fernando

    2014-01-01

    In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some sta...

  11. An adaptive multi-agent-based approach to smart grids control and optimization

    Energy Technology Data Exchange (ETDEWEB)

    Carvalho, Marco [Florida Institute of Technology, Melbourne, FL (United States); Perez, Carlos; Granados, Adrian [Institute for Human and Machine Cognition, Ocala, FL (United States)

    2012-03-15

    In this paper, we describe a reinforcement learning-based approach to power management in smart grids. The scenarios we consider are smart grid settings where renewable power sources (e.g. Photovoltaic panels) have unpredictable variations in power output due, for example, to weather or cloud transient effects. Our approach builds on a multi-agent system (MAS)-based infrastructure for the monitoring and coordination of smart grid environments with renewable power sources and configurable energy storage devices (battery banks). Software agents are responsible for tracking and reporting power flow variations at different points in the grid, and to optimally coordinate the engagement of battery banks (i.e. charge/idle/discharge modes) to maintain energy requirements to end-users. Agents are able to share information and coordinate control actions through a parallel communications infrastructure, and are also capable of learning, from experience, how to improve their response strategies for different operational conditions. In this paper we describe our approach and address some of the challenges associated with the communications infrastructure for distributed coordination. We also present some preliminary results of our first simulations using the GridLAB-D simulation environment, created by the US Department of Energy (DoE) at Pacific Northwest National Laboratory (PNNL). (orig.)

  12. FY1995 distributed control of man-machine cooperative multi agent systems; 1995 nendo ningen kyochogata multi agent kikai system no jiritsu seigyo

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-03-01

    In the near future, distributed autonomous systems will be practical in many situations, e.g., interactive production systems, hazardous environments, nursing homes, and individual houses. The agents which consist of the distributed system must not give damages to human being and should be working economically. In this project man-machine cooperative multi agent systems are studied in many kind of respects, and basic design technology, basic control technique are developed by establishing fundamental theories and by constructing experimental systems. In this project theoretical and experimental studies are conducted in the following sub-projects: (1) Distributed cooperative control in multi agent type actuation systems (2) Control of non-holonomic systems (3) Man-machine Cooperative systems (4) Robot systems learning human skills (5) Robust force control of constrained systems In each sub-project cooperative nature between machine agent systems and human being, interference between artificial multi agents and environment and new function emergence in coordination of the multi agents and the environment, robust force control against for the environments, control methods for non-holonomic systems, robot systems which can mimic and learn human skills were studied. In each sub-project, some problems were hi-lighted and solutions for the problems have been given based on construction of experimental systems. (NEDO)

  13. Effect of reinforcement learning on coordination of multiangent systems

    Science.gov (United States)

    Bukkapatnam, Satish T. S.; Gao, Greg

    2000-12-01

    For effective coordination of distributed environments involving multiagent systems, learning ability of each agent in the environment plays a crucial role. In this paper, we develop a simple group learning method based on reinforcement, and study its effect on coordination through application to a supply chain procurement scenario involving a computer manufacturer. Here, all parties are represented by self-interested, autonomous agents, each capable of performing specific simple tasks. They negotiate with each other to perform complex tasks and thus coordinate supply chain procurement. Reinforcement learning is intended to enable each agent to reach a best negotiable price within a shortest possible time. Our simulations of the application scenario under different learning strategies reveals the positive effects of reinforcement learning on an agent's as well as the system's performance.

  14. Instructional control of reinforcement learning: a behavioral and neurocomputational investigation.

    Science.gov (United States)

    Doll, Bradley B; Jacobs, W Jake; Sanfey, Alan G; Frank, Michael J

    2009-11-24

    Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.

  15. Adversarial Reinforcement Learning in a Cyber Security Simulation}

    OpenAIRE

    Elderman, Richard; Pater, Leon; Thie, Albert; Drugan, Madalina; Wiering, Marco

    2017-01-01

    This paper focuses on cyber-security simulations in networks modeled as a Markov game with incomplete information and stochastic elements. The resulting game is an adversarial sequential decision making problem played with two agents, the attacker and defender. The two agents pit one reinforcement learning technique, like neural networks, Monte Carlo learning and Q-learning, against each other and examine their effectiveness against learning opponents. The results showed that Monte Carlo lear...

  16. Study and Application of Reinforcement Learning in Cooperative Strategy of the Robot Soccer Based on BDI Model

    Directory of Open Access Journals (Sweden)

    Wu Bo-ying

    2009-11-01

    Full Text Available The dynamic cooperation model of multi-Agent is formed by combining reinforcement learning with BDI model. In this model, the concept of the individual optimization loses its meaning, because the repayment of each Agent dose not only depend on itsself but also on the choice of other Agents. All Agents can pursue a common optimum solution and try to realize the united intention as a whole to a maximum limit. The robot moves to its goal, depending on the present positions of the other robots that cooperate with it and the present position of the ball. One of these robots cooperating with it is controlled to move by man with a joystick. In this way, Agent can be ensured to search for each state-action as frequently as possible when it carries on choosing movements, so as to shorten the time of searching for the movement space so that the convergence speed of reinforcement learning can be improved. The validity of the proposed cooperative strategy for the robot soccer has been proved by combining theoretical analysis with simulation robot soccer match (11vs11 .

  17. Applications of Deep Learning and Reinforcement Learning to Biological Data.

    Science.gov (United States)

    Mahmud, Mufti; Kaiser, Mohammed Shamim; Hussain, Amir; Vassanelli, Stefano

    2018-06-01

    Rapid advances in hardware-based technologies during the past decades have opened up new possibilities for life scientists to gather multimodal data in various application domains, such as omics, bioimaging, medical imaging, and (brain/body)-machine interfaces. These have generated novel opportunities for development of dedicated data-intensive machine learning techniques. In particular, recent research in deep learning (DL), reinforcement learning (RL), and their combination (deep RL) promise to revolutionize the future of artificial intelligence. The growth in computational power accompanied by faster and increased data storage, and declining computing costs have already allowed scientists in various fields to apply these techniques on data sets that were previously intractable owing to their size and complexity. This paper provides a comprehensive survey on the application of DL, RL, and deep RL techniques in mining biological data. In addition, we compare the performances of DL techniques when applied to different data sets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.

  18. Social Learning, Reinforcement and Crime: Evidence from Three European Cities

    Science.gov (United States)

    Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina

    2012-01-01

    This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…

  19. Systems control with generalized probabilistic fuzzy-reinforcement learning

    NARCIS (Netherlands)

    Hinojosa, J.; Nefti, S.; Kaymak, U.

    2011-01-01

    Reinforcement learning (RL) is a valuable learning method when the systems require a selection of control actions whose consequences emerge over long periods for which input-output data are not available. In most combinations of fuzzy systems and RL, the environment is considered to be

  20. An Interactive Tool for Creating Multi-Agent Systems and Interactive Agent-based Games

    DEFF Research Database (Denmark)

    Lund, Henrik Hautop; Pagliarini, Luigi

    2011-01-01

    Utilizing principles from parallel and distributed processing combined with inspiration from modular robotics, we developed the modular interactive tiles. As an educational tool, the modular interactive tiles facilitate the learning of multi-agent systems and interactive agent-based games...

  1. Multi-Agent Systems for E-Commerce

    OpenAIRE

    Solodukha, T. V.; Sosnovskiy, O. A.; Zhelezko, B. A.

    2009-01-01

    The article focuses on multi-agent systems (MAS) and domains that can benefit from multi-agent technology. In the last few years, the agent based modeling (ABM) community has developed several practical agent based modeling toolkits that enable individuals to develop agent-based applications. The comparison of agent-based modeling toolkits is given. Multi-agent systems are designed to handle changing and dynamic business processes. Any organization with complex and distributed business pro...

  2. Planning of Autonomous Multi-agent Intersection

    Directory of Open Access Journals (Sweden)

    Viksnin Ilya I.

    2016-01-01

    Full Text Available In this paper, we propose a traffic management system with agents acting on behalf autonomous vehicle at the crossroads. Alternatively to existing solutions based on usage of semiautonomous control systems with the control unit, proposed in this paper algorithm apply the principles of decentralized multi-agent control. Agents during their collaboration generate intersection plan and determinate the optimal order of road intersection for a given criterion based on the exchange of information about them and their environment. The paper contains optimization criteria for possible routes selection and experiments that perform in order to estimate the proposed model. Experiment results show that this model can significantly reduce traffic density compared to the traditional traffic management systems. Moreover, the proposed algorithm efficiency increases with road traffic density. Furthermore, the availability of control unit in the system significantly reduces the negative impact of possible failures and hacker attacks.

  3. Human-level control through deep reinforcement learning

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-01

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  4. Human-level control through deep reinforcement learning.

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-26

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  5. Applications of Multi-Agent Technology to Power Systems

    Science.gov (United States)

    Nagata, Takeshi

    Currently, agents are focus of intense on many sub-fields of computer science and artificial intelligence. Agents are being used in an increasingly wide variety of applications. Many important computing applications such as planning, process control, communication networks and concurrent systems will benefit from using multi-agent system approach. A multi-agent system is a structure given by an environment together with a set of artificial agents capable to act on this environment. Multi-agent models are oriented towards interactions, collaborative phenomena, and autonomy. This article presents the applications of multi-agent technology to the power systems.

  6. Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.

    Science.gov (United States)

    Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna

    2016-09-01

    Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing.

  7. Can model-free reinforcement learning explain deontological moral judgments?

    Science.gov (United States)

    Ayars, Alisabeth

    2016-05-01

    Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Time representation in reinforcement learning models of the basal ganglia

    Directory of Open Access Journals (Sweden)

    Samuel Joseph Gershman

    2014-01-01

    Full Text Available Reinforcement learning models have been influential in understanding many aspects of basal ganglia function, from reward prediction to action selection. Time plays an important role in these models, but there is still no theoretical consensus about what kind of time representation is used by the basal ganglia. We review several theoretical accounts and their supporting evidence. We then discuss the relationship between reinforcement learning models and the timing mechanisms that have been attributed to the basal ganglia. We hypothesize that a single computational system may underlie both reinforcement learning and interval timing—the perception of duration in the range of seconds to hours. This hypothesis, which extends earlier models by incorporating a time-sensitive action selection mechanism, may have important implications for understanding disorders like Parkinson's disease in which both decision making and timing are impaired.

  9. Adaptive Trajectory Tracking Control using Reinforcement Learning for Quadrotor

    Directory of Open Access Journals (Sweden)

    Wenjie Lou

    2016-02-01

    Full Text Available Inaccurate system parameters and unpredicted external disturbances affect the performance of non-linear controllers. In this paper, a new adaptive control algorithm under the reinforcement framework is proposed to stabilize a quadrotor helicopter. Based on a command-filtered non-linear control algorithm, adaptive elements are added and learned by policy-search methods. To predict the inaccurate system parameters, a new kernel-based regression learning method is provided. In addition, Policy learning by Weighting Exploration with the Returns (PoWER and Return Weighted Regression (RWR are utilized to learn the appropriate parameters for adaptive elements in order to cancel the effect of external disturbance. Furthermore, numerical simulations under several conditions are performed, and the ability of adaptive trajectory-tracking control with reinforcement learning are demonstrated.

  10. The Computational Development of Reinforcement Learning during Adolescence.

    Directory of Open Access Journals (Sweden)

    Stefano Palminteri

    2016-06-01

    Full Text Available Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed. Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback and a value contextualisation module (enabling symmetrical reward and punishment learning. Unlike adults, adolescent performance did not benefit from counterfactual (complete feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.

  11. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

    OpenAIRE

    He, Frank S.; Liu, Yang; Schwing, Alexander G.; Peng, Jian

    2016-01-01

    We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and...

  12. 2015 Special Sessions of the 13th International Conference on Practical Applications of Agents and Multi-Agent Systems

    CERN Document Server

    Hernández, Josefa; Mathieu, Philippe; Campbell, Andrew; Fernández-Caballero, Antonio; Moreno, María; Julián, Vicente; Alonso-Betanzos, Amparo; Jiménez-López, María; Botti, Vicente; Trends in Practical Applications of Agents, Multi-Agent Systems and Sustainability : the PAAMS Collection

    2015-01-01

    This volume presents the papers that have been accepted for the 2015 special sessions of the 13th International Conference on Practical Applications of Agents and Multi-Agent Systems, held at University of Salamanca, Spain, at 3rd-5th June, 2015: Agents Behaviours and Artificial Markets (ABAM); Agents and Mobile Devices (AM); Multi-Agent Systems and Ambient Intelligence (MASMAI); Web Mining and Recommender systems (WebMiRes); Learning, Agents and Formal Languages (LAFLang); Agent-based Modeling of Sustainable Behavior and Green Economies (AMSBGE); Emotional Software Agents (SSESA) and Intelligent Educational Systems (SSIES). The volume also includes the paper accepted for the Doctoral Consortium in PAAMS 2015. PAAMS, the International Conference on Practical Applications of Agents and Multi-Agent Systems is an evolution of the International Workshop on Practical Applications of Agents and Multi-Agent Systems. PAAMS is an international yearly tribune to present, to discuss, and to disseminate the latest develo...

  13. A reusable multi-agent architecture for active intelligent websites

    NARCIS (Netherlands)

    Jonker, C.M.; Lam, R.A.; Treur, J.

    In this paper a reusable multi-agent architecture for intelligent Websites is presented and illustrated for an electronic department store. The architecture has been designed and implemented using the compositional design method for multi-agent systems DESIRE. The agents within this architecture are

  14. Reinforcement and inference in cross-situational word learning.

    Science.gov (United States)

    Tilles, Paulo F C; Fontanari, José F

    2013-01-01

    Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation) inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only) reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.

  15. Multi-Agent Cooperative Target Search

    Directory of Open Access Journals (Sweden)

    Jinwen Hu

    2014-05-01

    Full Text Available This paper addresses a vision-based cooperative search for multiple mobile ground targets by a group of unmanned aerial vehicles (UAVs with limited sensing and communication capabilities. The airborne camera on each UAV has a limited field of view and its target discriminability varies as a function of altitude. First, by dividing the whole surveillance region into cells, a probability map can be formed for each UAV indicating the probability of target existence within each cell. Then, we propose a distributed probability map updating model which includes the fusion of measurement information, information sharing among neighboring agents, information decay and transmission due to environmental changes such as the target movement. Furthermore, we formulate the target search problem as a multi-agent cooperative coverage control problem by optimizing the collective coverage area and the detection performance. The proposed map updating model and the cooperative control scheme are distributed, i.e., assuming that each agent only communicates with its neighbors within its communication range. Finally, the effectiveness of the proposed algorithms is illustrated by simulation.

  16. Control Prosody using Multi-Agent System

    Directory of Open Access Journals (Sweden)

    Kenji MATSUI

    2014-03-01

    Full Text Available Persons who have undergone a laryngectomy have a few options to partially restore speech but no completely satisfactory device. Even though the use of an electrolarynx (EL is the easiest way for a patient to produce speech, it does not produce a natural tone and appearance is far from normal. Because of that and the fact that none of them are hands-free, the feasibility of using a motion sensor to replace a conventional EL user interface has been explored. A mobile device motion sensor with multi-agent platform has been used to investigate on/off and pitch frequency control capability. A very small battery operated ARM-based control unit has also been developed to evaluate the motion sensor based user-interface. This control unit is placed on the wrist and the vibration device against the throat using support bandage. Two different conversion methods were used for the forearm tilt angle to pitch frequency conversion: linear mapping method and F0 template-based method A perceptual evaluation has been performed with two well-trained normal speakers and ten subjects. The results of the evaluation study showed that both methods are able to produce better speech quality in terms of the naturalness.

  17. Continuum deformation of multi-agent systems

    CERN Document Server

    Rastgoftar, Hossein

    2016-01-01

    This monograph presents new algorithms for formation control of multi-agent systems (MAS) based on principles of continuum mechanics. Beginning with an overview of traditional methods, the author then introduces an innovative new approach whereby agents of an MAS are considered as particles in a continuum evolving in ℝn whose desired configuration is required to satisfy an admissible deformation function. The necessary theory and its validation on a mobile-agent-based swarm test bed are considered for two primary tasks: homogeneous transformation of the MAS and deployment of a random distribution of agents on a desired configuration. The framework for this model is based on homogeneous transformations for the evolution of an MAS under no inter-agent communication, local inter-agent communication, and intelligent perception by agents. Different communication protocols for MAS evolution, the robustness of tracking of a desired motion by an MAS evolving in ℝn, and the effect of communication delays in an MAS...

  18. Flow Navigation by Smart Microswimmers via Reinforcement Learning

    Science.gov (United States)

    Colabrese, Simona; Biferale, Luca; Celani, Antonio; Gustavsson, Kristian

    2017-11-01

    We have numerically modeled active particles which are able to acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. We show that those swimmers can learn effective strategies just by experience, using a reinforcement learning algorithm. As an example, we focus on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, exploiting the underlying flow whenever possible. The reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This work paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. ERC AdG NewTURB 339032.

  19. Reinforcement and Systemic Machine Learning for Decision Making

    CERN Document Server

    Kulkarni, Parag

    2012-01-01

    Reinforcement and Systemic Machine Learning for Decision Making There are always difficulties in making machines that learn from experience. Complete information is not always available-or it becomes available in bits and pieces over a period of time. With respect to systemic learning, there is a need to understand the impact of decisions and actions on a system over that period of time. This book takes a holistic approach to addressing that need and presents a new paradigm-creating new learning applications and, ultimately, more intelligent machines. The first book of its kind in this new an

  20. Reinforcement Learning for Online Control of Evolutionary Algorithms

    NARCIS (Netherlands)

    Eiben, A.; Horvath, Mark; Kowalczyk, Wojtek; Schut, Martijn

    2007-01-01

    The research reported in this paper is concerned with assessing the usefulness of reinforcment learning (RL) for on-line calibration of parameters in evolutionary algorithms (EA). We are running an RL procedure and the EA simultaneously and the RL is changing the EA parameters on-the-fly. We

  1. Emotion in reinforcement learning agents and robots : A survey

    NARCIS (Netherlands)

    Moerland, T.M.; Broekens, D.J.; Jonker, C.M.

    2018-01-01

    This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action

  2. Perceptual learning rules based on reinforcers and attention

    NARCIS (Netherlands)

    Roelfsema, Pieter R.; van Ooyen, Arjen; Watanabe, Takeo

    2010-01-01

    How does the brain learn those visual features that are relevant for behavior? In this article, we focus on two factors that guide plasticity of visual representations. First, reinforcers cause the global release of diffusive neuromodulatory signals that gate plasticity. Second, attentional feedback

  3. Traffic light control by multiagent reinforcement learning systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.; Groen, F.C.A.; Babuška, R.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  4. Traffic Light Control by Multiagent Reinforcement Learning Systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.J.H.M.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  5. Optimal Control via Reinforcement Learning with Symbolic Policy Approximation

    NARCIS (Netherlands)

    Kubalìk, Jiřì; Alibekov, Eduard; Babuska, R.; Dochain, Denis; Henrion, Didier; Peaucelle, Dimitri

    2017-01-01

    Model-based reinforcement learning (RL) algorithms can be used to derive optimal control laws for nonlinear dynamic systems. With continuous-valued state and input variables, RL algorithms have to rely on function approximators to represent the value function and policy mappings. This paper

  6. Joy, Distress, Hope, and Fear in Reinforcement Learning (Extended Abstract)

    NARCIS (Netherlands)

    Jacobs, E.J.; Broekens, J.; Jonker, C.M.

    2014-01-01

    In this paper we present a mapping between joy, distress, hope and fear, and Reinforcement Learning primitives. Joy / distress is a signal that is derived from the RL update signal, while hope/fear is derived from the utility of the current state. Agent-based simulation experiments replicate

  7. Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

    Directory of Open Access Journals (Sweden)

    Takahiro Ezaki

    2016-07-01

    Full Text Available Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.

  8. Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement.

    Science.gov (United States)

    Fernández, Thalía; Bosch-Bayard, Jorge; Harmony, Thalía; Caballero, María I; Díaz-Comas, Lourdes; Galán, Lídice; Ricardo-Garcell, Josefina; Aubert, Eduardo; Otero-Ojeda, Gloria

    2016-03-01

    Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer.

  9. Structure identification in fuzzy inference using reinforcement learning

    Science.gov (United States)

    Berenji, Hamid R.; Khedkar, Pratap

    1993-01-01

    In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.

  10. A multiplicative reinforcement learning model capturing learning dynamics and interindividual variability in mice

    OpenAIRE

    Bathellier, Brice; Tee, Sui Poh; Hrovat, Christina; Rumpel, Simon

    2013-01-01

    Learning speed can strongly differ across individuals. This is seen in humans and animals. Here, we measured learning speed in mice performing a discrimination task and developed a theoretical model based on the reinforcement learning framework to account for differences between individual mice. We found that, when using a multiplicative learning rule, the starting connectivity values of the model strongly determine the shape of learning curves. This is in contrast to current learning models ...

  11. Distributed Economic Dispatch in Microgrids Based on Cooperative Reinforcement Learning.

    Science.gov (United States)

    Liu, Weirong; Zhuang, Peng; Liang, Hao; Peng, Jun; Huang, Zhiwu; Weirong Liu; Peng Zhuang; Hao Liang; Jun Peng; Zhiwu Huang; Liu, Weirong; Liang, Hao; Peng, Jun; Zhuang, Peng; Huang, Zhiwu

    2018-06-01

    Microgrids incorporated with distributed generation (DG) units and energy storage (ES) devices are expected to play more and more important roles in the future power systems. Yet, achieving efficient distributed economic dispatch in microgrids is a challenging issue due to the randomness and nonlinear characteristics of DG units and loads. This paper proposes a cooperative reinforcement learning algorithm for distributed economic dispatch in microgrids. Utilizing the learning algorithm can avoid the difficulty of stochastic modeling and high computational complexity. In the cooperative reinforcement learning algorithm, the function approximation is leveraged to deal with the large and continuous state spaces. And a diffusion strategy is incorporated to coordinate the actions of DG units and ES devices. Based on the proposed algorithm, each node in microgrids only needs to communicate with its local neighbors, without relying on any centralized controllers. Algorithm convergence is analyzed, and simulations based on real-world meteorological and load data are conducted to validate the performance of the proposed algorithm.

  12. Analysis of Bullying in Cooperative Multi-agent Systems’ Communications

    Directory of Open Access Journals (Sweden)

    Celia Gutiérrez

    2013-12-01

    Full Text Available Cooperative Multi-agent Systems frameworks do not include modules to test communications yet. The proposed framework incorporates robust analysis tools using IDKAnalysis2.0 to evaluate bullying effect in communications. The present work is based on ICARO-T. This platform follows the Adaptive Multi-agent Systems paradigm. Experimentation with ICARO-T includes two deployments: the equitative and the authoritative. Results confirm the usefulness of the analysis tools when exporting to Cooperative Multi-agent Systems that use different configurations. Besides, ICARO-T is provided with new functionality by a set of tools for communication analysis.

  13. Optimal and Autonomous Control Using Reinforcement Learning: A Survey.

    Science.gov (United States)

    Kiumarsi, Bahare; Vamvoudakis, Kyriakos G; Modares, Hamidreza; Lewis, Frank L

    2018-06-01

    This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.

  14. Challenges in the Verification of Reinforcement Learning Algorithms

    Science.gov (United States)

    Van Wesel, Perry; Goodloe, Alwyn E.

    2017-01-01

    Machine learning (ML) is increasingly being applied to a wide array of domains from search engines to autonomous vehicles. These algorithms, however, are notoriously complex and hard to verify. This work looks at the assumptions underlying machine learning algorithms as well as some of the challenges in trying to verify ML algorithms. Furthermore, we focus on the specific challenges of verifying reinforcement learning algorithms. These are highlighted using a specific example. Ultimately, we do not offer a solution to the complex problem of ML verification, but point out possible approaches for verification and interesting research opportunities.

  15. Amygdala and ventral striatum make distinct contributions to reinforcement learning

    Science.gov (United States)

    Costa, Vincent D.; Monte, Olga Dal; Lucas, Daniel R.; Murray, Elisabeth A.; Averbeck, Bruno B.

    2016-01-01

    Summary Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with a RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys’ choice reaction times, which emphasized a speed-accuracy tradeoff that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL. PMID:27720488

  16. A Neuro-Control Design Based on Fuzzy Reinforcement Learning

    DEFF Research Database (Denmark)

    Katebi, S.D.; Blanke, M.

    This paper describes a neuro-control fuzzy critic design procedure based on reinforcement learning. An important component of the proposed intelligent control configuration is the fuzzy credit assignment unit which acts as a critic, and through fuzzy implications provides adjustment mechanisms....... The fuzzy credit assignment unit comprises a fuzzy system with the appropriate fuzzification, knowledge base and defuzzification components. When an external reinforcement signal (a failure signal) is received, sequences of control actions are evaluated and modified by the action applier unit. The desirable...... ones instruct the neuro-control unit to adjust its weights and are simultaneously stored in the memory unit during the training phase. In response to the internal reinforcement signal (set point threshold deviation), the stored information is retrieved by the action applier unit and utilized for re...

  17. Projective Simulation compared to reinforcement learning

    OpenAIRE

    Bjerland, Øystein Førsund

    2015-01-01

    This thesis explores the model of projective simulation (PS), a novel approach for an artificial intelligence (AI) agent. The model of PS learns by interacting with the environment it is situated in, and allows for simulating actions before real action is taken. The action selection is based on a random walk through the episodic & compositional memory (ECM), which is a network of clips that represent previous experienced percepts. The network takes percepts as inpu...

  18. Reinforcement Learning Applications to Combat Identification

    Science.gov (United States)

    2017-03-01

    ruleset in an effort to mimic simplistic cognitive decision making of a TAO/MC and establishes parameters for the experimentation. Also, there is an...the process knowledge and decision - making abilities of the human decision maker. “[A] cognitive architecture provides the fixed processes and...have bearing on decisions to affect the learning rate in an operational implementation. 3. Cognitive Functions in CID The translation of the

  19. Reinforcement learning agents providing advice in complex video games

    Science.gov (United States)

    Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa

    2014-01-01

    This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

  20. Multi-Agent Modeling in Managing Six Sigma Projects

    Directory of Open Access Journals (Sweden)

    K. Y. Chau

    2009-10-01

    Full Text Available In this paper, a multi-agent model is proposed for considering the human resources factor in decision making in relation to the six sigma project. The proposed multi-agent system is expected to increase the acccuracy of project prioritization and to stabilize the human resources service level. A simulation of the proposed multiagent model is conducted. The results show that a multi-agent model which takes into consideration human resources when making decisions about project selection and project team formation is important in enabling efficient and effective project management. The multi-agent modeling approach provides an alternative approach for improving communication and the autonomy of six sigma projects in business organizations.

  1. Multi-Agent Information Classification Using Dynamic Acquaintance Lists.

    Science.gov (United States)

    Mukhopadhyay, Snehasis; Peng, Shengquan; Raje, Rajeev; Palakal, Mathew; Mostafa, Javed

    2003-01-01

    Discussion of automated information services focuses on information classification and collaborative agents, i.e. intelligent computer programs. Highlights include multi-agent systems; distributed artificial intelligence; thesauri; document representation and classification; agent modeling; acquaintances, or remote agents discovered through…

  2. Multi Agent System Based Wide Area Protection against Cascading Events

    DEFF Research Database (Denmark)

    Liu, Zhou; Chen, Zhe; Liu, Leo

    2012-01-01

    In this paper, a multi-agent system based wide area protection scheme is proposed in order to prevent long term voltage instability induced cascading events. The distributed relays and controllers work as a device agent which not only executes the normal function automatically but also can...... the effectiveness of proposed protection strategy. The simulation results indicate that the proposed multi agent control system can effectively coordinate the distributed relays and controllers to prevent the long term voltage instability induced cascading events....

  3. Smart: sistemas multi-agente robótico

    Directory of Open Access Journals (Sweden)

    JOVANI ALBERTO JIMÉNEZ BUILES

    2008-01-01

    Full Text Available El siguiente artículo busca dar una visión global de los Sistemas Multi-Agentes Robóticos (MARS mediante una explicación de las áreas relacionadas con el tema para luego presentar el Sistema Multi-Agente Robótico (SMART. SMART es un enjambre inteligente conformado por un robot nodriza y tres robot tipo baliza (guías que navegan de manera colaborativa un escenario estructurado.

  4. Reinforcement learning for optimal control of low exergy buildings

    International Nuclear Information System (INIS)

    Yang, Lei; Nagy, Zoltan; Goffin, Philippe; Schlueter, Arno

    2015-01-01

    Highlights: • Implementation of reinforcement learning control for LowEx Building systems. • Learning allows adaptation to local environment without prior knowledge. • Presentation of reinforcement learning control for real-life applications. • Discussion of the applicability for real-life situations. - Abstract: Over a third of the anthropogenic greenhouse gas (GHG) emissions stem from cooling and heating buildings, due to their fossil fuel based operation. Low exergy building systems are a promising approach to reduce energy consumption as well as GHG emissions. They consists of renewable energy technologies, such as PV, PV/T and heat pumps. Since careful tuning of parameters is required, a manual setup may result in sub-optimal operation. A model predictive control approach is unnecessarily complex due to the required model identification. Therefore, in this work we present a reinforcement learning control (RLC) approach. The studied building consists of a PV/T array for solar heat and electricity generation, as well as geothermal heat pumps. We present RLC for the PV/T array, and the full building model. Two methods, Tabular Q-learning and Batch Q-learning with Memory Replay, are implemented with real building settings and actual weather conditions in a Matlab/Simulink framework. The performance is evaluated against standard rule-based control (RBC). We investigated different neural network structures and find that some outperformed RBC already during the learning phase. Overall, every RLC strategy for PV/T outperformed RBC by over 10% after the third year. Likewise, for the full building, RLC outperforms RBC in terms of meeting the heating demand, maintaining the optimal operation temperature and compensating more effectively for ground heat. This allows to reduce engineering costs associated with the setup of these systems, as well as decrease the return-of-invest period, both of which are necessary to create a sustainable, zero-emission building

  5. Pleasurable music affects reinforcement learning according to the listener

    Science.gov (United States)

    Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira

    2013-01-01

    Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875

  6. Multiagent cooperation and competition with deep reinforcement learning.

    Directory of Open Access Journals (Sweden)

    Ardi Tampuu

    Full Text Available Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

  7. Multiagent cooperation and competition with deep reinforcement learning.

    Science.gov (United States)

    Tampuu, Ardi; Matiisen, Tambet; Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

    2017-01-01

    Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

  8. Multiagent cooperation and competition with deep reinforcement learning

    Science.gov (United States)

    Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

    2017-01-01

    Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments. PMID:28380078

  9. Multiagent Reinforcement Learning Dynamic Spectrum Access in Cognitive Radios

    Directory of Open Access Journals (Sweden)

    Wu Chun

    2014-02-01

    Full Text Available A multiuser independent Q-learning method which does not need information interaction is proposed for multiuser dynamic spectrum accessing in cognitive radios. The method adopts self-learning paradigm, in which each CR user performs reinforcement learning only through observing individual performance reward without spending communication resource on information interaction with others. The reward is defined suitably to present channel quality and channel conflict status. The learning strategy of sufficient exploration, preference for good channel, and punishment for channel conflict is designed to implement multiuser dynamic spectrum accessing. In two users two channels scenario, a fast learning algorithm is proposed and the convergence to maximal whole reward is proved. The simulation results show that, with the proposed method, the CR system can obtain convergence of Nash equilibrium with large probability and achieve great performance of whole reward.

  10. Simulation-based optimization parametric optimization techniques and reinforcement learning

    CERN Document Server

    Gosavi, Abhijit

    2003-01-01

    Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning introduces the evolving area of simulation-based optimization. The book's objective is two-fold: (1) It examines the mathematical governing principles of simulation-based optimization, thereby providing the reader with the ability to model relevant real-life problems using these techniques. (2) It outlines the computational technology underlying these methods. Taken together these two aspects demonstrate that the mathematical and computational methods discussed in this book do work. Broadly speaking, the book has two parts: (1) parametric (static) optimization and (2) control (dynamic) optimization. Some of the book's special features are: *An accessible introduction to reinforcement learning and parametric-optimization techniques. *A step-by-step description of several algorithms of simulation-based optimization. *A clear and simple introduction to the methodology of neural networks. *A gentle introduction to converg...

  11. Reinforcement active learning in the vibrissae system: optimal object localization.

    Science.gov (United States)

    Gordon, Goren; Dorfman, Nimrod; Ahissar, Ehud

    2013-01-01

    Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment. Copyright © 2012 Elsevier Ltd. All rights reserved.

  12. Human reinforcement learning subdivides structured action spaces by learning effector-specific values

    OpenAIRE

    Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.

    2009-01-01

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable, due to the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning – such as prediction error signals for action valuation associated with dopamine and the striatum – can cope with this “curse of dimensionality...

  13. Reinforcement learning produces dominant strategies for the Iterated Prisoner's Dilemma.

    Directory of Open Access Journals (Sweden)

    Marc Harper

    Full Text Available We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms. These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.

  14. Intranasal oxytocin enhances socially-reinforced learning in rhesus monkeys

    Directory of Open Access Journals (Sweden)

    Lisa A Parr

    2014-09-01

    Full Text Available There are currently no drugs approved for the treatment of social deficits associated with autism spectrum disorders (ASD. One hypothesis for these deficits is that individuals with ASD lack the motivation to attend to social cues because those cues are not implicitly rewarding. Therefore, any drug that could enhance the rewarding quality of social stimuli could have a profound impact on the treatment of ASD, and other social disorders. Oxytocin (OT is a neuropeptide that has been effective in enhancing social cognition and social reward in humans. The present study examined the ability of OT to selectively enhance learning after social compared to nonsocial reward in rhesus monkeys, an important species for modeling the neurobiology of social behavior in humans. Monkeys were required to learn an implicit visual matching task after receiving either intranasal (IN OT or Placebo (saline. Correct trials were rewarded with the presentation of positive and negative social (play faces/threat faces or nonsocial (banana/cage locks stimuli, plus food. Incorrect trials were not rewarded. Results demonstrated a strong effect of socially-reinforced learning, monkeys’ performed significantly better when reinforced with social versus nonsocial stimuli. Additionally, socially-reinforced learning was significantly better and occurred faster after IN-OT compared to placebo treatment. Performance in the IN-OT, but not Placebo, condition was also significantly better when the reinforcement stimuli were emotionally positive compared to negative facial expressions. These data support the hypothesis that OT may function to enhance prosocial behavior in primates by increasing the rewarding quality of emotionally positive, social compared to emotionally negative or nonsocial images. These data also support the use of the rhesus monkey as a model for exploring the neurobiological basis of social behavior and its impairment.

  15. Reinforcement learning produces dominant strategies for the Iterated Prisoner's Dilemma.

    Science.gov (United States)

    Harper, Marc; Knight, Vincent; Jones, Martin; Koutsovoulos, Georgios; Glynatsi, Nikoleta E; Campbell, Owen

    2017-01-01

    We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.

  16. Emotion in reinforcement learning agents and robots: A survey

    OpenAIRE

    Moerland, T.M.; Broekens, D.J.; Jonker, C.M.

    2018-01-01

    This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action selection. Therefore, computational emotion models are usually grounded in the agent's decision making architecture, of which RL is an important subclass. Studying emotions in RL-based agents is useful for ...

  17. Regulated open multi-agent systems (ROMAS) a multi-agent approach for designing normative open systems

    CERN Document Server

    Garcia, Emilia; Botti, Vicente

    2015-01-01

    Addressing the open problem of engineering normative open systems using the multi-agent paradigm, normative open systems are explained as systems in which heterogeneous and autonomous entities and institutions coexist in a complex social and legal framework that can evolve to address the different and often conflicting objectives of the many stakeholders involved. Presenting  a software engineering approach which covers both the analysis and design of these kinds of systems, and which deals with the open issues in the area, ROMAS (Regulated Open Multi-Agent Systems) defines a specific multi-agent architecture, meta-model, methodology and CASE tool. This CASE tool is based on Model-Driven technology and integrates the graphical design with the formal verification of some properties of these systems by means of model checking techniques. Utilizing tables to enhance reader insights into the most important requirements for designing normative open multi-agent systems, the book also provides a detailed and easy t...

  18. Field tests applying multi-agent technology for distributed control. Virtual power plants and wind energy

    Energy Technology Data Exchange (ETDEWEB)

    Schaeffer, G.J.; Warmer, C.J.; Hommelberg, M.P.F.; Kamphuis, I.G.; Kok, J.K. [Energy in the Built Environment and Networks, Petten (Netherlands)

    2007-01-15

    Multi-agent technology is state of the art ICT. It is not yet widely applied in power control systems. However, it has a large potential for bottom-up, distributed control of a network with large-scale renewable energy sources (RES) and distributed energy resources (DER) in future power systems. At least two major European R and D projects (MicroGrids and CRISP) have investigated its potential. Both grid-related as well as market-related applications have been studied. This paper will focus on two field tests, performed in the Netherlands, applying multi-agent control by means of the PowerMatcher concept. The first field test focuses on the application of multi-agent technology in a commercial setting, i.e. by reducing the need for balancing power in the case of intermittent energy sources, such as wind energy. In this case the flexibility is used of demand and supply of industrial and residential consumers and producers. Imbalance reduction rates of over 40% have been achieved applying the PowerMatcher, and with a proper portfolio even larger rates are expected. In the second field test the multi-agent technology is used in the design and implementation of a virtual power plant (VPP). This VPP digitally connects a number of micro-CHP units, installed in residential dwellings, into a cluster that is controlled to reduce the local peak demand of the common low-voltage grid segment the micro-CHP units are connected to. In this way the VPP supports the local distribution system operator (DSO) to defer reinforcements in the grid infrastructure (substations and cables)

  19. Field tests applying multi-agent technology for distributed control. Virtual power plants and wind energy

    International Nuclear Information System (INIS)

    Schaeffer, G.J.; Warmer, C.J.; Hommelberg, M.P.F.; Kamphuis, I.G.; Kok, J.K.

    2007-01-01

    Multi-agent technology is state of the art ICT. It is not yet widely applied in power control systems. However, it has a large potential for bottom-up, distributed control of a network with large-scale renewable energy sources (RES) and distributed energy resources (DER) in future power systems. At least two major European R and D projects (MicroGrids and CRISP) have investigated its potential. Both grid-related as well as market-related applications have been studied. This paper will focus on two field tests, performed in the Netherlands, applying multi-agent control by means of the PowerMatcher concept. The first field test focuses on the application of multi-agent technology in a commercial setting, i.e. by reducing the need for balancing power in the case of intermittent energy sources, such as wind energy. In this case the flexibility is used of demand and supply of industrial and residential consumers and producers. Imbalance reduction rates of over 40% have been achieved applying the PowerMatcher, and with a proper portfolio even larger rates are expected. In the second field test the multi-agent technology is used in the design and implementation of a virtual power plant (VPP). This VPP digitally connects a number of micro-CHP units, installed in residential dwellings, into a cluster that is controlled to reduce the local peak demand of the common low-voltage grid segment the micro-CHP units are connected to. In this way the VPP supports the local distribution system operator (DSO) to defer reinforcements in the grid infrastructure (substations and cables)

  20. Multiagent Reinforcement Learning with Regret Matching for Robot Soccer

    Directory of Open Access Journals (Sweden)

    Qiang Liu

    2013-01-01

    Full Text Available This paper proposes a novel multiagent reinforcement learning (MARL algorithm Nash- learning with regret matching, in which regret matching is used to speed up the well-known MARL algorithm Nash- learning. It is critical that choosing a suitable strategy for action selection to harmonize the relation between exploration and exploitation to enhance the ability of online learning for Nash- learning. In Markov Game the joint action of agents adopting regret matching algorithm can converge to a group of points of no-regret that can be viewed as coarse correlated equilibrium which includes Nash equilibrium in essence. It is can be inferred that regret matching can guide exploration of the state-action space so that the rate of convergence of Nash- learning algorithm can be increased. Simulation results on robot soccer validate that compared to original Nash- learning algorithm, the use of regret matching during the learning phase of Nash- learning has excellent ability of online learning and results in significant performance in terms of scores, average reward and policy convergence.

  1. Multi-agent Water Resources Management

    Science.gov (United States)

    Castelletti, A.; Giuliani, M.

    2011-12-01

    Increasing environmental awareness and emerging trends such as water trading, energy market, deregulation and democratization of water-related services are challenging integrated water resources planning and management worldwide. The traditional approach to water management design based on sector-by-sector optimization has to be reshaped to account for multiple interrelated decision-makers and many stakeholders with increasing decision power. Centralized management, though interesting from a conceptual point of view, is unfeasible in most of the modern social and institutional contexts, and often economically inefficient. Coordinated management, where different actors interact within a full open trust exchange paradigm under some institutional supervision is a promising alternative to the ideal centralized solution and the actual uncoordinated practices. This is a significant issue in most of the Southern Alps regulated lakes, where upstream hydropower reservoirs maximize their benefit independently form downstream users; it becomes even more relevant in the case of transboundary systems, where water management upstream affects water availability downstream (e.g. the River Zambesi flowing through Zambia, Zimbabwe and Mozambique or the Red River flowing from South-Western China through Northern Vietnam. In this study we apply Multi-Agent Systems (MAS) theory to design an optimal management in a decentralized way, considering a set of multiple autonomous agents acting in the same environment and taking into account the pay-off of individual water users, which are inherently distributed along the river and need to coordinate to jointly reach their objectives. In this way each real-world actor, representing the decision-making entity (e.g. the operator of a reservoir or a diversion dam) can be represented one-to-one by a computer agent, defined as a computer system that is situated in some environment and that is capable of autonomous action in this environment in

  2. Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning.

    Science.gov (United States)

    Mkrtchian, Anahit; Aylward, Jessica; Dayan, Peter; Roiser, Jonathan P; Robinson, Oliver J

    2017-10-01

    Serious and debilitating symptoms of anxiety are the most common mental health problem worldwide, accounting for around 5% of all adult years lived with disability in the developed world. Avoidance behavior-avoiding social situations for fear of embarrassment, for instance-is a core feature of such anxiety. However, as for many other psychiatric symptoms the biological mechanisms underlying avoidance remain unclear. Reinforcement learning models provide formal and testable characterizations of the mechanisms of decision making; here, we examine avoidance in these terms. A total of 101 healthy participants and individuals with mood and anxiety disorders completed an approach-avoidance go/no-go task under stress induced by threat of unpredictable shock. We show an increased reliance in the mood and anxiety group on a parameter of our reinforcement learning model that characterizes a prepotent (pavlovian) bias to withhold responding in the face of negative outcomes. This was particularly the case when the mood and anxiety group was under stress. This formal description of avoidance within the reinforcement learning framework provides a new means of linking clinical symptoms with biophysically plausible models of neural circuitry and, as such, takes us closer to a mechanistic understanding of mood and anxiety disorders. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  3. Towards autonomous neuroprosthetic control using Hebbian reinforcement learning.

    Science.gov (United States)

    Mahmoudi, Babak; Pohlmeyer, Eric A; Prins, Noeline W; Geng, Shijia; Sanchez, Justin C

    2013-12-01

    Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.

  4. Functional Contour-following via Haptic Perception and Reinforcement Learning.

    Science.gov (United States)

    Hellman, Randall B; Tekin, Cem; van der Schaar, Mihaela; Santos, Veronica J

    2018-01-01

    Many tasks involve the fine manipulation of objects despite limited visual feedback. In such scenarios, tactile and proprioceptive feedback can be leveraged for task completion. We present an approach for real-time haptic perception and decision-making for a haptics-driven, functional contour-following task: the closure of a ziplock bag. This task is challenging for robots because the bag is deformable, transparent, and visually occluded by artificial fingertip sensors that are also compliant. A deep neural net classifier was trained to estimate the state of a zipper within a robot's pinch grasp. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards by balancing exploration versus exploitation of the state-action space. The C-MAB learner outperformed a benchmark Q-learner by more efficiently exploring the state-action space while learning a hard-to-code task. The learned C-MAB policy was tested with novel ziplock bag scenarios and contours (wire, rope). Importantly, this work contributes to the development of reinforcement learning approaches that account for limited resources such as hardware life and researcher time. As robots are used to perform complex, physically interactive tasks in unstructured or unmodeled environments, it becomes important to develop methods that enable efficient and effective learning with physical testbeds.

  5. Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play

    NARCIS (Netherlands)

    van der Ree, Michiel; Wiering, Marco

    2013-01-01

    This paper compares three strategies in using reinforcement learning algorithms to let an artificial agent learnto play the game of Othello. The three strategies that are compared are: Learning by self-play, learning from playing against a fixed opponent, and learning from playing against a fixed

  6. A Multi-Agent Based Energy Management Solution for Integrated Buildings and Microgrid System

    DEFF Research Database (Denmark)

    Anvari-Moghaddam, Amjad; Rahimi-Kian, Ashkan; Mirian, Maryam S.

    2017-01-01

    -reflex to complex learning agents are designed and implemented to cooperate with each other to reach an optimal operating strategy for the mentioned integrated energy system (IES) while meeting the system’s objectives and related constraints. The optimization process for the EMS is defined as a coordinated......In this paper, an ontology-driven multi-agent based energy management system (EMS) is proposed for monitoring and optimal control of an integrated homes/buildings and microgrid system with various renewable energy resources (RESs) and controllable loads. Different agents ranging from simple...... distributed generation (DG) and demand response (DR) management problem within the studied environment and is solved by the proposed agent-based approach utilizing cooperation and communication among decision agents. To verify the effectiveness and applicability of the proposed multi-agent based EMS, several...

  7. Learning alternative movement coordination patterns using reinforcement feedback.

    Science.gov (United States)

    Lin, Tzu-Hsiang; Denomme, Amber; Ranganathan, Rajiv

    2018-05-01

    One of the characteristic features of the human motor system is redundancy-i.e., the ability to achieve a given task outcome using multiple coordination patterns. However, once participants settle on using a specific coordination pattern, the process of learning to use a new alternative coordination pattern to perform the same task is still poorly understood. Here, using two experiments, we examined this process of how participants shift from one coordination pattern to another using different reinforcement schedules. Participants performed a virtual reaching task, where they moved a cursor to different targets positioned on the screen. Our goal was to make participants use a coordination pattern with greater trunk motion, and to this end, we provided reinforcement by making the cursor disappear if the trunk motion during the reach did not cross a specified threshold value. In Experiment 1, we compared two reinforcement schedules in two groups of participants-an abrupt group, where the threshold was introduced immediately at the beginning of practice; and a gradual group, where the threshold was introduced gradually with practice. Results showed that both abrupt and gradual groups were effective in shifting their coordination patterns to involve greater trunk motion, but the abrupt group showed greater retention when the reinforcement was removed. In Experiment 2, we examined the basis of this advantage in the abrupt group using two additional control groups. Results showed that the advantage of the abrupt group was because of a greater number of practice trials with the desired coordination pattern. Overall, these results show that reinforcement can be successfully used to shift coordination patterns, which has potential in the rehabilitation of movement disorders.

  8. Manufacturing Scheduling Using Colored Petri Nets and Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Maria Drakaki

    2017-02-01

    Full Text Available Agent-based intelligent manufacturing control systems are capable to efficiently respond and adapt to environmental changes. Manufacturing system adaptation and evolution can be addressed with learning mechanisms that increase the intelligence of agents. In this paper a manufacturing scheduling method is presented based on Timed Colored Petri Nets (CTPNs and reinforcement learning (RL. CTPNs model the manufacturing system and implement the scheduling. In the search for an optimal solution a scheduling agent uses RL and in particular the Q-learning algorithm. A warehouse order-picking scheduling is presented as a case study to illustrate the method. The proposed scheduling method is compared to existing methods. Simulation and state space results are used to evaluate performance and identify system properties.

  9. Reinforcement Learning Based on the Bayesian Theorem for Electricity Markets Decision Support

    DEFF Research Database (Denmark)

    Sousa, Tiago; Pinto, Tiago; Praca, Isabel

    2014-01-01

    This paper presents the applicability of a reinforcement learning algorithm based on the application of the Bayesian theorem of probability. The proposed reinforcement learning algorithm is an advantageous and indispensable tool for ALBidS (Adaptive Learning strategic Bidding System), a multi...

  10. Reinforcement Learning Based Web Service Compositions for Mobile Business

    Science.gov (United States)

    Zhou, Juan; Chen, Shouming

    In this paper, we propose a new solution to Reactive Web Service Composition, via molding with Reinforcement Learning, and introducing modified (alterable) QoS variables into the model as elements in the Markov Decision Process tuple. Moreover, we give an example of Reactive-WSC-based mobile banking, to demonstrate the intrinsic capability of the solution in question of obtaining the optimized service composition, characterized by (alterable) target QoS variable sets with optimized values. Consequently, we come to the conclusion that the solution has decent potentials in boosting customer experiences and qualities of services in Web Services, and those in applications in the whole electronic commerce and business sector.

  11. Space Objects Maneuvering Detection and Prediction via Inverse Reinforcement Learning

    Science.gov (United States)

    Linares, R.; Furfaro, R.

    This paper determines the behavior of Space Objects (SOs) using inverse Reinforcement Learning (RL) to estimate the reward function that each SO is using for control. The approach discussed in this work can be used to analyze maneuvering of SOs from observational data. The inverse RL problem is solved using the Feature Matching approach. This approach determines the optimal reward function that a SO is using while maneuvering by assuming that the observed trajectories are optimal with respect to the SO's own reward function. This paper uses estimated orbital elements data to determine the behavior of SOs in a data-driven fashion.

  12. Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

    Science.gov (United States)

    Uchibe, Eiji; Doya, Kenji

    2008-12-01

    Understanding the design principle of reward functions is a substantial challenge both in artificial intelligence and neuroscience. Successful acquisition of a task usually requires not only rewards for goals, but also for intermediate states to promote effective exploration. This paper proposes a method for designing 'intrinsic' rewards of autonomous agents by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use Cyber Rodent robots, in which collision avoidance, recharging from battery packs, and 'mating' by software reproduction are three major 'extrinsic' rewards. We show in hardware experiments that the robots can find appropriate 'intrinsic' rewards for the vision of battery packs and other robots to promote approach behaviors.

  13. Stochastic abstract policies: generalizing knowledge to improve reinforcement learning.

    Science.gov (United States)

    Koga, Marcelo L; Freire, Valdinei; Costa, Anna H R

    2015-01-01

    Reinforcement learning (RL) enables an agent to learn behavior by acquiring experience through trial-and-error interactions with a dynamic environment. However, knowledge is usually built from scratch and learning to behave may take a long time. Here, we improve the learning performance by leveraging prior knowledge; that is, the learner shows proper behavior from the beginning of a target task, using the knowledge from a set of known, previously solved, source tasks. In this paper, we argue that building stochastic abstract policies that generalize over past experiences is an effective way to provide such improvement and this generalization outperforms the current practice of using a library of policies. We achieve that contributing with a new algorithm, AbsProb-PI-multiple and a framework for transferring knowledge represented as a stochastic abstract policy in new RL tasks. Stochastic abstract policies offer an effective way to encode knowledge because the abstraction they provide not only generalizes solutions but also facilitates extracting the similarities among tasks. We perform experiments in a robotic navigation environment and analyze the agent's behavior throughout the learning process and also assess the transfer ratio for different amounts of source tasks. We compare our method with the transfer of a library of policies, and experiments show that the use of a generalized policy produces better results by more effectively guiding the agent when learning a target task.

  14. Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing

    OpenAIRE

    Le, Minh; Fokkens, Antske

    2017-01-01

    Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this paper, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error propagation. Reinforcement learning improves accuracy of both labeled and unlabeled dependencies of the Stanford Neural Dependency Parser, a high performance greedy parser, while maintaining its eff...

  15. 9th KES Conference on Agent and Multi-Agent Systems : Technologies and Applications

    CERN Document Server

    Howlett, Robert; Jain, Lakhmi

    2015-01-01

    Agents and multi-agent systems are related to a modern software paradigm which has long been recognized as a promising technology for constructing autonomous, complex and intelligent systems. The topics covered in this volume include agent-oriented software engineering, agent co-operation, co-ordination, negotiation, organization and communication, distributed problem solving, specification of agent communication languages, agent privacy, safety and security, formalization of ontologies and conversational agents. The volume highlights new trends and challenges in agent and multi-agent research and includes 38 papers classified in the following specific topics: learning paradigms, agent-based modeling and simulation, business model innovation and disruptive technologies, anthropic-oriented computing, serious games and business intelligence, design and implementation of intelligent agents and multi-agent systems, digital economy, and advances in networked virtual enterprises. Published p...

  16. Human reinforcement learning subdivides structured action spaces by learning effector-specific values.

    Science.gov (United States)

    Gershman, Samuel J; Pesaran, Bijan; Daw, Nathaniel D

    2009-10-28

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.

  17. Reinforcement Learning in Distributed Domains: Beyond Team Games

    Science.gov (United States)

    Wolpert, David H.; Sill, Joseph; Turner, Kagan

    2000-01-01

    Distributed search algorithms are crucial in dealing with large optimization problems, particularly when a centralized approach is not only impractical but infeasible. Many machine learning concepts have been applied to search algorithms in order to improve their effectiveness. In this article we present an algorithm that blends Reinforcement Learning (RL) and hill climbing directly, by using the RL signal to guide the exploration step of a hill climbing algorithm. We apply this algorithm to the domain of a constellations of communication satellites where the goal is to minimize the loss of importance weighted data. We introduce the concept of 'ghost' traffic, where correctly setting this traffic induces the satellites to act to optimize the world utility. Our results indicated that the bi-utility search introduced in this paper outperforms both traditional hill climbing algorithms and distributed RL approaches such as team games.

  18. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Science.gov (United States)

    Kato, Ayaka; Morita, Kenji

    2016-10-01

    It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2) value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning

  19. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Directory of Open Access Journals (Sweden)

    Ayaka Kato

    2016-10-01

    Full Text Available It has been suggested that dopamine (DA represents reward-prediction-error (RPE defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1 decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2 value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i slowdown of behavior by post-training blockade of DA signaling, (ii observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems

  20. Reinforcement learning for dpm of embedded visual sensor nodes

    International Nuclear Information System (INIS)

    Khani, U.; Sadhayo, I. H.

    2014-01-01

    This paper proposes a RL (Reinforcement Learning) based DPM (Dynamic Power Management) technique to learn time out policies during a visual sensor node's operation which has multiple power/performance states. As opposed to the widely used static time out policies, our proposed DPM policy which is also referred to as OLTP (Online Learning of Time out Policies), learns to dynamically change the time out decisions in the different node states including the non-operational states. The selection of time out values in different power/performance states of a visual sensing platform is based on the workload estimates derived from a ML-ANN (Multi-Layer Artificial Neural Network) and an objective function given by weighted performance and power parameters. The DPM approach is also able to dynamically adjust the power-performance weights online to satisfy a given constraint of either power consumption or performance. Results show that the proposed learning algorithm explores the power-performance tradeoff with non-stationary workload and outperforms other DPM policies. It also performs the online adjustment of the tradeoff parameters in order to meet a user-specified constraint. (author)

  1. Explicit and implicit reinforcement learning across the psychosis spectrum.

    Science.gov (United States)

    Barch, Deanna M; Carter, Cameron S; Gold, James M; Johnson, Sheri L; Kring, Ann M; MacDonald, Angus W; Pizzagalli, Diego A; Ragland, J Daniel; Silverstein, Steven M; Strauss, Milton E

    2017-07-01

    Motivational and hedonic impairments are core features of a variety of types of psychopathology. An important aspect of motivational function is reinforcement learning (RL), including implicit (i.e., outside of conscious awareness) and explicit (i.e., including explicit representations about potential reward associations) learning, as well as both positive reinforcement (learning about actions that lead to reward) and punishment (learning to avoid actions that lead to loss). Here we present data from paradigms designed to assess both positive and negative components of both implicit and explicit RL, examine performance on each of these tasks among individuals with schizophrenia, schizoaffective disorder, and bipolar disorder with psychosis, and examine their relative relationships to specific symptom domains transdiagnostically. None of the diagnostic groups differed significantly from controls on the implicit RL tasks in either bias toward a rewarded response or bias away from a punished response. However, on the explicit RL task, both the individuals with schizophrenia and schizoaffective disorder performed significantly worse than controls, but the individuals with bipolar did not. Worse performance on the explicit RL task, but not the implicit RL task, was related to worse motivation and pleasure symptoms across all diagnostic categories. Performance on explicit RL, but not implicit RL, was related to working memory, which accounted for some of the diagnostic group differences. However, working memory did not account for the relationship of explicit RL to motivation and pleasure symptoms. These findings suggest transdiagnostic relationships across the spectrum of psychotic disorders between motivation and pleasure impairments and explicit RL. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  2. Theories about architecture and performance of multi-agent systems

    NARCIS (Netherlands)

    Gazendam, Henk W.M.; Jorna, René J.

    1998-01-01

    Multi-agent systems are promising as models of organization because they are based on the idea that most work in human organizations is done based on intelligence, communication, cooperation, and massive parallel processing. They offer an alternative for system theories of organization, which are

  3. Multi-agent simulation of purchasing activities in organizations

    NARCIS (Netherlands)

    Ebben, Mark; de Boer, L.; Sitar-Pop, C.E.; Yucesan, E.; Chen, C.H.; Snowdon, J.L.; Charnes, J.M.

    2002-01-01

    In this paper we present a multi-agent simulation model to investigate purchasing activities in an organizational environment. The starting point is the observation that the majority of purchasing activities in organizations are usually performed without any involvement of the organization's

  4. PDL as a multi-agent strategy logic

    NARCIS (Netherlands)

    D.J.N. van Eijck (Jan); B.C. Schipper

    2013-01-01

    textabstractPropositional Dynamic Logic or PDL was invented as a logic for reasoning about regular programming constructs. We propose a new perspective on PDL as a multi-agent strategic logic (MASL). This logic for strategic reasoning has group strategies as first class citizens, and

  5. Information and Intertemporal Choices in Multi-Agent Decision Problems

    OpenAIRE

    Mariagrazia Olivieri; Massimo Squillante; Viviana Ventre

    2016-01-01

    Psychological evidences of impulsivity and false consensus effect lead results far from rationality. It is shown that impulsivitymodifies the discount function of each individual, and false consensus effect increases the degree of consensus in a multi-agent decision problem. Analyzing them together we note that in strategic interactions these two human factors involve choices which change equilibriums expected by rational individuals.

  6. Logics for Intelligent Agents and Multi-Agent Systems

    NARCIS (Netherlands)

    Meyer, John-Jules Charles

    2014-01-01

    This chapter presents the history of the application of logic in a quite popular paradigm in contemporary computer science and artificial intelligence, viz. the area of intelligent agents and multi-agent systems. In particular we discuss the logics that have been used to specify single agents, the

  7. Multi-agent Pareto appointment exchanging in hospital patient scheduling

    NARCIS (Netherlands)

    I.B. Vermeulen (Ivan); S.M. Bohte (Sander); D.J.A. Somefun (Koye); J.A. La Poutré (Han)

    2007-01-01

    htmlabstractWe present a dynamic and distributed approach to the hospital patient scheduling problem, in which patients can have multiple appointments that have to be scheduled to different resources. To efficiently solve this problem we develop a multi-agent Pareto-improvement appointment

  8. Multi-agent Pareto appointment exchanging in hospital patient scheduling

    NARCIS (Netherlands)

    Vermeulen, I.B.; Bohté, S.M.; Somefun, D.J.A.; Poutré, La J.A.

    2007-01-01

    We present a dynamic and distributed approach to the hospital patient scheduling problem, in which patients can have multiple appointments that have to be scheduled to different resources. To efficiently solve this problem we develop a multi-agent Pareto-improvement appointment exchanging algorithm:

  9. Multi-agent Cooperation in a Planning Framework

    NARCIS (Netherlands)

    De Weerdt, M.M.; Bos, A.; Tonino, J.F.M.; Witteveen, C.

    2000-01-01

    The promise of multi-agent systems is that multiple agents can solve problems more efficiently than single agents can. In this paper we propose a method to implement cooperation between agents in the planning phase, in order to achive more cost-effective solutions than without cooperation. Two

  10. A Resource Logic for Multi-Agent Plan Merging

    NARCIS (Netherlands)

    De Weerdt, M.M.; Bos, A.; Tonino, H.; Witteveen, C.

    2003-01-01

    In a multi-agent system, agents are carrying out certain tasks by executing plans. Consequently, the problem of finding a plan, given a certain goal, has been given a lot of attention in the literature. Instead of concentrating on this problem, the focus of this paper is on cooperation between

  11. Cooperative Epistemic Multi-Agent Planning With Implicit Coordination

    DEFF Research Database (Denmark)

    Engesser, Thorsten; Bolander, Thomas; Mattmüller, Robert

    2015-01-01

    , meaning coordination is only allowed implicitly by means of the available epistemic actions. While this approach can be fruitfully applied to model reasoning in some simple social situations, we also provide some benchmark applications to show that the concept is useful for multi-agent systems in practice....

  12. MATT: Multi Agents Testing Tool Based Nets within Nets

    Directory of Open Access Journals (Sweden)

    Sara Kerraoui

    2016-12-01

    As part of this effort, we propose a model based testing approach for multi agent systems based on such a model called Reference net, where a tool, which aims to providing a uniform and automated approach is developed. The feasibility and the advantage of the proposed approach are shown through a short case study.

  13. Normative multi-agent programs and their logics

    NARCIS (Netherlands)

    Dastani, M.; Grossi, D.; Meyer, J.-J.C.; Tinnemeier, N.

    2009-01-01

    Multi-agent systems are viewed as consisting of individual agents whose behaviors are regulated by an organization artefact. This paper presents a simplified version of a programming language that is designed to implement norm-based artefacts. Such artefacts are specified in terms of norms being

  14. Implementing a Multi-Agent System in Python

    DEFF Research Database (Denmark)

    Ettienne, Mikko Berggren; Vester, Steen; Villadsen, Jørgen

    2012-01-01

    We describe the solution used by the Python-DTU team in the Multi-Agent Programming Contest 2011, where the scenario was called Agents on Mars. We present our auction-based agreement, area controlling and pathfinding algorithms and discuss our chosen strategy and our choice of technology used...

  15. Reimplementing a Multi-Agent System in Python

    DEFF Research Database (Denmark)

    Villadsen, Jørgen; Jensen, Andreas Schmidt; Ettienne, Mikko Berggren

    2012-01-01

    We provide a brief description of our Python-DTU system, including the overall design, the tools and the algorithms that we used in the Multi-Agent Programming Contest 2012, where the scenario was called Agents on Mars like in 2011. Our solution is an improvement of our Python-DTU system from last...

  16. Reimplementing a Multi-Agent System in Python

    DEFF Research Database (Denmark)

    Villadsen, Jørgen; Jensen, Andreas Schmidt; Ettienne, Mikko Berggren

    2013-01-01

    We provide a brief description of our Python-DTU system, including the overall design, the tools and the algorithms that we used in the Multi-Agent Programming Contest 2012, where the scenario was called Agents on Mars like in 2011. Our solution is an improvement of our Python-DTU system from last...

  17. Characterizing Reinforcement Learning Methods through Parameterized Learning Problems

    Science.gov (United States)

    2011-06-03

    extraneous. The agent could potentially adapt these representational aspects by applying methods from feature selection ( Kolter and Ng, 2009; Petrik et al...611–616. AAAI Press. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature selection in least-squares temporal difference learning. In A. P

  18. Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.

    Science.gov (United States)

    Ren, Zhipeng; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Zhipeng Ren; Daoyi Dong; Huaxiong Li; Chunlin Chen; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Ren, Zhipeng

    2018-06-01

    In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.

  19. Deep reinforcement learning for automated radiation adaptation in lung cancer.

    Science.gov (United States)

    Tseng, Huan-Hsin; Luo, Yi; Cui, Sunan; Chien, Jen-Tzung; Ten Haken, Randall K; Naqa, Issam El

    2017-12-01

    To investigate deep reinforcement learning (DRL) based on historical treatment plans for developing automated radiation adaptation protocols for nonsmall cell lung cancer (NSCLC) patients that aim to maximize tumor local control at reduced rates of radiation pneumonitis grade 2 (RP2). In a retrospective population of 114 NSCLC patients who received radiotherapy, a three-component neural networks framework was developed for deep reinforcement learning (DRL) of dose fractionation adaptation. Large-scale patient characteristics included clinical, genetic, and imaging radiomics features in addition to tumor and lung dosimetric variables. First, a generative adversarial network (GAN) was employed to learn patient population characteristics necessary for DRL training from a relatively limited sample size. Second, a radiotherapy artificial environment (RAE) was reconstructed by a deep neural network (DNN) utilizing both original and synthetic data (by GAN) to estimate the transition probabilities for adaptation of personalized radiotherapy patients' treatment courses. Third, a deep Q-network (DQN) was applied to the RAE for choosing the optimal dose in a response-adapted treatment setting. This multicomponent reinforcement learning approach was benchmarked against real clinical decisions that were applied in an adaptive dose escalation clinical protocol. In which, 34 patients were treated based on avid PET signal in the tumor and constrained by a 17.2% normal tissue complication probability (NTCP) limit for RP2. The uncomplicated cure probability (P+) was used as a baseline reward function in the DRL. Taking our adaptive dose escalation protocol as a blueprint for the proposed DRL (GAN + RAE + DQN) architecture, we obtained an automated dose adaptation estimate for use at ∼2/3 of the way into the radiotherapy treatment course. By letting the DQN component freely control the estimated adaptive dose per fraction (ranging from 1-5 Gy), the DRL automatically favored dose

  20. Reinforcement Learning for Routing in Cognitive Radio Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    Hasan A. A. Al-Rawi

    2014-01-01

    Full Text Available Cognitive radio (CR enables unlicensed users (or secondary users, SUs to sense for and exploit underutilized licensed spectrum owned by the licensed users (or primary users, PUs. Reinforcement learning (RL is an artificial intelligence approach that enables a node to observe, learn, and make appropriate decisions on action selection in order to maximize network performance. Routing enables a source node to search for a least-cost route to its destination node. While there have been increasing efforts to enhance the traditional RL approach for routing in wireless networks, this research area remains largely unexplored in the domain of routing in CR networks. This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate through simulation. New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing. Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs’ network performance without significantly jeopardizing PUs’ network performance, specifically SUs’ interference to PUs.

  1. Grounding the meanings in sensorimotor behavior using reinforcement learning

    Directory of Open Access Journals (Sweden)

    Igor eFarkaš

    2012-02-01

    Full Text Available The recent outburst of interest in cognitive developmental robotics is fueled by the ambition to propose ecologically plausible mechanisms of how, among other things, a learning agent/robot could ground linguistic meanings in its sensorimotor behaviour. Along this stream, we propose a model that allows the simulated iCub robot to learn the meanings of actions (point, touch and push oriented towards objects in robot's peripersonal space. In our experiments, the iCub learns to execute motor actions and comment on them. Architecturally, the model is composed of three neural-network-based modules that are trained in different ways. The first module, a two-layer perceptron, is trained by back-propagation to attend to the target position in the visual scene, given the low-level visual information and the feature-based target information. The second module, having the form of an actor-critic architecture, is the most distinguishing part of our model, and is trained by a continuous version of reinforcement learning to execute actions as sequences, based on a linguistic command. The third module, an echo-state network, is trained to provide the linguistic description of the executed actions. The trained model generalises well in case of novel action-target combinations with randomised initial arm positions. It can also promptly adapt its behavior if the action/target suddenly changes during motor execution.

  2. Learning User Preferences in Ubiquitous Systems: A User Study and a Reinforcement Learning Approach

    OpenAIRE

    Zaidenberg , Sofia; Reignier , Patrick; Mandran , Nadine

    2010-01-01

    International audience; Our study concerns a virtual assistant, proposing services to the user based on its current perceived activity and situation (ambient intelligence). Instead of asking the user to define his preferences, we acquire them automatically using a reinforcement learning approach. Experiments showed that our system succeeded the learning of user preferences. In order to validate the relevance and usability of such a system, we have first conducted a user study. 26 non-expert s...

  3. Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments

    OpenAIRE

    Kidziński, Łukasz; Mohanty, Sharada Prasanna; Ong, Carmichael; Huang, Zhewei; Zhou, Shuchang; Pechenko, Anton; Stelmaszczyk, Adam; Jarosik, Piotr; Pavlov, Mikhail; Kolesnikov, Sergey; Plis, Sergey; Chen, Zhibo; Zhang, Zhizheng; Chen, Jiale; Shi, Jun

    2018-01-01

    In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar ...

  4. Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing

    NARCIS (Netherlands)

    Le, M.N.; Fokkens, A.S.

    Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this paper, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error

  5. Working Memory and Reinforcement Schedule Jointly Determine Reinforcement Learning in Children: Potential Implications for Behavioral Parent Training

    Directory of Open Access Journals (Sweden)

    Elien Segers

    2018-03-01

    Full Text Available Introduction: Behavioral Parent Training (BPT is often provided for childhood psychiatric disorders. These disorders have been shown to be associated with working memory impairments. BPT is based on operant learning principles, yet how operant principles shape behavior (through the partial reinforcement (PRF extinction effect, i.e., greater resistance to extinction that is created when behavior is reinforced partially rather than continuously and the potential role of working memory therein is scarcely studied in children. This study explored the PRF extinction effect and the role of working memory therein using experimental tasks in typically developing children.Methods: Ninety-seven children (age 6–10 completed a working memory task and an operant learning task, in which children acquired a response-sequence rule under either continuous or PRF (120 trials, followed by an extinction phase (80 trials. Data of 88 children were used for analysis.Results: The PRF extinction effect was confirmed: We observed slower acquisition and extinction in the PRF condition as compared to the continuous reinforcement (CRF condition. Working memory was negatively related to acquisition but not extinction performance.Conclusion: Both reinforcement contingencies and working memory relate to acquisition performance. Potential implications for BPT are that decreasing working memory load may enhance the chance of optimally learning through reinforcement.

  6. Curiosity driven reinforcement learning for motion planning on humanoids

    Science.gov (United States)

    Frank, Mikhail; Leitner, Jürgen; Stollenga, Marijn; Förster, Alexander; Schmidhuber, Jürgen

    2014-01-01

    Most previous work on artificial curiosity (AC) and intrinsic motivation focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study AC in a more realistic setting, we embody a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning (RL) framework consists of a state-of-the-art, low-level, reactive control layer, which controls the iCub while respecting constraints, and a high-level curious agent, which explores the iCub's state-action space through information gain maximization, learning a world model from experience, controlling the actual iCub hardware in real-time. To the best of our knowledge, this is the first ever embodied, curious agent for real-time motion planning on a humanoid. We demonstrate that it can learn compact Markov models to represent large regions of the iCub's configuration space, and that the iCub explores intelligently, showing interest in its physical constraints as well as in objects it finds in its environment. PMID:24432001

  7. How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

    Science.gov (United States)

    Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C

    2014-03-01

    Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward

  8. Memory Transformation Enhances Reinforcement Learning in Dynamic Environments.

    Science.gov (United States)

    Santoro, Adam; Frankland, Paul W; Richards, Blake A

    2016-11-30

    Over the course of systems consolidation, there is a switch from a reliance on detailed episodic memories to generalized schematic memories. This switch is sometimes referred to as "memory transformation." Here we demonstrate a previously unappreciated benefit of memory transformation, namely, its ability to enhance reinforcement learning in a dynamic environment. We developed a neural network that is trained to find rewards in a foraging task where reward locations are continuously changing. The network can use memories for specific locations (episodic memories) and statistical patterns of locations (schematic memories) to guide its search. We find that switching from an episodic to a schematic strategy over time leads to enhanced performance due to the tendency for the reward location to be highly correlated with itself in the short-term, but regress to a stable distribution in the long-term. We also show that the statistics of the environment determine the optimal utilization of both types of memory. Our work recasts the theoretical question of why memory transformation occurs, shifting the focus from the avoidance of memory interference toward the enhancement of reinforcement learning across multiple timescales. As time passes, memories transform from a highly detailed state to a more gist-like state, in a process called "memory transformation." Theories of memory transformation speak to its advantages in terms of reducing memory interference, increasing memory robustness, and building models of the environment. However, the role of memory transformation from the perspective of an agent that continuously acts and receives reward in its environment is not well explored. In this work, we demonstrate a view of memory transformation that defines it as a way of optimizing behavior across multiple timescales. Copyright © 2016 the authors 0270-6474/16/3612228-15$15.00/0.

  9. Semi-Cooperative Learning in Smart Grid Agents

    Science.gov (United States)

    2013-12-01

    this PhD program , but watching you grow has only made me realize how much more awesome human learning is. You have been a source of profound joy and...which should alleviate concern for scala - bility along this dimension. • Learning the negotiation model: Figure 6.23 shows single-episode results that...for Semi-cooperative Multi-agent Coordination. In IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning . [Prendergast, 1999

  10. Service orientation in holonic and multi agent manufacturing and robotics

    CERN Document Server

    Thomas, Andre; Trentesaux, Damien

    2013-01-01

    The book covers four research domains representing a trend for modern manufacturing control: Holonic and Multi-agent technologies for industrial systems; Intelligent Product and Product-driven Automation; Service Orientation of Enterprise’s strategic and technical processes; and Distributed Intelligent Automation Systems. These evolution lines have in common concepts related to service orientation derived from the Service Oriented Architecture (SOA) paradigm.     The service-oriented multi-agent systems approach discussed in the book is characterized by the use of a set of distributed autonomous and cooperative agents, embedded in smart components that use the SOA principles, being oriented by offer and request of services, in order to fulfil production systems and value chain goals.   A new integrated vision combining emergent technologies is offered, to create control structures with distributed intelligence supporting the vertical and horizontal enterprise integration and running in truly distributed ...

  11. A Multi-Agent System for Tracking the Intent of Surface Contacts in Ports and Waterways

    National Research Council Canada - National Science Library

    Tan, Kok S

    2005-01-01

    ...) and employ them to identify asymmetric maritime threats in port and waterways. Each surface track is monitored by a compound multi-agent system that comprise of the several intent models, each containing a nested multi-agent system...

  12. Specification of Behavioural Requirements within Compositional Multi-Agent System Design

    OpenAIRE

    Herlea, D.E.; Jonker, C.M.; Treur, J.; Wijngaards, N.J.E.

    1999-01-01

    In this paper it is shown how informal and formal specification of behavioural requirements and scenarios for agents and multi-agent systems can be integrated within multi-agent system design. In particular, it is addressed how a compositional

  13. Anticipatory vehicle routing using delegate multi-agent systems

    OpenAIRE

    Weyns, Danny; Holvoet, Tom; Helleboogh, Alexander

    2007-01-01

    This paper presents an agent-based approach, called delegate multi-agent systems, for anticipatory vehicle routing to avoid traffic congestion. In this approach, individual vehicles are represented by agents, which themselves issue light-weight agents that explore alternative routes in the environment on behalf of the vehicles. Based on the evaluation of the alternatives, the vehicles then issue light-weight agents for allocating road segments, spreading the vehicles’ intentions and coordi...

  14. Biomorphic Multi-Agent Architecture for Persistent Computing

    Science.gov (United States)

    Lodding, Kenneth N.; Brewster, Paul

    2009-01-01

    A multi-agent software/hardware architecture, inspired by the multicellular nature of living organisms, has been proposed as the basis of design of a robust, reliable, persistent computing system. Just as a multicellular organism can adapt to changing environmental conditions and can survive despite the failure of individual cells, a multi-agent computing system, as envisioned, could adapt to changing hardware, software, and environmental conditions. In particular, the computing system could continue to function (perhaps at a reduced but still reasonable level of performance) if one or more component( s) of the system were to fail. One of the defining characteristics of a multicellular organism is unity of purpose. In biology, the purpose is survival of the organism. The purpose of the proposed multi-agent architecture is to provide a persistent computing environment in harsh conditions in which repair is difficult or impossible. A multi-agent, organism-like computing system would be a single entity built from agents or cells. Each agent or cell would be a discrete hardware processing unit that would include a data processor with local memory, an internal clock, and a suite of communication equipment capable of both local line-of-sight communications and global broadcast communications. Some cells, denoted specialist cells, could contain such additional hardware as sensors and emitters. Each cell would be independent in the sense that there would be no global clock, no global (shared) memory, no pre-assigned cell identifiers, no pre-defined network topology, and no centralized brain or control structure. Like each cell in a living organism, each agent or cell of the computing system would contain a full description of the system encoded as genes, but in this case, the genes would be components of a software genome.

  15. Negotiation and argumentation in multi-agent systems

    CERN Document Server

    Lopes, Fernando

    2014-01-01

    Multi-agent systems (MAS) composed of autonomous agents representing individuals or organizations and capable of reaching mutually beneficial agreements through negotiation and argumentation are becoming increasingly important and pervasive.Research on both automated negotiation and argumentation in MAS has a vigorous, exciting tradition. However, efforts to integrate both areas have received only selective attention in the academia and the practitioner literature. A symbiotic relationship could significantly strengthen each area's progress and trigger new R&D challenges and prospects toward t

  16. GA-based fuzzy reinforcement learning for control of a magnetic bearing system.

    Science.gov (United States)

    Lin, C T; Jou, C P

    2000-01-01

    This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.

  17. Reinforcement learning techniques for controlling resources in power networks

    Science.gov (United States)

    Kowli, Anupama Sunil

    As power grids transition towards increased reliance on renewable generation, energy storage and demand response resources, an effective control architecture is required to harness the full functionalities of these resources. There is a critical need for control techniques that recognize the unique characteristics of the different resources and exploit the flexibility afforded by them to provide ancillary services to the grid. The work presented in this dissertation addresses these needs. Specifically, new algorithms are proposed, which allow control synthesis in settings wherein the precise distribution of the uncertainty and its temporal statistics are not known. These algorithms are based on recent developments in Markov decision theory, approximate dynamic programming and reinforcement learning. They impose minimal assumptions on the system model and allow the control to be "learned" based on the actual dynamics of the system. Furthermore, they can accommodate complex constraints such as capacity and ramping limits on generation resources, state-of-charge constraints on storage resources, comfort-related limitations on demand response resources and power flow limits on transmission lines. Numerical studies demonstrating applications of these algorithms to practical control problems in power systems are discussed. Results demonstrate how the proposed control algorithms can be used to improve the performance and reduce the computational complexity of the economic dispatch mechanism in a power network. We argue that the proposed algorithms are eminently suitable to develop operational decision-making tools for large power grids with many resources and many sources of uncertainty.

  18. A Reinforcement Learning Framework for Spiking Networks with Dynamic Synapses

    Directory of Open Access Journals (Sweden)

    Karim El-Laithy

    2011-01-01

    Full Text Available An integration of both the Hebbian-based and reinforcement learning (RL rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.

  19. A reinforcement learning model of joy, distress, hope and fear

    Science.gov (United States)

    Broekens, Joost; Jacobs, Elmer; Jonker, Catholijn M.

    2015-07-01

    In this paper we computationally study the relation between adaptive behaviour and emotion. Using the reinforcement learning framework, we propose that learned state utility, ?, models fear (negative) and hope (positive) based on the fact that both signals are about anticipation of loss or gain. Further, we propose that joy/distress is a signal similar to the error signal. We present agent-based simulation experiments that show that this model replicates psychological and behavioural dynamics of emotion. This work distinguishes itself by assessing the dynamics of emotion in an adaptive agent framework - coupling it to the literature on habituation, development, extinction and hope theory. Our results support the idea that the function of emotion is to provide a complex feedback signal for an organism to adapt its behaviour. Our work is relevant for understanding the relation between emotion and adaptation in animals, as well as for human-robot interaction, in particular how emotional signals can be used to communicate between adaptive agents and humans.

  20. Multi-agent robotic systems and applications for satellite missions

    Science.gov (United States)

    Nunes, Miguel A.

    A revolution in the space sector is happening. It is expected that in the next decade there will be more satellites launched than in the previous sixty years of space exploration. Major challenges are associated with this growth of space assets such as the autonomy and management of large groups of satellites, in particular with small satellites. There are two main objectives for this work. First, a flexible and distributed software architecture is presented to expand the possibilities of spacecraft autonomy and in particular autonomous motion in attitude and position. The approach taken is based on the concept of distributed software agents, also referred to as multi-agent robotic system. Agents are defined as software programs that are social, reactive and proactive to autonomously maximize the chances of achieving the set goals. Part of the work is to demonstrate that a multi-agent robotic system is a feasible approach for different problems of autonomy such as satellite attitude determination and control and autonomous rendezvous and docking. The second main objective is to develop a method to optimize multi-satellite configurations in space, also known as satellite constellations. This automated method generates new optimal mega-constellations designs for Earth observations and fast revisit times on large ground areas. The optimal satellite constellation can be used by researchers as the baseline for new missions. The first contribution of this work is the development of a new multi-agent robotic system for distributing the attitude determination and control subsystem for HiakaSat. The multi-agent robotic system is implemented and tested on the satellite hardware-in-the-loop testbed that simulates a representative space environment. The results show that the newly proposed system for this particular case achieves an equivalent control performance when compared to the monolithic implementation. In terms on computational efficiency it is found that the multi-agent

  1. Reinforcement Learning for Predictive Analytics in Smart Cities

    Directory of Open Access Journals (Sweden)

    Kostas Kolomvatsos

    2017-06-01

    Full Text Available The digitization of our lives cause a shift in the data production as well as in the required data management. Numerous nodes are capable of producing huge volumes of data in our everyday activities. Sensors, personal smart devices as well as the Internet of Things (IoT paradigm lead to a vast infrastructure that covers all the aspects of activities in modern societies. In the most of the cases, the critical issue for public authorities (usually, local, like municipalities is the efficient management of data towards the support of novel services. The reason is that analytics provided on top of the collected data could help in the delivery of new applications that will facilitate citizens’ lives. However, the provision of analytics demands intelligent techniques for the underlying data management. The most known technique is the separation of huge volumes of data into a number of parts and their parallel management to limit the required time for the delivery of analytics. Afterwards, analytics requests in the form of queries could be realized and derive the necessary knowledge for supporting intelligent applications. In this paper, we define the concept of a Query Controller ( Q C that receives queries for analytics and assigns each of them to a processor placed in front of each data partition. We discuss an intelligent process for query assignments that adopts Machine Learning (ML. We adopt two learning schemes, i.e., Reinforcement Learning (RL and clustering. We report on the comparison of the two schemes and elaborate on their combination. Our aim is to provide an efficient framework to support the decision making of the QC that should swiftly select the appropriate processor for each query. We provide mathematical formulations for the discussed problem and present simulation results. Through a comprehensive experimental evaluation, we reveal the advantages of the proposed models and describe the outcomes results while comparing them with a

  2. Longitudinal investigation on learned helplessness tested under negative and positive reinforcement involving stimulus control.

    Science.gov (United States)

    Oliveira, Emileane C; Hunziker, Maria Helena

    2014-07-01

    In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.

  3. Reinforcement Learning Based Novel Adaptive Learning Framework for Smart Grid Prediction

    Directory of Open Access Journals (Sweden)

    Tian Li

    2017-01-01

    Full Text Available Smart grid is a potential infrastructure to supply electricity demand for end users in a safe and reliable manner. With the rapid increase of the share of renewable energy and controllable loads in smart grid, the operation uncertainty of smart grid has increased briskly during recent years. The forecast is responsible for the safety and economic operation of the smart grid. However, most existing forecast methods cannot account for the smart grid due to the disabilities to adapt to the varying operational conditions. In this paper, reinforcement learning is firstly exploited to develop an online learning framework for the smart grid. With the capability of multitime scale resolution, wavelet neural network has been adopted in the online learning framework to yield reinforcement learning and wavelet neural network (RLWNN based adaptive learning scheme. The simulations on two typical prediction problems in smart grid, including wind power prediction and load forecast, validate the effectiveness and the scalability of the proposed RLWNN based learning framework and algorithm.

  4. The combination of appetitive and aversive reinforcers and the nature of their interaction during auditory learning.

    Science.gov (United States)

    Ilango, A; Wetzel, W; Scheich, H; Ohl, F W

    2010-03-31

    Learned changes in behavior can be elicited by either appetitive or aversive reinforcers. It is, however, not clear whether the two types of motivation, (approaching appetitive stimuli and avoiding aversive stimuli) drive learning in the same or different ways, nor is their interaction understood in situations where the two types are combined in a single experiment. To investigate this question we have developed a novel learning paradigm for Mongolian gerbils, which not only allows rewards and punishments to be presented in isolation or in combination with each other, but also can use these opposite reinforcers to drive the same learned behavior. Specifically, we studied learning of tone-conditioned hurdle crossing in a shuttle box driven by either an appetitive reinforcer (brain stimulation reward) or an aversive reinforcer (electrical footshock), or by a combination of both. Combination of the two reinforcers potentiated speed of acquisition, led to maximum possible performance, and delayed extinction as compared to either reinforcer alone. Additional experiments, using partial reinforcement protocols and experiments in which one of the reinforcers was omitted after the animals had been previously trained with the combination of both reinforcers, indicated that appetitive and aversive reinforcers operated together but acted in different ways: in this particular experimental context, punishment appeared to be more effective for initial acquisition and reward more effective to maintain a high level of conditioned responses (CRs). The results imply that learning mechanisms in problem solving were maximally effective when the initial punishment of mistakes was combined with the subsequent rewarding of correct performance. Copyright 2010 IBRO. Published by Elsevier Ltd. All rights reserved.

  5. Dopamine-Dependent Reinforcement of Motor Skill Learning: Evidence from Gilles de la Tourette Syndrome

    Science.gov (United States)

    Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias

    2011-01-01

    Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…

  6. Efficient collective swimming by harnessing vortices through deep reinforcement learning.

    Science.gov (United States)

    Verma, Siddhartha; Novati, Guido; Koumoutsakos, Petros

    2018-06-05

    Fish in schooling formations navigate complex flow fields replete with mechanical energy in the vortex wakes of their companions. Their schooling behavior has been associated with evolutionary advantages including energy savings, yet the underlying physical mechanisms remain unknown. We show that fish can improve their sustained propulsive efficiency by placing themselves in appropriate locations in the wake of other swimmers and intercepting judiciously their shed vortices. This swimming strategy leads to collective energy savings and is revealed through a combination of high-fidelity flow simulations with a deep reinforcement learning (RL) algorithm. The RL algorithm relies on a policy defined by deep, recurrent neural nets, with long-short-term memory cells, that are essential for capturing the unsteadiness of the two-way interactions between the fish and the vortical flow field. Surprisingly, we find that swimming in-line with a leader is not associated with energetic benefits for the follower. Instead, "smart swimmer(s)" place themselves at off-center positions, with respect to the axis of the leader(s) and deform their body to synchronize with the momentum of the oncoming vortices, thus enhancing their swimming efficiency at no cost to the leader(s). The results confirm that fish may harvest energy deposited in vortices and support the conjecture that swimming in formation is energetically advantageous. Moreover, this study demonstrates that deep RL can produce navigation algorithms for complex unsteady and vortical flow fields, with promising implications for energy savings in autonomous robotic swarms.

  7. Off-policy reinforcement learning for H∞ control design.

    Science.gov (United States)

    Luo, Biao; Wu, Huai-Ning; Huang, Tingwen

    2015-01-01

    The H∞ control design problem is considered for nonlinear systems with unknown internal system model. It is known that the nonlinear H∞ control problem can be transformed into solving the so-called Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, model-based approaches cannot be used for approximately solving HJI equation, when the accurate system model is unavailable or costly to obtain in practice. To overcome these difficulties, an off-policy reinforcement leaning (RL) method is introduced to learn the solution of HJI equation from real system data instead of mathematical system model, and its convergence is proved. In the off-policy RL method, the system data can be generated with arbitrary policies rather than the evaluating policy, which is extremely important and promising for practical systems. For implementation purpose, a neural network (NN)-based actor-critic structure is employed and a least-square NN weight update algorithm is derived based on the method of weighted residuals. Finally, the developed NN-based off-policy RL method is tested on a linear F16 aircraft plant, and further applied to a rotational/translational actuator system.

  8. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning

    Directory of Open Access Journals (Sweden)

    Yuntian Feng

    2017-01-01

    Full Text Available We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  9. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning.

    Science.gov (United States)

    Feng, Yuntian; Zhang, Hongjun; Hao, Wenning; Chen, Gang

    2017-01-01

    We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q -Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  10. Reviewing Microgrids from a Multi-Agent Systems Perspective

    Directory of Open Access Journals (Sweden)

    Jorge J. Gomez-Sanz

    2014-05-01

    Full Text Available The construction of Smart Grids leads to the main question of what kind of intelligence such grids require and how to build it. Some authors choose an agent based solution to realize this intelligence. However, there may be some misunderstandings in the way this technology is being applied. This paper exposes some considerations of this subject, focusing on the Microgrid level, and shows a practical example through INGENIAS methodology, which is a methodology for the development of Agent Oriented systems that applies Model Driven Development techniques to produce fully functional Multi-Agent Systems.

  11. Teamwork in Multi-Agent Systems A Formal Approach

    CERN Document Server

    Dunin-Keplicz, Barbara Maria

    2010-01-01

    What makes teamwork tick?. Cooperation matters, in daily life and in complex applications. After all, many tasks need more than a single agent to be effectively performed. Therefore, teamwork rules!. Teams are social groups of agents dedicated to the fulfilment of particular persistent tasks. In modern multiagent environments, heterogeneous teams often consist of autonomous software agents, various types of robots and human beings. Teamwork in Multi-agent Systems: A Formal Approach explains teamwork rules in terms of agents' attitudes and their complex interplay. It provides the first comprehe

  12. Cooperative epistemic multi-agent planning for implicit coordination

    DEFF Research Database (Denmark)

    Engesser, Thorsten; Bolander, Thomas; Mattmüller, Robert

    2017-01-01

    framework to include perspective shifts, allowing us to define new notions of sequential and conditional planning with implicit coordination. With these, it is possible to solve planning tasks with joint goals in a decentralized manner without the agents having to negotiate about and commit to a joint...... policy at plan time. First we define the central planning notions and sketch the implementation of a planning system built on those notions. Afterwards we provide some case studies in order to evaluate the planner empirically and to show that the concept is useful for multi-agent systems in practice....

  13. Knowledge-Based Reinforcement Learning for Data Mining

    Science.gov (United States)

    Kudenko, Daniel; Grzes, Marek

    Data Mining is the process of extracting patterns from data. Two general avenues of research in the intersecting areas of agents and data mining can be distinguished. The first approach is concerned with mining an agent’s observation data in order to extract patterns, categorize environment states, and/or make predictions of future states. In this setting, data is normally available as a batch, and the agent’s actions and goals are often independent of the data mining task. The data collection is mainly considered as a side effect of the agent’s activities. Machine learning techniques applied in such situations fall into the class of supervised learning. In contrast, the second scenario occurs where an agent is actively performing the data mining, and is responsible for the data collection itself. For example, a mobile network agent is acquiring and processing data (where the acquisition may incur a certain cost), or a mobile sensor agent is moving in a (perhaps hostile) environment, collecting and processing sensor readings. In these settings, the tasks of the agent and the data mining are highly intertwined and interdependent (or even identical). Supervised learning is not a suitable technique for these cases. Reinforcement Learning (RL) enables an agent to learn from experience (in form of reward and punishment for explorative actions) and adapt to new situations, without a teacher. RL is an ideal learning technique for these data mining scenarios, because it fits the agent paradigm of continuous sensing and acting, and the RL agent is able to learn to make decisions on the sampling of the environment which provides the data. Nevertheless, RL still suffers from scalability problems, which have prevented its successful use in many complex real-world domains. The more complex the tasks, the longer it takes a reinforcement learning algorithm to converge to a good solution. For many real-world tasks, human expert knowledge is available. For example, human

  14. Exploring complex dynamics in multi agent-based intelligent systems: Theoretical and experimental approaches using the Multi Agent-based Behavioral Economic Landscape (MABEL) model

    Science.gov (United States)

    Alexandridis, Konstantinos T.

    This dissertation adopts a holistic and detailed approach to modeling spatially explicit agent-based artificial intelligent systems, using the Multi Agent-based Behavioral Economic Landscape (MABEL) model. The research questions that addresses stem from the need to understand and analyze the real-world patterns and dynamics of land use change from a coupled human-environmental systems perspective. Describes the systemic, mathematical, statistical, socio-economic and spatial dynamics of the MABEL modeling framework, and provides a wide array of cross-disciplinary modeling applications within the research, decision-making and policy domains. Establishes the symbolic properties of the MABEL model as a Markov decision process, analyzes the decision-theoretic utility and optimization attributes of agents towards comprising statistically and spatially optimal policies and actions, and explores the probabilogic character of the agents' decision-making and inference mechanisms via the use of Bayesian belief and decision networks. Develops and describes a Monte Carlo methodology for experimental replications of agent's decisions regarding complex spatial parcel acquisition and learning. Recognizes the gap on spatially-explicit accuracy assessment techniques for complex spatial models, and proposes an ensemble of statistical tools designed to address this problem. Advanced information assessment techniques such as the Receiver-Operator Characteristic curve, the impurity entropy and Gini functions, and the Bayesian classification functions are proposed. The theoretical foundation for modular Bayesian inference in spatially-explicit multi-agent artificial intelligent systems, and the ensembles of cognitive and scenario assessment modular tools build for the MABEL model are provided. Emphasizes the modularity and robustness as valuable qualitative modeling attributes, and examines the role of robust intelligent modeling as a tool for improving policy-decisions related to land

  15. Nonparametric bayesian reward segmentation for skill discovery using inverse reinforcement learning

    CSIR Research Space (South Africa)

    Ranchod, P

    2015-10-01

    Full Text Available We present a method for segmenting a set of unstructured demonstration trajectories to discover reusable skills using inverse reinforcement learning (IRL). Each skill is characterised by a latent reward function which the demonstrator is assumed...

  16. Perception-based Co-evolutionary Reinforcement Learning for UAV Sensor Allocation

    National Research Council Canada - National Science Library

    Berenji, Hamid

    2003-01-01

    .... A Perception-based reasoning approach based on co-evolutionary reinforcement learning was developed for jointly addressing sensor allocation on each individual UAV and allocation of a team of UAVs...

  17. Applying reinforcement learning to the weapon assignment problem in air defence

    CSIR Research Space (South Africa)

    Mouton, H

    2011-12-01

    Full Text Available . The techniques investigated in this article were two methods from the machine-learning subfield of reinforcement learning (RL), namely a Monte Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy temporal-difference (TD) learning...

  18. Modeling and simulation of virtual human's coordination based on multi-agent systems

    Science.gov (United States)

    Zhang, Mei; Wen, Jing-Hua; Zhang, Zu-Xuan; Zhang, Jian-Qing

    2006-10-01

    The difficulties and hotspots researched in current virtual geographic environment (VGE) are sharing space and multiusers operation, distributed coordination and group decision-making. The theories and technologies of MAS provide a brand-new environment for analysis, design and realization of distributed opening system. This paper takes cooperation among virtual human in VGE which multi-user participate in as main researched object. First we describe theory foundation truss of VGE, and present the formalization description of Multi-Agent System (MAS). Then we detailed analyze and research arithmetic of collectivity operating behavior learning of virtual human based on best held Genetic Algorithm(GA), and establish dynamics action model which Multi-Agents and object interact dynamically and colony movement strategy. Finally we design a example which shows how 3 evolutional Agents cooperate to complete the task of colony pushing column box, and design a virtual world prototype of virtual human pushing box collectively based on V-Realm Builder 2.0, moreover we make modeling and dynamic simulation with Simulink 6.

  19. A Multi Agent Based Approach for Prehospital Emergency Management.

    Science.gov (United States)

    Safdari, Reza; Shoshtarian Malak, Jaleh; Mohammadzadeh, Niloofar; Danesh Shahraki, Azimeh

    2017-07-01

    To demonstrate an architecture to automate the prehospital emergency process to categorize the specialized care according to the situation at the right time for reducing the patient mortality and morbidity. Prehospital emergency process were analyzed using existing prehospital management systems, frameworks and the extracted process were modeled using sequence diagram in Rational Rose software. System main agents were identified and modeled via component diagram, considering the main system actors and by logically dividing business functionalities, finally the conceptual architecture for prehospital emergency management was proposed. The proposed architecture was simulated using Anylogic simulation software. Anylogic Agent Model, State Chart and Process Model were used to model the system. Multi agent systems (MAS) had a great success in distributed, complex and dynamic problem solving environments, and utilizing autonomous agents provides intelligent decision making capabilities.  The proposed architecture presents prehospital management operations. The main identified agents are: EMS Center, Ambulance, Traffic Station, Healthcare Provider, Patient, Consultation Center, National Medical Record System and quality of service monitoring agent. In a critical condition like prehospital emergency we are coping with sophisticated processes like ambulance navigation health care provider and service assignment, consultation, recalling patients past medical history through a centralized EHR system and monitoring healthcare quality in a real-time manner. The main advantage of our work has been the multi agent system utilization. Our Future work will include proposed architecture implementation and evaluation of its impact on patient quality care improvement.

  20. Service orientation in holonic and multi-agent manufacturing

    CERN Document Server

    Thomas, André; Trentesaux, Damien

    2015-01-01

    This volume gathers the peer reviewed papers presented at the 4th edition of the International Workshop “Service Orientation in Holonic and Multi-agent Manufacturing – SOHOMA’14” organized and hosted on November 5-6, 2014 by the University of Lorraine, France in collaboration with the CIMR Research Centre of the University Politehnica of Bucharest and the TEMPO Laboratory of the University of Valenciennes and Hainaut-Cambrésis.   The book is structured in six parts, each one covering a specific research line which represents a trend in future manufacturing: (1) Holonic and Agent-based Industrial Automation Systems; (2) Service-oriented Management and Control of Manufacturing Systems; (3) Distributed Modelling for Safety and Security in Industrial Systems; (4) Complexity, Big Data and Virtualization in Computing-oriented Manufacturing; (5) Adaptive, Bio-inspired and Self-organizing Multi-Agent Systems for Manufacturing, and (6) Physical Internet Simulation, Modelling and Control.   There is a clear ...

  1. Place preference and vocal learning rely on distinct reinforcers in songbirds.

    Science.gov (United States)

    Murdoch, Don; Chen, Ruidong; Goldberg, Jesse H

    2018-04-30

    In reinforcement learning (RL) agents are typically tasked with maximizing a single objective function such as reward. But it remains poorly understood how agents might pursue distinct objectives at once. In machines, multiobjective RL can be achieved by dividing a single agent into multiple sub-agents, each of which is shaped by agent-specific reinforcement, but it remains unknown if animals adopt this strategy. Here we use songbirds to test if navigation and singing, two behaviors with distinct objectives, can be differentially reinforced. We demonstrate that strobe flashes aversively condition place preference but not song syllables. Brief noise bursts aversively condition song syllables but positively reinforce place preference. Thus distinct behavior-generating systems, or agencies, within a single animal can be shaped by correspondingly distinct reinforcement signals. Our findings suggest that spatially segregated vocal circuits can solve a credit assignment problem associated with multiobjective learning.

  2. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study

    Science.gov (United States)

    Anderson, Sarah J.; Hecker, Kent G.; Krigolson, Olave E.; Jamniczky, Heather A.

    2018-01-01

    In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise. PMID:29467638

  3. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention-A Neuroeducation Study.

    Science.gov (United States)

    Anderson, Sarah J; Hecker, Kent G; Krigolson, Olave E; Jamniczky, Heather A

    2018-01-01

    In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise.

  4. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study

    Directory of Open Access Journals (Sweden)

    Sarah J. Anderson

    2018-02-01

    Full Text Available In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT methods incorporate pre-class exercises (typically online meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise.

  5. Identification of animal behavioral strategies by inverse reinforcement learning.

    Directory of Open Access Journals (Sweden)

    Shoichiro Yamaguchi

    2018-05-01

    Full Text Available Animals are able to reach a desired state in an environment by controlling various behavioral patterns. Identification of the behavioral strategy used for this control is important for understanding animals' decision-making and is fundamental to dissect information processing done by the nervous system. However, methods for quantifying such behavioral strategies have not been fully established. In this study, we developed an inverse reinforcement-learning (IRL framework to identify an animal's behavioral strategy from behavioral time-series data. We applied this framework to C. elegans thermotactic behavior; after cultivation at a constant temperature with or without food, fed worms prefer, while starved worms avoid the cultivation temperature on a thermal gradient. Our IRL approach revealed that the fed worms used both the absolute temperature and its temporal derivative and that their behavior involved two strategies: directed migration (DM and isothermal migration (IM. With DM, worms efficiently reached specific temperatures, which explains their thermotactic behavior when fed. With IM, worms moved along a constant temperature, which reflects isothermal tracking, well-observed in previous studies. In contrast to fed animals, starved worms escaped the cultivation temperature using only the absolute, but not the temporal derivative of temperature. We also investigated the neural basis underlying these strategies, by applying our method to thermosensory neuron-deficient worms. Thus, our IRL-based approach is useful in identifying animal strategies from behavioral time-series data and could be applied to a wide range of behavioral studies, including decision-making, in other organisms.

  6. Optimized Assistive Human-Robot Interaction Using Reinforcement Learning.

    Science.gov (United States)

    Modares, Hamidreza; Ranatunga, Isura; Lewis, Frank L; Popa, Dan O

    2016-03-01

    An intelligent human-robot interaction (HRI) system with adjustable robot behavior is presented. The proposed HRI system assists the human operator to perform a given task with minimum workload demands and optimizes the overall human-robot system performance. Motivated by human factor studies, the presented control structure consists of two control loops. First, a robot-specific neuro-adaptive controller is designed in the inner loop to make the unknown nonlinear robot behave like a prescribed robot impedance model as perceived by a human operator. In contrast to existing neural network and adaptive impedance-based control methods, no information of the task performance or the prescribed robot impedance model parameters is required in the inner loop. Then, a task-specific outer-loop controller is designed to find the optimal parameters of the prescribed robot impedance model to adjust the robot's dynamics to the operator skills and minimize the tracking error. The outer loop includes the human operator, the robot, and the task performance details. The problem of finding the optimal parameters of the prescribed robot impedance model is transformed into a linear quadratic regulator (LQR) problem which minimizes the human effort and optimizes the closed-loop behavior of the HRI system for a given task. To obviate the requirement of the knowledge of the human model, integral reinforcement learning is used to solve the given LQR problem. Simulation results on an x - y table and a robot arm, and experimental implementation results on a PR2 robot confirm the suitability of the proposed method.

  7. Dynamic Sensor Tasking for Space Situational Awareness via Reinforcement Learning

    Science.gov (United States)

    Linares, R.; Furfaro, R.

    2016-09-01

    This paper studies the Sensor Management (SM) problem for optical Space Object (SO) tracking. The tasking problem is formulated as a Markov Decision Process (MDP) and solved using Reinforcement Learning (RL). The RL problem is solved using the actor-critic policy gradient approach. The actor provides a policy which is random over actions and given by a parametric probability density function (pdf). The critic evaluates the policy by calculating the estimated total reward or the value function for the problem. The parameters of the policy action pdf are optimized using gradients with respect to the reward function. Both the critic and the actor are modeled using deep neural networks (multi-layer neural networks). The policy neural network takes the current state as input and outputs probabilities for each possible action. This policy is random, and can be evaluated by sampling random actions using the probabilities determined by the policy neural network's outputs. The critic approximates the total reward using a neural network. The estimated total reward is used to approximate the gradient of the policy network with respect to the network parameters. This approach is used to find the non-myopic optimal policy for tasking optical sensors to estimate SO orbits. The reward function is based on reducing the uncertainty for the overall catalog to below a user specified uncertainty threshold. This work uses a 30 km total position error for the uncertainty threshold. This work provides the RL method with a negative reward as long as any SO has a total position error above the uncertainty threshold. This penalizes policies that take longer to achieve the desired accuracy. A positive reward is provided when all SOs are below the catalog uncertainty threshold. An optimal policy is sought that takes actions to achieve the desired catalog uncertainty in minimum time. This work trains the policy in simulation by letting it task a single sensor to "learn" from its performance

  8. A reward optimization method based on action subrewards in hierarchical reinforcement learning.

    Science.gov (United States)

    Fu, Yuchen; Liu, Quan; Ling, Xionghong; Cui, Zhiming

    2014-01-01

    Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

  9. Model of interaction in Smart Grid on the basis of multi-agent system

    Science.gov (United States)

    Engel, E. A.; Kovalev, I. V.; Engel, N. E.

    2016-11-01

    This paper presents model of interaction in Smart Grid on the basis of multi-agent system. The use of travelling waves in the multi-agent system describes the behavior of the Smart Grid from the local point, which is being the complement of the conventional approach. The simulation results show that the absorption of the wave in the distributed multi-agent systems is effectively simulated the interaction in Smart Grid.

  10. Robot Control Using UML and Multi-agent System

    Directory of Open Access Journals (Sweden)

    Ales Pavliska

    2003-02-01

    Full Text Available Increased industrialization and new markets have led to an accumulation of used technical consumer goods, which results in greater exploitation of raw materials, energy and landfill sites. In order to reduce the use of natural resources conserve precious energy and limit the increase in waste volume. The application of disassembly techniques is the first step towards this prevention of waste. These techniques form a reliable and clean approach: "noble" or high-graded recycling. This paper presents a multi agent system for disassembly process, which is implemented in a computer-aided application for supervising of the disassembling system: the Interactive Intelligent Interface for Disassembling System. Unified modeling language diagrams are used for an internal and external definition of the disassembling system.

  11. Multi-Agent Framework in Visual Sensor Networks

    Directory of Open Access Journals (Sweden)

    J. M. Molina

    2007-01-01

    Full Text Available The recent interest in the surveillance of public, military, and commercial scenarios is increasing the need to develop and deploy intelligent and/or automated distributed visual surveillance systems. Many applications based on distributed resources use the so-called software agent technology. In this paper, a multi-agent framework is applied to coordinate videocamera-based surveillance. The ability to coordinate agents improves the global image and task distribution efficiency. In our proposal, a software agent is embedded in each camera and controls the capture parameters. Then coordination is based on the exchange of high-level messages among agents. Agents use an internal symbolic model to interpret the current situation from the messages from all other agents to improve global coordination.

  12. Multi-agent cooperative systems applied to precision applications

    International Nuclear Information System (INIS)

    McKay, M.D.; Anderson, M.O.; Gunderson, R.W.; Flann, N.; Abbott, B.

    1998-01-01

    Regulatory agencies are imposing limits and constraints to protect the operator and/or the environment. While generally necessary, these controls also tend to increase cost and decrease efficiency and productivity. Intelligent computer systems can be made to perform these hazardous tasks with greater efficiency and precision without danger to the operators. The Idaho national Engineering and Environmental Laboratory and the Center for Self-Organizing and Intelligent Systems at Utah State University have developed a series of autonomous all-terrain multi-agent systems capable of performing automated tasks within hazardous environments. This paper discusses the development and application of cooperative small-scale and large-scale robots for use in various activities associated with radiologically contaminated areas, prescription farming, and unexploded ordinances

  13. Intercell scheduling: A negotiation approach using multi-agent coalitions

    Science.gov (United States)

    Tian, Yunna; Li, Dongni; Zheng, Dan; Jia, Yunde

    2016-10-01

    Intercell scheduling problems arise as a result of intercell transfers in cellular manufacturing systems. Flexible intercell routes are considered in this article, and a coalition-based scheduling (CBS) approach using distributed multi-agent negotiation is developed. Taking advantage of the extended vision of the coalition agents, the global optimization is improved and the communication cost is reduced. The objective of the addressed problem is to minimize mean tardiness. Computational results show that, compared with the widely used combinatorial rules, CBS provides better performance not only in minimizing the objective, i.e. mean tardiness, but also in minimizing auxiliary measures such as maximum completion time, mean flow time and the ratio of tardy parts. Moreover, CBS is better than the existing intercell scheduling approach for the same problem with respect to the solution quality and computational costs.

  14. electrónica en sistemas multi-agente

    Directory of Open Access Journals (Sweden)

    MARCELA PASTRANA DAVID

    2008-01-01

    Full Text Available El objetivo del trabajo presentado en este artículo consiste en la definición de un método de comparación basado en la aplicación de métricas de calidad elaboradas con el fin de medir protocolos de negociación electrónica en entornos Multi-Agente. Para la comparación de los protocolos se escogen los siguientes criterios de calidad: Rapidez, Eficiencia, Escalabilidad y Completitud. Para la aplicación y validación del método de Comparación se implementan dos prototipos de negociación electrónica basados en las subastas inglesa y holandesa, utilizando la plataforma JADE (Java Agents DEvelopment Framework y se obtuvieron resultados preliminares de su comportamiento los cuales fueron analizados y se sacaron las conclusiones respectivas.

  15. Distributed Research Project Scheduling Based on Multi-Agent Methods

    Directory of Open Access Journals (Sweden)

    Constanta Nicoleta Bodea

    2011-01-01

    Full Text Available Different project planning and scheduling approaches have been developed. The Operational Research (OR provides two major planning techniques: CPM (Critical Path Method and PERT (Program Evaluation and Review Technique. Due to projects complexity and difficulty to use classical methods, new approaches were developed. Artificial Intelligence (AI initially promoted the automatic planner concept, but model-based planning and scheduling methods emerged later on. The paper adresses the project scheduling optimization problem, when projects are seen as Complex Adaptive Systems (CAS. Taken into consideration two different approaches for project scheduling optimization: TCPSP (Time- Constrained Project Scheduling and RCPSP (Resource-Constrained Project Scheduling, the paper focuses on a multiagent implementation in MATLAB for TCSP. Using the research project as a case study, the paper includes a comparison between two multi-agent methods: Genetic Algorithm (GA and Ant Colony Algorithm (ACO.

  16. Modeling, Simulation, and Characterization of Distributed Multi-agent Systems

    Directory of Open Access Journals (Sweden)

    Reed F. Young

    2012-04-01

    Full Text Available A strategy is described that utilizes a novel application of a potential-force function that includes the tuning of coefficients to control mobile robots orchestrated as a distributed multiagent system. Control system parameters are manipulated methodically via simulation and hardware experimentation to gain a better understanding of their impact upon mission performance of the multi-agent system as applied to a predetermined task of area exploration and mapping. Also included are descriptions of experiment infrastructure components that afford convenient solutions to research challenges. These consist of a surrogate localization (position and orientation function utilizing a novel MATLAB executable (MEX function and a user datagram protocol (UDP-based communications protocol that facilitates communication among network-based control computers.

  17. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning.

    Science.gov (United States)

    Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B J

    2014-06-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.

  18. 10th KES Conference on Agent and Multi-Agent Systems : Technologies and Applications

    CERN Document Server

    Chen-Burger, Yun-Heh; Howlett, Robert; Jain, Lakhmi

    2016-01-01

    The modern economy is driven by technologies and knowledge. Digital technologies can free, shift and multiply choices, often intruding on the space of other industries, by providing new ways of conducting business operations and creating values for customers and companies. The topics covered in this volume include software agents, multi-agent systems, agent modelling, mobile and cloud computing, big data analysis, business intelligence, artificial intelligence, social systems, computer embedded systems and nature inspired manufacturing, etc. that contribute to the modern Digital Economy. This volume highlights new trends and challenges in agent, new digital and knowledge economy research and includes 28 papers classified in the following specific topics: business process management, agent-based modeling and simulation, anthropic-oriented computing, learning paradigms, business informatics and gaming, digital economy, and advances in networked virtual enterprises. Published papers were selected for presentatio...

  19. Temporal Memory Reinforcement Learning for the Autonomous Micro-mobile Robot Based-behavior

    Institute of Scientific and Technical Information of China (English)

    Yang Yujun(杨玉君); Cheng Junshi; Chen Jiapin; Li Xiaohai

    2004-01-01

    This paper presents temporal memory reinforcement learning for the autonomous micro-mobile robot based-behavior. Human being has a memory oblivion process, i.e. the earlier to memorize, the earlier to forget, only the repeated thing can be remembered firmly. Enlightening forms this, and the robot need not memorize all the past states, at the same time economizes the EMS memory space, which is not enough in the MPU of our AMRobot. The proposed algorithm is an extension of the Q-learning, which is an incremental reinforcement learning method. The results of simulation have shown that the algorithm is valid.

  20. Video Demo: Deep Reinforcement Learning for Coordination in Traffic Light Control

    NARCIS (Netherlands)

    van der Pol, E.; Oliehoek, F.A.; Bosse, T.; Bredeweg, B.

    2016-01-01

    This video demonstration contrasts two approaches to coordination in traffic light control using reinforcement learning: earlier work, based on a deconstruction of the state space into a linear combination of vehicle states, and our own approach based on the Deep Q-learning algorithm.

  1. Study for the design method of multi-agent diagnostic system to improve diagnostic performance for similar abnormality

    International Nuclear Information System (INIS)

    Minowa, Hirotsugu; Gofuku, Akio

    2014-01-01

    Accidents on industrial plants cause large loss on human, economic, social credibility. In recent, studies of diagnostic methods using techniques of machine learning such as support vector machine is expected to detect the occurrence of abnormality in a plant early and correctly. There were reported that these diagnostic machines has high accuracy to diagnose the operating state of industrial plant under mono abnormality occurrence. But the each diagnostic machine on the multi-agent diagnostic system may misdiagnose similar abnormalities as a same abnormality if abnormalities to diagnose increases. That causes that a single diagnostic machine may show higher diagnostic performance than one of multi-agent diagnostic system because decision-making considering with misdiagnosis is difficult. Therefore, we study the design method for multi-agent diagnostic system to diagnose similar abnormality correctly. This method aimed to realize automatic generation of diagnostic system where the generation process and location of diagnostic machines are optimized to diagnose correctly the similar abnormalities which are evaluated from the similarity of process signals by statistical method. This paper explains our design method and reports the result evaluated our method applied to the process data of the fast-breeder reactor Monju

  2. Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models.

    Science.gov (United States)

    Najnin, Shamima; Banerjee, Bonny

    2018-01-01

    Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The "novel words to novel objects" language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task.

  3. The drift diffusion model as the choice rule in reinforcement learning.

    Science.gov (United States)

    Pedersen, Mads Lund; Frank, Michael J; Biele, Guido

    2017-08-01

    Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.

  4. EFFICIENT SPECTRUM UTILIZATION IN COGNITIVE RADIO THROUGH REINFORCEMENT LEARNING

    Directory of Open Access Journals (Sweden)

    Dhananjay Kumar

    2013-09-01

    Full Text Available Machine learning schemes can be employed in cognitive radio systems to intelligently locate the spectrum holes with some knowledge about the operating environment. In this paper, we formulate a variation of Actor Critic Learning algorithm known as Continuous Actor Critic Learning Automaton (CACLA and compare this scheme with Actor Critic Learning scheme and existing Q–learning scheme. Simulation results show that our CACLA scheme has lesser execution time and achieves higher throughput compared to other two schemes.

  5. Adversarial Reinforcement Learning in a Cyber Security Simulation}

    NARCIS (Netherlands)

    Elderman, Richard; Pater, Leon; Thie, Albert; Drugan, Madalina; Wiering, Marco

    2017-01-01

    This paper focuses on cyber-security simulations in networks modeled as a Markov game with incomplete information and stochastic elements. The resulting game is an adversarial sequential decision making problem played with two agents, the attacker and defender. The two agents pit one reinforcement

  6. Decision Making in Reinforcement Learning Using a Modified Learning Space Based on the Importance of Sensors

    Directory of Open Access Journals (Sweden)

    Yasutaka Kishima

    2013-01-01

    Full Text Available Many studies have been conducted on the application of reinforcement learning (RL to robots. A robot which is made for general purpose has redundant sensors or actuators because it is difficult to assume an environment that the robot will face and a task that the robot must execute. In this case, -space on RL contains redundancy so that the robot must take much time to learn a given task. In this study, we focus on the importance of sensors with regard to a robot’s performance of a particular task. The sensors that are applicable to a task differ according to the task. By using the importance of the sensors, we try to adjust the state number of the sensors and to reduce the size of -space. In this paper, we define the measure of importance of a sensor for a task with the correlation between the value of each sensor and reward. A robot calculates the importance of the sensors and makes the size of -space smaller. We propose the method which reduces learning space and construct the learning system by putting it in RL. In this paper, we confirm the effectiveness of our proposed system with an experimental robot.

  7. Neural correlates of reinforcement learning and social preferences in competitive bidding.

    Science.gov (United States)

    van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M

    2013-01-30

    In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.

  8. A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control

    Science.gov (United States)

    Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu

    This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.

  9. Learning Similar Actions by Reinforcement or Sensory-Prediction Errors Rely on Distinct Physiological Mechanisms.

    Science.gov (United States)

    Uehara, Shintaro; Mawase, Firas; Celnik, Pablo

    2017-09-14

    Humans can acquire knowledge of new motor behavior via different forms of learning. The two forms most commonly studied have been the development of internal models based on sensory-prediction errors (error-based learning) and success-based feedback (reinforcement learning). Human behavioral studies suggest these are distinct learning processes, though the neurophysiological mechanisms that are involved have not been characterized. Here, we evaluated physiological markers from the cerebellum and the primary motor cortex (M1) using noninvasive brain stimulations while healthy participants trained finger-reaching tasks. We manipulated the extent to which subjects rely on error-based or reinforcement by providing either vector or binary feedback about task performance. Our results demonstrated a double dissociation where learning the task mainly via error-based mechanisms leads to cerebellar plasticity modifications but not long-term potentiation (LTP)-like plasticity changes in M1; while learning a similar action via reinforcement mechanisms elicited M1 LTP-like plasticity but not cerebellar plasticity changes. Our findings indicate that learning complex motor behavior is mediated by the interplay of different forms of learning, weighing distinct neural mechanisms in M1 and the cerebellum. Our study provides insights for designing effective interventions to enhance human motor learning. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. A Multi-Agent Architecture for an Intelligent Website in Insurance

    NARCIS (Netherlands)

    Jonker, C.M.; Lam, R.A.; Treur, J.

    1999-01-01

    In this paper a multi-agent architecture for intelligent Websites is presented and applied in insurance. The architecture has been designed and implemented using the compositional development method for multi-agent systems DESIRE. The agents within this architecture are based on a generic broker

  11. Research of negotiation in network trade system based on multi-agent

    Science.gov (United States)

    Cai, Jun; Wang, Guozheng; Wu, Haiyan

    2009-07-01

    A construction and implementation technology of network trade based on multi-agent is described in this paper. First, we researched the technology of multi-agent, then we discussed the consumer's behaviors and the negotiation between purchaser and bargainer which emerges in the traditional business mode and analysed the key technology to implement the network trade system. Finally, we implement the system.

  12. Multi-agent platform for development of educational games for children with autism

    NARCIS (Netherlands)

    Alers, S.H.M.; Barakova, E.I.

    2009-01-01

    Multi-agent system of autonomous interactive blocks that can display its active state through color and light intensity has been developed. Depending on the individual rules, these autonomous blocks could express emergent behaviors which are a basis for various educational games. The multi-agent

  13. Distributed Cooperative Control of Nonlinear and Non-identical Multi-agent Systems

    DEFF Research Database (Denmark)

    Bidram, Ali; Lewis, Frank; Davoudi, Ali

    2013-01-01

    This paper exploits input-output feedback linearization technique to implement distributed cooperative control of multi-agent systems with nonlinear and non-identical dynamics. Feedback linearization transforms the synchronization problem for a nonlinear and heterogeneous multi-agent system...... for electric power microgrids. The effectiveness of the proposed control is verified by simulating a microgrid test system....

  14. Multi-agent system-based event-triggered hybrid control scheme for energy internet

    DEFF Research Database (Denmark)

    Dou, Chunxia; Yue, Dong; Han, Qing Long

    2017-01-01

    This paper is concerned with an event-triggered hybrid control for the energy Internet based on a multi-agent system approach with which renewable energy resources can be fully utilized to meet load demand with high security and well dynamical quality. In the design of control, a multi-agent system...

  15. Self-learning fuzzy logic controllers based on reinforcement

    International Nuclear Information System (INIS)

    Wang, Z.; Shao, S.; Ding, J.

    1996-01-01

    This paper proposes a new method for learning and tuning Fuzzy Logic Controllers. The self-learning scheme in this paper is composed of Bucket-Brigade and Genetic Algorithm. The proposed method is tested on the cart-pole system. Simulation results show that our approach has good learning and control performance

  16. Experiments with Online Reinforcement Learning in Real-Time Strategy Games

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-time strategy (RTS) games provide a challenging platform to implement online reinforcement learning (RL) techniques in a real application. Computer, as one game player, monitors opponents' (human or other computers) strategies and then updates its own policy using RL methods. In this article......, we first examine the suitability of applying the online RL in various computer games. Reinforcement learning application depends on both RL complexity and the game features. We then propose a multi-layer framework for implementing online RL in an RTS game. The framework significantly reduces RL...... the effectiveness of our proposed framework and shed light on relevant issues in using online RL in RTS games....

  17. A Model to Explain the Emergence of Reward Expectancy neurons using Reinforcement Learning and Neural Network

    OpenAIRE

    Shinya, Ishii; Munetaka, Shidara; Katsunari, Shibata

    2006-01-01

    In an experiment of multi-trial task to obtain a reward, reward expectancy neurons,###which responded only in the non-reward trials that are necessary to advance###toward the reward, have been observed in the anterior cingulate cortex of monkeys.###In this paper, to explain the emergence of the reward expectancy neuron in###terms of reinforcement learning theory, a model that consists of a recurrent neural###network trained based on reinforcement learning is proposed. The analysis of the###hi...

  18. Trends in practical applications of heterogeneous multi-agent systems : the PAAMS collection

    CERN Document Server

    Rodríguez, Juan; Mathieu, Philippe; Campbell, Andrew; Ortega, Alfonso; Adam, Emmanuel; Navarro, Elena; Ahrndt, Sebastian; Moreno, María; Julián, Vicente

    2014-01-01

    PAAMS, the International Conference on Practical Applications of Agents and Multi-Agent Systems is an evolution of the International Workshop on Practical Applications of Agents and Multi-Agent Systems. PAAMS is an international yearly tribune to present, to discuss, and to disseminate the latest developments and the most important outcomes related to real-world applications. It provides a unique opportunity to bring multi-disciplinary experts, academics and practitioners together to exchange their experience in the development of Agents and Multi-Agent Systems. This volume presents the papers that have been accepted for the 2014 special sessions: Agents Behaviours and Artificial Markets (ABAM), Agents and Mobile Devices (AM), Bio-Inspired and Multi-Agents Systems: Applications to Languages (BioMAS), Multi-Agent Systems and Ambient Intelligence (MASMAI), Self-Explaining Agents (SEA), Web Mining and Recommender systems (WebMiRes) and Intelligent Educational Systems (SSIES).

  19. Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain.

    Science.gov (United States)

    Niv, Yael; Edlund, Jeffrey A; Dayan, Peter; O'Doherty, John P

    2012-01-11

    Humans and animals are exquisitely, though idiosyncratically, sensitive to risk or variance in the outcomes of their actions. Economic, psychological, and neural aspects of this are well studied when information about risk is provided explicitly. However, we must normally learn about outcomes from experience, through trial and error. Traditional models of such reinforcement learning focus on learning about the mean reward value of cues and ignore higher order moments such as variance. We used fMRI to test whether the neural correlates of human reinforcement learning are sensitive to experienced risk. Our analysis focused on anatomically delineated regions of a priori interest in the nucleus accumbens, where blood oxygenation level-dependent (BOLD) signals have been suggested as correlating with quantities derived from reinforcement learning. We first provide unbiased evidence that the raw BOLD signal in these regions corresponds closely to a reward prediction error. We then derive from this signal the learned values of cues that predict rewards of equal mean but different variance and show that these values are indeed modulated by experienced risk. Moreover, a close neurometric-psychometric coupling exists between the fluctuations of the experience-based evaluations of risky options that we measured neurally and the fluctuations in behavioral risk aversion. This suggests that risk sensitivity is integral to human learning, illuminating economic models of choice, neuroscientific models of affective learning, and the workings of the underlying neural mechanisms.

  20. Quadratic stabilisability of multi-agent systems under switching topologies

    Science.gov (United States)

    Guan, Yongqiang; Ji, Zhijian; Zhang, Lin; Wang, Long

    2014-12-01

    This paper addresses the stabilisability of multi-agent systems (MASs) under switching topologies. Necessary and/or sufficient conditions are presented in terms of graph topology. These conditions explicitly reveal how the intrinsic dynamics of the agents, the communication topology and the external control input affect stabilisability jointly. With the appropriate selection of some agents to which the external inputs are applied and the suitable design of neighbour-interaction rules via a switching topology, an MAS is proved to be stabilisable even if so is not for each of uncertain subsystem. In addition, a method is proposed to constructively design a switching rule for MASs with norm-bounded time-varying uncertainties. The switching rules designed via this method do not rely on uncertainties, and the switched MAS is quadratically stabilisable via decentralised external self-feedback for all uncertainties. With respect to applications of the stabilisability results, the formation control and the cooperative tracking control are addressed. Numerical simulations are presented to demonstrate the effectiveness of the proposed results.

  1. A Lookahead Behavior Model for Multi-Agent Hybrid Simulation

    Directory of Open Access Journals (Sweden)

    Mei Yang

    2017-10-01

    Full Text Available In the military field, multi-agent simulation (MAS plays an important role in studying wars statistically. For a military simulation system, which involves large-scale entities and generates a very large number of interactions during the runtime, the issue of how to improve the running efficiency is of great concern for researchers. Current solutions mainly use hybrid simulation to gain fewer updates and synchronizations, where some important continuous models are maintained implicitly to keep the system dynamics, and partial resynchronization (PR is chosen as the preferable state update mechanism. However, problems, such as resynchronization interval selection and cyclic dependency, remain unsolved in PR, which easily lead to low update efficiency and infinite looping of the state update process. To address these problems, this paper proposes a lookahead behavior model (LBM to implement a PR-based hybrid simulation. In LBM, a minimal safe time window is used to predict the interactions between implicit models, upon which the resynchronization interval can be efficiently determined. Moreover, the LBM gives an estimated state value in the lookahead process so as to break the state-dependent cycle. The simulation results show that, compared with traditional mechanisms, LBM requires fewer updates and synchronizations.

  2. Adaptive Multi-Agent Systems for Constrained Optimization

    Science.gov (United States)

    Macready, William; Bieniawski, Stefan; Wolpert, David H.

    2004-01-01

    Product Distribution (PD) theory is a new framework for analyzing and controlling distributed systems. Here we demonstrate its use for distributed stochastic optimization. First we review one motivation of PD theory, as the information-theoretic extension of conventional full-rationality game theory to the case of bounded rational agents. In this extension the equilibrium of the game is the optimizer of a Lagrangian of the (probability distribution of) the joint state of the agents. When the game in question is a team game with constraints, that equilibrium optimizes the expected value of the team game utility, subject to those constraints. The updating of the Lagrange parameters in the Lagrangian can be viewed as a form of automated annealing, that focuses the MAS more and more on the optimal pure strategy. This provides a simple way to map the solution of any constrained optimization problem onto the equilibrium of a Multi-Agent System (MAS). We present computer experiments involving both the Queen s problem and K-SAT validating the predictions of PD theory and its use for off-the-shelf distributed adaptive optimization.

  3. Designing of Roaming Protocol for Bluetooth Equipped Multi Agent Systems

    Science.gov (United States)

    Subhan, Fazli; Hasbullah, Halabi B.

    Bluetooth is an established standard for low cost, low power, wireless personal area network. Currently, Bluetooth does not support any roaming protocol in which handoff occurs dynamically when a Bluetooth device is moving out of the piconet. If a device is losing its connection to the master device, no provision is made to transfer it to another master. Handoff is not possible in a piconet, as in order to stay within the network, a slave would have to keep the same master. So, by definition intra-handoff is not possible within a piconet. This research mainly focuses on Bluetooth technology and designing a roaming protocol for Bluetooth equipped multi agent systems. A mathematical model is derived for an agent. The idea behind the mathematical model is to know when to initiate the roaming process for an agent. A desired trajectory for the agent is calculated using its x and y coordinates system, and is simulated in SIMULINK. Various roaming techniques are also studied and discussed. The advantage of designing a roaming protocol is to ensure the Bluetooth enabled roaming devices can freely move inside the network coverage without losing its connection or break of service in case of changing the base stations.

  4. Multi-Agent Market Modeling of Foreign Exchange Rates

    Science.gov (United States)

    Zimmermann, Georg; Neuneier, Ralph; Grothmann, Ralph

    A market mechanism is basically driven by a superposition of decisions of many agents optimizing their profit. The oeconomic price dynamic is a consequence of the cumulated excess demand/supply created on this micro level. The behavior analysis of a small number of agents is well understood through the game theory. In case of a large number of agents one may use the limiting case that an individual agent does not have an influence on the market, which allows the aggregation of agents by statistic methods. In contrast to this restriction, we can omit the assumption of an atomic market structure, if we model the market through a multi-agent approach. The contribution of the mathematical theory of neural networks to the market price formation is mostly seen on the econometric side: neural networks allow the fitting of high dimensional nonlinear dynamic models. Furthermore, in our opinion, there is a close relationship between economics and the modeling ability of neural networks because a neuron can be interpreted as a simple model of decision making. With this in mind, a neural network models the interaction of many decisions and, hence, can be interpreted as the price formation mechanism of a market.

  5. MAINS: MULTI-AGENT INTELLIGENT SERVICE ARCHITECTURE FOR CLOUD COMPUTING

    Directory of Open Access Journals (Sweden)

    T. Joshva Devadas

    2014-04-01

    Full Text Available Computing has been transformed to a model having commoditized services. These services are modeled similar to the utility services water and electricity. The Internet has been stunningly successful over the course of past three decades in supporting multitude of distributed applications and a wide variety of network technologies. However, its popularity has become the biggest impediment to its further growth with the handheld devices mobile and laptops. Agents are intelligent software system that works on behalf of others. Agents are incorporated in many innovative applications in order to improve the performance of the system. Agent uses its possessed knowledge to react with the system and helps to improve the performance. Agents are introduced in the cloud computing is to minimize the response time when similar request is raised from an end user in the globe. In this paper, we have introduced a Multi Agent Intelligent system (MAINS prior to cloud service models and it was tested using sample dataset. Performance of the MAINS layer was analyzed in three aspects and the outcome of the analysis proves that MAINS Layer provides a flexible model to create cloud applications and deploying them in variety of applications.

  6. Digital Watermark Tracking using Intelligent Multi-Agents System

    Directory of Open Access Journals (Sweden)

    Nagaraj V. DHARWADKAR

    2010-01-01

    Full Text Available E-commerce has become a huge business and adriving factor in the development of the Internet. Onlineshopping services are well established. Due to the evolution of2G and 3G mobile networks, soon online shopping services arecomplemented by their wireless counterparts. Furthermore, inthe recent years online delivery of digital media, such as MP3audio or video or image is very popular and will become anincreasingly important part of E-commerce. The advantage ofinternet is sharing the valuable digital data which lead to misuseof digital data. To resolve the problem of misuse of digital dataon Internet we need to have strong Digital rights monitoringsystem. Digital Rights Management (DRM is fairly youngdiscipline, while some of its underlying technologies have beenknown from many years. The use of DRM for managing andprotecting intellectual property rights is a comparatively newfield. In this paper we propose a model for online digital imagelibrary copyright protection based on watermark trackingSystem.In our proposed model the tracking of watermarks onremote host nodes is done using active mobile agents. The multiagentsystem architecture is used in watermark tracking whichsupports the coordination of several component tasks acrossdistributed and flexible networks of information sources.Whereas a centralized system is susceptible to system-widefailures and processing bottlenecks, multi-agent systems aremore reliable, especially given the likelihood of individualcomponent failures.

  7. A MULTI-AGENT SYSTEM FOR FOREST TRANSPORT ACTIVITY PLANNING

    Directory of Open Access Journals (Sweden)

    Carlos Alberto Araújo Júnior

    2017-09-01

    Full Text Available This study aims to propose and implement a conceptual model of an intelligent system in a georeferenced environment to determine the design of forest transport fleets. For this, we used a multi-agent systems based tool, which is the subject of studies of distributed artificial intelligence. The proposed model considers the use of plantation mapping (stands and forest roads, as well as information about the different vehicle transport capacities. The system was designed to adapt itself to changes that occur during the forest transport operation process, such as the modification of demanded volume or the inclusion of route restrictions used by the vehicles. For its development, we used the Java programming language associated with the LPSolve library for the optimization calculation, the JADE platform to develop agents, and the ArcGis Runtime to determine the optimal transport routes. Five agents were modelled: the transporter, controller, router, loader and unloader agents. The model is able to determine the amount of trucks among the different vehicles available that meet the demand and availability of routes, with a focus on minimizing the total costs of timber transport. The system can also rearrange itself after the transportation routes change during the process.

  8. Decentralized control of multi-agent aerial transportation system

    KAUST Repository

    Toumi, Noureddine

    2017-04-01

    Autonomous aerial transportation has multiple potential applications including emergency cases and rescue missions where ground intervention may be difficult. In this context, the following work will address the control of multi-agent Vertical Take-off and Landing aircraft (VTOL) transportation system. We develop a decentralized method. The advantage of such a solution is that it can provide better maneuverability and lifting capabilities compared to existing systems. First, we consider a cooperative group of VTOLs transporting one payload. The main idea is that each agent perceive the interaction with other agents as a disturbance while assuming a negotiated motion model and imposing certain magnitude bounds on each agent. The theoretical model will be then validated using a numerical simulation illustrating the interesting features of the presented control method. Results show that under specified disturbances, the algorithm is able to guarantee the tracking with a minimal error. We describe a toolbox that has been developed for this purpose. Then, a system of multiple VTOLs lifting payloads will be studied. The algorithm assures that the VTOLs are coordinated with minimal communication. Additionally, a novel gripper design for ferrous objects is presented that enables the transportation of ferrous objects without a cable. Finally, we discuss potential connections to human in the loop transportation systems.

  9. Exchanging large data object in multi-agent systems

    Science.gov (United States)

    Al-Yaseen, Wathiq Laftah; Othman, Zulaiha Ali; Nazri, Mohd Zakree Ahmad

    2016-08-01

    One of the Business Intelligent solutions that is currently in use is the Multi-Agent System (MAS). Communication is one of the most important elements in MAS, especially for exchanging large low level data between distributed agents (physically). The Agent Communication Language in JADE has been offered as a secure method for sending data, whereby the data is defined as an object. However, the object cannot be used to send data to another agent in a different location. Therefore, the aim of this paper was to propose a method for the exchange of large low level data as an object by creating a proxy agent known as a Delivery Agent, which temporarily imitates the Receiver Agent. The results showed that the proposed method is able to send large-sized data. The experiments were conducted using 16 datasets ranging from 100,000 to 7 million instances. However, for the proposed method, the RAM and the CPU machine had to be slightly increased for the Receiver Agent, but the latency time was not significantly different compared to the use of the Java Socket method (non-agent and less secure). With such results, it was concluded that the proposed method can be used to securely send large data between agents.

  10. A multi-agent system architecture for sensor networks.

    Science.gov (United States)

    Fuentes-Fernández, Rubén; Guijarro, María; Pajares, Gonzalo

    2009-01-01

    The design of the control systems for sensor networks presents important challenges. Besides the traditional problems about how to process the sensor data to obtain the target information, engineers need to consider additional aspects such as the heterogeneity and high number of sensors, and the flexibility of these networks regarding topologies and the sensors in them. Although there are partial approaches for resolving these issues, their integration relies on ad hoc solutions requiring important development efforts. In order to provide an effective approach for this integration, this paper proposes an architecture based on the multi-agent system paradigm with a clear separation of concerns. The architecture considers sensors as devices used by an upper layer of manager agents. These agents are able to communicate and negotiate services to achieve the required functionality. Activities are organized according to roles related with the different aspects to integrate, mainly sensor management, data processing, communication and adaptation to changes in the available devices and their capabilities. This organization largely isolates and decouples the data management from the changing network, while encouraging reuse of solutions. The use of the architecture is facilitated by a specific modelling language developed through metamodelling. A case study concerning a generic distributed system for fire fighting illustrates the approach and the comparison with related work.

  11. A Multi-Agent System Architecture for Sensor Networks

    Directory of Open Access Journals (Sweden)

    María Guijarro

    2009-12-01

    Full Text Available The design of the control systems for sensor networks presents important challenges. Besides the traditional problems about how to process the sensor data to obtain the target information, engineers need to consider additional aspects such as the heterogeneity and high number of sensors, and the flexibility of these networks regarding topologies and the sensors in them. Although there are partial approaches for resolving these issues, their integration relies on ad hoc solutions requiring important development efforts. In order to provide an effective approach for this integration, this paper proposes an architecture based on the multi-agent system paradigm with a clear separation of concerns. The architecture considers sensors as devices used by an upper layer of manager agents. These agents are able to communicate and negotiate services to achieve the required functionality. Activities are organized according to roles related with the different aspects to integrate, mainly sensor management, data processing, communication and adaptation to changes in the available devices and their capabilities. This organization largely isolates and decouples the data management from the changing network, while encouraging reuse of solutions. The use of the architecture is facilitated by a specific modelling language developed through metamodelling. A case study concerning a generic distributed system for fire fighting illustrates the approach and the comparison with related work.

  12. Vehicle-based interactive management with multi-agent approach

    Directory of Open Access Journals (Sweden)

    Yee Ming Chen

    2009-09-01

    Full Text Available Under the energy crisis and global warming, mass transportation becomes more important than before. The disadvantages of mass transportation, plus the high flexibility and efficiency of taxi and with the revolution of technology, electric-taxi is the better transportation choice for metropolis. On the other hand, among the many taxi service types, dial-a-ride (DAR service system is the better way for passenger and taxi. However the electricity replenishing of electric-taxi is the biggest shortage and constraint for DAR operation system. In order to more effectively manage the electric-taxi DAR operation system and the lots of disadvantages of physical system and observe the behaviors and interactions of simulation system, multi-agent simulation technique is the most suitable simulation technique. Finally, we use virtual data as the input of simulation system and analyze the simulation result. We successfully obtain two performance measures: average waiting time and service rate. Result shows the average waiting time is only 3.93 seconds and the service rate (total transport passenger number / total passenger number is 37.073%. So these two performance measures can support us to make management decisions. The multiagent oriented model put forward in this article is the subject of an application intended in the long term to supervise the user information system of an urban transport network.

  13. A Multi Agent Based Model for Airport Service Planning

    Directory of Open Access Journals (Sweden)

    W.H. Ip

    2010-09-01

    Full Text Available Aviation industry is highly dynamic and demanding in nature that time and safety are the two most important factors while one of the major sources of delay is aircraft on ground because of it complexity, a lot of machinery like vehicles are involved and lots of communication are involved. As one of the aircraft ground services providers in Hong Kong International Airport, China Aircraft Services Limited (CASL aims to increase competitiveness by better its service provided while minimizing cost is also needed. One of the ways is to optimize the number of maintenance vehicles allocated in order to minimize chance of delay and also operating costs. In the paper, an agent-based model is proposed for support decision making in vehicle allocation. The overview of the aircrafts ground services procedures is firstly mentioned with different optimization methods suggested by researchers. Then, the agent-based approach is introduced and in the latter part of report and a multi-agent system is built and proposed which is decision supportive for CASL in optimizing the maintenance vehicles' allocation. The application provides flexibility for inputting number of different kinds of vehicles, simulation duration and aircraft arrival rate in order to simulation different scenarios which occurs in HKIA.

  14. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    Science.gov (United States)

    Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane

    2017-01-01

    Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning. PMID:29163004

  15. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Lucas Kastner

    2017-10-01

    Full Text Available Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1 Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2 Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3 Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.

  16. Safe robot execution in model-based reinforcement learning

    OpenAIRE

    Martínez Martínez, David; Alenyà Ribas, Guillem; Torras, Carme

    2015-01-01

    Task learning in robotics requires repeatedly executing the same actions in different states to learn the model of the task. However, in real-world domains, there are usually sequences of actions that, if executed, may produce unrecoverable errors (e.g. breaking an object). Robots should avoid repeating such errors when learning, and thus explore the state space in a more intelligent way. This requires identifying dangerous action effects to avoid including such actions in the generated plans...

  17. Reinforcement function design and bias for efficient learning in mobile robots

    International Nuclear Information System (INIS)

    Touzet, C.; Santos, J.M.

    1998-01-01

    The main paradigm in sub-symbolic learning robot domain is the reinforcement learning method. Various techniques have been developed to deal with the memorization/generalization problem, demonstrating the superior ability of artificial neural network implementations. In this paper, the authors address the issue of designing the reinforcement so as to optimize the exploration part of the learning. They also present and summarize works relative to the use of bias intended to achieve the effective synthesis of the desired behavior. Demonstrative experiments involving a self-organizing map implementation of the Q-learning and real mobile robots (Nomad 200 and Khepera) in a task of obstacle avoidance behavior synthesis are described. 3 figs., 5 tabs

  18. Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Taylor, M.E.; Stone, P.

    2010-01-01

    Temporal difference and evolutionary methods are two of the most common approaches to solving reinforcement learning problems. However, there is little consensus on their relative merits and there have been few empirical studies that directly compare their performance. This article aims to address

  19. A multi-agent brokerage platform for media content recommendation

    Directory of Open Access Journals (Sweden)

    Veloso Bruno

    2015-09-01

    Full Text Available Near real time media content personalisation is nowadays a major challenge involving media content sources, distributors and viewers. This paper describes an approach to seamless recommendation, negotiation and transaction of personalised media content. It adopts an integrated view of the problem by proposing, on the business-to-business (B2B side, a brokerage platform to negotiate the media items on behalf of the media content distributors and sources, providing viewers, on the business-to-consumer (B2C side, with a personalised electronic programme guide (EPG containing the set of recommended items after negotiation. In this setup, when a viewer connects, the distributor looks up and invites sources to negotiate the contents of the viewer personal EPG. The proposed multi-agent brokerage platform is structured in four layers, modelling the registration, service agreement, partner lookup, invitation as well as item recommendation, negotiation and transaction stages of the B2B processes. The recommendation service is a rule-based switch hybrid filter, including six collaborative and two content-based filters. The rule-based system selects, at runtime, the filter(s to apply as well as the final set of recommendations to present. The filter selection is based on the data available, ranging from the history of items watched to the ratings and/or tags assigned to the items by the viewer. Additionally, this module implements (i a novel item stereotype to represent newly arrived items, (ii a standard user stereotype for new users, (iii a novel passive user tag cloud stereotype for socially passive users, and (iv a new content-based filter named the collinearity and proximity similarity (CPS. At the end of the paper, we present off-line results and a case study describing how the recommendation service works. The proposed system provides, to our knowledge, an excellent holistic solution to the problem of recommending multimedia contents.

  20. Energy Management Strategy for a Hybrid Electric Vehicle Based on Deep Reinforcement Learning

    OpenAIRE

    Yue Hu; Weimin Li; Kun Xu; Taimoor Zahid; Feiyan Qin; Chenming Li

    2018-01-01

    An energy management strategy (EMS) is important for hybrid electric vehicles (HEVs) since it plays a decisive role on the performance of the vehicle. However, the variation of future driving conditions deeply influences the effectiveness of the EMS. Most existing EMS methods simply follow predefined rules that are not adaptive to different driving conditions online. Therefore, it is useful that the EMS can learn from the environment or driving cycle. In this paper, a deep reinforcement learn...

  1. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning

    OpenAIRE

    Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ

    2014-01-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated....

  2. Active-learning strategies: the use of a game to reinforce learning in nursing education. A case study.

    Science.gov (United States)

    Boctor, Lisa

    2013-03-01

    The majority of nursing students are kinesthetic learners, preferring a hands-on, active approach to education. Research shows that active-learning strategies can increase student learning and satisfaction. This study looks at the use of one active-learning strategy, a Jeopardy-style game, 'Nursopardy', to reinforce Fundamentals of Nursing material, aiding in students' preparation for a standardized final exam. The game was created keeping students varied learning styles and the NCLEX blueprint in mind. The blueprint was used to create 5 categories, with 26 total questions. Student survey results, using a five-point Likert scale showed that they did find this learning method enjoyable and beneficial to learning. More research is recommended regarding learning outcomes, when using active-learning strategies, such as games. Copyright © 2012 Elsevier Ltd. All rights reserved.

  3. Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults.

    Science.gov (United States)

    Vernetti, Angélina; Smith, Tim J; Senju, Atsushi

    2017-03-15

    While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.

  4. High and low temperatures have unequal reinforcing properties in Drosophila spatial learning.

    Science.gov (United States)

    Zars, Melissa; Zars, Troy

    2006-07-01

    Small insects regulate their body temperature solely through behavior. Thus, sensing environmental temperature and implementing an appropriate behavioral strategy can be critical for survival. The fly Drosophila melanogaster prefers 24 degrees C, avoiding higher and lower temperatures when tested on a temperature gradient. Furthermore, temperatures above 24 degrees C have negative reinforcing properties. In contrast, we found that flies have a preference in operant learning experiments for a low-temperature-associated position rather than the 24 degrees C alternative in the heat-box. Two additional differences between high- and low-temperature reinforcement, i.e., temperatures above and below 24 degrees C, were found. Temperatures equally above and below 24 degrees C did not reinforce equally and only high temperatures supported increased memory performance with reversal conditioning. Finally, low- and high-temperature reinforced memories are similarly sensitive to two genetic mutations. Together these results indicate the qualitative meaning of temperatures below 24 degrees C depends on the dynamics of the temperatures encountered and that the reinforcing effects of these temperatures depend on at least some common genetic components. Conceptualizing these results using the Wolf-Heisenberg model of operant conditioning, we propose the maximum difference in experienced temperatures determines the magnitude of the reinforcement input to a conditioning circuit.

  5. Consensus of second-order multi-agent dynamic systems with quantized data

    Energy Technology Data Exchange (ETDEWEB)

    Guan, Zhi-Hong, E-mail: zhguan@mail.hust.edu.cn [Department of Control Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074 (China); Meng, Cheng [Department of Control Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074 (China); Liao, Rui-Quan [Petroleum Engineering College,Yangtze University, Jingzhou, 420400 (China); Zhang, Ding-Xue, E-mail: zdx7773@163.com [Petroleum Engineering College,Yangtze University, Jingzhou, 420400 (China)

    2012-01-09

    The consensus problem of second-order multi-agent systems with quantized link is investigated in this Letter. Some conditions are derived for the quantized consensus of the second-order multi-agent systems by the stability theory. Moreover, a result characterizing the relationship between the eigenvalues of the Laplacians matrix and the quantized consensus is obtained. Examples are given to illustrate the theoretical analysis. -- Highlights: ► A second-order multi-agent model with quantized data is proposed. ► Two sufficient and necessary conditions are obtained. ► The relationship between the eigenvalues of the Laplacians matrix and the quantized consensus is discovered.

  6. Consensus of heterogeneous multi-agent systems based on sampled data with a small sampling delay

    International Nuclear Information System (INIS)

    Wang Na; Wu Zhi-Hai; Peng Li

    2014-01-01

    In this paper, consensus problems of heterogeneous multi-agent systems based on sampled data with a small sampling delay are considered. First, a consensus protocol based on sampled data with a small sampling delay for heterogeneous multi-agent systems is proposed. Then, the algebra graph theory, the matrix method, the stability theory of linear systems, and some other techniques are employed to derive the necessary and sufficient conditions guaranteeing heterogeneous multi-agent systems to asymptotically achieve the stationary consensus. Finally, simulations are performed to demonstrate the correctness of the theoretical results. (interdisciplinary physics and related areas of science and technology)

  7. Cerebellar and prefrontal cortex contributions to adaptation, strategies, and reinforcement learning.

    Science.gov (United States)

    Taylor, Jordan A; Ivry, Richard B

    2014-01-01

    Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. © 2014 Elsevier B.V. All rights reserved.

  8. Advances on Practical Applications of Agents and Multi-Agent Systems 10th International Conference on Practical Applications of Agents and Multi-Agent Systems

    CERN Document Server

    Müller, Jörg; Rodríguez, Juan; Pérez, Javier

    2012-01-01

    Research on Agents and Multi-Agent Systems has matured during the last decade and many effective applications of this technology are now deployed. PAAMS provides an international forum to present and discuss the latest scientific developments and their effective applications, to assess the impact of the approach, and to facilitate technology transfer. PAAMS started as a local initiative, but has since grown to become THE international yearly platform to present, to discuss, and to disseminate the latest developments and the most important outcomes related to real-world applications. It provides a unique opportunity to bring multi-disciplinary experts, academics and practitioners together to exchange their experience in the development and deployment of Agents and Multi-Agent Systems. PAAMS intends to bring together researchers and developers from industry and the academic world to report on the latest scientific and technical advances on the application of multi-agent systems, to discuss and debate the major ...

  9. Highlights on Practical Applications of Agents and Multi-Agent Systems 10th International Conference on Practical Applications of Agents and Multi-Agent Systems

    CERN Document Server

    Sánchez, Miguel; Mathieu, Philippe; Rodríguez, Juan; Adam, Emmanuel; Ortega, Alfonso; Moreno, María; Navarro, Elena; Hirsch, Benjamin; Lopes-Cardoso, Henrique; Julián, Vicente

    2012-01-01

    Research on Agents and Multi-Agent Systems has matured during the last decade and many effective applications of this technology are now deployed. PAAMS provides an international forum to present and discuss the latest scientific developments and their effective applications, to assess the impact of the approach, and to facilitate technology transfer. PAAMS started as a local initiative, but has since grown to become THE international yearly platform to present, to discuss, and to disseminate the latest developments and the most important outcomes related to real-world applications. It provides a unique opportunity to bring multi-disciplinary experts, academics and practitioners together to exchange their experience in the development and deployment of Agents and Multi-Agent Systems. PAAMS intends to bring together researchers and developers from industry and the academic world to report on the latest scientific and technical advances on the application of multi-agent systems, to discuss and debate the major ...

  10. Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing

    Science.gov (United States)

    Lefebvre, Germain; Blakemore, Sarah-Jayne

    2017-01-01

    Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice. PMID:28800597

  11. Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing.

    Science.gov (United States)

    Palminteri, Stefano; Lefebvre, Germain; Kilford, Emma J; Blakemore, Sarah-Jayne

    2017-08-01

    Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice.

  12. Frontostriatal development and probabilistic reinforcement learning during adolescence.

    Science.gov (United States)

    DePasque, Samantha; Galván, Adriana

    2017-09-01

    Adolescence has traditionally been viewed as a period of vulnerability to increased risk-taking and adverse outcomes, which have been linked to neurobiological maturation of the frontostriatal reward system. However, growing research on the role of developmental changes in the adolescent frontostriatal system in facilitating learning will provide a more nuanced view of adolescence. In this review, we discuss the implications of existing research on this topic for learning during adolescence, and suggest that the very neural changes that render adolescents vulnerable to social pressure and risky decision making may also stand to play a role in scaffolding the ability to learn from rewards and from performance-related feedback. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. Reinforcement Learning for Constrained Energy Trading Games With Incomplete Information.

    Science.gov (United States)

    Wang, Huiwei; Huang, Tingwen; Liao, Xiaofeng; Abu-Rub, Haitham; Chen, Guo

    2017-10-01

    This paper considers the problem of designing adaptive learning algorithms to seek the Nash equilibrium (NE) of the constrained energy trading game among individually strategic players with incomplete information. In this game, each player uses the learning automaton scheme to generate the action probability distribution based on his/her private information for maximizing his own averaged utility. It is shown that if one of admissible mixed-strategies converges to the NE with probability one, then the averaged utility and trading quantity almost surely converge to their expected ones, respectively. For the given discontinuous pricing function, the utility function has already been proved to be upper semicontinuous and payoff secure which guarantee the existence of the mixed-strategy NE. By the strict diagonal concavity of the regularized Lagrange function, the uniqueness of NE is also guaranteed. Finally, an adaptive learning algorithm is provided to generate the strategy probability distribution for seeking the mixed-strategy NE.

  14. Bi-directional effect of increasing doses of baclofen on reinforcement learning

    Directory of Open Access Journals (Sweden)

    Jean eTerrier

    2011-07-01

    Full Text Available In rodents as well as in humans, efficient reinforcement learning depends on dopamine (DA released from ventral tegmental area (VTA neurons. It has been shown that in brain slices of mice, GABAB-receptor agonists at low concentrations increase the firing frequency of VTA-DA neurons, while high concentrations reduce the firing frequency. It remains however elusive whether baclofen can modulate reinforcement learning. Here, in a double blind study in 34 healthy human volunteers, we tested the effects of a low and a high concentration of oral baclofen in a gambling task associated with monetary reward. A low (20 mg dose of baclofen increased the efficiency of reward-associated learning but had no effect on the avoidance of monetary loss. A high (50 mg dose of baclofen on the other hand did not affect the learning curve. At the end of the task, subjects who received 20 mg baclofen p.o. were more accurate in choosing the symbol linked to the highest probability of earning money compared to the control group (89.55±1.39% vs 81.07±1.55%, p=0.002. Our results support a model where baclofen, at low concentrations, causes a disinhibition of DA neurons, increases DA levels and thus facilitates reinforcement learning.

  15. Spared internal but impaired external reward prediction error signals in major depressive disorder during reinforcement learning.

    Science.gov (United States)

    Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris

    2017-01-01

    Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.

  16. DYNAMIC AND INCREMENTAL EXPLORATION STRATEGY IN FUSION ADAPTIVE RESONANCE THEORY FOR ONLINE REINFORCEMENT LEARNING

    Directory of Open Access Journals (Sweden)

    Budhitama Subagdja

    2016-06-01

    Full Text Available One of the fundamental challenges in reinforcement learning is to setup a proper balance between exploration and exploitation to obtain the maximum cummulative reward in the long run. Most protocols for exploration bound the overall values to a convergent level of performance. If new knowledge is inserted or the environment is suddenly changed, the issue becomes more intricate as the exploration must compromise the pre-existing knowledge. This paper presents a type of multi-channel adaptive resonance theory (ART neural network model called fusion ART which serves as a fuzzy approximator for reinforcement learning with inherent features that can regulate the exploration strategy. This intrinsic regulation is driven by the condition of the knowledge learnt so far by the agent. The model offers a stable but incremental reinforcement learning that can involve prior rules as bootstrap knowledge for guiding the agent to select the right action. Experiments in obstacle avoidance and navigation tasks demonstrate that in the configuration of learning wherein the agent learns from scratch, the inherent exploration model in fusion ART model is comparable to the basic E-greedy policy. On the other hand, the model is demonstrated to deal with prior knowledge and strike a balance between exploration and exploitation.

  17. What Can Reinforcement Learning Teach Us About Non-Equilibrium Quantum Dynamics

    Science.gov (United States)

    Bukov, Marin; Day, Alexandre; Sels, Dries; Weinberg, Phillip; Polkovnikov, Anatoli; Mehta, Pankaj

    Equilibrium thermodynamics and statistical physics are the building blocks of modern science and technology. Yet, our understanding of thermodynamic processes away from equilibrium is largely missing. In this talk, I will reveal the potential of what artificial intelligence can teach us about the complex behaviour of non-equilibrium systems. Specifically, I will discuss the problem of finding optimal drive protocols to prepare a desired target state in quantum mechanical systems by applying ideas from Reinforcement Learning [one can think of Reinforcement Learning as the study of how an agent (e.g. a robot) can learn and perfect a given policy through interactions with an environment.]. The driving protocols learnt by our agent suggest that the non-equilibrium world features possibilities easily defying intuition based on equilibrium physics.

  18. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning.

    Science.gov (United States)

    Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S

    2011-01-01

    As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. © 2011 IEEE

  19. TEXPLORE temporal difference reinforcement learning for robots and time-constrained domains

    CERN Document Server

    Hester, Todd

    2013-01-01

    This book presents and develops new reinforcement learning methods that enable fast and robust learning on robots in real-time. Robots have the potential to solve many problems in society, because of their ability to work in dangerous places doing necessary jobs that no one wants or is able to do. One barrier to their widespread deployment is that they are mainly limited to tasks where it is possible to hand-program behaviors for every situation that may be encountered. For robots to meet their potential, they need methods that enable them to learn and adapt to novel situations that they were not programmed for. Reinforcement learning (RL) is a paradigm for learning sequential decision making processes and could solve the problems of learning and adaptation on robots. This book identifies four key challenges that must be addressed for an RL algorithm to be practical for robotic control tasks. These RL for Robotics Challenges are: 1) it must learn in very few samples; 2) it must learn in domains with continuou...

  20. A Multi-Agent Framework Manages a Representative Sensor Web

    Science.gov (United States)

    Suri, D.; Schmidt, D.; Biswas, G.; Kinnebrew, J.; Otte, W.; Shankaran, N.

    2008-12-01

    NASA's vision of a Sensor Web (which includes a distributed global observation system) consists of a large number of elements, such as remote spacecraft hosting multiple instruments, in situ terrestrial and oceanic sensor networks, and airborne assets. Researchers and developers of a Sensor web face a number of challenges that arise from (1) the inherent heterogeneous and geographical distributed nature of the Sensor web; (2) the myriad mission goals and objectives that must be satisfied by the Sensor web, ranging from an improved understanding of earth science, weather forecasting, and disaster management to an alleviation of societal problems; and (3) the need to support myriad operational modes, such as long and short-term monitoring and targeted observations. Resolving these challenges requires some form of autonomy - typically embodied in software. Agent technology has emerged both as a salient purveyor of entities that exhibit autonomous behavior and also as a paradigm for constructing complex software systems with a large number of interacting heterogeneous components. This paper describes our experiences integrating the Multi-agent Architecture for Coordinated Responsive Observations (MACRO) into the SouthEast Alaska MOnitoring Network for Science, Telecommunications, Education, and Research (SEAMONSTER). MACRO provides agents at (1) the mission level, where agents interact with users to define science goals and then translate these goals into a set of prioritized tasks that have to be executed to achieve these goals, and (2) the resource level, where agents translate tasks into activities related to data collection, data analysis, and data communication. As a representative small-scale sensor web situated in multiple locations on the Juneau Icefield, SEAMONSTER affords an unparalleled opportunity to develop, mature, and showcase MACRO's multi-level agent capabilities. MACRO is developed by the Lockheed Martin Space System Company's Advanced Technology

  1. Instructional control of reinforcement learning: A behavioral and neurocomputational investigation

    NARCIS (Netherlands)

    Doll, B.B.; Jacobs, W.J.; Sanfey, A.G.; Frank, M.J.

    2009-01-01

    Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S (Ed) 1989. Rule-governed behavior:

  2. 10th International Conference on Practical Applications of Agents and Multi-Agent Systems

    CERN Document Server

    Pérez, Javier; Golinska, Paulina; Giroux, Sylvain; Corchuelo, Rafael; Trends in Practical Applications of Agents and Multiagent Systems

    2012-01-01

    PAAMS, the International Conference on Practical Applications of Agents and Multi-Agent Systems is an evolution of the International Workshop on Practical Applications of Agents and Multi-Agent Systems. PAAMS is an international yearly tribune to present, to discuss, and to disseminate the latest developments and the most important outcomes related to real-world applications. It provides a unique opportunity to bring multi-disciplinary experts, academics and practitioners together to exchange their experience in the development of Agents and Multi-Agent Systems.   This volume presents the papers that have been accepted for the 2012 in the workshops: Workshop on Agents for Ambient Assisted Living, Workshop on Agent-Based Solutions for Manufacturing and Supply Chain and Workshop on Agents and Multi-agent systems for Enterprise Integration.

  3. Emergency First Response to a Crisis Event: A Multi-Agent Simulation Approach

    National Research Council Canada - National Science Library

    Roginski, Jonathan W

    2006-01-01

    .... This process led to the development of a multi-agent simulation methodology for emergency first response specifically applied to analyze a notional vehicle bomb attack during a festival in the Baltimore Inner Harbor...

  4. Towards a multi-agent system for regulated information exchange in crime investigations

    NARCIS (Netherlands)

    Dijkstra, Pieter; Prakken, H.; Vey Mestdagh, C.N.J. de

    2005-01-01

    This paper outlines a multi-agent architecture for regulated information exchange of crime investigation data between police forces. Interactions between police officers about information exchange are analysed as negotiation dialogues with embedded persuasion dialogues. An architecture is then

  5. Fault-Tolerant Consensus of Multi-Agent System With Distributed Adaptive Protocol.

    Science.gov (United States)

    Chen, Shun; Ho, Daniel W C; Li, Lulu; Liu, Ming

    2015-10-01

    In this paper, fault-tolerant consensus in multi-agent system using distributed adaptive protocol is investigated. Firstly, distributed adaptive online updating strategies for some parameters are proposed based on local information of the network structure. Then, under the online updating parameters, a distributed adaptive protocol is developed to compensate the fault effects and the uncertainty effects in the leaderless multi-agent system. Based on the local state information of neighboring agents, a distributed updating protocol gain is developed which leads to a fully distributed continuous adaptive fault-tolerant consensus protocol design for the leaderless multi-agent system. Furthermore, a distributed fault-tolerant leader-follower consensus protocol for multi-agent system is constructed by the proposed adaptive method. Finally, a simulation example is given to illustrate the effectiveness of the theoretical analysis.

  6. A stochastic multi-agent optimization model for energy infrastructure planning under uncertainty and competition.

    Science.gov (United States)

    2017-07-04

    This paper presents a stochastic multi-agent optimization model that supports energy infrastruc- : ture planning under uncertainty. The interdependence between dierent decision entities in the : system is captured in an energy supply chain network, w...

  7. Cooperative control of multi-agent systems optimal and adaptive design approaches

    CERN Document Server

    Lewis, Frank L; Hengster-Movric, Kristian; Das, Abhijit

    2014-01-01

    Task complexity, communication constraints, flexibility and energy-saving concerns are all factors that may require a group of autonomous agents to work together in a cooperative manner. Applications involving such complications include mobile robots, wireless sensor networks, unmanned aerial vehicles (UAVs), spacecraft, and so on. In such networked multi-agent scenarios, the restrictions imposed by the communication graph topology can pose severe problems in the design of cooperative feedback control systems.  Cooperative control of multi-agent systems is a challenging topic for both control theorists and practitioners and has been the subject of significant recent research. Cooperative Control of Multi-Agent Systems extends optimal control and adaptive control design methods to multi-agent systems on communication graphs.  It develops Riccati design techniques for general linear dynamics for cooperative state feedback design, cooperative observer design, and cooperative dynamic output feedback design.  B...

  8. A Distributed Framework for Real Time Path Planning in Practical Multi-agent Systems

    KAUST Repository

    Abdelkader, Mohamed; Jaleel, Hassan; Shamma, Jeff S.

    2017-01-01

    We present a framework for distributed, energy efficient, and real time implementable algorithms for path planning in multi-agent systems. The proposed framework is presented in the context of a motivating example of capture the flag which

  9. MULTI AGENT-BASED ENVIRONMENTAL LANDSCAPE (MABEL) - AN ARTIFICIAL INTELLIGENCE SIMULATION MODEL: SOME EARLY ASSESSMENTS

    OpenAIRE

    Alexandridis, Konstantinos T.; Pijanowski, Bryan C.

    2002-01-01

    The Multi Agent-Based Environmental Landscape model (MABEL) introduces a Distributed Artificial Intelligence (DAI) systemic methodology, to simulate land use and transformation changes over time and space. Computational agents represent abstract relations among geographic, environmental, human and socio-economic variables, with respect to land transformation pattern changes. A multi-agent environment is developed providing task-nonspecific problem-solving abilities, flexibility on achieving g...

  10. Trends in Cyber-Physical Multi-Agent Systems. The PAAMS Collection - 15th International Conference

    OpenAIRE

    Fernando De la Prieta; Zita Vale; Luis Antunes; Tiago Pinto; Andrew T. Campbell; Vicente Julián; Antonio J.R. Neves; María N. Moreno

    2017-01-01

    PAAMS, the International Conference on Practical Applications of Agents and Multi-Agent Systems is an evolution of the International Workshop on Practical Applications of Agents and Multi-Agent Systems. PAAMS is an international yearly tribune to present, to discuss, and to disseminate the latest developments and the most important outcomes related to real-world applications. It provides a unique opportunity to bring multi-disciplinary experts, academics and practitioners together to exchange...

  11. Distributed Market-Based Algorithms for Multi-Agent Planning with Shared Resources

    Science.gov (United States)

    2013-02-01

    1 Introduction 1 2 Distributed Market-Based Multi-Agent Planning 5 2.1 Problem Formulation...over the deterministic planner, on the “test set” of scenarios with changing economies. . . 50 xi xii Chapter 1 Introduction Multi-agent planning is...representation of the objective (4.2.1). For example, for the supply chain mangement problem, we assumed a sequence of Bernoulli coin flips, which seems

  12. An Evolutionary Approach for Optimizing Hierarchical Multi-Agent System Organization

    OpenAIRE

    Shen, Zhiqi; Yu, Ling; Yu, Han

    2014-01-01

    It has been widely recognized that the performance of a multi-agent system is highly affected by its organization. A large scale system may have billions of possible ways of organization, which makes it impractical to find an optimal choice of organization using exhaustive search methods. In this paper, we propose a genetic algorithm aided optimization scheme for designing hierarchical structures of multi-agent systems. We introduce a novel algorithm, called the hierarchical genetic algorithm...

  13. A meta-ontological framework for multi-agent systems design

    OpenAIRE

    Sokolova, Marina; Fernández Caballero, Antonio

    2007-01-01

    The paper introduces an approach to using a meta-ontology framework for complex multi-agent systems design, and illustrates it in an application related to ecological-medical issues. The described shared ontology is pooled from private sub-ontologies, which represent a problem area ontology, an agent ontology, a task ontology, an ontology of interactions, and the multi-agent system architecture ontology.

  14. An Intelligent Fleet Condition-Based Maintenance Decision Making Method Based on Multi-Agent

    OpenAIRE

    Bo Sun; Qiang Feng; Songjie Li

    2012-01-01

    According to the demand for condition-based maintenance online decision making among a mission oriented fleet, an intelligent maintenance decision making method based on Multi-agent and heuristic rules is proposed. The process of condition-based maintenance within an aircraft fleet (each containing one or more Line Replaceable Modules) based on multiple maintenance thresholds is analyzed. Then the process is abstracted into a Multi-Agent Model, a 2-layer model structure containing host negoti...

  15. Autonomous Inter-Task Transfer in Reinforcement Learning Domains

    Science.gov (United States)

    2008-08-01

    Mountain Car. However, because the source task uses a car with a motor more than twice as powerful as in the 3D task, the tran- sition function learned in...powerful car motor or changing the surface friction of the hill • s: changing the range of the state variables • si: changing where the car starts...Aamodt and Enric Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches, 1994. Mazda Ahmadi, Matthew E

  16. Stress Modulates Reinforcement Learning in Younger and Older Adults

    OpenAIRE

    Lighthall, Nichole R.; Gorlick, Marissa A.; Schoeke, Andrej; Frank, Michael J.; Mather, Mara

    2012-01-01

    Animal research and human neuroimaging studies indicate that stress increases dopamine levels in brain regions involved in reward processing and stress also appears to increase the attractiveness of addictive drugs. The current study tested the hypothesis that stress increases reward salience, leading to more effective learning about positive than negative outcomes in a probabilistic selection task. Changes to dopamine pathways with age raise the question of whether stress effects on incentiv...

  17. Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning.

    Science.gov (United States)

    Zhu, Lusha; Mathewson, Kyle E; Hsu, Ming

    2012-01-31

    Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents' beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs.

  18. Learning from demonstration: Teaching a myoelectric prosthesis with an intact limb via reinforcement learning.

    Science.gov (United States)

    Vasan, Gautham; Pilarski, Patrick M

    2017-07-01

    Prosthetic arms should restore and extend the capabilities of someone with an amputation. They should move naturally and be able to perform elegant, coordinated movements that approximate those of a biological arm. Despite these objectives, the control of modern-day prostheses is often nonintuitive and taxing. Existing devices and control approaches do not yet give users the ability to effect highly synergistic movements during their daily-life control of a prosthetic device. As a step towards improving the control of prosthetic arms and hands, we introduce an intuitive approach to training a prosthetic control system that helps a user achieve hard-to-engineer control behaviours. Specifically, we present an actor-critic reinforcement learning method that for the first time promises to allow someone with an amputation to use their non-amputated arm to teach their prosthetic arm how to move through a wide range of coordinated motions and grasp patterns. We evaluate our method during the myoelectric control of a multi-joint robot arm by non-amputee users, and demonstrate that by using our approach a user can train their arm to perform simultaneous gestures and movements in all three degrees of freedom in the robot's hand and wrist based only on information sampled from the robot and the user's above-elbow myoelectric signals. Our results indicate that this learning-from-demonstration paradigm may be well suited to use by both patients and clinicians with minimal technical knowledge, as it allows a user to personalize the control of his or her prosthesis without having to know the underlying mechanics of the prosthetic limb. These preliminary results also suggest that our approach may extend in a straightforward way to next-generation prostheses with precise finger and wrist control, such that these devices may someday allow users to perform fluid and intuitive movements like playing the piano, catching a ball, and comfortably shaking hands.

  19. Multi-agent: a technique to implement geo-visualization of networked virtual reality

    Science.gov (United States)

    Lin, Zhiyong; Li, Wenjing; Meng, Lingkui

    2007-06-01

    Networked Virtual Reality (NVR) is a system based on net connected and spatial information shared, whose demands cannot be fully meet by the existing architectures and application patterns of VR to some extent. In this paper, we propose a new architecture of NVR based on Multi-Agent framework. which includes the detailed definition of various agents and their functions and full description of the collaboration mechanism, Through the prototype system test with DEM Data and 3D Models Data, the advantages of Multi-Agent based Networked Virtual Reality System in terms of the data loading time, user response time and scene construction time etc. are verified. First, we introduce the characters of Networked Virtual Realty and the characters of Multi-Agent technique in Section 1. Then we give the architecture design of Networked Virtual Realty based on Multi-Agent in Section 2.The Section 2 content includes the rule of task division, the multi-agent architecture design to implement Networked Virtual Realty and the function of agents. Section 3 shows the prototype implementation according to the design. Finally, Section 4 discusses the benefits of using Multi-Agent to implement geovisualization of Networked Virtual Realty.

  20. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia

    Science.gov (United States)

    Markou, Athina; Salamone, John D.; Bussey, Timothy; Mar, Adam; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-01-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu). A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. PMID:23994273

  1. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia.

    Science.gov (United States)

    Markou, Athina; Salamone, John D; Bussey, Timothy J; Mar, Adam C; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-11-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu) meeting. A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. Copyright © 2013 Elsevier

  2. Reinforcement learning on slow features of high-dimensional input streams.

    Directory of Open Access Journals (Sweden)

    Robert Legenstein

    Full Text Available Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning.

  3. A Review of the Relationship between Novelty, Intrinsic Motivation and Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Siddique Nazmul

    2017-11-01

    Full Text Available This paper presents a review on the tri-partite relationship between novelty, intrinsic motivation and reinforcement learning. The paper first presents a literature survey on novelty and the different computational models of novelty detection, with a specific focus on the features of stimuli that trigger a Hedonic value for generating a novelty signal. It then presents an overview of intrinsic motivation and investigations into different models with the aim of exploring deeper co-relationships between specific features of a novelty signal and its effect on intrinsic motivation in producing a reward function. Finally, it presents survey results on reinforcement learning, different models and their functional relationship with intrinsic motivation.

  4. Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.

    Science.gov (United States)

    Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

    2015-01-01

    Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.

  5. The Study of Reinforcement Learning for Traffic Self-Adaptive Control under Multiagent Markov Game Environment

    Directory of Open Access Journals (Sweden)

    Lun-Hui Xu

    2013-01-01

    Full Text Available Urban traffic self-adaptive control problem is dynamic and uncertain, so the states of traffic environment are hard to be observed. Efficient agent which controls a single intersection can be discovered automatically via multiagent reinforcement learning. However, in the majority of the previous works on this approach, each agent needed perfect observed information when interacting with the environment and learned individually with less efficient coordination. This study casts traffic self-adaptive control as a multiagent Markov game problem. The design employs traffic signal control agent (TSCA for each signalized intersection that coordinates with neighboring TSCAs. A mathematical model for TSCAs’ interaction is built based on nonzero-sum markov game which has been applied to let TSCAs learn how to cooperate. A multiagent Markov game reinforcement learning approach is constructed on the basis of single-agent Q-learning. This method lets each TSCA learn to update its Q-values under the joint actions and imperfect information. The convergence of the proposed algorithm is analyzed theoretically. The simulation results show that the proposed method is convergent and effective in realistic traffic self-adaptive control setting.

  6. Challenges in adapting imitation and reinforcement learning to compliant robots

    Directory of Open Access Journals (Sweden)

    Calinon Sylvain

    2011-12-01

    Full Text Available There is an exponential increase of the range of tasks that robots are forecasted to accomplish. (Reprogramming these robots becomes a critical issue for their commercialization and for their applications to real-world scenarios in which users without expertise in robotics wish to adapt the robot to their needs. This paper addresses the problem of designing userfriendly human-robot interfaces to transfer skills in a fast and efficient manner. This paper presents recent work conducted at the Learning and Interaction group at ADVR-IIT, ranging from skill acquisition through kinesthetic teaching to self-refinement strategies initiated from demonstrations. Our group started to explore the use of imitation and exploration strategies that can take advantage of the compliant capabilities of recent robot hardware and control architectures.

  7. Multichannel sound reinforcement systems at work in a learning environment

    Science.gov (United States)

    Malek, John; Campbell, Colin

    2003-04-01

    Many people have experienced the entertaining benefits of a surround sound system, either in their own home or in a movie theater, but another application exists for multichannel sound that has for the most part gone unused. This is the application of multichannel sound systems to the learning environment. By incorporating a 7.1 surround processor and a touch panel interface programmable control system, the main lecture hall at the University of Michigan Taubman College of Architecture and Urban Planning has been converted from an ordinary lecture hall to a working audiovisual laboratory. The multichannel sound system is used in a wide variety of experiments, including exposure to sounds to test listeners' aural perception of the tonal characteristics of varying pitch, reverberation, speech transmission index, and sound-pressure level. The touch panel's custom interface allows a variety of user groups to control different parts of the AV system and provides preset capability that allows for numerous system configurations.

  8. Performance Comparison of Two Reinforcement Learning Algorithms for Small Mobile Robots

    Czech Academy of Sciences Publication Activity Database

    Neruda, Roman; Slušný, Stanislav

    2009-01-01

    Roč. 2, č. 1 (2009), s. 59-68 ISSN 2005-4297 R&D Projects: GA MŠk(CZ) 1M0567 Grant - others:GA UK(CZ) 7637/2007 Institutional research plan: CEZ:AV0Z10300504 Keywords : reinforcement learning * mobile robots * inteligent agents Subject RIV: IN - Informatics, Computer Science http://www.sersc.org/journals/IJCA/vol2_no1/7.pdf

  9. Vision-based Navigation and Reinforcement Learning Path Finding for Social Robots

    OpenAIRE

    Pérez Sala, Xavier

    2010-01-01

    We propose a robust system for automatic Robot Navigation in uncontrolled en- vironments. The system is composed by three main modules: the Arti cial Vision module, the Reinforcement Learning module, and the behavior control module. The aim of the system is to allow a robot to automatically nd a path that arrives to a pre xed goal. Turn and straight movements in uncontrolled environments are automatically estimated and controlled using the proposed modules. The Arti cial Vi...

  10. TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow

    OpenAIRE

    Hafner, Danijar; Davidson, James; Vanhoucke, Vincent

    2017-01-01

    We introduce TensorFlow Agents, an efficient infrastructure paradigm for building parallel reinforcement learning algorithms in TensorFlow. We simulate multiple environments in parallel, and group them to perform the neural network computation on a batch rather than individual observations. This allows the TensorFlow execution engine to parallelize computation, without the need for manual synchronization. Environments are stepped in separate Python processes to progress them in parallel witho...

  11. Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.

    Directory of Open Access Journals (Sweden)

    Borja Fernandez-Gauna

    Full Text Available Multi-Agent Reinforcement Learning (MARL algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS in the relevant state-action subspace. Each agent's local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.

  12. A Hybrid Cognitive-Reactive Multi-Agent Controller

    National Research Council Canada - National Science Library

    Bugajska, Magdalena D; Schultz, Alan C; Trafton, J. G; Taylor, Matthew; Mintz, Farilee E

    2002-01-01

    ...). In this system, the learning algorithm handles reactive aspects of the task and provides an adaptation mechanism, while the cognitive model handles cognitive aspects of the task and ensures the realism of the behavior...

  13. Intrinsically motivated reinforcement learning for human-robot interaction in the real-world.

    Science.gov (United States)

    Qureshi, Ahmed Hussain; Nakamura, Yutaka; Yoshikawa, Yuichiro; Ishiguro, Hiroshi

    2018-03-26

    For a natural social human-robot interaction, it is essential for a robot to learn the human-like social skills. However, learning such skills is notoriously hard due to the limited availability of direct instructions from people to teach a robot. In this paper, we propose an intrinsically motivated reinforcement learning framework in which an agent gets the intrinsic motivation-based rewards through the action-conditional predictive model. By using the proposed method, the robot learned the social skills from the human-robot interaction experiences gathered in the real uncontrolled environments. The results indicate that the robot not only acquired human-like social skills but also took more human-like decisions, on a test dataset, than a robot which received direct rewards for the task achievement. Copyright © 2018 Elsevier Ltd. All rights reserved.

  14. Spike-based decision learning of Nash equilibria in two-player games.

    Directory of Open Access Journals (Sweden)

    Johannes Friedrich

    Full Text Available Humans and animals face decision tasks in an uncertain multi-agent environment where an agent's strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic and mixed (stochastic Nash equilibrium, respectively. In contrast, temporal-difference(TD-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.

  15. Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex.

    Directory of Open Access Journals (Sweden)

    George L Chadderdon

    Full Text Available Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint "forearm" to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1, no learning (0, or punishment (-1, corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.

  16. Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex.

    Science.gov (United States)

    Chadderdon, George L; Neymotin, Samuel A; Kerr, Cliff C; Lytton, William W

    2012-01-01

    Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint "forearm" to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1), no learning (0), or punishment (-1), corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.

  17. Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder.

    Science.gov (United States)

    Rothkirch, Marcus; Tonn, Jonas; Köhler, Stephan; Sterzer, Philipp

    2017-04-01

    According to current concepts, major depressive disorder is strongly related to dysfunctional neural processing of motivational information, entailing impairments in reinforcement learning. While computational modelling can reveal the precise nature of neural learning signals, it has not been used to study learning-related neural dysfunctions in unmedicated patients with major depressive disorder so far. We thus aimed at comparing the neural coding of reward and punishment prediction errors, representing indicators of neural learning-related processes, between unmedicated patients with major depressive disorder and healthy participants. To this end, a group of unmedicated patients with major depressive disorder (n = 28) and a group of age- and sex-matched healthy control participants (n = 30) completed an instrumental learning task involving monetary gains and losses during functional magnetic resonance imaging. The two groups did not differ in their learning performance. Patients and control participants showed the same level of prediction error-related activity in the ventral striatum and the anterior insula. In contrast, neural coding of reward prediction errors in the medial orbitofrontal cortex was reduced in patients. Moreover, neural reward prediction error signals in the medial orbitofrontal cortex and ventral striatum showed negative correlations with anhedonia severity. Using a standard instrumental learning paradigm we found no evidence for an overall impairment of reinforcement learning in medication-free patients with major depressive disorder. Importantly, however, the attenuated neural coding of reward in the medial orbitofrontal cortex and the relation between anhedonia and reduced reward prediction error-signalling in the medial orbitofrontal cortex and ventral striatum likely reflect an impairment in experiencing pleasure from rewarding events as a key mechanism of anhedonia in major depressive disorder. © The Author (2017). Published by Oxford

  18. Research on monitoring system of water resources in irrigation region based on multi-agent

    International Nuclear Information System (INIS)

    Zhao, T H; Wang, D S

    2012-01-01

    Irrigation agriculture is the basis of agriculture and rural economic development in China. Realizing the water resource information of irrigated area will make full use of existing water resource and increase benefit of irrigation agriculture greatly. However, the water resource information system of many irrigated areas in our country is not still very sound at present, it lead to the wasting of a lot of water resources. This paper has analyzed the existing water resource monitoring system of irrigated areas, introduced the Multi-Agent theories, and set up a water resource monitoring system of irrigated area based on multi-Agent. This system is composed of monitoring multi-Agent federal, telemetry multi-Agent federal, and the Communication Network GSM between them. It can make full use of good intelligence and communication coordination in the multi-Agent federation interior, improve the dynamic monitoring and controlling timeliness of water resource of irrigated area greatly, provide information service for the sustainable development of irrigated area, and lay a foundation for realizing high information of water resource of irrigated area.

  19. Multi-agent search for source localization in a turbulent medium

    International Nuclear Information System (INIS)

    Hajieghrary, Hadi; Hsieh, M. Ani; Schwartz, Ira B.

    2016-01-01

    We extend the gradient-less search strategy referred to as “infotaxis” to a distributed multi-agent system. “Infotaxis” is a search strategy that uses sporadic sensor measurements to determine the source location of materials dispersed in a turbulent medium. In this work, we leverage the spatio-temporal sensing capabilities of a mobile sensing agents to optimize the time spent finding and localizing the position of the source using a multi-agent collaborative search strategy. Our results suggest that the proposed multi-agent collaborative search strategy leverages the team's ability to obtain simultaneous measurements at different locations to speed up the search process. We present a multi-agent collaborative “infotaxis” strategy that uses the relative entropy of the system to synthesize a suitable search strategy for the team. The result is a collaborative information theoretic search strategy that results in control actions that maximize the information gained by the team, and improves estimates of the source position. - Highlights: • We extend the gradient-less infotaxis search strategy to a distributed multi-agent system. • Leveraging the spatio-temporal sensing capabilities of a team of mobile sensing agents speeds up the search process. • The resulting information theoretic search strategy maximizes the information gained and improves the estimate of the source position.

  20. Multi-agent based distributed control architecture for microgrid energy management and optimization

    International Nuclear Information System (INIS)

    Basir Khan, M. Reyasudin; Jidin, Razali; Pasupuleti, Jagadeesh

    2016-01-01

    Highlights: • A new multi-agent based distributed control architecture for energy management. • Multi-agent coordination based on non-cooperative game theory. • A microgrid model comprised of renewable energy generation systems. • Performance comparison of distributed with conventional centralized control. - Abstract: Most energy management systems are based on a centralized controller that is difficult to satisfy criteria such as fault tolerance and adaptability. Therefore, a new multi-agent based distributed energy management system architecture is proposed in this paper. The distributed generation system is composed of several distributed energy resources and a group of loads. A multi-agent system based decentralized control architecture was developed in order to provide control for the complex energy management of the distributed generation system. Then, non-cooperative game theory was used for the multi-agent coordination in the system. The distributed generation system was assessed by simulation under renewable resource fluctuations, seasonal load demand and grid disturbances. The simulation results show that the implementation of the new energy management system proved to provide more robust and high performance controls than conventional centralized energy management systems.

  1. Blended learning for reinforcing dental pharmacology in the clinical years: A qualitative analysis.

    Science.gov (United States)

    Eachempati, Prashanti; Kiran Kumar, K S; Sumanth, K N

    2016-10-01

    Blended learning has become the method of choice in educational institutions because of its systematic integration of traditional classroom teaching and online components. This study aims to analyze student's reflection regarding blended learning in dental pharmacology. A cross-sectional study was conducted in Faculty of Dentistry, Melaka-Manipal Medical College among 3 rd and 4 th year BDS students. A total of 145 dental students, who consented, participate in the study. Students were divided into 14 groups. Nine online sessions followed by nine face-to-face discussions were held. Each session addressed topics related to oral lesions and orofacial pain with pharmacological applications. After each week, students were asked to reflect on blended learning. On completion of 9 weeks, reflections were collected and analyzed. Qualitative analysis was done using thematic analysis model suggested by Braun and Clarke. The four main themes were identified, namely, merits of blended learning, skill in writing prescription for oral diseases, dosages of drugs, and identification of strengths and weakness. In general, the participants had a positive feedback regarding blended learning. Students felt more confident in drug selection and prescription writing. They could recollect the doses better after the online and face-to-face sessions. Most interestingly, the students reflected that they are able to identify their strength and weakness after the blended learning sessions. Blended learning module was successfully implemented for reinforcing dental pharmacology. The results obtained in this study enable us to plan future comparative studies to know the effectiveness of blended learning in dental pharmacology.

  2. Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

    Directory of Open Access Journals (Sweden)

    Nicolas Frémaux

    2013-04-01

    Full Text Available Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD learning of Doya (2000 to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

  3. Study on collaborative optimization control of ventilation and radon reduction system based on multi-agent technology

    International Nuclear Information System (INIS)

    Dai Jianyong; Meng Lingcong; Zou Shuliang

    2015-01-01

    According to the radioactive safety features such as radon and its progeny, combined with the theory of ventilation system, structure of multi-agent system for ventilation and radon reduction system is constructed with the application of multi agent technology. The function attribute of the key agent and the connection between the nodes in the multi-agent system are analyzed to establish the distributed autonomous logic structure and negotiation mechanism of multi agent system of ventilation and radon reduction system, and thus to implement the coordination optimization control of the multi-agent system. The example analysis shows that the system structure of the multi-agent system of ventilation and reducing radon system and its collaborative mechanism can improve and optimize the radioactive pollutants control, which provides a theoretical basis and important application prospect. (authors)

  4. 'Proactive' use of cue-context congruence for building reinforcement learning's reward function.

    Science.gov (United States)

    Zsuga, Judit; Biro, Klara; Tajti, Gabor; Szilasi, Magdolna Emma; Papp, Csaba; Juhasz, Bela; Gesztelyi, Rudolf

    2016-10-28

    Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent's knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability. The central objective of reinforcement learning is to solve these two functions outside the agent's control either using, or not using a model. In the present paper, using the proactive model of reinforcement learning we offer insight on how the brain creates simplified representations of the environment, and how these representations are organized to support the identification of relevant stimuli and action. Furthermore, we identify neurobiological correlates of our model by suggesting that the reward and policy functions, attributes of the Bellman equitation, are built by the orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC), respectively. Based on this we propose that the OFC assesses cue-context congruence to activate the most context frame. Furthermore given the bidirectional neuroanatomical link between the OFC and model-free structures, we suggest that model-based input is incorporated into the reward prediction error (RPE) signal, and conversely RPE signal may be used to update the reward-related information of context frames and the policy underlying action selection in the OFC and ACC, respectively. Furthermore clinical implications for cognitive behavioral interventions are discussed.

  5. Multi-agent cooperation rescue algorithm based on influence degree and state prediction

    Science.gov (United States)

    Zheng, Yanbin; Ma, Guangfu; Wang, Linlin; Xi, Pengxue

    2018-04-01

    Aiming at the multi-agent cooperative rescue in disaster, a multi-agent cooperative rescue algorithm based on impact degree and state prediction is proposed. Firstly, based on the influence of the information in the scene on the collaborative task, the influence degree function is used to filter the information. Secondly, using the selected information to predict the state of the system and Agent behavior. Finally, according to the result of the forecast, the cooperative behavior of Agent is guided and improved the efficiency of individual collaboration. The simulation results show that this algorithm can effectively solve the cooperative rescue problem of multi-agent and ensure the efficient completion of the task.

  6. Adaptive Synchronization for Heterogeneous Multi-Agent Systems with Switching Topologies

    Directory of Open Access Journals (Sweden)

    Muhammad Ridho Rosa

    2018-02-01

    Full Text Available This work provides a multi-agent extension of output-feedback model reference adaptive control (MRAC, designed to synchronize a network of heterogeneous uncertain agents. The implementation of this scheme is based on multi-agent matching conditions. The practical advantage of the proposed MRAC is the possibility of handling the case of the unknown dynamics of the agents only by using the output and the control input of its neighbors. In addition, it is reasonable to consider the case when the communication topology is time-varying. In this work, the time-varying communication leads to a switching control structure that depends on the number of the predecessor of the agents. By using the switching control structure to handle the time-varying topologies, we show that synchronization can be achieved. The multi-agent adaptive switching controller is first analyzed, and numerical simulations based on formation control of simplifier quadcopter dynamics are provided.

  7. Distributed Consensus of Stochastic Delayed Multi-agent Systems Under Asynchronous Switching.

    Science.gov (United States)

    Wu, Xiaotai; Tang, Yang; Cao, Jinde; Zhang, Wenbing

    2016-08-01

    In this paper, the distributed exponential consensus of stochastic delayed multi-agent systems with nonlinear dynamics is investigated under asynchronous switching. The asynchronous switching considered here is to account for the time of identifying the active modes of multi-agent systems. After receipt of confirmation of mode's switching, the matched controller can be applied, which means that the switching time of the matched controller in each node usually lags behind that of system switching. In order to handle the coexistence of switched signals and stochastic disturbances, a comparison principle of stochastic switched delayed systems is first proved. By means of this extended comparison principle, several easy to verified conditions for the existence of an asynchronously switched distributed controller are derived such that stochastic delayed multi-agent systems with asynchronous switching and nonlinear dynamics can achieve global exponential consensus. Two examples are given to illustrate the effectiveness of the proposed method.

  8. A Plant Control Technology Using Reinforcement Learning Method with Automatic Reward Adjustment

    Science.gov (United States)

    Eguchi, Toru; Sekiai, Takaaki; Yamada, Akihiro; Shimizu, Satoru; Fukai, Masayuki

    A control technology using Reinforcement Learning (RL) and Radial Basis Function (RBF) Network has been developed to reduce environmental load substances exhausted from power and industrial plants. This technology consists of the statistic model using RBF Network, which estimates characteristics of plants with respect to environmental load substances, and RL agent, which learns the control logic for the plants using the statistic model. In this technology, it is necessary to design an appropriate reward function given to the agent immediately according to operation conditions and control goals to control plants flexibly. Therefore, we propose an automatic reward adjusting method of RL for plant control. This method adjusts the reward function automatically using information of the statistic model obtained in its learning process. In the simulations, it is confirmed that the proposed method can adjust the reward function adaptively for several test functions, and executes robust control toward the thermal power plant considering the change of operation conditions and control goals.

  9. Optimal medication dosing from suboptimal clinical examples: a deep reinforcement learning approach.

    Science.gov (United States)

    Nemati, Shamim; Ghassemi, Mohammad M; Clifford, Gari D

    2016-08-01

    Misdosing medications with sensitive therapeutic windows, such as heparin, can place patients at unnecessary risk, increase length of hospital stay, and lead to wasted hospital resources. In this work, we present a clinician-in-the-loop sequential decision making framework, which provides an individualized dosing policy adapted to each patient's evolving clinical phenotype. We employed retrospective data from the publicly available MIMIC II intensive care unit database, and developed a deep reinforcement learning algorithm that learns an optimal heparin dosing policy from sample dosing trails and their associated outcomes in large electronic medical records. Using separate training and testing datasets, our model was observed to be effective in proposing heparin doses that resulted in better expected outcomes than the clinical guidelines. Our results demonstrate that a sequential modeling approach, learned from retrospective data, could potentially be used at the bedside to derive individualized patient dosing policies.

  10. Bio-robots automatic navigation with graded electric reward stimulation based on Reinforcement Learning.

    Science.gov (United States)

    Zhang, Chen; Sun, Chao; Gao, Liqiang; Zheng, Nenggan; Chen, Weidong; Zheng, Xiaoxiang

    2013-01-01

    Bio-robots based on brain computer interface (BCI) suffer from the lack of considering the characteristic of the animals in navigation. This paper proposed a new method for bio-robots' automatic navigation combining the reward generating algorithm base on Reinforcement Learning (RL) with the learning intelligence of animals together. Given the graded electrical reward, the animal e.g. the rat, intends to seek the maximum reward while exploring an unknown environment. Since the rat has excellent spatial recognition, the rat-robot and the RL algorithm can convergent to an optimal route by co-learning. This work has significant inspiration for the practical development of bio-robots' navigation with hybrid intelligence.

  11. Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems

    International Nuclear Information System (INIS)

    Wei Qing-Lai; Song Rui-Zhuo; Xiao Wen-Dong; Sun Qiu-Ye

    2015-01-01

    This paper estimates an off-policy integral reinforcement learning (IRL) algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the system data generated by an arbitrary control. Moreover, off-policy IRL can be regarded as a direct learning method, which avoids the identification of system dynamics. In this paper, the performance index function is first given based on the system tracking error and control error. For solving the Hamilton–Jacobi–Bellman (HJB) equation, an off-policy IRL algorithm is proposed. It is proven that the iterative control makes the tracking error system asymptotically stable, and the iterative performance index function is convergent. Simulation study demonstrates the effectiveness of the developed tracking control method. (paper)

  12. Research and Implementation of Key Technologies in Multi-Agent System to Support Distributed Workflow

    Science.gov (United States)

    Pan, Tianheng

    2018-01-01

    In recent years, the combination of workflow management system and Multi-agent technology is a hot research field. The problem of lack of flexibility in workflow management system can be improved by introducing multi-agent collaborative management. The workflow management system adopts distributed structure. It solves the problem that the traditional centralized workflow structure is fragile. In this paper, the agent of Distributed workflow management system is divided according to its function. The execution process of each type of agent is analyzed. The key technologies such as process execution and resource management are analyzed.

  13. From fault classification to fault tolerance for multi-agent systems

    CERN Document Server

    Potiron, Katia; Taillibert, Patrick

    2013-01-01

    Faults are a concern for Multi-Agent Systems (MAS) designers, especially if the MAS are built for industrial or military use because there must be some guarantee of dependability. Some fault classification exists for classical systems, and is used to define faults. When dependability is at stake, such fault classification may be used from the beginning of the system's conception to define fault classes and specify which types of faults are expected. Thus, one may want to use fault classification for MAS; however, From Fault Classification to Fault Tolerance for Multi-Agent Systems argues that

  14. Event-triggered hybrid control based on multi-Agent systems for Microgrids

    DEFF Research Database (Denmark)

    Dou, Chun-xia; Liu, Bin; Guerrero, Josep M.

    2014-01-01

    This paper is focused on a multi-agent system based event-triggered hybrid control for intelligently restructuring the operating mode of an microgrid (MG) to ensure the energy supply with high security, stability and cost effectiveness. Due to the microgrid is composed of different types...... of distributed energy resources, thus it is typical hybrid dynamic network. Considering the complex hybrid behaviors, a hierarchical decentralized coordinated control scheme is firstly constructed based on multi-agent sys-tem, then, the hybrid model of the microgrid is built by using differential hybrid Petri...

  15. Multi-agent based modeling for electric vehicle integration in a distribution network operation

    DEFF Research Database (Denmark)

    Hu, Junjie; Morais, Hugo; Lind, Morten

    2016-01-01

    The purpose of this paper is to present a multi-agent based modeling technology for simulating and operating a hierarchical energy management of a power distribution system with focus on EVs integration. The proposed multi-agent system consists of four types of agents: i) Distribution system...... operator (DSO) technical agent and ii) DSO market agents that both belong to the top layer of the hierarchy and their roles are to manage the distribution network by avoiding grid congestions and using congestion prices to coordinate the energy scheduled; iii) Electric vehicle virtual power plant agents...

  16. Second-Order Controllability of Multi-Agent Systems with Multiple Leaders

    International Nuclear Information System (INIS)

    Liu Bo; Han Xiao; Shi Yun-Tao; Su Hou-Sheng

    2016-01-01

    This paper proposes a new second-order continuous-time multi-agent model and analyzes the controllability of second-order multi-agent system with multiple leaders based on the asymmetric topology. This paper considers the more general case: velocity coupling topology is different from location coupling topology. Some sufficient and necessary conditions are presented for the controllability of the system with multiple leaders. In addition, the paper studies the controllability of the system with velocity damping gain. Simulation results are given to illustrate the correctness of theoretical results. (paper)

  17. Opportunities of creating multi-agent systems in the service sector

    Directory of Open Access Journals (Sweden)

    Shatsky A.A.

    2017-03-01

    Full Text Available the paper seeks to examine opportunities to create multi-agent systems (MAS in the service sector. Using methods of theoretical analysis and synthesis, the author attempts to apply a multi-agent technology to description of the socio-economic system, such as the service sector. As a result, the author identifies three types of MAS in the service sector based on different types of architecture of intelligent information systems. The research shows that the problem posed by the author requires further study and clarification of results

  18. A Multi-Agent Framework for Coordination of Intelligent Assistive Technologies

    DEFF Research Database (Denmark)

    Valente, Pedro Ricardo da Nova; Hossain, S.; Groenbaek, B.

    2010-01-01

    Intelligent care for the future is the IntelliCare project's main priority. This paper describes the design of a generic multi-agent framework for coordination of intelligent assistive technologies. The paper overviews technologies and software systems suitable for context awareness...... and housekeeping tasks, especially for performing a multi-robot cleaning-task activity. It also describes conducted work in the design of a multi-agent platform for coordination of intelligent assistive technologies. Instead of using traditional robot odometry estimation methods, we have tested an independent...

  19. Human-Robot Teaming in a Multi-Agent Space Assembly Task

    Science.gov (United States)

    Rehnmark, Fredrik; Currie, Nancy; Ambrose, Robert O.; Culbert, Christopher

    2004-01-01

    NASA's Human Space Flight program depends heavily on spacewalks performed by pairs of suited human astronauts. These Extra-Vehicular Activities (EVAs) are severely restricted in both duration and scope by consumables and available manpower. An expanded multi-agent EVA team combining the information-gathering and problem-solving skills of humans with the survivability and physical capabilities of robots is proposed and illustrated by example. Such teams are useful for large-scale, complex missions requiring dispersed manipulation, locomotion and sensing capabilities. To study collaboration modalities within a multi-agent EVA team, a 1-g test is conducted with humans and robots working together in various supporting roles.

  20. Dynamical Consensus Algorithm for Second-Order Multi-Agent Systems Subjected to Communication Delay

    International Nuclear Information System (INIS)

    Liu Chenglin; Liu Fei

    2013-01-01

    To solve the dynamical consensus problem of second-order multi-agent systems with communication delay, delay-dependent compensations are added into the normal asynchronously-coupled consensus algorithm so as to make the agents achieve a dynamical consensus. Based on frequency-domain analysis, sufficient conditions are gained for second-order multi-agent systems with communication delay under leaderless and leader-following consensus algorithms respectively. Simulation illustrates the correctness of the results. (interdisciplinary physics and related areas of science and technology)

  1. Universal effect of dynamical reinforcement learning mechanism in spatial evolutionary games

    International Nuclear Information System (INIS)

    Zhang, Hai-Feng; Wu, Zhi-Xi; Wang, Bing-Hong

    2012-01-01

    One of the prototypical mechanisms in understanding the ubiquitous cooperation in social dilemma situations is the win–stay, lose–shift rule. In this work, a generalized win–stay, lose–shift learning model—a reinforcement learning model with dynamic aspiration level—is proposed to describe how humans adapt their social behaviors based on their social experiences. In the model, the players incorporate the information of the outcomes in previous rounds with time-dependent aspiration payoffs to regulate the probability of choosing cooperation. By investigating such a reinforcement learning rule in the spatial prisoner's dilemma game and public goods game, a most noteworthy viewpoint is that moderate greediness (i.e. moderate aspiration level) favors best the development and organization of collective cooperation. The generality of this observation is tested against different regulation strengths and different types of network of interaction as well. We also make comparisons with two recently proposed models to highlight the importance of the mechanism of adaptive aspiration level in supporting cooperation in structured populations

  2. Reinforcement Learning with Autonomous Small Unmanned Aerial Vehicles in Cluttered Environments

    Science.gov (United States)

    Tran, Loc; Cross, Charles; Montague, Gilbert; Motter, Mark; Neilan, James; Qualls, Garry; Rothhaar, Paul; Trujillo, Anna; Allen, B. Danette

    2015-01-01

    We present ongoing work in the Autonomy Incubator at NASA Langley Research Center (LaRC) exploring the efficacy of a data set aggregation approach to reinforcement learning for small unmanned aerial vehicle (sUAV) flight in dense and cluttered environments with reactive obstacle avoidance. The goal is to learn an autonomous flight model using training experiences from a human piloting a sUAV around static obstacles. The training approach uses video data from a forward-facing camera that records the human pilot's flight. Various computer vision based features are extracted from the video relating to edge and gradient information. The recorded human-controlled inputs are used to train an autonomous control model that correlates the extracted feature vector to a yaw command. As part of the reinforcement learning approach, the autonomous control model is iteratively updated with feedback from a human agent who corrects undesired model output. This data driven approach to autonomous obstacle avoidance is explored for simulated forest environments furthering autonomous flight under the tree canopy research. This enables flight in previously inaccessible environments which are of interest to NASA researchers in Earth and Atmospheric sciences.

  3. Fuzzy OLAP association rules mining-based modular reinforcement learning approach for multiagent systems.

    Science.gov (United States)

    Kaya, Mehmet; Alhajj, Reda

    2005-04-01

    Multiagent systems and data mining have recently attracted considerable attention in the field of computing. Reinforcement learning is the most commonly used learning process for multiagent systems. However, it still has some drawbacks, including modeling other learning agents present in the domain as part of the state of the environment, and some states are experienced much less than others, or some state-action pairs are never visited during the learning phase. Further, before completing the learning process, an agent cannot exhibit a certain behavior in some states that may be experienced sufficiently. In this study, we propose a novel multiagent learning approach to handle these problems. Our approach is based on utilizing the mining process for modular cooperative learning systems. It incorporates fuzziness and online analytical processing (OLAP) based mining to effectively process the information reported by agents. First, we describe a fuzzy data cube OLAP architecture which facilitates effective storage and processing of the state information reported by agents. This way, the action of the other agent, not even in the visual environment. of the agent under consideration, can simply be predicted by extracting online association rules, a well-known data mining technique, from the constructed data cube. Second, we present a new action selection model, which is also based on association rules mining. Finally, we generalize not sufficiently experienced states, by mining multilevel association rules from the proposed fuzzy data cube. Experimental results obtained on two different versions of a well-known pursuit domain show the robustness and effectiveness of the proposed fuzzy OLAP mining based modular learning approach. Finally, we tested the scalability of the approach presented in this paper and compared it with our previous work on modular-fuzzy Q-learning and ordinary Q-learning.

  4. Dynamic pricing and automated resource allocation for complex information services reinforcement learning and combinatorial auctions

    CERN Document Server

    Schwind, Michael; Fandel, G

    2007-01-01

    Many firms provide their customers with online information products which require limited resources such as server capacity. This book develops allocation mechanisms that aim to ensure an efficient resource allocation in modern IT-services. Recent methods of artificial intelligence, such as neural networks and reinforcement learning, and nature-oriented optimization methods, such as genetic algorithms and simulated annealing, are advanced and applied to allocation processes in distributed IT-infrastructures, e.g. grid systems. The author presents two methods, both of which using the users??? w

  5. Brain Circuits of Methamphetamine Place Reinforcement Learning: The Role of the Hippocampus-VTA Loop.

    Science.gov (United States)

    Keleta, Yonas B; Martinez, Joe L

    2012-03-01

    The reinforcing effects of addictive drugs including methamphetamine (METH) involve the midbrain ventral tegmental area (VTA). VTA is primary source of dopamine (DA) to the nucleus accumbens (NAc) and the ventral hippocampus (VHC). These three brain regions are functionally connected through the hippocampal-VTA loop that includes two main neural pathways: the bottom-up pathway and the top-down pathway. In this paper, we take the view that addiction is a learning process. Therefore, we tested the involvement of the hippocampus in reinforcement learning by studying conditioned place preference (CPP) learning by sequentially conditioning each of the three nuclei in either the bottom-up order of conditioning; VTA, then VHC, finally NAc, or the top-down order; VHC, then VTA, finally NAc. Following habituation, the rats underwent experimental modules consisting of two conditioning trials each followed by immediate testing (test 1 and test 2) and two additional tests 24 h (test 3) and/or 1 week following conditioning (test 4). The module was repeated three times for each nucleus. The results showed that METH, but not Ringer's, produced positive CPP following conditioning each brain area in the bottom-up order. In the top-down order, METH, but not Ringer's, produced either an aversive CPP or no learning effect following conditioning each nucleus of interest. In addition, METH place aversion was antagonized by coadministration of the N-methyl-d-aspartate (NMDA) receptor antagonist MK801, suggesting that the aversion learning was an NMDA receptor activation-dependent process. We conclude that the hippocampus is a critical structure in the reward circuit and hence suggest that the development of target-specific therapeutics for the control of addiction emphasizes on the hippocampus-VTA top-down connection.

  6. Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions.

    Science.gov (United States)

    Tamosiunaite, Minija; Asfour, Tamim; Wörgötter, Florentin

    2009-03-01

    Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult.

  7. MULTI-AGENT APPROACH TO BUILDING AN INTELLIGENT VEHICLE MAINTENANCE AND REPAIR SYSTEM

    Directory of Open Access Journals (Sweden)

    V. Pavlenko

    2017-12-01

    Full Text Available To ensure the reliability of the car, early detection and prevention of the occurrence and development of failures is required in order to reduce the costs of maintenance and repair. Multi-agent technologies make it possible to raise the level of technical reliability of cars and minimize the costs of performing repair and maintenance operations.

  8. Multi-agent simulation of competitive electricity markets: Autonomous systems cooperation for European market modeling

    International Nuclear Information System (INIS)

    Santos, Gabriel; Pinto, Tiago; Morais, Hugo; Sousa, Tiago M.; Pereira, Ivo F.; Fernandes, Ricardo; Praça, Isabel; Vale, Zita

    2015-01-01

    Highlights: • Definition of an ontology allowing the communication between multi-agents systems. • Social welfare evaluation in different electricity markets. • Demonstration of the use of the proposed ontology between two multi-agents systems. • Strategic biding in electricity markets. • European electricity markets comparison. - Abstract: The electricity market restructuring, and its worldwide evolution into regional and even continental scales, along with the increasing necessity for an adequate integration of renewable energy sources, is resulting in a rising complexity in power systems operation. Several power system simulators have been developed in recent years with the purpose of helping operators, regulators, and involved players to understand and deal with this complex and constantly changing environment. The main contribution of this paper is given by the integration of several electricity market and power system models, respecting to the reality of different countries. This integration is done through the development of an upper ontology which integrates the essential concepts necessary to interpret all the available information. The continuous development of Multi-Agent System for Competitive Electricity Markets platform provides the means for the exemplification of the usefulness of this ontology. A case study using the proposed multi-agent platform is presented, considering a scenario based on real data that simulates the European Electricity Market environment, and comparing its performance using different market mechanisms. The main goal is to demonstrate the advantages that the integration of various market models and simulation platforms have for the study of the electricity markets’ evolution

  9. Delay-Induced Consensus and Quasi-Consensus in Multi-Agent Dynamical Systems

    NARCIS (Netherlands)

    Yu, Wenwu; Chen, Guanrong; Cao, Ming; Ren, Wei

    2013-01-01

    This paper studies consensus and quasi-consensus in multi-agent dynamical systems. A linear consensus protocol in the second-order dynamics is designed where both the current and delayed position information is utilized. Time delay, in a common perspective, can induce periodic oscillations or even

  10. Achieving semantic interoperability in multi-agent systems: A dialogue-based approach

    NARCIS (Netherlands)

    Diggelen, J. van

    2007-01-01

    Software agents sharing the same ontology can exchange their knowledge fluently as their knowledge representations are compatible with respect to the concepts regarded as relevant and with respect to the names given to these concepts. However, in open heterogeneous multi-agent systems, this scenario

  11. Development and evaluation of multi-agent models predicting Twitter trends in multiple domains

    NARCIS (Netherlands)

    Attema, T.; Maanen, P.P. van; Meeuwissen, E.

    2015-01-01

    This paper concerns multi-agent models predicting Twitter trends. We use a step-wise approach to develop a novel agent-based model with the following properties: (1) it uses individual behavior parameters for a set of Twitter users and (2) it uses a retweet graph to model the underlying social

  12. Multi-Agent Rendezvousing with a Finite Set of Candidate Rendezvous Points

    NARCIS (Netherlands)

    Fang, J.; Morse, A. S.; Cao, M.

    2008-01-01

    The discrete multi-agent rendezvous problem we consider in this paper is concerned with a specified set of points in the plane, called “dwell-points,” and a set of mobile autonomous agents with limited sensing range. Each agent is initially positioned at some dwell-point, and is able to determine

  13. Design of a Multi Agent Architecture for Robot Soccer. A Case Study

    NARCIS (Netherlands)

    Poel, Mannes; Seesink, R.A.; Schoute, Albert L.; Dierssen, W.; Kooij, N.

    A Multi Agent System (MAS) for the FIRA Mirosot League is presented. This MAS allows a general number of players and is used in the 5 against 5 and 7 against 7 competition. In the MAS there is coach agent and n (the number of robots in the team) player agents. There is a one to one correspondence

  14. Multi-agent system for energy resource scheduling of integrated microgrids in a distributed system

    International Nuclear Information System (INIS)

    Logenthiran, T.; Srinivasan, Dipti; Khambadkone, Ashwin M.

    2011-01-01

    This paper proposes a multi-agent system for energy resource scheduling of an islanded power system with distributed resources, which consists of integrated microgrids and lumped loads. Distributed intelligent multi-agent technology is applied to make the power system more reliable, efficient and capable of exploiting and integrating alternative sources of energy. The algorithm behind the proposed energy resource scheduling has three stages. The first stage is to schedule each microgrid individually to satisfy its internal demand. The next stage involves finding the best possible bids for exporting power to the network and compete in a whole sale energy market. The final stage is to reschedule each microgrid individually to satisfy the total demand, which is the addition of internal demand and the demand from the results of the whole sale energy market simulation. The simulation results of a power system with distributed resources comprising three microgrids and five lumped loads show that the proposed multi-agent system allows efficient management of micro-sources with minimum operational cost. The case studies demonstrate that the system is successfully monitored, controlled and operated by means of the developed multi-agent system. (author)

  15. The elaboration of a manufacturing flow connectivity model, based on Multi Agent System

    Directory of Open Access Journals (Sweden)

    Fahhama Lamyae

    2017-01-01

    The aim of this paper was to establish a model of the industrial flow connectivity; Afterward, we’ve detailed a network configuration model based on the multi-agents systems, to study the interactions between all the actors and give a more realistic vision onto manufacturing coordination in the supply chain.

  16. Multi-agent target tracking using particle filters enhanced with context data

    CSIR Research Space (South Africa)

    Claessens, R

    2015-05-01

    Full Text Available The proposed framework for Multi-Agent Target Tracking supports i) tracking of objects and ii) search and rescue based on the fusion of very heterogeneous data. The system is based on a novel approach to fusing sensory observations, intelligence...

  17. A Comparison of Organization-Centered and Agent-Centered Multi-Agent Systems

    DEFF Research Database (Denmark)

    Jensen, Andreas Schmidt; Villadsen, Jørgen

    2013-01-01

    Whereas most classical multi-agent systems have the agent in center, there has recently been a development towards focusing more on the organization of the system, thereby allowing the designer to focus on what the system goals are, without considering how the goals should be fulfilled. We have d...

  18. Artificial force fields for multi-agent simulations of maritime traffic and risk estimation

    NARCIS (Netherlands)

    Xiao, F.; Ligteringen, H.; Van Gulijk, C.; Ale, B.J.M.

    2012-01-01

    A probabilistic risk model is designed to estimate probabilities of collisions for shipping accidents in busy waterways. We propose a method based on multi-agent simulation that uses an artificial force field to model ship maneuvers. The artificial force field is calibrated by AIS data (Automatic

  19. Multi-Agent Programming Contest 2013: The Teams and the Design of Their Systems

    DEFF Research Database (Denmark)

    Ahlbrecht, Tobias; Bender-Saebelkampf, Christian; Brito, Maiquel

    2013-01-01

    Five teams participated in the Multi-Agent Programming Contest in 2013: All of them gained experience in 2012 already. In order to better understand which paradigms they used, which techniques they considered important and how much work they invested, the organisers of the contest compiled together...

  20. Diagnosis of multi-agent systems and its application to public administration

    NARCIS (Netherlands)

    Boer, A.; van Engers, T.; Abramowicz, W.; Maciaszek, L.; Węcel, K.

    2011-01-01

    In this paper we present a model-based diagnosis view on the complex social systems in which large public administration organizations operate. The purpose of diagnosis as presented in this paper is to identify agent role instances that are not conforming to expectations in a multi-agent system

  1. Multi-agent system for energy resource scheduling of integrated microgrids in a distributed system

    Energy Technology Data Exchange (ETDEWEB)

    Logenthiran, T.; Srinivasan, Dipti; Khambadkone, Ashwin M. [Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore)

    2011-01-15

    This paper proposes a multi-agent system for energy resource scheduling of an islanded power system with distributed resources, which consists of integrated microgrids and lumped loads. Distributed intelligent multi-agent technology is applied to make the power system more reliable, efficient and capable of exploiting and integrating alternative sources of energy. The algorithm behind the proposed energy resource scheduling has three stages. The first stage is to schedule each microgrid individually to satisfy its internal demand. The next stage involves finding the best possible bids for exporting power to the network and compete in a whole sale energy market. The final stage is to reschedule each microgrid individually to satisfy the total demand, which is the addition of internal demand and the demand from the results of the whole sale energy market simulation. The simulation results of a power system with distributed resources comprising three microgrids and five lumped loads show that the proposed multi-agent system allows efficient management of micro-sources with minimum operational cost. The case studies demonstrate that the system is successfully monitored, controlled and operated by means of the developed multi-agent system. (author)

  2. Distributed Scheduling to Support a Call Centre: a Co-operative Multi-Agent Approach

    NARCIS (Netherlands)

    Brazier, F.M.; Jonker, C.M.; Jungen, F.J.; Treur, J.; Nwana, H.S.

    1998-01-01

    This paper describes a multi-agent system architecture to increase the value of 24 hour a day call centre service. This system supports call centres in making appointments with clients on the basis of knowledge of employees and their schedules. Relevant activities of employees are scheduled for

  3. Towards a multi-agent system for visualising simulated behaviour within the built environment

    NARCIS (Netherlands)

    Dijkstra, J.; Timmermans, H.J.P.; Vries, de B.; Timmermans, H.J.P.; Vries, de B.

    2000-01-01

    This paper describes the outline of a multi-agent system approach for visualising simulated user behaviour within a building. This system can be used to support the assessment of design performance. Visualisation is of critical importance in improving the readability of design representations.

  4. A Distributed Multi-agent Control System for Power Consumption in Buildings

    DEFF Research Database (Denmark)

    Kosek, Anna Magdalena; Gehrke, Oliver

    2012-01-01

    This paper presents a distributed controller for adjusting the electrical consumption of a residential building in response to an external power setpoint in Watts. The controller is based on a multi-agent system and has been implemented in JCSP. It is modularly built, capable of self-configuratio...

  5. Organization of the secure distributed computing based on multi-agent system

    Science.gov (United States)

    Khovanskov, Sergey; Rumyantsev, Konstantin; Khovanskova, Vera

    2018-04-01

    Nowadays developing methods for distributed computing is received much attention. One of the methods of distributed computing is using of multi-agent systems. The organization of distributed computing based on the conventional network computers can experience security threats performed by computational processes. Authors have developed the unified agent algorithm of control system of computing network nodes operation. Network PCs is used as computing nodes. The proposed multi-agent control system for the implementation of distributed computing allows in a short time to organize using of the processing power of computers any existing network to solve large-task by creating a distributed computing. Agents based on a computer network can: configure a distributed computing system; to distribute the computational load among computers operated agents; perform optimization distributed computing system according to the computing power of computers on the network. The number of computers connected to the network can be increased by connecting computers to the new computer system, which leads to an increase in overall processing power. Adding multi-agent system in the central agent increases the security of distributed computing. This organization of the distributed computing system reduces the problem solving time and increase fault tolerance (vitality) of computing processes in a changing computing environment (dynamic change of the number of computers on the network). Developed a multi-agent system detects cases of falsification of the results of a distributed system, which may lead to wrong decisions. In addition, the system checks and corrects wrong results.

  6. Implementing a Multi-Agent System in Python with an Auction-Based Agreement Approach

    DEFF Research Database (Denmark)

    Ettienne, Mikko Berggren; Vester, Steen; Villadsen, Jørgen

    2012-01-01

    We describe the solution used by the Python-DTU team in the Multi-Agent Programming Contest 2011, where the scenario was called Agents on Mars. We present our auction-based agreement algorithm and discuss our chosen strategy and our choice of technology used for implementing the system. Finally, we...

  7. Linking Individual Learning Styles to Approach-Avoidance Motivational Traits and Computational Aspects of Reinforcement Learning.

    Directory of Open Access Journals (Sweden)

    Kristoffer Carl Aberg

    Full Text Available Learning how to gain rewards (approach learning and avoid punishments (avoidance learning is fundamental for everyday life. While individual differences in approach and avoidance learning styles have been related to genetics and aging, the contribution of personality factors, such as traits, remains undetermined. Moreover, little is known about the computational mechanisms mediating differences in learning styles. Here, we used a probabilistic selection task with positive and negative feedbacks, in combination with computational modelling, to show that individuals displaying better approach (vs. avoidance learning scored higher on measures of approach (vs. avoidance trait motivation, but, paradoxically, also displayed reduced learning speed following positive (vs. negative outcomes. These data suggest that learning different types of information depend on associated reward values and internal motivational drives, possibly determined by personality traits.

  8. Linking Individual Learning Styles to Approach-Avoidance Motivational Traits and Computational Aspects of Reinforcement Learning

    Science.gov (United States)

    Carl Aberg, Kristoffer; Doell, Kimberly C.; Schwartz, Sophie

    2016-01-01

    Learning how to gain rewards (approach learning) and avoid punishments (avoidance learning) is fundamental for everyday life. While individual differences in approach and avoidance learning styles have been related to genetics and aging, the contribution of personality factors, such as traits, remains undetermined. Moreover, little is known about the computational mechanisms mediating differences in learning styles. Here, we used a probabilistic selection task with positive and negative feedbacks, in combination with computational modelling, to show that individuals displaying better approach (vs. avoidance) learning scored higher on measures of approach (vs. avoidance) trait motivation, but, paradoxically, also displayed reduced learning speed following positive (vs. negative) outcomes. These data suggest that learning different types of information depend on associated reward values and internal motivational drives, possibly determined by personality traits. PMID:27851807

  9. An analysis of intergroup rivalry using Ising model and reinforcement learning

    Science.gov (United States)

    Zhao, Feng-Fei; Qin, Zheng; Shao, Zhuo

    2014-01-01

    Modeling of intergroup rivalry can help us better understand economic competitions, political elections and other similar activities. The result of intergroup rivalry depends on the co-evolution of individual behavior within one group and the impact from the rival group. In this paper, we model the rivalry behavior using Ising model. Different from other simulation studies using Ising model, the evolution rules of each individual in our model are not static, but have the ability to learn from historical experience using reinforcement learning technique, which makes the simulation more close to real human behavior. We studied the phase transition in intergroup rivalry and focused on the impact of the degree of social freedom, the personality of group members and the social experience of individuals. The results of computer simulation show that a society with a low degree of social freedom and highly educated, experienced individuals is more likely to be one-sided in intergroup rivalry.

  10. Pedunculopontine tegmental nucleus lesions impair stimulus--reward learning in autoshaping and conditioned reinforcement paradigms.

    Science.gov (United States)

    Inglis, W L; Olmstead, M C; Robbins, T W

    2000-04-01

    The role of the pedunculopontine tegmental nucleus (PPTg) in stimulus-reward learning was assessed by testing the effects of PPTg lesions on performance in visual autoshaping and conditioned reinforcement (CRf) paradigms. Rats with PPTg lesions were unable to learn an association between a conditioned stimulus (CS) and a primary reward in either paradigm. In the autoshaping experiment, PPTg-lesioned rats approached the CS+ and CS- with equal frequency, and the latencies to respond to the two stimuli did not differ. PPTg lesions also disrupted discriminated approaches to an appetitive CS in the CRf paradigm and completely abolished the acquisition of responding with CRf. These data are discussed in the context of a possible cognitive function of the PPTg, particularly in terms of lesion-induced disruptions of attentional processes that are mediated by the thalamus.

  11. Reinforcement learning solution for HJB equation arising in constrained optimal control problem.

    Science.gov (United States)

    Luo, Biao; Wu, Huai-Ning; Huang, Tingwen; Liu, Derong

    2015-11-01

    The constrained optimal control problem depends on the solution of the complicated Hamilton-Jacobi-Bellman equation (HJBE). In this paper, a data-based off-policy reinforcement learning (RL) method is proposed, which learns the solution of the HJBE and the optimal control policy from real system data. One important feature of the off-policy RL is that its policy evaluation can be realized with data generated by other behavior policies, not necessarily the target policy, which solves the insufficient exploration problem. The convergence of the off-policy RL is proved by demonstrating its equivalence to the successive approximation approach. Its implementation procedure is based on the actor-critic neural networks structure, where the function approximation is conducted with linearly independent basis functions. Subsequently, the convergence of the implementation procedure with function approximation is also proved. Finally, its effectiveness is verified through computer simulations. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Reinforcement learning for a biped robot based on a CPG-actor-critic method.

    Science.gov (United States)

    Nakamura, Yutaka; Mori, Takeshi; Sato, Masa-aki; Ishii, Shin

    2007-08-01

    Animals' rhythmic movements, such as locomotion, are considered to be controlled by neural circuits called central pattern generators (CPGs), which generate oscillatory signals. Motivated by this biological mechanism, studies have been conducted on the rhythmic movements controlled by CPG. As an autonomous learning framework for a CPG controller, we propose in this article a reinforcement learning method we call the "CPG-actor-critic" method. This method introduces a new architecture to the actor, and its training is roughly based on a stochastic policy gradient algorithm presented recently. We apply this method to an automatic acquisition problem of control for a biped robot. Computer simulations show that training of the CPG can be successfully performed by our method, thus allowing the biped robot to not only walk stably but also adapt to environmental changes.

  13. The role of multiple neuromodulators in reinforcement learning that is based on competition between eligibility traces

    Directory of Open Access Journals (Sweden)

    Marco A Huertas

    2016-12-01

    Full Text Available The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment, and how to stop learning once the target behaviors are attained (stopping rule. To address the first problem, synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although these were mere theoretical constructs, recent experiements have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP and one for long-term depression (LTD, each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different

  14. The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces.

    Science.gov (United States)

    Huertas, Marco A; Schwettmann, Sarah E; Shouval, Harel Z

    2016-01-01

    The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL) refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment), and how to stop learning once the target behaviors are attained (stopping rule). To address the first problem synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although, these were mere theoretical constructs, recent experiments have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward nucleus cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP) and one for long-term depression (LTD), each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although, the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different neuromodulators for

  15. Optimal and Scalable Caching for 5G Using Reinforcement Learning of Space-Time Popularities

    Science.gov (United States)

    Sadeghi, Alireza; Sheikholeslami, Fatemeh; Giannakis, Georgios B.

    2018-02-01

    Small basestations (SBs) equipped with caching units have potential to handle the unprecedented demand growth in heterogeneous networks. Through low-rate, backhaul connections with the backbone, SBs can prefetch popular files during off-peak traffic hours, and service them to the edge at peak periods. To intelligently prefetch, each SB must learn what and when to cache, while taking into account SB memory limitations, the massive number of available contents, the unknown popularity profiles, as well as the space-time popularity dynamics of user file requests. In this work, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown. Joint consideration of global and local popularity demands along with cache-refreshing costs allow for a simple, yet practical asynchronous caching approach. The novel RL-based caching relies on a Q-learning algorithm to implement the optimal policy in an online fashion, thus enabling the cache control unit at the SB to learn, track, and possibly adapt to the underlying dynamics. To endow the algorithm with scalability, a linear function approximation of the proposed Q-learning scheme is introduced, offering faster convergence as well as reduced complexity and memory requirements. Numerical tests corroborate the merits of the proposed approach in various realistic settings.

  16. Energy Management Strategy for a Hybrid Electric Vehicle Based on Deep Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Yue Hu

    2018-01-01

    Full Text Available An energy management strategy (EMS is important for hybrid electric vehicles (HEVs since it plays a decisive role on the performance of the vehicle. However, the variation of future driving conditions deeply influences the effectiveness of the EMS. Most existing EMS methods simply follow predefined rules that are not adaptive to different driving conditions online. Therefore, it is useful that the EMS can learn from the environment or driving cycle. In this paper, a deep reinforcement learning (DRL-based EMS is designed such that it can learn to select actions directly from the states without any prediction or predefined rules. Furthermore, a DRL-based online learning architecture is presented. It is significant for applying the DRL algorithm in HEV energy management under different driving conditions. Simulation experiments have been conducted using MATLAB and Advanced Vehicle Simulator (ADVISOR co-simulation. Experimental results validate the effectiveness of the DRL-based EMS compared with the rule-based EMS in terms of fuel economy. The online learning architecture is also proved to be effective. The proposed method ensures the optimality, as well as real-time applicability, in HEVs.

  17. A Day-to-Day Route Choice Model Based on Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Fangfang Wei

    2014-01-01

    Full Text Available Day-to-day traffic dynamics are generated by individual traveler’s route choice and route adjustment behaviors, which are appropriate to be researched by using agent-based model and learning theory. In this paper, we propose a day-to-day route choice model based on reinforcement learning and multiagent simulation. Travelers’ memory, learning rate, and experience cognition are taken into account. Then the model is verified and analyzed. Results show that the network flow can converge to user equilibrium (UE if travelers can remember all the travel time they have experienced, but which is not necessarily the case under limited memory; learning rate can strengthen the flow fluctuation, but memory leads to the contrary side; moreover, high learning rate results in the cyclical oscillation during the process of flow evolution. Finally, both the scenarios of link capacity degradation and random link capacity are used to illustrate the model’s applications. Analyses and applications of our model demonstrate the model is reasonable and useful for studying the day-to-day traffic dynamics.

  18. Intrinsic interactive reinforcement learning - Using error-related potentials for real world human-robot interaction.

    Science.gov (United States)

    Kim, Su Kyoung; Kirchner, Elsa Andrea; Stefes, Arne; Kirchner, Frank

    2017-12-14

    Reinforcement learning (RL) enables robots to learn its optimal behavioral strategy in dynamic environments based on feedback. Explicit human feedback during robot RL is advantageous, since an explicit reward function can be easily adapted. However, it is very demanding and tiresome for a human to continuously and explicitly generate feedback. Therefore, the development of implicit approaches is of high relevance. In this paper, we used an error-related potential (ErrP), an event-related activity in the human electroencephalogram (EEG), as an intrinsically generated implicit feedback (rewards) for RL. Initially we validated our approach with seven subjects in a simulated robot learning scenario. ErrPs were detected online in single trial with a balanced accuracy (bACC) of 91%, which was sufficient to learn to recognize gestures and the correct mapping between human gestures and robot actions in parallel. Finally, we validated our approach in a real robot scenario, in which seven subjects freely chose gestures and the real robot correctly learned the mapping between gestures and actions (ErrP detection (90% bACC)). In this paper, we demonstrated that intrinsically generated EEG-based human feedback in RL can successfully be used to implicitly improve gesture-based robot control during human-robot interaction. We call our approach intrinsic interactive RL.

  19. Multiobjective Reinforcement Learning for Traffic Signal Control Using Vehicular Ad Hoc Network

    Directory of Open Access Journals (Sweden)

    Houli Duan

    2010-01-01

    Full Text Available We propose a new multiobjective control algorithm based on reinforcement learning for urban traffic signal control, named multi-RL. A multiagent structure is used to describe the traffic system. A vehicular ad hoc network is used for the data exchange among agents. A reinforcement learning algorithm is applied to predict the overall value of the optimization objective given vehicles' states. The policy which minimizes the cumulative value of the optimization objective is regarded as the optimal one. In order to make the method adaptive to various traffic conditions, we also introduce a multiobjective control scheme in which the optimization objective is selected adaptively to real-time traffic states. The optimization objectives include the vehicle stops, the average waiting time, and the maximum queue length of the next intersection. In addition, we also accommodate a priority control to the buses and the emergency vehicles through our model. The simulation results indicated that our algorithm could perform more efficiently than traditional traffic light control methods.

  20. FMRQ-A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks.

    Science.gov (United States)

    Zhang, Zhen; Zhao, Dongbin; Gao, Junwei; Wang, Dongqing; Dai, Yujie

    2017-06-01

    In this paper, we propose a multiagent reinforcement learning algorithm dealing with fully cooperative tasks. The algorithm is called frequency of the maximum reward Q-learning (FMRQ). FMRQ aims to achieve one of the optimal Nash equilibria so as to optimize the performance index in multiagent systems. The frequency of obtaining the highest global immediate reward instead of immediate reward is used as the reinforcement signal. With FMRQ each agent does not need the observation of the other agents' actions and only shares its state and reward at each step. We validate FMRQ through case studies of repeated games: four cases of two-player two-action and one case of three-player two-action. It is demonstrated that FMRQ can converge to one of the optimal Nash equilibria in these cases. Moreover, comparison experiments on tasks with multiple states and finite steps are conducted. One is box-pushing and the other one is distributed sensor network problem. Experimental results show that the proposed algorithm outperforms others with higher performance.

  1. Reinforcement Learning Based Data Self-Destruction Scheme for Secured Data Management

    Directory of Open Access Journals (Sweden)

    Young Ki Kim

    2018-04-01

    Full Text Available As technologies and services that leverage cloud computing have evolved, the number of businesses and individuals who use them are increasing rapidly. In the course of using cloud services, as users store and use data that include personal information, research on privacy protection models to protect sensitive information in the cloud environment is becoming more important. As a solution to this problem, a self-destructing scheme has been proposed that prevents the decryption of encrypted user data after a certain period of time using a Distributed Hash Table (DHT network. However, the existing self-destructing scheme does not mention how to set the number of key shares and the threshold value considering the environment of the dynamic DHT network. This paper proposes a method to set the parameters to generate the key shares needed for the self-destructing scheme considering the availability and security of data. The proposed method defines state, action, and reward of the reinforcement learning model based on the similarity of the graph, and applies the self-destructing scheme process by updating the parameter based on the reinforcement learning model. Through the proposed technique, key sharing parameters can be set in consideration of data availability and security in dynamic DHT network environments.

  2. Reinforcement learning of self-regulated β-oscillations for motor restoration in chronic stroke

    Directory of Open Access Journals (Sweden)

    Georgios eNaros

    2015-07-01

    Full Text Available Neurofeedback training of motor imagery-related brain-states with brain-machine interfaces (BMI is currently being explored prior to standard physiotherapy to improve the motor outcome of stroke rehabilitation. Pilot studies suggest that such a priming intervention before physiotherapy might increase the responsiveness of the brain to the subsequent physiotherapy, thereby improving the clinical outcome. However, there is little evidence up to now that these BMI-based interventions have achieved operate conditioning of specific brain states that facilitate task-specific functional gains beyond the practice of primed physiotherapy. In this context, we argue that BMI technology needs to aim at physiological features relevant for the targeted behavioral gain. Moreover, this therapeutic intervention has to be informed by concepts of reinforcement learning to develop its full potential. Such a refined neurofeedback approach would need to address the following issues (1 Defining a physiological feedback target specific to the intended behavioral gain, e.g. β-band oscillations for cortico-muscular communication. This targeted brain state could well be different from the brain state optimal for the neurofeedback task (2 Selecting a BMI classification and thresholding approach on the basis of learning principles, i.e. balancing challenge and reward of the neurofeedback task instead of maximizing the classification accuracy of the feedback device (3 Adjusting the feedback in the course of the training period to account for the cognitive load and the learning experience of the participant. The proposed neurofeedback strategy provides evidence for the feasibility of the suggested approach by demonstrating that dynamic threshold adaptation based on reinforcement learning may lead to frequency-specific operant conditioning of β-band oscillations paralleled by task-specific motor improvement; a proposal that requires investigation in a larger cohort of stroke

  3. Multi-Objective Reinforcement Learning-Based Deep Neural Networks for Cognitive Space Communications

    Science.gov (United States)

    Ferreria, Paulo Victor R.; Paffenroth, Randy; Wyglinski, Alexander M.; Hackett, Timothy M.; Bilen, Sven G.; Reinhart, Richard C.; Mortensen, Dale J.

    2017-01-01

    Future communication subsystems of space exploration missions can potentially benefit from software-defined radios (SDRs) controlled by machine learning algorithms. In this paper, we propose a novel hybrid radio resource allocation management control algorithm that integrates multi-objective reinforcement learning and deep artificial neural networks. The objective is to efficiently manage communications system resources by monitoring performance functions with common dependent variables that result in conflicting goals. The uncertainty in the performance of thousands of different possible combinations of radio parameters makes the trade-off between exploration and exploitation in reinforcement learning (RL) much more challenging for future critical space-based missions. Thus, the system should spend as little time as possible on exploring actions, and whenever it explores an action, it should perform at acceptable levels most of the time. The proposed approach enables on-line learning by interactions with the environment and restricts poor resource allocation performance through virtual environment exploration. Improvements in the multiobjective performance can be achieved via transmitter parameter adaptation on a packet-basis, with poorly predicted performance promptly resulting in rejected decisions. Simulations presented in this work considered the DVB-S2 standard adaptive transmitter parameters and additional ones expected to be present in future adaptive radio systems. Performance results are provided by analysis of the proposed hybrid algorithm when operating across a satellite communication channel from Earth to GEO orbit during clear sky conditions. The proposed approach constitutes part of the core cognitive engine proof-of-concept to be delivered to the NASA Glenn Research Center SCaN Testbed located onboard the International Space Station.

  4. Variability in Dopamine Genes Dissociates Model-Based and Model-Free Reinforcement Learning.

    Science.gov (United States)

    Doll, Bradley B; Bath, Kevin G; Daw, Nathaniel D; Frank, Michael J

    2016-01-27

    Considerable evidence suggests that multiple learning systems can drive behavior. Choice can proceed reflexively from previous actions and their associated outcomes, as captured by "model-free" learning algorithms, or flexibly from prospective consideration of outcomes that might occur, as captured by "model-based" learning algorithms. However, differential contributions of dopamine to these systems are poorly understood. Dopamine is widely thought to support model-free learning by modulating plasticity in striatum. Model-based learning may also be affected by these striatal effects, or by other dopaminergic effects elsewhere, notably on prefrontal working memory function. Indeed, prominent demonstrations linking striatal dopamine to putatively model-free learning did not rule out model-based effects, whereas other studies have reported dopaminergic modulation of verifiably model-based learning, but without distinguishing a prefrontal versus striatal locus. To clarify the relationships between dopamine, neural systems, and learning strategies, we combine a genetic association approach in humans with two well-studied reinforcement learning tasks: one isolating model-based from model-free behavior and the other sensitive to key aspects of striatal plasticity. Prefrontal function was indexed by a polymorphism in the COMT gene, differences of which reflect dopamine levels in the prefrontal cortex. This polymorphism has been associated with differences in prefrontal activity and working memory. Striatal function was indexed by a gene coding for DARPP-32, which is densely expressed in the striatum where it is necessary for synaptic plasticity. We found evidence for our hypothesis that variations in prefrontal dopamine relate to model-based learning, whereas variations in striatal dopamine function relate to model-free learning. Decisions can stem reflexively from their previously associated outcomes or flexibly from deliberative consideration of potential choice outcomes

  5. Investigation of Drive-Reinforcement Learning and Application of Learning to Flight Control

    Science.gov (United States)

    1993-08-01

    WL-TR-93-1153 INVESTIGATION OF DRIVE-REINFORCEMEN% LEARNING AND APPLICATION OF LEARNING TO FLIGHT CONTROL AD-A277 442 WALTER L. BAKER (ED), STEPHEN ...OF LEARNING TO FUIGHT CONTROL PE 62204 ___ ___ ___ ___ __ ___ ___ ___ ___ ___ ___ __ PR 2003 6. AUTHOR(S) TA 05 WALTER L. BAKER (ED), STEPHEN C. ATKINS...34 Computers and Thought, E. A. Freigenbaum and J. Feldman (eds.), Mc- Graw Hill, New York, (1959). [19] Holland, J. H., "Escaping Brittleness: The Possibility

  6. Within- and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory.

    Science.gov (United States)

    Collins, Anne G E; Frank, Michael J

    2018-03-06

    Learning from rewards and punishments is essential to survival and facilitates flexible human behavior. It is widely appreciated that multiple cognitive and reinforcement learning systems contribute to decision-making, but the nature of their interactions is elusive. Here, we leverage methods for extracting trial-by-trial indices of reinforcement learning (RL) and working memory (WM) in human electro-encephalography to reveal single-trial computations beyond that afforded by behavior alone. Neural dynamics confirmed that increases in neural expectation were predictive of reduced neural surprise in the following feedback period, supporting central tenets of RL models. Within- and cross-trial dynamics revealed a cooperative interplay between systems for learning, in which WM contributes expectations to guide RL, despite competition between systems during choice. Together, these results provide a deeper understanding of how multiple neural systems interact for learning and decision-making and facilitate analysis of their disruption in clinical populations.

  7. Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations.

    Science.gov (United States)

    Lee, Jae Young; Park, Jin Bae; Choi, Yoon Ho

    2015-05-01

    This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral Q -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.

  8. Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data.

    Science.gov (United States)

    Lewis, F L; Vamvoudakis, Kyriakos G

    2011-02-01

    Approximate dynamic programming (ADP) is a class of reinforcement learning methods that have shown their importance in a variety of applications, including feedback control of dynamical systems. ADP generally requires full information about the system internal states, which is usually not available in practical situations. In this paper, we show how to implement ADP methods using only measured input/output data from the system. Linear dynamical systems with deterministic behavior are considered herein, which are systems of great interest in the control system community. In control system theory, these types of methods are referred to as output feedback (OPFB). The stochastic equivalent of the systems dealt with in this paper is a class of partially observable Markov decision processes. We develop both policy iteration and value iteration algorithms that converge to an optimal controller that requires only OPFB. It is shown that, similar to Q -learning, the new methods have the important advantage that knowledge of the system dynamics is not needed for the implementation of these learning algorithms or for the OPFB control. Only the order of the system, as well as an upper bound on its "observability index," must be known. The learned OPFB controller is in the form of a polynomial autoregressive moving-average controller that has equivalent performance with the optimal state variable feedback gain.

  9. Design issues of a reinforcement-based self-learning fuzzy controller for petrochemical process control

    Science.gov (United States)

    Yen, John; Wang, Haojin; Daugherity, Walter C.

    1992-01-01

    Fuzzy logic controllers have some often-cited advantages over conventional techniques such as PID control, including easier implementation, accommodation to natural language, and the ability to cover a wider range of operating conditions. One major obstacle that hinders the broader application of fuzzy logic controllers is the lack of a systematic way to develop and modify their rules; as a result the creation and modification of fuzzy rules often depends on trial and error or pure experimentation. One of the proposed approaches to address this issue is a self-learning fuzzy logic controller (SFLC) that uses reinforcement learning techniques to learn the desirability of states and to adjust the consequent part of its fuzzy control rules accordingly. Due to the different dynamics of the controlled processes, the performance of a self-learning fuzzy controller is highly contingent on its design. The design issue has not received sufficient attention. The issues related to the design of a SFLC for application to a petrochemical process are discussed, and its performance is compared with that of a PID and a self-tuning fuzzy logic controller.

  10. DAT1-Genotype and Menstrual Cycle, but Not Hormonal Contraception, Modulate Reinforcement Learning: Preliminary Evidence.

    Science.gov (United States)

    Jakob, Kristina; Ehrentreich, Hanna; Holtfrerich, Sarah K C; Reimers, Luise; Diekhof, Esther K

    2018-01-01

    Hormone by genotype interactions have been widely ignored by cognitive neuroscience. Yet, the dependence of cognitive performance on both baseline dopamine (DA) and current 17ß-estradiol (E2) level argues for their combined effect also in the context of reinforcement learning. Here, we assessed how the interaction between the natural rise of E2 in the late follicular phase (FP) and the 40 base-pair variable number tandem repeat polymorphism of the dopamine transporter (DAT1) affects reinforcement learning capacity. 30 women with a regular menstrual cycle performed a probabilistic feedback learning task twice during the early and late FP. In addition, 39 women, who took hormonal contraceptives (HC) to suppress natural ovulation, were tested during the "pill break" and the intake phase of HC. The present data show that DAT1-genotype may interact with transient hormonal state, but only in women with a natural menstrual cycle. We found that carriers of the 9-repeat allele (9RP) experienced a significant decrease in the ability to avoid punishment from early to late FP. Neither homozygote subjects of the 10RP allele, nor subjects from the HC group showed a change in behavior between phases. These data are consistent with neurobiological studies that found that rising E2 may reverse DA transporter function and could enhance DA efflux, which would in turn reduce punishment sensitivity particularly in subjects with a higher transporter density to begin with. Taken together, the present results, although based on a small sample, add to the growing understanding of the complex interplay between different physiological modulators of dopaminergic transmission. They may not only point out the necessity to control for hormonal state in behavioral genetic research, but may offer new starting points for studies in clinical settings.

  11. DAT1-Genotype and Menstrual Cycle, but Not Hormonal Contraception, Modulate Reinforcement Learning: Preliminary Evidence

    Directory of Open Access Journals (Sweden)

    Kristina Jakob

    2018-02-01

    Full Text Available Hormone by genotype interactions have been widely ignored by cognitive neuroscience. Yet, the dependence of cognitive performance on both baseline dopamine (DA and current 17ß-estradiol (E2 level argues for their combined effect also in the context of reinforcement learning. Here, we assessed how the interaction between the natural rise of E2 in the late follicular phase (FP and the 40 base-pair variable number tandem repeat polymorphism of the dopamine transporter (DAT1 affects reinforcement learning capacity. 30 women with a regular menstrual cycle performed a probabilistic feedback learning task twice during the early and late FP. In addition, 39 women, who took hormonal contraceptives (HC to suppress natural ovulation, were tested during the “pill break” and the intake phase of HC. The present data show that DAT1-genotype may interact with transient hormonal state, but only in women with a natural menstrual cycle. We found that carriers of the 9-repeat allele (9RP experienced a significant decrease in the ability to avoid punishment from early to late FP. Neither homozygote subjects of the 10RP allele, nor subjects from the HC group showed a change in behavior between phases. These data are consistent with neurobiological studies that found that rising E2 may reverse DA transporter function and could enhance DA efflux, which would in turn reduce punishment sensitivity particularly in subjects with a higher transporter density to begin with. Taken together, the present results, although based on a small sample, add to the growing understanding of the complex interplay between different physiological modulators of dopaminergic transmission. They may not only point out the necessity to control for hormonal state in behavioral genetic research, but may offer new starting points for studies in clinical settings.

  12. Long term effects of aversive reinforcement on colour discrimination learning in free-flying bumblebees.

    Directory of Open Access Journals (Sweden)

    Miguel A Rodríguez-Gironés

    Full Text Available The results of behavioural experiments provide important information about the structure and information-processing abilities of the visual system. Nevertheless, if we want to infer from behavioural data how the visual system operates, it is important to know how different learning protocols affect performance and to devise protocols that minimise noise in the response of experimental subjects. The purpose of this work was to investigate how reinforcement schedule and individual variability affect the learning process in a colour discrimination task. Free-flying bumblebees were trained to discriminate between two perceptually similar colours. The target colour was associated with sucrose solution, and the distractor could be associated with water or quinine solution throughout the experiment, or with one substance during the first half of the experiment and the other during the second half. Both acquisition and final performance of the discrimination task (measured as proportion of correct choices were determined by the choice of reinforcer during the first half of the experiment: regardless of whether bees were trained with water or quinine during the second half of the experiment, bees trained with quinine during the first half learned the task faster and performed better during the whole experiment. Our results confirm that the choice of stimuli used during training affects the rate at which colour discrimination tasks are acquired and show that early contact with a strongly aversive stimulus can be sufficient to maintain high levels of attention during several hours. On the other hand, bees which took more time to decide on which flower to alight were more likely to make correct choices than bees which made fast decisions. This result supports the existence of a trade-off between foraging speed and accuracy, and highlights the importance of measuring choice latencies during behavioural experiments focusing on cognitive abilities.

  13. Stochastic collusion and the power law of learning: a general reinforcement learning model of cooperation

    NARCIS (Netherlands)

    Flache, A.

    2002-01-01

    Concerns about models of cultural adaptation as analogs of genetic selection have led cognitive game theorists to explore learning-theoretic specifications. Two prominent examples, the Bush-Mosteller stochastic learning model and the Roth-Erev payoff-matching model, are aligned and integrated as

  14. Modelo de Sistema Multi-Agente ubicuo, adaptativo y sensible al contexto para ofrecer recomendaciones personalizadas de recursos educativos basado en ontologías

    OpenAIRE

    Salazar Ospina, Oscar Mauricio

    2015-01-01

    La gran diversidad de mecanismos de acceso a la información que nos brinda la tecnología móvil actual, hace que la computación tradicional se vea limitada ante las necesidades de los usuarios. A partir de esto surge la necesidad de desarrollar sistemas de e-learning que permitan obtener información adaptada de los perfiles del usuario, la cual sea confiable y recuperada en tiempo real de acuerdo a los requerimientos del entorno. El Sistema Multi-Agente (SMA) que se propone en esta tesis prete...

  15. The "proactive" model of learning: Integrative framework for model-free and model-based reinforcement learning utilizing the associative learning-based proactive brain concept.

    Science.gov (United States)

    Zsuga, Judit; Biro, Klara; Papp, Csaba; Tajti, Gabor; Gesztelyi, Rudolf

    2016-02-01

    Reinforcement learning (RL) is a powerful concept underlying forms of associative learning governed by the use of a scalar reward signal, with learning taking place if expectations are violated. RL may be assessed using model-based and model-free approaches. Model-based reinforcement learning involves the amygdala, the hippocampus, and the orbitofrontal cortex (OFC). The model-free system involves the pedunculopontine-tegmental nucleus (PPTgN), the ventral tegmental area (VTA) and the ventral striatum (VS). Based on the functional connectivity of VS, model-free and model based RL systems center on the VS that by integrating model-free signals (received as reward prediction error) and model-based reward related input computes value. Using the concept of reinforcement learning agent we propose that the VS serves as the value function component of the RL agent. Regarding the model utilized for model-based computations we turned to the proactive brain concept, which offers an ubiquitous function for the default network based on its great functional overlap with contextual associative areas. Hence, by means of the default network the brain continuously organizes its environment into context frames enabling the formulation of analogy-based association that are turned into predictions of what to expect. The OFC integrates reward-related information into context frames upon computing reward expectation by compiling stimulus-reward and context-reward information offered by the amygdala and hippocampus, respectively. Furthermore we suggest that the integration of model-based expectations regarding reward into the value signal is further supported by the efferent of the OFC that reach structures canonical for model-free learning (e.g., the PPTgN, VTA, and VS). (c) 2016 APA, all rights reserved).

  16. Research on monitoring system of water resources in Shiyang River Basin based on Multi-agent

    Science.gov (United States)

    Zhao, T. H.; Yin, Z.; Song, Y. Z.

    2012-11-01

    The Shiyang River Basin is the most populous, economy relatively develop, the highest degree of development and utilization of water resources, water conflicts the most prominent, ecological environment problems of the worst hit areas in Hexi inland river basin in Gansu province. the contradiction between people and water is aggravated constantly in the basin. This text combines multi-Agent technology with monitoring system of water resource, the establishment of a management center, telemetry Agent Federation, as well as the communication network between the composition of the Shiyang River Basin water resources monitoring system. By taking advantage of multi-agent system intelligence and communications coordination to improve the timeliness of the basin water resources monitoring.

  17. Controllability of multi-agent systems with time-delay in state and switching topology

    Science.gov (United States)

    Ji, Zhijian; Wang, Zidong; Lin, Hai; Wang, Zhen

    2010-02-01

    In this article, the controllability issue is addressed for an interconnected system of multiple agents. The network associated with the system is of the leader-follower structure with some agents taking leader role and others being followers interconnected via the neighbour-based rule. Sufficient conditions are derived for the controllability of multi-agent systems with time-delay in state, as well as a graph-based uncontrollability topology structure is revealed. Both single and double integrator dynamics are considered. For switching topology, two algebraic necessary and sufficient conditions are derived for the controllability of multi-agent systems. Several examples are also presented to illustrate how to control the system to shape into the desired configurations.

  18. Consensus pursuit of heterogeneous multi-agent systems under a directed acyclic graph

    Science.gov (United States)

    Yan, Jing; Guan, Xin-Ping; Luo, Xiao-Yuan

    2011-04-01

    This paper is concerned with the cooperative target pursuit problem by multiple agents based on directed acyclic graph. The target appears at a random location and moves only when sensed by the agents, and agents will pursue the target once they detect its existence. Since the ability of each agent may be different, we consider the heterogeneous multi-agent systems. According to the topology of the multi-agent systems, a novel consensus-based control law is proposed, where the target and agents are modeled as a leader and followers, respectively. Based on Mason's rule and signal flow graph analysis, the convergence conditions are provided to show that the agents can catch the target in a finite time. Finally, simulation studies are provided to verify the effectiveness of the proposed approach.

  19. Formation of Robust Multi-Agent Networks through Self-Organizing Random Regular Graphs

    KAUST Repository

    Yasin Yazicioǧlu, A.; Egerstedt, Magnus; Shamma, Jeff S.

    2015-01-01

    Multi-Agent networks are often modeled as interaction graphs, where the nodes represent the agents and the edges denote some direct interactions. The robustness of a multi-Agent network to perturbations such as failures, noise, or malicious attacks largely depends on the corresponding graph. In many applications, networks are desired to have well-connected interaction graphs with relatively small number of links. One family of such graphs is the random regular graphs. In this paper, we present a decentralized scheme for transforming any connected interaction graph with a possibly non-integer average degree of k into a connected random m-regular graph for some m ϵ [k+k ] 2. Accordingly, the agents improve the robustness of the network while maintaining a similar number of links as the initial configuration by locally adding or removing some edges. © 2015 IEEE.

  20. Consensus of Multi-Agent Systems with Prestissimo Scale-Free Networks

    International Nuclear Information System (INIS)

    Yang Hongyong; Lu Lan; Cao Kecai; Zhang Siying

    2010-01-01

    In this paper, the relations of the network topology and the moving consensus of multi-agent systems are studied. A consensus-prestissimo scale-free network model with the static preferential-consensus attachment is presented on the rewired link of the regular network. The effects of the static preferential-consensus BA network on the algebraic connectivity of the topology graph are compared with the regular network. The robustness gain to delay is analyzed for variable network topology with the same scale. The time to reach the consensus is studied for the dynamic network with and without communication delays. By applying the computer simulations, it is validated that the speed of the convergence of multi-agent systems can be greatly improved in the preferential-consensus BA network model with different configuration. (interdisciplinary physics and related areas of science and technology)

  1. Endogenous Price Bubbles in a Multi-Agent System of the Housing Market.

    Science.gov (United States)

    Kouwenberg, Roy; Zwinkels, Remco C J

    2015-01-01

    Economic history shows a large number of boom-bust cycles, with the U.S. real estate market as one of the latest examples. Classical economic models have not been able to provide a full explanation for this type of market dynamics. Therefore, we analyze home prices in the U.S. using an alternative approach, a multi-agent complex system. Instead of the classical assumptions of agent rationality and market efficiency, agents in the model are heterogeneous, adaptive, and boundedly rational. We estimate the multi-agent system with historical house prices for the U.S. market. The model fits the data well and a deterministic version of the model can endogenously produce boom-and-bust cycles on the basis of the estimated coefficients. This implies that trading between agents themselves can create major price swings in absence of fundamental news.

  2. Endogenous Price Bubbles in a Multi-Agent System of the Housing Market.

    Directory of Open Access Journals (Sweden)

    Roy Kouwenberg

    Full Text Available Economic history shows a large number of boom-bust cycles, with the U.S. real estate market as one of the latest examples. Classical economic models have not been able to provide a full explanation for this type of market dynamics. Therefore, we analyze home prices in the U.S. using an alternative approach, a multi-agent complex system. Instead of the classical assumptions of agent rationality and market efficiency, agents in the model are heterogeneous, adaptive, and boundedly rational. We estimate the multi-agent system with historical house prices for the U.S. market. The model fits the data well and a deterministic version of the model can endogenously produce boom-and-bust cycles on the basis of the estimated coefficients. This implies that trading between agents themselves can create major price swings in absence of fundamental news.

  3. Robust Consensus of Multi-Agent Systems with Uncertain Exogenous Disturbances

    International Nuclear Information System (INIS)

    Yang Hong-Yong; Guo Lei; Han Chao

    2011-01-01

    The objective of this paper is to investigate the consensus of the multi-agent systems with nonlinear coupling function and external disturbances. The disturbance includes two parts, one part is supposed to be generated by an exogenous system, which is not required to be neutrally stable as in the output regulation theory, the other part is the modeling uncertainty in the exogenous disturbance system. A novel composite disturbance observer based control (DOBC) and H ∞ control scheme is presented so that the disturbance with the exogenous system can be estimated and compensated and the consensus of the multi-agent systems with fixed and switching graph can be reached by using H ∞ control law. Simulations demonstrate the advantages of the proposed DOBC and H ∞ control scheme. (interdisciplinary physics and related areas of science and technology)

  4. Formation of Robust Multi-Agent Networks through Self-Organizing Random Regular Graphs

    KAUST Repository

    Yasin Yazicioǧlu, A.

    2015-11-25

    Multi-Agent networks are often modeled as interaction graphs, where the nodes represent the agents and the edges denote some direct interactions. The robustness of a multi-Agent network to perturbations such as failures, noise, or malicious attacks largely depends on the corresponding graph. In many applications, networks are desired to have well-connected interaction graphs with relatively small number of links. One family of such graphs is the random regular graphs. In this paper, we present a decentralized scheme for transforming any connected interaction graph with a possibly non-integer average degree of k into a connected random m-regular graph for some m ϵ [k+k ] 2. Accordingly, the agents improve the robustness of the network while maintaining a similar number of links as the initial configuration by locally adding or removing some edges. © 2015 IEEE.

  5. Endogenous Price Bubbles in a Multi-Agent System of the Housing Market

    Science.gov (United States)

    2015-01-01

    Economic history shows a large number of boom-bust cycles, with the U.S. real estate market as one of the latest examples. Classical economic models have not been able to provide a full explanation for this type of market dynamics. Therefore, we analyze home prices in the U.S. using an alternative approach, a multi-agent complex system. Instead of the classical assumptions of agent rationality and market efficiency, agents in the model are heterogeneous, adaptive, and boundedly rational. We estimate the multi-agent system with historical house prices for the U.S. market. The model fits the data well and a deterministic version of the model can endogenously produce boom-and-bust cycles on the basis of the estimated coefficients. This implies that trading between agents themselves can create major price swings in absence of fundamental news. PMID:26107740

  6. Research on monitoring system of water resources in Shiyang River Basin based on Multi-agent

    International Nuclear Information System (INIS)

    Zhao, T h; Yin, Z; Song, Y Z

    2012-01-01

    The Shiyang River Basin is the most populous, economy relatively develop, the highest degree of development and utilization of water resources, water conflicts the most prominent, ecological environment problems of the worst hit areas in Hexi inland river basin in Gansu province. the contradiction between people and water is aggravated constantly in the basin. This text combines multi-Agent technology with monitoring system of water resource, the establishment of a management center, telemetry Agent Federation, as well as the communication network between the composition of the Shiyang River Basin water resources monitoring system. By taking advantage of multi-agent system intelligence and communications coordination to improve the timeliness of the basin water resources monitoring.

  7. Multi-agent approach for power system in a smart grid protection context

    DEFF Research Database (Denmark)

    Abedini, Reza; Pinto, Tiago; Morais, Hugo

    2013-01-01

    electricity markets and in the other hand with increasing penetration of Distributed Generation (DG) because of environment issues and diminishing in fossil fuel reserves and its price growth, made microgrid more attractive. Micro grids are considers as partial of SmartGrid system to accommodate DGs as well......With increasing penetration of electricity application in society and the need of majority of appliance to electricity, high level of reliability becomes more essential; in one hand with deregulation of electricity market in production, transmission and distribution and emerge of competitive...... proposes a new approach for protection in a Microgrid environment as a part of SmartGrid: Multi-agent system to Protections Coordination (MAS-ProteC) which integrated in MASGriP (Multi-Agent Smart Grid Platform), providing protection services within network operation in SmartGrid in electricity market...

  8. A Novel Secondary Control for Microgrid Based on Synergetic Control of Multi-Agent System

    Directory of Open Access Journals (Sweden)

    Zhiwen Yu

    2016-03-01

    Full Text Available In power systems, the secondary control is a very useful way to restore the system frequency and voltage to the rated value. This paper tries to propose a secondary frequency and voltage control of islanded microgrids based on the distributed synergetic control of multi-agent systems. In the proposed control, since each distributed generation only requires its own information and that of the neighbors, the secondary control is fully distributed. The system is more reliable because the central controller and complex communication network are reduced in the distributed structure. Based on multi-agent systems, the dynamic model is established, and distributed synergetic control algorithms are given to design the secondary control of the islanded microgrid. Meanwhile, the system has globally asymptotic stability under the proposed control, which is proved by the direct Lyapunov method. Simulation results about a test microgrid are given to verify the effectiveness of the proposed control.

  9. 11th International Conference on Practical Applications of Agents and Multi-Agent Systems

    CERN Document Server

    Hermoso, Ramon; Moreno, María; Rodríguez, Juan; Hirsch, Benjamin; Mathieu, Philippe; Campbell, Andrew; Suarez-Figueroa, Mari; Ortega, Alfonso; Adam, Emmanuel; Navarro, Elena

    2013-01-01

    Research on Agents and Multi-agent Systems has matured during the last decade and many effective applications of this technology are now deployed. PAAMS provides an international forum to presents and discuss the latest scientific developments and their effective applications, to assess the impact of the approach, and to facilitate technology transfer. PAAMS started as a local initiative, but since grown to become the international yearly platform to present, to discuss, and to disseminate the latest developments and the most important outcomes related to real-world applications. It provides a unique opportunity to bring multi-disciplinary experts, academics and practitioners together to Exchange their experience in the development and deployment of Agents and Multiagents systems. PAAMS intends to bring together researchers and developers from industry and the academic world to report on the latest scientific and technical advances on the application of multi-agent systems, to discuss and debate the major iss...

  10. Coordination between Generation and Transmission Maintenance Scheduling by Means of Multi-agent Technique

    Science.gov (United States)

    Nagata, Takeshi; Tao, Yasuhiro; Utatani, Masahiro; Sasaki, Hiroshi; Fujita, Hideki

    This paper proposes a multi-agent approach to maintenance scheduling in restructured power systems. The restructuring of electric power industry has resulted in market-based approaches for unbundling a multitude of service provided by self-interested entities such as power generating companies (GENCOs), transmission providers (TRANSCOs) and distribution companies (DISCOs). The Independent System Operator (ISO) is responsible for the security of the system operation. The schedule submitted to ISO by GENCOs and TRANSCOs should satisfy security and reliability constraints. The proposed method consists of several GENCO Agents (GAGs), TARNSCO Agents (TAGs) and a ISO Agent(IAG). The IAG’s role in maintenance scheduling is limited to ensuring that the submitted schedules do not cause transmission congestion or endanger the system reliability. From the simulation results, it can be seen the proposed multi-agent approach could coordinate between generation and transmission maintenance schedules.

  11. From pattern formation to material computation multi-agent modelling of physarum polycephalum

    CERN Document Server

    Jones, Jeff

    2015-01-01

    This book addresses topics of mobile multi-agent systems, pattern formation, biological modelling, artificial life, unconventional computation, and robotics. The behaviour of a simple organism which is capable of remarkable biological and computational feats that seem to transcend its simple component parts is examined and modelled. In this book the following question is asked: How can something as simple as Physarum polycephalum - a giant amoeboid single-celled organism which does not possess any neural tissue, fixed skeleton or organised musculature - can approximate complex computational behaviour during its foraging, growth and adaptation of its amorphous body plan, and with such limited resources? To answer this question the same apparent limitations as faced by the organism are applied: using only simple components with local interactions. A synthesis approach is adopted and a mobile multi-agent system with very simple individual behaviours is employed. It is shown their interactions yield emergent beha...

  12. Agent and multi-Agent systems in distributed systems digital economy and e-commerce

    CERN Document Server

    Hartung, Ronald

    2013-01-01

    Information and communication technology, in particular artificial intelligence, can be used to support economy and commerce using digital means. This book is about agents and multi-agent distributed systems applied to digital economy and e-commerce to meet, improve, and overcome challenges in the digital economy and e-commerce sphere. Agent and multi-agent solutions are applied in implementing real-life, exciting developments associated with the need to eliminate problems of distributed systems.   The book presents solutions for both technology and applications, illustrating the possible uses of agents in the enterprise domain, covering design and analytic methods, needed to provide a solid foundation required for practical systems. More specifically, the book provides solutions for the digital economy, e-sourcing clusters in network economy, and knowledge exchange between agents applicable to online trading agents, and security solutions to both digital economy and e-commerce. Furthermore, it offers soluti...

  13. 6th Workshop on Service Orientation in Holonic and Multi-Agent Manufacturing

    CERN Document Server

    Trentesaux, Damien; Thomas, André; Leitão, Paulo; Oliveira, José

    2017-01-01

    The book offers an integrated vision on Cloud and HPC, Big Data, Analytics and virtualization in computing-oriented manufacturing, combining information and communication technologies, service-oriented control of holonic architectures as well as enterprise integration solutions based on SOA principles. It is structured in eight parts, each one grouping research and trends in digital manufacturing and service oriented manufacturing control: Cloud and Cyber-Physical Systems for Smart Manufacturing, Reconfigurable and Self-organized Multi-Agent Systems for Industry and Service, Sustainability Issues in Intelligent Manufacturing Systems, Holonic and Multi-agent System Design for Industry and Service, Should Intelligent Manufacturing Systems be Dependable and Safe?, Service-oriented Management and Control of Manufacturing Systems, Engineering and Human Integration in Flexible and Reconfigurable Industrial Systems,Virtualization and Simulation in Computing-oriented Industry and Service.

  14. A multi-agent based intelligent configuration method for aircraft fleet maintenance personnel

    Directory of Open Access Journals (Sweden)

    Feng Qiang

    2014-04-01

    Full Text Available A multi-agent based fleet maintenance personnel configuration method is proposed to solve the mission oriented aircraft fleet maintenance personnel configuration problem. The maintenance process of an aircraft fleet is analyzed first. In the process each aircraft contains multiple parts, and different parts are repaired by personnel with different majors and levels. The factors and their relationship involved in the process of maintenance are analyzed and discussed. Then the whole maintenance process is described as a 3-layer multi-agent system (MAS model. A communication and reasoning strategy among the agents is put forward. A fleet maintenance personnel configuration algorithm is proposed based on contract net protocol (CNP. Finally, a fleet of 10 aircraft is studied for verification purposes. A mission type with 3 waves of continuous dispatch is imaged. Compared with the traditional methods that can just provide configuration results, the proposed method can provide optimal maintenance strategies as well.

  15. Realization on the interactive remote video conference system based on multi-Agent

    Directory of Open Access Journals (Sweden)

    Zheng Yan

    2016-01-01

    Full Text Available To make people at different places participate in the same conference, speak and discuss freely, the interactive remote video conferencing system is designed and realized based on multi-Agent collaboration. FEC (forward error correction and tree P2P technology are firstly used to build a live conference structure to transfer audio and video data; then the branch conference port can participate to speak and discuss through the application of becoming a interactive focus; the introduction of multi-Agent collaboration technology improve the system robustness. The experiments showed that, under normal network conditions, the system can support 350 branch conference node simultaneously to make live broadcasting. The audio and video quality is smooth. It can carry out large-scale remote video conference.

  16. A Reinforcement Learning Approach to Call Admission Control in HAPS Communication System

    Directory of Open Access Journals (Sweden)

    Ni Shu Yan

    2017-01-01

    Full Text Available The large changing of link capacity and number of users caused by the movement of both platform and users in communication system based on high altitude platform station (HAPS will resulting in high dropping rate of handover and reduce resource utilization. In order to solve these problems, this paper proposes an adaptive call admission control strategy based on reinforcement learning approach. The goal of this strategy is to maximize long-term gains of system, with the introduction of cross-layer interaction and the service downgraded. In order to access different traffics adaptively, the access utility of handover traffics and new call traffics is designed in different state of communication system. Numerical simulation result shows that the proposed call admission control strategy can enhance bandwidth resource utilization and the performances of handover traffics.

  17. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis.

    Science.gov (United States)

    Glimcher, Paul W

    2011-09-13

    A number of recent advances have been achieved in the study of midbrain dopaminergic neurons. Understanding these advances and how they relate to one another requires a deep understanding of the computational models that serve as an explanatory framework and guide ongoing experimental inquiry. This intertwining of theory and experiment now suggests very clearly that the phasic activity of the midbrain dopamine neurons provides a global mechanism for synaptic modification. These synaptic modifications, in turn, provide the mechanistic underpinning for a specific class of reinforcement learning mechanisms that now seem to underlie much of human and animal behavior. This review describes both the critical empirical findings that are at the root of this conclusion and the fantastic theoretical advances from which this conclusion is drawn.

  18. Off-Policy Reinforcement Learning: Optimal Operational Control for Two-Time-Scale Industrial Processes.

    Science.gov (United States)

    Li, Jinna; Kiumarsi, Bahare; Chai, Tianyou; Lewis, Frank L; Fan, Jialu

    2017-12-01

    Industrial flow lines are composed of unit processes operating on a fast time scale and performance measurements known as operational indices measured at a slower time scale. This paper presents a model-free optimal solution to a class of two time-scale industrial processes using off-policy reinforcement learning (RL). First, the lower-layer unit process control loop with a fast sampling period and the upper-layer operational index dynamics at a slow time scale are modeled. Second, a general optimal operational control problem is formulated to optimally prescribe the set-points for the unit industrial process. Then, a zero-sum game off-policy RL algorithm is developed to find the optimal set-points by using data measured in real-time. Finally, a simulation experiment is employed for an industrial flotation process to show the effectiveness of the proposed method.

  19. Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints

    Science.gov (United States)

    Yang, Xiong; Liu, Derong; Wang, Ding

    2014-03-01

    In this paper, an adaptive reinforcement learning-based solution is developed for the infinite-horizon optimal control problem of constrained-input continuous-time nonlinear systems in the presence of nonlinearities with unknown structures. Two different types of neural networks (NNs) are employed to approximate the Hamilton-Jacobi-Bellman equation. That is, an recurrent NN is constructed to identify the unknown dynamical system, and two feedforward NNs are used as the actor and the critic to approximate the optimal control and the optimal cost, respectively. Based on this framework, the action NN and the critic NN are tuned simultaneously, without the requirement for the knowledge of system drift dynamics. Moreover, by using Lyapunov's direct method, the weights of the action NN and the critic NN are guaranteed to be uniformly ultimately bounded, while keeping the closed-loop system stable. To demonstrate the effectiveness of the present approach, simulation results are illustrated.

  20. A Reinforcement Learning Approach to Access Management in Wireless Cellular Networks

    Directory of Open Access Journals (Sweden)

    Jihun Moon

    2017-01-01

    Full Text Available In smart city applications, huge numbers of devices need to be connected in an autonomous manner. 3rd Generation Partnership Project (3GPP specifies that Machine Type Communication (MTC should be used to handle data transmission among a large number of devices. However, the data transmission rates are highly variable, and this brings about a congestion problem. To tackle this problem, the use of Access Class Barring (ACB is recommended to restrict the number of access attempts allowed in data transmission by utilizing strategic parameters. In this paper, we model the problem of determining the strategic parameters with a reinforcement learning algorithm. In our model, the system evolves to minimize both the collision rate and the access delay. The experimental results show that our scheme improves system performance in terms of the access success rate, the failure rate, the collision rate, and the access delay.