WorldWideScience

Sample records for multi-agent reinforcement learning

  1. Multi-agent machine learning a reinforcement approach

    CERN Document Server

    Schwartz, H M

    2014-01-01

    The book begins with a chapter on traditional methods of supervised learning, covering recursive least squares learning, mean square error methods, and stochastic approximation. Chapter 2 covers single agent reinforcement learning. Topics include learning value functions, Markov games, and TD learning with eligibility traces. Chapter 3 discusses two player games including two player matrix games with both pure and mixed strategies. Numerous algorithms and examples are presented. Chapter 4 covers learning in multi-player games, stochastic games, and Markov games, focusing on learning multi-pla

  2. Switching dynamics of multi-agent learning

    NARCIS (Netherlands)

    Vrancx, P.; Tuyls, K.P.; Westra, R.

    2008-01-01

    This paper presents the dynamics of multi-agent reinforcement learning in multiple state problems. We extend previous work that formally modelled the relation between reinforcement learning agents and replicator dynamics in stateless multi-agent games. More precisely, in this work we use a

  3. Fast Conflict Resolution Based on Reinforcement Learning in Multi-agent System

    Institute of Scientific and Technical Information of China (English)

    PIAOSonghao; HONGBingrong; CHUHaitao

    2004-01-01

    In multi-agent system where each agen thas a different goal (even the team of agents has the same goal), agents must be able to resolve conflicts arising in the process of achieving their goal. Many researchers presented methods for conflict resolution, e.g., Reinforcement learning (RL), but the conventional RL requires a large computation cost because every agent must learn, at the same time the overlap of actions selected by each agent results in local conflict. Therefore in this paper, we propose a novel method to solve these problems. In order to deal with the conflict within the multi-agent system, the concept of potential field function based Action selection priority level (ASPL) is brought forward. In this method, all kinds of environment factor that may have influence on the priority are effectively computed with the potential field function. So the priority to access the local resource can be decided rapidly. By avoiding the complex coordination mechanism used in general multi-agent system, the conflict in multi-agent system is settled more efficiently. Our system consists of RL with ASPL module and generalized rules module. Using ASPL, RL module chooses a proper cooperative behavior, and generalized rule module can accelerate the learning process. By applying the proposed method to Robot Soccer, the learning process can be accelerated. The results of simulation and real experiments indicate the effectiveness of the method.

  4. Optimal control in microgrid using multi-agent reinforcement learning.

    Science.gov (United States)

    Li, Fu-Dong; Wu, Min; He, Yong; Chen, Xin

    2012-11-01

    This paper presents an improved reinforcement learning method to minimize electricity costs on the premise of satisfying the power balance and generation limit of units in a microgrid with grid-connected mode. Firstly, the microgrid control requirements are analyzed and the objective function of optimal control for microgrid is proposed. Then, a state variable "Average Electricity Price Trend" which is used to express the most possible transitions of the system is developed so as to reduce the complexity and randomicity of the microgrid, and a multi-agent architecture including agents, state variables, action variables and reward function is formulated. Furthermore, dynamic hierarchical reinforcement learning, based on change rate of key state variable, is established to carry out optimal policy exploration. The analysis shows that the proposed method is beneficial to handle the problem of "curse of dimensionality" and speed up learning in the unknown large-scale world. Finally, the simulation results under JADE (Java Agent Development Framework) demonstrate the validity of the presented method in optimal control for a microgrid with grid-connected mode. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  5. Adaptive Load Balancing of Parallel Applications with Multi-Agent Reinforcement Learning on Heterogeneous Systems

    Directory of Open Access Journals (Sweden)

    Johan Parent

    2004-01-01

    Full Text Available We report on the improvements that can be achieved by applying machine learning techniques, in particular reinforcement learning, for the dynamic load balancing of parallel applications. The applications being considered in this paper are coarse grain data intensive applications. Such applications put high pressure on the interconnect of the hardware. Synchronization and load balancing in complex, heterogeneous networks need fast, flexible, adaptive load balancing algorithms. Viewing a parallel application as a one-state coordination game in the framework of multi-agent reinforcement learning, and by using a recently introduced multi-agent exploration technique, we are able to improve upon the classic job farming approach. The improvements are achieved with limited computation and communication overhead.

  6. Concurrent Learning of Control in Multi agent Sequential Decision Tasks

    Science.gov (United States)

    2018-04-17

    Concurrent Learning of Control in Multi-agent Sequential Decision Tasks The overall objective of this project was to develop multi-agent reinforcement... learning (MARL) approaches for intelligent agents to autonomously learn distributed control policies in decentral- ized partially observable... learning of policies in Dec-POMDPs, established performance bounds, evaluated these algorithms both theoretically and empirically, The views

  7. Strategic farsighted learning in competitive multi-agent games

    NARCIS (Netherlands)

    t Hoen, P.J.; Bohté, S.M.; Poutré, la J.A.; Brewka, G.; Coradeschi, S.; Perini, A.

    2006-01-01

    We describe a generalized Q-learning type algorithm for reinforcement learning in competitive multi-agent games. We make the observation that in a competitive setting with adaptive agents an agent's actions will (likely) result in changes in the opponents policies. In addition to accounting for the

  8. Reinforcement learning in supply chains.

    Science.gov (United States)

    Valluri, Annapurna; North, Michael J; Macal, Charles M

    2009-10-01

    Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.

  9. Decentralized Reinforcement Learning of robot behaviors

    NARCIS (Netherlands)

    Leottau, David L.; Ruiz-del-Solar, Javier; Babuska, R.

    2018-01-01

    A multi-agent methodology is proposed for Decentralized Reinforcement Learning (DRL) of individual behaviors in problems where multi-dimensional action spaces are involved. When using this methodology, sub-tasks are learned in parallel by individual agents working toward a common goal. In

  10. Reinforcement learning agents providing advice in complex video games

    Science.gov (United States)

    Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa

    2014-01-01

    This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

  11. Emotion in reinforcement learning agents and robots : A survey

    NARCIS (Netherlands)

    Moerland, T.M.; Broekens, D.J.; Jonker, C.M.

    2018-01-01

    This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action

  12. Reinforcement Learning for a New Piano Mover

    Directory of Open Access Journals (Sweden)

    Yuko Ishiwaka

    2005-08-01

    Full Text Available We attempt to achieve corporative behavior of autonomous decentralized agents constructed via Q-Learning, which is a type of reinforcement learning. As such, in the present paper, we examine the piano mover's problem. We propose a multi-agent architecture that has a training agent, learning agents and intermediate agent. Learning agents are heterogeneous and can communicate with each other. The movement of an object with three kinds of agent depends on the composition of the actions of the learning agents. By learning its own shape through the learning agents, avoidance of obstacles by the object is expected. We simulate the proposed method in a two-dimensional continuous world. Results obtained in the present investigation reveal the effectiveness of the proposed method.

  13. Multi-Agent Framework for Virtual Learning Spaces.

    Science.gov (United States)

    Sheremetov, Leonid; Nunez, Gustavo

    1999-01-01

    Discussion of computer-supported collaborative learning, distributed artificial intelligence, and intelligent tutoring systems focuses on the concept of agents, and describes a virtual learning environment that has a multi-agent system. Describes a model of interactions in collaborative learning and discusses agents for Web-based virtual…

  14. Emotion in reinforcement learning agents and robots: A survey

    OpenAIRE

    Moerland, T.M.; Broekens, D.J.; Jonker, C.M.

    2018-01-01

    This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action selection. Therefore, computational emotion models are usually grounded in the agent's decision making architecture, of which RL is an important subclass. Studying emotions in RL-based agents is useful for ...

  15. Collective Machine Learning: Team Learning and Classification in Multi-Agent Systems

    Science.gov (United States)

    Gifford, Christopher M.

    2009-01-01

    This dissertation focuses on the collaboration of multiple heterogeneous, intelligent agents (hardware or software) which collaborate to learn a task and are capable of sharing knowledge. The concept of collaborative learning in multi-agent and multi-robot systems is largely under studied, and represents an area where further research is needed to…

  16. Pareto Optimal Solutions for Network Defense Strategy Selection Simulator in Multi-Objective Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Yang Sun

    2018-01-01

    Full Text Available Using Pareto optimization in Multi-Objective Reinforcement Learning (MORL leads to better learning results for network defense games. This is particularly useful for network security agents, who must often balance several goals when choosing what action to take in defense of a network. If the defender knows his preferred reward distribution, the advantages of Pareto optimization can be retained by using a scalarization algorithm prior to the implementation of the MORL. In this paper, we simulate a network defense scenario by creating a multi-objective zero-sum game and using Pareto optimization and MORL to determine optimal solutions and compare those solutions to different scalarization approaches. We build a Pareto Defense Strategy Selection Simulator (PDSSS system for assisting network administrators on decision-making, specifically, on defense strategy selection, and the experiment results show that the Satisficing Trade-Off Method (STOM scalarization approach performs better than linear scalarization or GUESS method. The results of this paper can aid network security agents attempting to find an optimal defense policy for network security games.

  17. "Notice of Violation of IEEE Publication Principles" Multiobjective Reinforcement Learning: A Comprehensive Overview.

    Science.gov (United States)

    Liu, Chunming; Xu, Xin; Hu, Dewen

    2013-04-29

    Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.

  18. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving

    OpenAIRE

    Shalev-Shwartz, Shai; Shammah, Shaked; Shashua, Amnon

    2016-01-01

    Autonomous driving is a multi-agent setting where the host vehicle must apply sophisticated negotiation skills with other road users when overtaking, giving way, merging, taking left and right turns and while pushing ahead in unstructured urban roadways. Since there are many possible scenarios, manually tackling all possible cases will likely yield a too simplistic policy. Moreover, one must balance between unexpected behavior of other drivers/pedestrians and at the same time not to be too de...

  19. An Online Q-learning Based Multi-Agent LFC for a Multi-Area Multi-Source Power System Including Distributed Energy Resources

    Directory of Open Access Journals (Sweden)

    H. Shayeghi

    2017-12-01

    Full Text Available This paper presents an online two-stage Q-learning based multi-agent (MA controller for load frequency control (LFC in an interconnected multi-area multi-source power system integrated with distributed energy resources (DERs. The proposed control strategy consists of two stages. The first stage is employed a PID controller which its parameters are designed using sine cosine optimization (SCO algorithm and are fixed. The second one is a reinforcement learning (RL based supplementary controller that has a flexible structure and improves the output of the first stage adaptively based on the system dynamical behavior. Due to the use of RL paradigm integrated with PID controller in this strategy, it is called RL-PID controller. The primary motivation for the integration of RL technique with PID controller is to make the existing local controllers in the industry compatible to reduce the control efforts and system costs. This novel control strategy combines the advantages of the PID controller with adaptive behavior of MA to achieve the desired level of robust performance under different kind of uncertainties caused by stochastically power generation of DERs, plant operational condition changes, and physical nonlinearities of the system. The suggested decentralized controller is composed of the autonomous intelligent agents, who learn the optimal control policy from interaction with the system. These agents update their knowledge about the system dynamics continuously to achieve a good frequency oscillation damping under various severe disturbances without any knowledge of them. It leads to an adaptive control structure to solve LFC problem in the multi-source power system with stochastic DERs. The results of RL-PID controller in comparison to the traditional PID and fuzzy-PID controllers is verified in a multi-area power system integrated with DERs through some performance indices.

  20. Learning Sequences of Actions in Collectives of Autonomous Agents

    Science.gov (United States)

    Turner, Kagan; Agogino, Adrian K.; Wolpert, David H.; Clancy, Daniel (Technical Monitor)

    2001-01-01

    In this paper we focus on the problem of designing a collective of autonomous agents that individually learn sequences of actions such that the resultant sequence of joint actions achieves a predetermined global objective. We are particularly interested in instances of this problem where centralized control is either impossible or impractical. For single agent systems in similar domains, machine learning methods (e.g., reinforcement learners) have been successfully used. However, applying such solutions directly to multi-agent systems often proves problematic, as agents may work at cross-purposes, or have difficulty in evaluating their contribution to achievement of the global objective, or both. Accordingly, the crucial design step in multiagent systems centers on determining the private objectives of each agent so that as the agents strive for those objectives, the system reaches a good global solution. In this work we consider a version of this problem involving multiple autonomous agents in a grid world. We use concepts from collective intelligence to design goals for the agents that are 'aligned' with the global goal, and are 'learnable' in that agents can readily see how their behavior affects their utility. We show that reinforcement learning agents using those goals outperform both 'natural' extensions of single agent algorithms and global reinforcement, learning solutions based on 'team games'.

  1. Reinforcement Learning Multi-Agent Modeling of Decision-Making Agents for the Study of Transboundary Surface Water Conflicts with Application to the Syr Darya River Basin

    Science.gov (United States)

    Riegels, N.; Siegfried, T.; Pereira Cardenal, S. J.; Jensen, R. A.; Bauer-Gottwein, P.

    2008-12-01

    In most economics--driven approaches to optimizing water use at the river basin scale, the system is modelled deterministically with the goal of maximizing overall benefits. However, actual operation and allocation decisions must be made under hydrologic and economic uncertainty. In addition, river basins often cross political boundaries, and different states may not be motivated to cooperate so as to maximize basin- scale benefits. Even within states, competing agents such as irrigation districts, municipal water agencies, and large industrial users may not have incentives to cooperate to realize efficiency gains identified in basin- level studies. More traditional simulation--optimization approaches assume pre-commitment by individual agents and stakeholders and unconditional compliance on each side. While this can help determine attainable gains and tradeoffs from efficient management, such hardwired policies do not account for dynamic feedback between agents themselves or between agents and their environments (e.g. due to climate change etc.). In reality however, we are dealing with an out-of-equilibrium multi-agent system, where there is neither global knowledge nor global control, but rather continuous strategic interaction between decision making agents. Based on the theory of stochastic games, we present a computational framework that allows for studying the dynamic feedback between decision--making agents themselves and an inherently uncertain environment in a spatially and temporally distributed manner. Agents with decision-making control over water allocation such as countries, irrigation districts, and municipalities are represented by reinforcement learning agents and coupled to a detailed hydrologic--economic model. This approach emphasizes learning by agents from their continuous interaction with other agents and the environment. It provides a convenient framework for the solution of the problem of dynamic decision-making in a mixed cooperative / non

  2. Semi-Cooperative Learning in Smart Grid Agents

    Science.gov (United States)

    2013-12-01

    this PhD program , but watching you grow has only made me realize how much more awesome human learning is. You have been a source of profound joy and...which should alleviate concern for scala - bility along this dimension. • Learning the negotiation model: Figure 6.23 shows single-episode results that...for Semi-cooperative Multi-agent Coordination. In IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning . [Prendergast, 1999

  3. An Improved Reinforcement Learning System Using Affective Factors

    Directory of Open Access Journals (Sweden)

    Takashi Kuremoto

    2013-07-01

    Full Text Available As a powerful and intelligent machine learning method, reinforcement learning (RL has been widely used in many fields such as game theory, adaptive control, multi-agent system, nonlinear forecasting, and so on. The main contribution of this technique is its exploration and exploitation approaches to find the optimal solution or semi-optimal solution of goal-directed problems. However, when RL is applied to multi-agent systems (MASs, problems such as “curse of dimension”, “perceptual aliasing problem”, and uncertainty of the environment constitute high hurdles to RL. Meanwhile, although RL is inspired by behavioral psychology and reward/punishment from the environment is used, higher mental factors such as affects, emotions, and motivations are rarely adopted in the learning procedure of RL. In this paper, to challenge agents learning in MASs, we propose a computational motivation function, which adopts two principle affective factors “Arousal” and “Pleasure” of Russell’s circumplex model of affects, to improve the learning performance of a conventional RL algorithm named Q-learning (QL. Compared with the conventional QL, computer simulations of pursuit problems with static and dynamic preys were carried out, and the results showed that the proposed method results in agents having a faster and more stable learning performance.

  4. The Reinforcement Learning Competition 2014

    OpenAIRE

    Dimitrakakis, Christos; Li, Guangliang; Tziortziotis, Nikoalos

    2014-01-01

    Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the development of very general learning agents for arbitrary reinforcement learning problems and to provide a test-bed for the unbiased evaluation of algorithms.

  5. Construction of multi-agent mobile robots control system in the problem of persecution with using a modified reinforcement learning method based on neural networks

    Science.gov (United States)

    Patkin, M. L.; Rogachev, G. N.

    2018-02-01

    A method for constructing a multi-agent control system for mobile robots based on training with reinforcement using deep neural networks is considered. Synthesis of the management system is proposed to be carried out with reinforcement training and the modified Actor-Critic method, in which the Actor module is divided into Action Actor and Communication Actor in order to simultaneously manage mobile robots and communicate with partners. Communication is carried out by sending partners at each step a vector of real numbers that are added to the observation vector and affect the behaviour. Functions of Actors and Critic are approximated by deep neural networks. The Critics value function is trained by using the TD-error method and the Actor’s function by using DDPG. The Communication Actor’s neural network is trained through gradients received from partner agents. An environment in which a cooperative multi-agent interaction is present was developed, computer simulation of the application of this method in the control problem of two robots pursuing two goals was carried out.

  6. TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow

    OpenAIRE

    Hafner, Danijar; Davidson, James; Vanhoucke, Vincent

    2017-01-01

    We introduce TensorFlow Agents, an efficient infrastructure paradigm for building parallel reinforcement learning algorithms in TensorFlow. We simulate multiple environments in parallel, and group them to perform the neural network computation on a batch rather than individual observations. This allows the TensorFlow execution engine to parallelize computation, without the need for manual synchronization. Environments are stepped in separate Python processes to progress them in parallel witho...

  7. Learning from induced changes in opponent (re)actions in multi-agent games

    NARCIS (Netherlands)

    P.J. 't Hoen (Pieter Jan); S.M. Bohte (Sander); J.A. La Poutré (Han)

    2005-01-01

    textabstractMulti-agent learning is a growing area of research. An important topic is to formulate how an agent can learn a good policy in the face of adaptive, competitive opponents. Most research has focused on extensions of single agent learning techniques originally designed for agents in more

  8. Human-level control through deep reinforcement learning

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-01

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  9. Human-level control through deep reinforcement learning.

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-26

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  10. OWL model of multi-agent Smart-system of distance learning for people with vision disabilities

    Directory of Open Access Journals (Sweden)

    Galina A. Samigulina

    2017-01-01

    Full Text Available The aim of the study is to develop an ontological model of multiagent smart-system of distance learning for visually impaired people based on Java Agent Development Framework for obtaining high-quality engineering education in laboratories of join use on modern equipment.Materials and methods of research. In developing multi-agent smart-system of distance learning, using various agents based on cognitive, ontological, statistical and intellectual methods is important. It is more convenient to implement this task in the form of software using multi-agent approach and Java Agent Development Framework. The main advantages of the platform are stability of operation, clear interface, simplicity of creating agents and extensive user database. In multi-agent systems, the solution is obtained automatically as result of interaction of many independent, purposeful agents. Each agent can perform certain tasks and pursue specified goals. Intellectual multi-agent systems and practical applications in distance learning based on them are considered.Results. The structural diagram of functioning of smart system distance learning for visually impaired people using various agents based on the system approach and the multi-agent platform Java Agent Development Framework is developed. The complex approach of distance learning of visually impaired people for obtaining highquality engineering education in laboratories of joint use on modern equipment is offered.The ontological model of multi-agent smart-system with a detailed description of the functions of following agents is created: personal, manager, ontological, cognitive, statistical, intellectual, shared laboratory agent, health agent, assistant to the agent and state agent. These agents execute their individual functions and provide a quality environment for learning.Conclusion. Thus, the proposed smart-system of distance learning for visually impaired people can significantly improve effectiveness and

  11. Adaptive representations for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.

    2010-01-01

    This book presents new algorithms for reinforcement learning, a form of machine learning in which an autonomous agent seeks a control policy for a sequential decision task. Since current methods typically rely on manually designed solution representations, agents that automatically adapt their own

  12. IMPLEMENTATION OF MULTIAGENT REINFORCEMENT LEARNING MECHANISM FOR OPTIMAL ISLANDING OPERATION OF DISTRIBUTION NETWORK

    DEFF Research Database (Denmark)

    Saleem, Arshad; Lind, Morten

    2008-01-01

    among electric power utilities to utilize modern information and communication technologies (ICT) in order to improve the automation of the distribution system. In this paper we present our work for the implementation of a dynamic multi-agent based distributed reinforcement learning mechanism...

  13. Effect of reinforcement learning on coordination of multiangent systems

    Science.gov (United States)

    Bukkapatnam, Satish T. S.; Gao, Greg

    2000-12-01

    For effective coordination of distributed environments involving multiagent systems, learning ability of each agent in the environment plays a crucial role. In this paper, we develop a simple group learning method based on reinforcement, and study its effect on coordination through application to a supply chain procurement scenario involving a computer manufacturer. Here, all parties are represented by self-interested, autonomous agents, each capable of performing specific simple tasks. They negotiate with each other to perform complex tasks and thus coordinate supply chain procurement. Reinforcement learning is intended to enable each agent to reach a best negotiable price within a shortest possible time. Our simulations of the application scenario under different learning strategies reveals the positive effects of reinforcement learning on an agent's as well as the system's performance.

  14. DYNAMIC AND INCREMENTAL EXPLORATION STRATEGY IN FUSION ADAPTIVE RESONANCE THEORY FOR ONLINE REINFORCEMENT LEARNING

    Directory of Open Access Journals (Sweden)

    Budhitama Subagdja

    2016-06-01

    Full Text Available One of the fundamental challenges in reinforcement learning is to setup a proper balance between exploration and exploitation to obtain the maximum cummulative reward in the long run. Most protocols for exploration bound the overall values to a convergent level of performance. If new knowledge is inserted or the environment is suddenly changed, the issue becomes more intricate as the exploration must compromise the pre-existing knowledge. This paper presents a type of multi-channel adaptive resonance theory (ART neural network model called fusion ART which serves as a fuzzy approximator for reinforcement learning with inherent features that can regulate the exploration strategy. This intrinsic regulation is driven by the condition of the knowledge learnt so far by the agent. The model offers a stable but incremental reinforcement learning that can involve prior rules as bootstrap knowledge for guiding the agent to select the right action. Experiments in obstacle avoidance and navigation tasks demonstrate that in the configuration of learning wherein the agent learns from scratch, the inherent exploration model in fusion ART model is comparable to the basic E-greedy policy. On the other hand, the model is demonstrated to deal with prior knowledge and strike a balance between exploration and exploitation.

  15. Reinforcement Learning State-of-the-Art

    CERN Document Server

    Wiering, Marco

    2012-01-01

    Reinforcement learning encompasses both a science of adaptive behavior of rational beings in uncertain environments and a computational methodology for finding optimal behaviors for challenging problems in control, optimization and adaptive behavior of intelligent agents. As a field, reinforcement learning has progressed tremendously in the past decade. The main goal of this book is to present an up-to-date series of survey articles on the main contemporary sub-fields of reinforcement learning. This includes surveys on partially observable environments, hierarchical task decompositions, relational knowledge representation and predictive state representations. Furthermore, topics such as transfer, evolutionary methods and continuous spaces in reinforcement learning are surveyed. In addition, several chapters review reinforcement learning methods in robotics, in games, and in computational neuroscience. In total seventeen different subfields are presented by mostly young experts in those areas, and together the...

  16. Belief reward shaping in reinforcement learning

    CSIR Research Space (South Africa)

    Marom, O

    2018-02-01

    Full Text Available A key challenge in many reinforcement learning problems is delayed rewards, which can significantly slow down learning. Although reward shaping has previously been introduced to accelerate learning by bootstrapping an agent with additional...

  17. Reinforcement learning or active inference?

    Science.gov (United States)

    Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J

    2009-07-29

    This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  18. Reinforcement learning or active inference?

    Directory of Open Access Journals (Sweden)

    Karl J Friston

    2009-07-01

    Full Text Available This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  19. Multi-agents and learning: Implications for Webusage mining

    Science.gov (United States)

    Lotfy, Hewayda M.S.; Khamis, Soheir M.S.; Aboghazalah, Maie M.

    2015-01-01

    Characterization of user activities is an important issue in the design and maintenance of websites. Server weblog files have abundant information about the user’s current interests. This information can be mined and analyzed therefore the administrators may be able to guide the users in their browsing activity so they may obtain relevant information in a shorter span of time to obtain user satisfaction. Web-based technology facilitates the creation of personally meaningful and socially useful knowledge through supportive interactions, communication and collaboration among educators, learners and information. This paper suggests a new methodology based on learning techniques for a Web-based Multiagent-based application to discover the hidden patterns in the user’s visited links. It presents a new approach that involves unsupervised, reinforcement learning, and cooperation between agents. It is utilized to discover patterns that represent the user’s profiles in a sample website into specific categories of materials using significance percentages. These profiles are used to make recommendations of interesting links and categories to the user. The experimental results of the approach showed successful user pattern recognition, and cooperative learning among agents to obtain user profiles. It indicates that combining different learning algorithms is capable of improving user satisfaction indicated by the percentage of precision, recall, the progressive category weight and F1-measure. PMID:26966569

  20. A strategy learning model for autonomous agents based on classification

    Directory of Open Access Journals (Sweden)

    Śnieżyński Bartłomiej

    2015-09-01

    Full Text Available In this paper we propose a strategy learning model for autonomous agents based on classification. In the literature, the most commonly used learning method in agent-based systems is reinforcement learning. In our opinion, classification can be considered a good alternative. This type of supervised learning can be used to generate a classifier that allows the agent to choose an appropriate action for execution. Experimental results show that this model can be successfully applied for strategy generation even if rewards are delayed. We compare the efficiency of the proposed model and reinforcement learning using the farmer-pest domain and configurations of various complexity. In complex environments, supervised learning can improve the performance of agents much faster that reinforcement learning. If an appropriate knowledge representation is used, the learned knowledge may be analyzed by humans, which allows tracking the learning process

  1. Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning.

    Science.gov (United States)

    Zhu, Lusha; Mathewson, Kyle E; Hsu, Ming

    2012-01-31

    Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents' beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs.

  2. Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models.

    Science.gov (United States)

    Najnin, Shamima; Banerjee, Bonny

    2018-01-01

    Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The "novel words to novel objects" language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task.

  3. Place preference and vocal learning rely on distinct reinforcers in songbirds.

    Science.gov (United States)

    Murdoch, Don; Chen, Ruidong; Goldberg, Jesse H

    2018-04-30

    In reinforcement learning (RL) agents are typically tasked with maximizing a single objective function such as reward. But it remains poorly understood how agents might pursue distinct objectives at once. In machines, multiobjective RL can be achieved by dividing a single agent into multiple sub-agents, each of which is shaped by agent-specific reinforcement, but it remains unknown if animals adopt this strategy. Here we use songbirds to test if navigation and singing, two behaviors with distinct objectives, can be differentially reinforced. We demonstrate that strobe flashes aversively condition place preference but not song syllables. Brief noise bursts aversively condition song syllables but positively reinforce place preference. Thus distinct behavior-generating systems, or agencies, within a single animal can be shaped by correspondingly distinct reinforcement signals. Our findings suggest that spatially segregated vocal circuits can solve a credit assignment problem associated with multiobjective learning.

  4. Personalised learning object based on multi-agent model and learners’ learning styles

    Directory of Open Access Journals (Sweden)

    Noppamas Pukkhem

    2011-09-01

    Full Text Available A multi-agent model is proposed in which learning styles and a word analysis technique to create a learning object recommendation system are used. On the basis of a learning style-based design, a concept map combination model is proposed to filter out unsuitable learning concepts from a given course. Our learner model classifies learners into eight styles and implements compatible computational methods consisting of three recommendations: i non-personalised, ii preferred feature-based, and iii neighbour-based collaborative filtering. The analysis of preference error (PE was performed by comparing the actual preferred learning object with the predicted one. In our experiments, the feature-based recommendation algorithm has the fewest PE.

  5. Adversarial Reinforcement Learning in a Cyber Security Simulation}

    OpenAIRE

    Elderman, Richard; Pater, Leon; Thie, Albert; Drugan, Madalina; Wiering, Marco

    2017-01-01

    This paper focuses on cyber-security simulations in networks modeled as a Markov game with incomplete information and stochastic elements. The resulting game is an adversarial sequential decision making problem played with two agents, the attacker and defender. The two agents pit one reinforcement learning technique, like neural networks, Monte Carlo learning and Q-learning, against each other and examine their effectiveness against learning opponents. The results showed that Monte Carlo lear...

  6. A Multi-Agent Control Architecture for a Robotic Wheelchair

    Directory of Open Access Journals (Sweden)

    C. Galindo

    2006-01-01

    Full Text Available Assistant robots like robotic wheelchairs can perform an effective and valuable work in our daily lives. However, they eventually may need external help from humans in the robot environment (particularly, the driver in the case of a wheelchair to accomplish safely and efficiently some tricky tasks for the current technology, i.e. opening a locked door, traversing a crowded area, etc. This article proposes a control architecture for assistant robots designed under a multi-agent perspective that facilitates the participation of humans into the robotic system and improves the overall performance of the robot as well as its dependability. Within our design, agents have their own intentions and beliefs, have different abilities (that include algorithmic behaviours and human skills and also learn autonomously the most convenient method to carry out their actions through reinforcement learning. The proposed architecture is illustrated with a real assistant robot: a robotic wheelchair that provides mobility to impaired or elderly people.

  7. Reinforcement Learning Based on the Bayesian Theorem for Electricity Markets Decision Support

    DEFF Research Database (Denmark)

    Sousa, Tiago; Pinto, Tiago; Praca, Isabel

    2014-01-01

    This paper presents the applicability of a reinforcement learning algorithm based on the application of the Bayesian theorem of probability. The proposed reinforcement learning algorithm is an advantageous and indispensable tool for ALBidS (Adaptive Learning strategic Bidding System), a multi...

  8. Learning in engineered multi-agent systems

    Science.gov (United States)

    Menon, Anup

    Consider the problem of maximizing the total power produced by a wind farm. Due to aerodynamic interactions between wind turbines, each turbine maximizing its individual power---as is the case in present-day wind farms---does not lead to optimal farm-level power capture. Further, there are no good models to capture the said aerodynamic interactions, rendering model based optimization techniques ineffective. Thus, model-free distributed algorithms are needed that help turbines adapt their power production on-line so as to maximize farm-level power capture. Motivated by such problems, the main focus of this dissertation is a distributed model-free optimization problem in the context of multi-agent systems. The set-up comprises of a fixed number of agents, each of which can pick an action and observe the value of its individual utility function. An individual's utility function may depend on the collective action taken by all agents. The exact functional form (or model) of the agent utility functions, however, are unknown; an agent can only measure the numeric value of its utility. The objective of the multi-agent system is to optimize the welfare function (i.e. sum of the individual utility functions). Such a collaborative task requires communications between agents and we allow for the possibility of such inter-agent communications. We also pay attention to the role played by the pattern of such information exchange on certain aspects of performance. We develop two algorithms to solve this problem. The first one, engineered Interactive Trial and Error Learning (eITEL) algorithm, is based on a line of work in the Learning in Games literature and applies when agent actions are drawn from finite sets. While in a model-free setting, we introduce a novel qualitative graph-theoretic framework to encode known directed interactions of the form "which agents' action affect which others' payoff" (interaction graph). We encode explicit inter-agent communications in a directed

  9. Learning Agent for a Heat-Pump Thermostat with a Set-Back Strategy Using Model-Free Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Frederik Ruelens

    2015-08-01

    Full Text Available The conventional control paradigm for a heat pump with a less efficient auxiliary heating element is to keep its temperature set point constant during the day. This constant temperature set point ensures that the heat pump operates in its more efficient heat-pump mode and minimizes the risk of activating the less efficient auxiliary heating element. As an alternative to a constant set-point strategy, this paper proposes a learning agent for a thermostat with a set-back strategy. This set-back strategy relaxes the set-point temperature during convenient moments, e.g., when the occupants are not at home. Finding an optimal set-back strategy requires solving a sequential decision-making process under uncertainty, which presents two challenges. The first challenge is that for most residential buildings, a description of the thermal characteristics of the building is unavailable and challenging to obtain. The second challenge is that the relevant information on the state, i.e., the building envelope, cannot be measured by the learning agent. In order to overcome these two challenges, our paper proposes an auto-encoder coupled with a batch reinforcement learning technique. The proposed approach is validated for two building types with different thermal characteristics for heating in the winter and cooling in the summer. The simulation results indicate that the proposed learning agent can reduce the energy consumption by 4%–9% during 100 winter days and by 9%–11% during 80 summer days compared to the conventional constant set-point strategy.

  10. Study and Application of Reinforcement Learning in Cooperative Strategy of the Robot Soccer Based on BDI Model

    Directory of Open Access Journals (Sweden)

    Wu Bo-ying

    2009-11-01

    Full Text Available The dynamic cooperation model of multi-Agent is formed by combining reinforcement learning with BDI model. In this model, the concept of the individual optimization loses its meaning, because the repayment of each Agent dose not only depend on itsself but also on the choice of other Agents. All Agents can pursue a common optimum solution and try to realize the united intention as a whole to a maximum limit. The robot moves to its goal, depending on the present positions of the other robots that cooperate with it and the present position of the ball. One of these robots cooperating with it is controlled to move by man with a joystick. In this way, Agent can be ensured to search for each state-action as frequently as possible when it carries on choosing movements, so as to shorten the time of searching for the movement space so that the convergence speed of reinforcement learning can be improved. The validity of the proposed cooperative strategy for the robot soccer has been proved by combining theoretical analysis with simulation robot soccer match (11vs11 .

  11. Efficient abstraction selection in reinforcement learning

    NARCIS (Netherlands)

    Seijen, H. van; Whiteson, S.; Kester, L.

    2013-01-01

    This paper introduces a novel approach for abstraction selection in reinforcement learning problems modelled as factored Markov decision processes (MDPs), for which a state is described via a set of state components. In abstraction selection, an agent must choose an abstraction from a set of

  12. Multi-Objective Reinforcement Learning-Based Deep Neural Networks for Cognitive Space Communications

    Science.gov (United States)

    Ferreria, Paulo Victor R.; Paffenroth, Randy; Wyglinski, Alexander M.; Hackett, Timothy M.; Bilen, Sven G.; Reinhart, Richard C.; Mortensen, Dale J.

    2017-01-01

    Future communication subsystems of space exploration missions can potentially benefit from software-defined radios (SDRs) controlled by machine learning algorithms. In this paper, we propose a novel hybrid radio resource allocation management control algorithm that integrates multi-objective reinforcement learning and deep artificial neural networks. The objective is to efficiently manage communications system resources by monitoring performance functions with common dependent variables that result in conflicting goals. The uncertainty in the performance of thousands of different possible combinations of radio parameters makes the trade-off between exploration and exploitation in reinforcement learning (RL) much more challenging for future critical space-based missions. Thus, the system should spend as little time as possible on exploring actions, and whenever it explores an action, it should perform at acceptable levels most of the time. The proposed approach enables on-line learning by interactions with the environment and restricts poor resource allocation performance through virtual environment exploration. Improvements in the multiobjective performance can be achieved via transmitter parameter adaptation on a packet-basis, with poorly predicted performance promptly resulting in rejected decisions. Simulations presented in this work considered the DVB-S2 standard adaptive transmitter parameters and additional ones expected to be present in future adaptive radio systems. Performance results are provided by analysis of the proposed hybrid algorithm when operating across a satellite communication channel from Earth to GEO orbit during clear sky conditions. The proposed approach constitutes part of the core cognitive engine proof-of-concept to be delivered to the NASA Glenn Research Center SCaN Testbed located onboard the International Space Station.

  13. Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play

    NARCIS (Netherlands)

    van der Ree, Michiel; Wiering, Marco

    2013-01-01

    This paper compares three strategies in using reinforcement learning algorithms to let an artificial agent learnto play the game of Othello. The three strategies that are compared are: Learning by self-play, learning from playing against a fixed opponent, and learning from playing against a fixed

  14. Multiobjective Reinforcement Learning for Traffic Signal Control Using Vehicular Ad Hoc Network

    Directory of Open Access Journals (Sweden)

    Houli Duan

    2010-01-01

    Full Text Available We propose a new multiobjective control algorithm based on reinforcement learning for urban traffic signal control, named multi-RL. A multiagent structure is used to describe the traffic system. A vehicular ad hoc network is used for the data exchange among agents. A reinforcement learning algorithm is applied to predict the overall value of the optimization objective given vehicles' states. The policy which minimizes the cumulative value of the optimization objective is regarded as the optimal one. In order to make the method adaptive to various traffic conditions, we also introduce a multiobjective control scheme in which the optimization objective is selected adaptively to real-time traffic states. The optimization objectives include the vehicle stops, the average waiting time, and the maximum queue length of the next intersection. In addition, we also accommodate a priority control to the buses and the emergency vehicles through our model. The simulation results indicated that our algorithm could perform more efficiently than traditional traffic light control methods.

  15. Proposed Methodology for Application of Human-like gradual Multi-Agent Q-Learning (HuMAQ) for Multi-robot Exploration

    International Nuclear Information System (INIS)

    Ray, Dip Narayan; Majumder, Somajyoti

    2014-01-01

    Several attempts have been made by the researchers around the world to develop a number of autonomous exploration techniques for robots. But it has been always an important issue for developing the algorithm for unstructured and unknown environments. Human-like gradual Multi-agent Q-leaming (HuMAQ) is a technique developed for autonomous robotic exploration in unknown (and even unimaginable) environments. It has been successfully implemented in multi-agent single robotic system. HuMAQ uses the concept of Subsumption architecture, a well-known Behaviour-based architecture for prioritizing the agents of the multi-agent system and executes only the most common action out of all the different actions recommended by different agents. Instead of using new state-action table (Q-table) each time, HuMAQ uses the immediate past table for efficient and faster exploration. The proof of learning has also been established both theoretically and practically. HuMAQ has the potential to be used in different and difficult situations as well as applications. The same architecture has been modified to use for multi-robot exploration in an environment. Apart from all other existing agents used in the single robotic system, agents for inter-robot communication and coordination/ co-operation with the other similar robots have been introduced in the present research. Current work uses a series of indigenously developed identical autonomous robotic systems, communicating with each other through ZigBee protocol

  16. Behavior Self-Organization in Multi-Agent Learning

    National Research Council Canada - National Science Library

    Bay, John

    1999-01-01

    There are four primary results of the first year of the project: It was discovered that clustering algorithms for pre-sorting high-dimensional datasets was not effective in improving subsequent processing by reinforcement learning methods...

  17. Product Distribution Theory for Control of Multi-Agent Systems

    Science.gov (United States)

    Lee, Chia Fan; Wolpert, David H.

    2004-01-01

    Product Distribution (PD) theory is a new framework for controlling Multi-Agent Systems (MAS's). First we review one motivation of PD theory, as the information-theoretic extension of conventional full-rationality game theory to the case of bounded rational agents. In this extension the equilibrium of the game is the optimizer of a Lagrangian of the (probability distribution of) the joint stare of the agents. Accordingly we can consider a team game in which the shared utility is a performance measure of the behavior of the MAS. For such a scenario the game is at equilibrium - the Lagrangian is optimized - when the joint distribution of the agents optimizes the system's expected performance. One common way to find that equilibrium is to have each agent run a reinforcement learning algorithm. Here we investigate the alternative of exploiting PD theory to run gradient descent on the Lagrangian. We present computer experiments validating some of the predictions of PD theory for how best to do that gradient descent. We also demonstrate how PD theory can improve performance even when we are not allowed to rerun the MAS from different initial conditions, a requirement implicit in some previous work.

  18. Optimal Wonderful Life Utility Functions in Multi-Agent Systems

    Science.gov (United States)

    Wolpert, David H.; Tumer, Kagan; Swanson, Keith (Technical Monitor)

    2000-01-01

    The mathematics of Collective Intelligence (COINs) is concerned with the design of multi-agent systems so as to optimize an overall global utility function when those systems lack centralized communication and control. Typically in COINs each agent runs a distinct Reinforcement Learning (RL) algorithm, so that much of the design problem reduces to how best to initialize/update each agent's private utility function, as far as the ensuing value of the global utility is concerned. Traditional team game solutions to this problem assign to each agent the global utility as its private utility function. In previous work we used the COIN framework to derive the alternative Wonderful Life Utility (WLU), and experimentally established that having the agents use it induces global utility performance up to orders of magnitude superior to that induced by use of the team game utility. The WLU has a free parameter (the clamping parameter) which we simply set to zero in that previous work. Here we derive the optimal value of the clamping parameter, and demonstrate experimentally that using that optimal value can result in significantly improved performance over that of clamping to zero, over and above the improvement beyond traditional approaches.

  19. Multi-issue Agent Negotiation Based on Fairness

    Science.gov (United States)

    Zuo, Baohe; Zheng, Sue; Wu, Hong

    Agent-based e-commerce service has become a hotspot now. How to make the agent negotiation process quickly and high-efficiently is the main research direction of this area. In the multi-issue model, MAUT(Multi-attribute Utility Theory) or its derived theory usually consider little about the fairness of both negotiators. This work presents a general model of agent negotiation which considered the satisfaction of both negotiators via autonomous learning. The model can evaluate offers from the opponent agent based on the satisfaction degree, learn online to get the opponent's knowledge from interactive instances of history and negotiation of this time, make concessions dynamically based on fair object. Through building the optimal negotiation model, the bilateral negotiation achieved a higher efficiency and fairer deal.

  20. Multiagent cooperation and competition with deep reinforcement learning.

    Directory of Open Access Journals (Sweden)

    Ardi Tampuu

    Full Text Available Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

  1. Multiagent cooperation and competition with deep reinforcement learning

    Science.gov (United States)

    Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

    2017-01-01

    Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments. PMID:28380078

  2. Multiagent cooperation and competition with deep reinforcement learning.

    Science.gov (United States)

    Tampuu, Ardi; Matiisen, Tambet; Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

    2017-01-01

    Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

  3. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning

    National Research Council Canada - National Science Library

    Bowling, Michael

    2000-01-01

    .... In this paper we contribute a comprehensive presentation of the relevant techniques for solving stochastic games from both the game theory community and reinforcement learning communities. We examine the assumptions and limitations of these algorithms, and identify similarities between these algorithms, single agent reinforcement learners, and basic game theory techniques.

  4. A Two-Stage Multi-Agent Based Assessment Approach to Enhance Students' Learning Motivation through Negotiated Skills Assessment

    Science.gov (United States)

    Chadli, Abdelhafid; Bendella, Fatima; Tranvouez, Erwan

    2015-01-01

    In this paper we present an Agent-based evaluation approach in a context of Multi-agent simulation learning systems. Our evaluation model is based on a two stage assessment approach: (1) a Distributed skill evaluation combining agents and fuzzy sets theory; and (2) a Negotiation based evaluation of students' performance during a training…

  5. 14th International Conference on Practical Applications of Agents and Multi-Agent Systems : Special Sessions

    CERN Document Server

    Escalona, María; Corchuelo, Rafael; Mathieu, Philippe; Vale, Zita; Campbell, Andrew; Rossi, Silvia; Adam, Emmanuel; Jiménez-López, María; Navarro, Elena; Moreno, María

    2016-01-01

    PAAMS, the International Conference on Practical Applications of Agents and Multi-Agent Systems is an evolution of the International Workshop on Practical Applications of Agents and Multi-Agent Systems. PAAMS is an international yearly tribune to present, to discuss, and to disseminate the latest developments and the most important outcomes related to real-world applications. It provides a unique opportunity to bring multi-disciplinary experts, academics and practitioners together to exchange their experience in the development of Agents and Multi-Agent Systems. This volume presents the papers that have been accepted for the 2016 in the special sessions: Agents Behaviours and Artificial Markets (ABAM); Advances on Demand Response and Renewable Energy Sources in Agent Based Smart Grids (ADRESS); Agents and Mobile Devices (AM); Agent Methodologies for Intelligent Robotics Applications (AMIRA); Learning, Agents and Formal Languages (LAFLang); Multi-Agent Systems and Ambient Intelligence (MASMAI); Web Mining and ...

  6. Iterative learning control for multi-agent systems coordination

    CERN Document Server

    Yang, Shiping; Li, Xuefang; Shen, Dong

    2016-01-01

    A timely guide using iterative learning control (ILC) as a solution for multi-agent systems (MAS) challenges, this book showcases recent advances and industrially relevant applications. Readers are first given a comprehensive overview of the intersection between ILC and MAS, then introduced to a range of topics that include both basic and advanced theoretical discussions, rigorous mathematics, engineering practice, and both linear and nonlinear systems. Through systematic discussion of network theory and intelligent control, the authors explore future research possibilities, develop new tools, and provide numerous applications such as power grids, communication and sensor networks, intelligent transportation systems, and formation control. Readers will gain a roadmap of the latest advances in the fields and can use their newfound knowledge to design their own algorithms.

  7. What Can Reinforcement Learning Teach Us About Non-Equilibrium Quantum Dynamics

    Science.gov (United States)

    Bukov, Marin; Day, Alexandre; Sels, Dries; Weinberg, Phillip; Polkovnikov, Anatoli; Mehta, Pankaj

    Equilibrium thermodynamics and statistical physics are the building blocks of modern science and technology. Yet, our understanding of thermodynamic processes away from equilibrium is largely missing. In this talk, I will reveal the potential of what artificial intelligence can teach us about the complex behaviour of non-equilibrium systems. Specifically, I will discuss the problem of finding optimal drive protocols to prepare a desired target state in quantum mechanical systems by applying ideas from Reinforcement Learning [one can think of Reinforcement Learning as the study of how an agent (e.g. a robot) can learn and perfect a given policy through interactions with an environment.]. The driving protocols learnt by our agent suggest that the non-equilibrium world features possibilities easily defying intuition based on equilibrium physics.

  8. Reinforcement learning: Solving two case studies

    Science.gov (United States)

    Duarte, Ana Filipa; Silva, Pedro; dos Santos, Cristina Peixoto

    2012-09-01

    Reinforcement Learning algorithms offer interesting features for the control of autonomous systems, such as the ability to learn from direct interaction with the environment, and the use of a simple reward signalas opposed to the input-outputs pairsused in classic supervised learning. The reward signal indicates the success of failure of the actions executed by the agent in the environment. In this work, are described RL algorithmsapplied to two case studies: the Crawler robot and the widely known inverted pendulum. We explore RL capabilities to autonomously learn a basic locomotion pattern in the Crawler, andapproach the balancing problem of biped locomotion using the inverted pendulum.

  9. Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.

    Directory of Open Access Journals (Sweden)

    Borja Fernandez-Gauna

    Full Text Available Multi-Agent Reinforcement Learning (MARL algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS in the relevant state-action subspace. Each agent's local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.

  10. A self-taught artificial agent for multi-physics computational model personalization.

    Science.gov (United States)

    Neumann, Dominik; Mansi, Tommaso; Itu, Lucian; Georgescu, Bogdan; Kayvanpour, Elham; Sedaghat-Hamedani, Farbod; Amr, Ali; Haas, Jan; Katus, Hugo; Meder, Benjamin; Steidl, Stefan; Hornegger, Joachim; Comaniciu, Dorin

    2016-12-01

    Personalization is the process of fitting a model to patient data, a critical step towards application of multi-physics computational models in clinical practice. Designing robust personalization algorithms is often a tedious, time-consuming, model- and data-specific process. We propose to use artificial intelligence concepts to learn this task, inspired by how human experts manually perform it. The problem is reformulated in terms of reinforcement learning. In an off-line phase, Vito, our self-taught artificial agent, learns a representative decision process model through exploration of the computational model: it learns how the model behaves under change of parameters. The agent then automatically learns an optimal strategy for on-line personalization. The algorithm is model-independent; applying it to a new model requires only adjusting few hyper-parameters of the agent and defining the observations to match. The full knowledge of the model itself is not required. Vito was tested in a synthetic scenario, showing that it could learn how to optimize cost functions generically. Then Vito was applied to the inverse problem of cardiac electrophysiology and the personalization of a whole-body circulation model. The obtained results suggested that Vito could achieve equivalent, if not better goodness of fit than standard methods, while being more robust (up to 11% higher success rates) and with faster (up to seven times) convergence rate. Our artificial intelligence approach could thus make personalization algorithms generalizable and self-adaptable to any patient and any model. Copyright © 2016. Published by Elsevier B.V.

  11. Algorithms for Reinforcement Learning

    CERN Document Server

    Szepesvari, Csaba

    2010-01-01

    Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms'

  12. 9th KES Conference on Agent and Multi-Agent Systems : Technologies and Applications

    CERN Document Server

    Howlett, Robert; Jain, Lakhmi

    2015-01-01

    Agents and multi-agent systems are related to a modern software paradigm which has long been recognized as a promising technology for constructing autonomous, complex and intelligent systems. The topics covered in this volume include agent-oriented software engineering, agent co-operation, co-ordination, negotiation, organization and communication, distributed problem solving, specification of agent communication languages, agent privacy, safety and security, formalization of ontologies and conversational agents. The volume highlights new trends and challenges in agent and multi-agent research and includes 38 papers classified in the following specific topics: learning paradigms, agent-based modeling and simulation, business model innovation and disruptive technologies, anthropic-oriented computing, serious games and business intelligence, design and implementation of intelligent agents and multi-agent systems, digital economy, and advances in networked virtual enterprises. Published p...

  13. Reinforcement learning for microgrid energy management

    International Nuclear Information System (INIS)

    Kuznetsova, Elizaveta; Li, Yan-Fu; Ruiz, Carlos; Zio, Enrico; Ault, Graham; Bell, Keith

    2013-01-01

    We consider a microgrid for energy distribution, with a local consumer, a renewable generator (wind turbine) and a storage facility (battery), connected to the external grid via a transformer. We propose a 2 steps-ahead reinforcement learning algorithm to plan the battery scheduling, which plays a key role in the achievement of the consumer goals. The underlying framework is one of multi-criteria decision-making by an individual consumer who has the goals of increasing the utilization rate of the battery during high electricity demand (so as to decrease the electricity purchase from the external grid) and increasing the utilization rate of the wind turbine for local use (so as to increase the consumer independence from the external grid). Predictions of available wind power feed the reinforcement learning algorithm for selecting the optimal battery scheduling actions. The embedded learning mechanism allows to enhance the consumer knowledge about the optimal actions for battery scheduling under different time-dependent environmental conditions. The developed framework gives the capability to intelligent consumers to learn the stochastic environment and make use of the experience to select optimal energy management actions. - Highlights: • A consumer exploits a 2 steps-ahead reinforcement learning for battery scheduling. • The Q-learning based mechanism is fed by the predictions of available wind power. • Wind speed state evolutions are modeled with a Markov chain model. • Optimal scheduling actions are learned through the occurrence of similar scenarios. • The consumer manifests a continuous enhance of his knowledge about optimal actions

  14. Manufacturing Scheduling Using Colored Petri Nets and Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Maria Drakaki

    2017-02-01

    Full Text Available Agent-based intelligent manufacturing control systems are capable to efficiently respond and adapt to environmental changes. Manufacturing system adaptation and evolution can be addressed with learning mechanisms that increase the intelligence of agents. In this paper a manufacturing scheduling method is presented based on Timed Colored Petri Nets (CTPNs and reinforcement learning (RL. CTPNs model the manufacturing system and implement the scheduling. In the search for an optimal solution a scheduling agent uses RL and in particular the Q-learning algorithm. A warehouse order-picking scheduling is presented as a case study to illustrate the method. The proposed scheduling method is compared to existing methods. Simulation and state space results are used to evaluate performance and identify system properties.

  15. An Interactive Tool for Creating Multi-Agent Systems and Interactive Agent-based Games

    DEFF Research Database (Denmark)

    Lund, Henrik Hautop; Pagliarini, Luigi

    2011-01-01

    Utilizing principles from parallel and distributed processing combined with inspiration from modular robotics, we developed the modular interactive tiles. As an educational tool, the modular interactive tiles facilitate the learning of multi-agent systems and interactive agent-based games...

  16. FY1995 distributed control of man-machine cooperative multi agent systems; 1995 nendo ningen kyochogata multi agent kikai system no jiritsu seigyo

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-03-01

    In the near future, distributed autonomous systems will be practical in many situations, e.g., interactive production systems, hazardous environments, nursing homes, and individual houses. The agents which consist of the distributed system must not give damages to human being and should be working economically. In this project man-machine cooperative multi agent systems are studied in many kind of respects, and basic design technology, basic control technique are developed by establishing fundamental theories and by constructing experimental systems. In this project theoretical and experimental studies are conducted in the following sub-projects: (1) Distributed cooperative control in multi agent type actuation systems (2) Control of non-holonomic systems (3) Man-machine Cooperative systems (4) Robot systems learning human skills (5) Robust force control of constrained systems In each sub-project cooperative nature between machine agent systems and human being, interference between artificial multi agents and environment and new function emergence in coordination of the multi agents and the environment, robust force control against for the environments, control methods for non-holonomic systems, robot systems which can mimic and learn human skills were studied. In each sub-project, some problems were hi-lighted and solutions for the problems have been given based on construction of experimental systems. (NEDO)

  17. Cooperative learning neural network output feedback control of uncertain nonlinear multi-agent systems under directed topologies

    Science.gov (United States)

    Wang, W.; Wang, D.; Peng, Z. H.

    2017-09-01

    Without assuming that the communication topologies among the neural network (NN) weights are to be undirected and the states of each agent are measurable, the cooperative learning NN output feedback control is addressed for uncertain nonlinear multi-agent systems with identical structures in strict-feedback form. By establishing directed communication topologies among NN weights to share their learned knowledge, NNs with cooperative learning laws are employed to identify the uncertainties. By designing NN-based κ-filter observers to estimate the unmeasurable states, a new cooperative learning output feedback control scheme is proposed to guarantee that the system outputs can track nonidentical reference signals with bounded tracking errors. A simulation example is given to demonstrate the effectiveness of the theoretical results.

  18. An adaptive multi-agent-based approach to smart grids control and optimization

    Energy Technology Data Exchange (ETDEWEB)

    Carvalho, Marco [Florida Institute of Technology, Melbourne, FL (United States); Perez, Carlos; Granados, Adrian [Institute for Human and Machine Cognition, Ocala, FL (United States)

    2012-03-15

    In this paper, we describe a reinforcement learning-based approach to power management in smart grids. The scenarios we consider are smart grid settings where renewable power sources (e.g. Photovoltaic panels) have unpredictable variations in power output due, for example, to weather or cloud transient effects. Our approach builds on a multi-agent system (MAS)-based infrastructure for the monitoring and coordination of smart grid environments with renewable power sources and configurable energy storage devices (battery banks). Software agents are responsible for tracking and reporting power flow variations at different points in the grid, and to optimally coordinate the engagement of battery banks (i.e. charge/idle/discharge modes) to maintain energy requirements to end-users. Agents are able to share information and coordinate control actions through a parallel communications infrastructure, and are also capable of learning, from experience, how to improve their response strategies for different operational conditions. In this paper we describe our approach and address some of the challenges associated with the communications infrastructure for distributed coordination. We also present some preliminary results of our first simulations using the GridLAB-D simulation environment, created by the US Department of Energy (DoE) at Pacific Northwest National Laboratory (PNNL). (orig.)

  19. 'Proactive' use of cue-context congruence for building reinforcement learning's reward function.

    Science.gov (United States)

    Zsuga, Judit; Biro, Klara; Tajti, Gabor; Szilasi, Magdolna Emma; Papp, Csaba; Juhasz, Bela; Gesztelyi, Rudolf

    2016-10-28

    Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent's knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability. The central objective of reinforcement learning is to solve these two functions outside the agent's control either using, or not using a model. In the present paper, using the proactive model of reinforcement learning we offer insight on how the brain creates simplified representations of the environment, and how these representations are organized to support the identification of relevant stimuli and action. Furthermore, we identify neurobiological correlates of our model by suggesting that the reward and policy functions, attributes of the Bellman equitation, are built by the orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC), respectively. Based on this we propose that the OFC assesses cue-context congruence to activate the most context frame. Furthermore given the bidirectional neuroanatomical link between the OFC and model-free structures, we suggest that model-based input is incorporated into the reward prediction error (RPE) signal, and conversely RPE signal may be used to update the reward-related information of context frames and the policy underlying action selection in the OFC and ACC, respectively. Furthermore clinical implications for cognitive behavioral interventions are discussed.

  20. Adaptive, Distributed Control of Constrained Multi-Agent Systems

    Science.gov (United States)

    Bieniawski, Stefan; Wolpert, David H.

    2004-01-01

    Product Distribution (PO) theory was recently developed as a broad framework for analyzing and optimizing distributed systems. Here we demonstrate its use for adaptive distributed control of Multi-Agent Systems (MASS), i.e., for distributed stochastic optimization using MAS s. First we review one motivation of PD theory, as the information-theoretic extension of conventional full-rationality game theory to the case of bounded rational agents. In this extension the equilibrium of the game is the optimizer of a Lagrangian of the (Probability dist&&on on the joint state of the agents. When the game in question is a team game with constraints, that equilibrium optimizes the expected value of the team game utility, subject to those constraints. One common way to find that equilibrium is to have each agent run a Reinforcement Learning (E) algorithm. PD theory reveals this to be a particular type of search algorithm for minimizing the Lagrangian. Typically that algorithm i s quite inefficient. A more principled alternative is to use a variant of Newton's method to minimize the Lagrangian. Here we compare this alternative to RL-based search in three sets of computer experiments. These are the N Queen s problem and bin-packing problem from the optimization literature, and the Bar problem from the distributed RL literature. Our results confirm that the PD-theory-based approach outperforms the RL-based scheme in all three domains.

  1. Reusable Reinforcement Learning via Shallow Trails.

    Science.gov (United States)

    Yu, Yang; Chen, Shi-Yong; Da, Qing; Zhou, Zhi-Hua

    2018-06-01

    Reinforcement learning has shown great success in helping learning agents accomplish tasks autonomously from environment interactions. Meanwhile in many real-world applications, an agent needs to accomplish not only a fixed task but also a range of tasks. For this goal, an agent can learn a metapolicy over a set of training tasks that are drawn from an underlying distribution. By maximizing the total reward summed over all the training tasks, the metapolicy can then be reused in accomplishing test tasks from the same distribution. However, in practice, we face two major obstacles to train and reuse metapolicies well. First, how to identify tasks that are unrelated or even opposite with each other, in order to avoid their mutual interference in the training. Second, how to characterize task features, according to which a metapolicy can be reused. In this paper, we propose the MetA-Policy LEarning (MAPLE) approach that overcomes the two difficulties by introducing the shallow trail. It probes a task by running a roughly trained policy. Using the rewards of the shallow trail, MAPLE automatically groups similar tasks. Moreover, when the task parameters are unknown, the rewards of the shallow trail also serve as task features. Empirical studies on several controlling tasks verify that MAPLE can train metapolicies well and receives high reward on test tasks.

  2. The Study of Reinforcement Learning for Traffic Self-Adaptive Control under Multiagent Markov Game Environment

    Directory of Open Access Journals (Sweden)

    Lun-Hui Xu

    2013-01-01

    Full Text Available Urban traffic self-adaptive control problem is dynamic and uncertain, so the states of traffic environment are hard to be observed. Efficient agent which controls a single intersection can be discovered automatically via multiagent reinforcement learning. However, in the majority of the previous works on this approach, each agent needed perfect observed information when interacting with the environment and learned individually with less efficient coordination. This study casts traffic self-adaptive control as a multiagent Markov game problem. The design employs traffic signal control agent (TSCA for each signalized intersection that coordinates with neighboring TSCAs. A mathematical model for TSCAs’ interaction is built based on nonzero-sum markov game which has been applied to let TSCAs learn how to cooperate. A multiagent Markov game reinforcement learning approach is constructed on the basis of single-agent Q-learning. This method lets each TSCA learn to update its Q-values under the joint actions and imperfect information. The convergence of the proposed algorithm is analyzed theoretically. The simulation results show that the proposed method is convergent and effective in realistic traffic self-adaptive control setting.

  3. Online Bahavior Aquisition of an Agent based on Coaching as Learning Assistance

    Science.gov (United States)

    Hirokawa, Masakazu; Suzuki, Kenji

    This paper describes a novel methodology, namely ``Coaching'', which allows humans to give a subjective evaluation to an agent in an iterative manner. This is an interactive learning method to improve the reinforcement learning by modifying a reward function dynamically according to given evaluations by a trainer and the learning situation of the agent. We demonstrate that the agent can learn different reward functions by given instructions such as ``good or bad'' by human's observation, and can also obtain a set of behavior based on the learnt reward functions through several experiments.

  4. Reinforcement learning account of network reciprocity.

    Science.gov (United States)

    Ezaki, Takahiro; Masuda, Naoki

    2017-01-01

    Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model) approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results.

  5. Experiments with Online Reinforcement Learning in Real-Time Strategy Games

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-time strategy (RTS) games provide a challenging platform to implement online reinforcement learning (RL) techniques in a real application. Computer, as one game player, monitors opponents' (human or other computers) strategies and then updates its own policy using RL methods. In this article......, we first examine the suitability of applying the online RL in various computer games. Reinforcement learning application depends on both RL complexity and the game features. We then propose a multi-layer framework for implementing online RL in an RTS game. The framework significantly reduces RL...... the effectiveness of our proposed framework and shed light on relevant issues in using online RL in RTS games....

  6. Homeostatic Agent for General Environment

    Science.gov (United States)

    Yoshida, Naoto

    2018-03-01

    One of the essential aspect in biological agents is dynamic stability. This aspect, called homeostasis, is widely discussed in ethology, neuroscience and during the early stages of artificial intelligence. Ashby's homeostats are general-purpose learning machines for stabilizing essential variables of the agent in the face of general environments. However, despite their generality, the original homeostats couldn't be scaled because they searched their parameters randomly. In this paper, first we re-define the objective of homeostats as the maximization of a multi-step survival probability from the view point of sequential decision theory and probabilistic theory. Then we show that this optimization problem can be treated by using reinforcement learning algorithms with special agent architectures and theoretically-derived intrinsic reward functions. Finally we empirically demonstrate that agents with our architecture automatically learn to survive in a given environment, including environments with visual stimuli. Our survival agents can learn to eat food, avoid poison and stabilize essential variables through theoretically-derived single intrinsic reward formulations.

  7. Joy, Distress, Hope, and Fear in Reinforcement Learning (Extended Abstract)

    NARCIS (Netherlands)

    Jacobs, E.J.; Broekens, J.; Jonker, C.M.

    2014-01-01

    In this paper we present a mapping between joy, distress, hope and fear, and Reinforcement Learning primitives. Joy / distress is a signal that is derived from the RL update signal, while hope/fear is derived from the utility of the current state. Agent-based simulation experiments replicate

  8. 2015 Special Sessions of the 13th International Conference on Practical Applications of Agents and Multi-Agent Systems

    CERN Document Server

    Hernández, Josefa; Mathieu, Philippe; Campbell, Andrew; Fernández-Caballero, Antonio; Moreno, María; Julián, Vicente; Alonso-Betanzos, Amparo; Jiménez-López, María; Botti, Vicente; Trends in Practical Applications of Agents, Multi-Agent Systems and Sustainability : the PAAMS Collection

    2015-01-01

    This volume presents the papers that have been accepted for the 2015 special sessions of the 13th International Conference on Practical Applications of Agents and Multi-Agent Systems, held at University of Salamanca, Spain, at 3rd-5th June, 2015: Agents Behaviours and Artificial Markets (ABAM); Agents and Mobile Devices (AM); Multi-Agent Systems and Ambient Intelligence (MASMAI); Web Mining and Recommender systems (WebMiRes); Learning, Agents and Formal Languages (LAFLang); Agent-based Modeling of Sustainable Behavior and Green Economies (AMSBGE); Emotional Software Agents (SSESA) and Intelligent Educational Systems (SSIES). The volume also includes the paper accepted for the Doctoral Consortium in PAAMS 2015. PAAMS, the International Conference on Practical Applications of Agents and Multi-Agent Systems is an evolution of the International Workshop on Practical Applications of Agents and Multi-Agent Systems. PAAMS is an international yearly tribune to present, to discuss, and to disseminate the latest develo...

  9. Reinforcement Learning for Ramp Control: An Analysis of Learning Parameters

    Directory of Open Access Journals (Sweden)

    Chao Lu

    2016-08-01

    Full Text Available Reinforcement Learning (RL has been proposed to deal with ramp control problems under dynamic traffic conditions; however, there is a lack of sufficient research on the behaviour and impacts of different learning parameters. This paper describes a ramp control agent based on the RL mechanism and thoroughly analyzed the influence of three learning parameters; namely, learning rate, discount rate and action selection parameter on the algorithm performance. Two indices for the learning speed and convergence stability were used to measure the algorithm performance, based on which a series of simulation-based experiments were designed and conducted by using a macroscopic traffic flow model. Simulation results showed that, compared with the discount rate, the learning rate and action selection parameter made more remarkable impacts on the algorithm performance. Based on the analysis, some suggestionsabout how to select suitable parameter values that can achieve a superior performance were provided.

  10. Reinforcement learning account of network reciprocity.

    Directory of Open Access Journals (Sweden)

    Takahiro Ezaki

    Full Text Available Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results.

  11. Autonomous parsing of behavior in a multi-agent setting

    NARCIS (Netherlands)

    Vanderelst, D.; Barakova, E.I.; Rutkowski, L.; Tadeusiewicz, R.

    2008-01-01

    Imitation learning is a promising route to instruct robotic multi-agent systems. However, imitating agents should be able to decide autonomously what behavior, observed in others, is interesting to copy. Here we investigate whether a simple recurrent network (Elman Net) can be used to extract

  12. Multi-agent sequential hypothesis testing

    KAUST Repository

    Kim, Kwang-Ki K.

    2014-12-15

    This paper considers multi-agent sequential hypothesis testing and presents a framework for strategic learning in sequential games with explicit consideration of both temporal and spatial coordination. The associated Bayes risk functions explicitly incorporate costs of taking private/public measurements, costs of time-difference and disagreement in actions of agents, and costs of false declaration/choices in the sequential hypothesis testing. The corresponding sequential decision processes have well-defined value functions with respect to (a) the belief states for the case of conditional independent private noisy measurements that are also assumed to be independent identically distributed over time, and (b) the information states for the case of correlated private noisy measurements. A sequential investment game of strategic coordination and delay is also discussed as an application of the proposed strategic learning rules.

  13. Value learning through reinforcement : The basics of dopamine and reinforcement learning

    NARCIS (Netherlands)

    Daw, N.D.; Tobler, P.N.; Glimcher, P.W.; Fehr, E.

    2013-01-01

    This chapter provides an overview of reinforcement learning and temporal difference learning and relates these topics to the firing properties of midbrain dopamine neurons. First, we review the RescorlaWagner learning rule and basic learning phenomena, such as blocking, which the rule explains. Then

  14. Construction of a Learning Agent Handling Its Rewards According to Environmental Situations

    Science.gov (United States)

    Moriyama, Koichi; Numao, Masayuki

    The authors aim at constructing an agent which learns appropriate actions in a Multi-Agent environment with and without social dilemmas. For this aim, the agent must have nonrationality that makes it give up its own profit when it should do that. Since there are many studies on rational learning that brings more and more profit, it is desirable to utilize them for constructing the agent. Therefore, we use a reward-handling manner that makes internal evaluation from the agent's rewards, and then the agent learns actions by a rational learning method with the internal evaluation. If the agent has only a fixed manner, however, it does not act well in the environment with and without dilemmas. Thus, the authors equip the agent with several reward-handling manners and criteria for selecting an effective one for the environmental situation. In the case of humans, what generates the internal evaluation is usually called emotion. Hence, this study also aims at throwing light on emotional activities of humans from a constructive view. In this paper, we divide a Multi-Agent environment into three situations and construct an agent having the reward-handling manners and the criteria. We observe that the agent acts well in all the three Multi-Agent situations composed of homogeneous agents.

  15. A novel multi-agent decentralized win or learn fast policy hill-climbing with eligibility trace algorithm for smart generation control of interconnected complex power grids

    International Nuclear Information System (INIS)

    Xi, Lei; Yu, Tao; Yang, Bo; Zhang, Xiaoshun

    2015-01-01

    Highlights: • Proposing a decentralized smart generation control scheme for the automatic generation control coordination. • A novel multi-agent learning algorithm is developed to resolve stochastic control problems in power systems. • A variable learning rate are introduced base on the framework of stochastic games. • A simulation platform is developed to test the performance of different algorithms. - Abstract: This paper proposes a multi-agent smart generation control scheme for the automatic generation control coordination in interconnected complex power systems. A novel multi-agent decentralized win or learn fast policy hill-climbing with eligibility trace algorithm is developed, which can effectively identify the optimal average policies via a variable learning rate under various operation conditions. Based on control performance standards, the proposed approach is implemented in a flexible multi-agent stochastic dynamic game-based smart generation control simulation platform. Based on the mixed strategy and average policy, it is highly adaptive in stochastic non-Markov environments and large time-delay systems, which can fulfill automatic generation control coordination in interconnected complex power systems in the presence of increasing penetration of decentralized renewable energy. Two case studies on both a two-area load–frequency control power system and the China Southern Power Grid model have been done. Simulation results verify that multi-agent smart generation control scheme based on the proposed approach can obtain optimal average policies thus improve the closed-loop system performances, and can achieve a fast convergence rate with significant robustness compared with other methods

  16. Deep Reinforcement Learning: An Overview

    OpenAIRE

    Li, Yuxi

    2017-01-01

    We give an overview of recent exciting achievements of deep reinforcement learning (RL). We discuss six core elements, six important mechanisms, and twelve applications. We start with background of machine learning, deep learning and reinforcement learning. Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. After that, we discuss important mechanisms for RL, including attention and memory, unsuperv...

  17. Data Mining Process Optimization in Computational Multi-agent Systems

    OpenAIRE

    Kazík, O.; Neruda, R. (Roman)

    2015-01-01

    In this paper, we present an agent-based solution of metalearning problem which focuses on optimization of data mining processes. We exploit the framework of computational multi-agent systems in which various meta-learning problems have been already studied, e.g. parameter-space search or simple method recommendation. In this paper, we examine the effect of data preprocessing for machine learning problems. We perform the set of experiments in the search-space of data mining processes which is...

  18. FMRQ-A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks.

    Science.gov (United States)

    Zhang, Zhen; Zhao, Dongbin; Gao, Junwei; Wang, Dongqing; Dai, Yujie

    2017-06-01

    In this paper, we propose a multiagent reinforcement learning algorithm dealing with fully cooperative tasks. The algorithm is called frequency of the maximum reward Q-learning (FMRQ). FMRQ aims to achieve one of the optimal Nash equilibria so as to optimize the performance index in multiagent systems. The frequency of obtaining the highest global immediate reward instead of immediate reward is used as the reinforcement signal. With FMRQ each agent does not need the observation of the other agents' actions and only shares its state and reward at each step. We validate FMRQ through case studies of repeated games: four cases of two-player two-action and one case of three-player two-action. It is demonstrated that FMRQ can converge to one of the optimal Nash equilibria in these cases. Moreover, comparison experiments on tasks with multiple states and finite steps are conducted. One is box-pushing and the other one is distributed sensor network problem. Experimental results show that the proposed algorithm outperforms others with higher performance.

  19. Knowledge-Based Reinforcement Learning for Data Mining

    Science.gov (United States)

    Kudenko, Daniel; Grzes, Marek

    Data Mining is the process of extracting patterns from data. Two general avenues of research in the intersecting areas of agents and data mining can be distinguished. The first approach is concerned with mining an agent’s observation data in order to extract patterns, categorize environment states, and/or make predictions of future states. In this setting, data is normally available as a batch, and the agent’s actions and goals are often independent of the data mining task. The data collection is mainly considered as a side effect of the agent’s activities. Machine learning techniques applied in such situations fall into the class of supervised learning. In contrast, the second scenario occurs where an agent is actively performing the data mining, and is responsible for the data collection itself. For example, a mobile network agent is acquiring and processing data (where the acquisition may incur a certain cost), or a mobile sensor agent is moving in a (perhaps hostile) environment, collecting and processing sensor readings. In these settings, the tasks of the agent and the data mining are highly intertwined and interdependent (or even identical). Supervised learning is not a suitable technique for these cases. Reinforcement Learning (RL) enables an agent to learn from experience (in form of reward and punishment for explorative actions) and adapt to new situations, without a teacher. RL is an ideal learning technique for these data mining scenarios, because it fits the agent paradigm of continuous sensing and acting, and the RL agent is able to learn to make decisions on the sampling of the environment which provides the data. Nevertheless, RL still suffers from scalability problems, which have prevented its successful use in many complex real-world domains. The more complex the tasks, the longer it takes a reinforcement learning algorithm to converge to a good solution. For many real-world tasks, human expert knowledge is available. For example, human

  20. Rational and Mechanistic Perspectives on Reinforcement Learning

    Science.gov (United States)

    Chater, Nick

    2009-01-01

    This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…

  1. Multi-agent models of spatial cognition, learning and complex choice behavior in urban environments

    NARCIS (Netherlands)

    Arentze, Theo; Timmermans, Harry; Portugali, J.

    2006-01-01

    This chapter provides an overview of ongoing research projects in the DDSS research program at TUE related to multi-agents. Projects include (a) the use of multi-agent models and concepts of artificial intelligence to develop models of activity-travel behavior; (b) the use of a multi-agent model to

  2. Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games.

    Science.gov (United States)

    Li, Jinna; Modares, Hamidreza; Chai, Tianyou; Lewis, Frank L; Xie, Lihua

    2017-10-01

    This paper develops an off-policy reinforcement learning (RL) algorithm to solve optimal synchronization of multiagent systems. This is accomplished by using the framework of graphical games. In contrast to traditional control protocols, which require complete knowledge of agent dynamics, the proposed off-policy RL algorithm is a model-free approach, in that it solves the optimal synchronization problem without knowing any knowledge of the agent dynamics. A prescribed control policy, called behavior policy, is applied to each agent to generate and collect data for learning. An off-policy Bellman equation is derived for each agent to learn the value function for the policy under evaluation, called target policy, and find an improved policy, simultaneously. Actor and critic neural networks along with least-square approach are employed to approximate target control policies and value functions using the data generated by applying prescribed behavior policies. Finally, an off-policy RL algorithm is presented that is implemented in real time and gives the approximate optimal control policy for each agent using only measured data. It is shown that the optimal distributed policies found by the proposed algorithm satisfy the global Nash equilibrium and synchronize all agents to the leader. Simulation results illustrate the effectiveness of the proposed method.

  3. Stochastic abstract policies: generalizing knowledge to improve reinforcement learning.

    Science.gov (United States)

    Koga, Marcelo L; Freire, Valdinei; Costa, Anna H R

    2015-01-01

    Reinforcement learning (RL) enables an agent to learn behavior by acquiring experience through trial-and-error interactions with a dynamic environment. However, knowledge is usually built from scratch and learning to behave may take a long time. Here, we improve the learning performance by leveraging prior knowledge; that is, the learner shows proper behavior from the beginning of a target task, using the knowledge from a set of known, previously solved, source tasks. In this paper, we argue that building stochastic abstract policies that generalize over past experiences is an effective way to provide such improvement and this generalization outperforms the current practice of using a library of policies. We achieve that contributing with a new algorithm, AbsProb-PI-multiple and a framework for transferring knowledge represented as a stochastic abstract policy in new RL tasks. Stochastic abstract policies offer an effective way to encode knowledge because the abstraction they provide not only generalizes solutions but also facilitates extracting the similarities among tasks. We perform experiments in a robotic navigation environment and analyze the agent's behavior throughout the learning process and also assess the transfer ratio for different amounts of source tasks. We compare our method with the transfer of a library of policies, and experiments show that the use of a generalized policy produces better results by more effectively guiding the agent when learning a target task.

  4. Reinforcement learning in computer vision

    Science.gov (United States)

    Bernstein, A. V.; Burnaev, E. V.

    2018-04-01

    Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.

  5. Learning to trade via direct reinforcement.

    Science.gov (United States)

    Moody, J; Saffell, M

    2001-01-01

    We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision-making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills.

  6. A Model to Explain the Emergence of Reward Expectancy neurons using Reinforcement Learning and Neural Network

    OpenAIRE

    Shinya, Ishii; Munetaka, Shidara; Katsunari, Shibata

    2006-01-01

    In an experiment of multi-trial task to obtain a reward, reward expectancy neurons,###which responded only in the non-reward trials that are necessary to advance###toward the reward, have been observed in the anterior cingulate cortex of monkeys.###In this paper, to explain the emergence of the reward expectancy neuron in###terms of reinforcement learning theory, a model that consists of a recurrent neural###network trained based on reinforcement learning is proposed. The analysis of the###hi...

  7. Curiosity driven reinforcement learning for motion planning on humanoids

    Science.gov (United States)

    Frank, Mikhail; Leitner, Jürgen; Stollenga, Marijn; Förster, Alexander; Schmidhuber, Jürgen

    2014-01-01

    Most previous work on artificial curiosity (AC) and intrinsic motivation focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study AC in a more realistic setting, we embody a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning (RL) framework consists of a state-of-the-art, low-level, reactive control layer, which controls the iCub while respecting constraints, and a high-level curious agent, which explores the iCub's state-action space through information gain maximization, learning a world model from experience, controlling the actual iCub hardware in real-time. To the best of our knowledge, this is the first ever embodied, curious agent for real-time motion planning on a humanoid. We demonstrate that it can learn compact Markov models to represent large regions of the iCub's configuration space, and that the iCub explores intelligently, showing interest in its physical constraints as well as in objects it finds in its environment. PMID:24432001

  8. Functional Contour-following via Haptic Perception and Reinforcement Learning.

    Science.gov (United States)

    Hellman, Randall B; Tekin, Cem; van der Schaar, Mihaela; Santos, Veronica J

    2018-01-01

    Many tasks involve the fine manipulation of objects despite limited visual feedback. In such scenarios, tactile and proprioceptive feedback can be leveraged for task completion. We present an approach for real-time haptic perception and decision-making for a haptics-driven, functional contour-following task: the closure of a ziplock bag. This task is challenging for robots because the bag is deformable, transparent, and visually occluded by artificial fingertip sensors that are also compliant. A deep neural net classifier was trained to estimate the state of a zipper within a robot's pinch grasp. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards by balancing exploration versus exploitation of the state-action space. The C-MAB learner outperformed a benchmark Q-learner by more efficiently exploring the state-action space while learning a hard-to-code task. The learned C-MAB policy was tested with novel ziplock bag scenarios and contours (wire, rope). Importantly, this work contributes to the development of reinforcement learning approaches that account for limited resources such as hardware life and researcher time. As robots are used to perform complex, physically interactive tasks in unstructured or unmodeled environments, it becomes important to develop methods that enable efficient and effective learning with physical testbeds.

  9. Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer.

    Science.gov (United States)

    Hu, Yujing; Gao, Yang; An, Bo

    2015-07-01

    An important approach in multiagent reinforcement learning (MARL) is equilibrium-based MARL, which adopts equilibrium solution concepts in game theory and requires agents to play equilibrium strategies at each state. However, most existing equilibrium-based MARL algorithms cannot scale due to a large number of computationally expensive equilibrium computations (e.g., computing Nash equilibria is PPAD-hard) during learning. For the first time, this paper finds that during the learning process of equilibrium-based MARL, the one-shot games corresponding to each state's successive visits often have the same or similar equilibria (for some states more than 90% of games corresponding to successive visits have similar equilibria). Inspired by this observation, this paper proposes to use equilibrium transfer to accelerate equilibrium-based MARL. The key idea of equilibrium transfer is to reuse previously computed equilibria when each agent has a small incentive to deviate. By introducing transfer loss and transfer condition, a novel framework called equilibrium transfer-based MARL is proposed. We prove that although equilibrium transfer brings transfer loss, equilibrium-based MARL algorithms can still converge to an equilibrium policy under certain assumptions. Experimental results in widely used benchmarks (e.g., grid world game, soccer game, and wall game) show that the proposed framework: 1) not only significantly accelerates equilibrium-based MARL (up to 96.7% reduction in learning time), but also achieves higher average rewards than algorithms without equilibrium transfer and 2) scales significantly better than algorithms without equilibrium transfer when the state/action space grows and the number of agents increases.

  10. Learning Natural Selection in 4th Grade with Multi-Agent-Based Computational Models

    Science.gov (United States)

    Dickes, Amanda Catherine; Sengupta, Pratim

    2013-01-01

    In this paper, we investigate how elementary school students develop multi-level explanations of population dynamics in a simple predator-prey ecosystem, through scaffolded interactions with a multi-agent-based computational model (MABM). The term "agent" in an MABM indicates individual computational objects or actors (e.g., cars), and these…

  11. The "proactive" model of learning: Integrative framework for model-free and model-based reinforcement learning utilizing the associative learning-based proactive brain concept.

    Science.gov (United States)

    Zsuga, Judit; Biro, Klara; Papp, Csaba; Tajti, Gabor; Gesztelyi, Rudolf

    2016-02-01

    Reinforcement learning (RL) is a powerful concept underlying forms of associative learning governed by the use of a scalar reward signal, with learning taking place if expectations are violated. RL may be assessed using model-based and model-free approaches. Model-based reinforcement learning involves the amygdala, the hippocampus, and the orbitofrontal cortex (OFC). The model-free system involves the pedunculopontine-tegmental nucleus (PPTgN), the ventral tegmental area (VTA) and the ventral striatum (VS). Based on the functional connectivity of VS, model-free and model based RL systems center on the VS that by integrating model-free signals (received as reward prediction error) and model-based reward related input computes value. Using the concept of reinforcement learning agent we propose that the VS serves as the value function component of the RL agent. Regarding the model utilized for model-based computations we turned to the proactive brain concept, which offers an ubiquitous function for the default network based on its great functional overlap with contextual associative areas. Hence, by means of the default network the brain continuously organizes its environment into context frames enabling the formulation of analogy-based association that are turned into predictions of what to expect. The OFC integrates reward-related information into context frames upon computing reward expectation by compiling stimulus-reward and context-reward information offered by the amygdala and hippocampus, respectively. Furthermore we suggest that the integration of model-based expectations regarding reward into the value signal is further supported by the efferent of the OFC that reach structures canonical for model-free learning (e.g., the PPTgN, VTA, and VS). (c) 2016 APA, all rights reserved).

  12. A reinforcement learning model of joy, distress, hope and fear

    Science.gov (United States)

    Broekens, Joost; Jacobs, Elmer; Jonker, Catholijn M.

    2015-07-01

    In this paper we computationally study the relation between adaptive behaviour and emotion. Using the reinforcement learning framework, we propose that learned state utility, ?, models fear (negative) and hope (positive) based on the fact that both signals are about anticipation of loss or gain. Further, we propose that joy/distress is a signal similar to the error signal. We present agent-based simulation experiments that show that this model replicates psychological and behavioural dynamics of emotion. This work distinguishes itself by assessing the dynamics of emotion in an adaptive agent framework - coupling it to the literature on habituation, development, extinction and hope theory. Our results support the idea that the function of emotion is to provide a complex feedback signal for an organism to adapt its behaviour. Our work is relevant for understanding the relation between emotion and adaptation in animals, as well as for human-robot interaction, in particular how emotional signals can be used to communicate between adaptive agents and humans.

  13. Performance Comparison of Two Reinforcement Learning Algorithms for Small Mobile Robots

    Czech Academy of Sciences Publication Activity Database

    Neruda, Roman; Slušný, Stanislav

    2009-01-01

    Roč. 2, č. 1 (2009), s. 59-68 ISSN 2005-4297 R&D Projects: GA MŠk(CZ) 1M0567 Grant - others:GA UK(CZ) 7637/2007 Institutional research plan: CEZ:AV0Z10300504 Keywords : reinforcement learning * mobile robots * inteligent agents Subject RIV: IN - Informatics, Computer Science http://www.sersc.org/journals/IJCA/vol2_no1/7.pdf

  14. Multi-Agent Systems for E-Commerce

    OpenAIRE

    Solodukha, T. V.; Sosnovskiy, O. A.; Zhelezko, B. A.

    2009-01-01

    The article focuses on multi-agent systems (MAS) and domains that can benefit from multi-agent technology. In the last few years, the agent based modeling (ABM) community has developed several practical agent based modeling toolkits that enable individuals to develop agent-based applications. The comparison of agent-based modeling toolkits is given. Multi-agent systems are designed to handle changing and dynamic business processes. Any organization with complex and distributed business pro...

  15. Game-theoretic learning and distributed optimization in memoryless multi-agent systems

    CERN Document Server

    Tatarenko, Tatiana

    2017-01-01

    This book presents new efficient methods for optimization in realistic large-scale, multi-agent systems. These methods do not require the agents to have the full information about the system, but instead allow them to make their local decisions based only on the local information, possibly obtained during scommunication with their local neighbors. The book, primarily aimed at researchers in optimization and control, considers three different information settings in multi-agent systems: oracle-based, communication-based, and payoff-based. For each of these information types, an efficient optimization algorithm is developed, which leads the system to an optimal state. The optimization problems are set without such restrictive assumptions as convexity of the objective functions, complicated communication topologies, closed-form expressions for costs and utilities, and finiteness of the system’s state space. .

  16. Enhanced risk management by an emerging multi-agent architecture

    Science.gov (United States)

    Lin, Sin-Jin; Hsu, Ming-Fu

    2014-07-01

    Classification in imbalanced datasets has attracted much attention from researchers in the field of machine learning. Most existing techniques tend not to perform well on minority class instances when the dataset is highly skewed because they focus on minimising the forecasting error without considering the relative distribution of each class. This investigation proposes an emerging multi-agent architecture, grounded on cooperative learning, to solve the class-imbalanced classification problem. Additionally, this study deals further with the obscure nature of the multi-agent architecture and expresses comprehensive rules for auditors. The results from this study indicate that the presented model performs satisfactorily in risk management and is able to tackle a highly class-imbalanced dataset comparatively well. Furthermore, the knowledge visualised process, supported by real examples, can assist both internal and external auditors who must allocate limited detecting resources; they can take the rules as roadmaps to modify the auditing programme.

  17. Spike-based decision learning of Nash equilibria in two-player games.

    Directory of Open Access Journals (Sweden)

    Johannes Friedrich

    Full Text Available Humans and animals face decision tasks in an uncertain multi-agent environment where an agent's strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic and mixed (stochastic Nash equilibrium, respectively. In contrast, temporal-difference(TD-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.

  18. Research and application of multi-agent genetic algorithm in tower defense game

    Science.gov (United States)

    Jin, Shaohua

    2018-04-01

    In this paper, a new multi-agent genetic algorithm based on orthogonal experiment is proposed, which is based on multi-agent system, genetic algorithm and orthogonal experimental design. The design of neighborhood competition operator, orthogonal crossover operator, Son and self-learning operator. The new algorithm is applied to mobile tower defense game, according to the characteristics of the game, the establishment of mathematical models, and finally increases the value of the game's monster.

  19. Tank War Using Online Reinforcement Learning

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-Time Strategy(RTS) games provide a challenging platform to implement online reinforcement learning(RL) techniques in a real application. Computer as one player monitors opponents'(human or other computers) strategies and then updates its own policy using RL methods. In this paper, we propose...... a multi-layer framework for implementing the online RL in a RTS game. The framework significantly reduces the RL computational complexity by decomposing the state space in a hierarchical manner. We implement the RTS game - Tank General, and perform a thorough test on the proposed framework. The results...... show the effectiveness of our proposed framework and shed light on relevant issues on using the RL in RTS games....

  20. Multiagent Reinforcement Learning with Regret Matching for Robot Soccer

    Directory of Open Access Journals (Sweden)

    Qiang Liu

    2013-01-01

    Full Text Available This paper proposes a novel multiagent reinforcement learning (MARL algorithm Nash- learning with regret matching, in which regret matching is used to speed up the well-known MARL algorithm Nash- learning. It is critical that choosing a suitable strategy for action selection to harmonize the relation between exploration and exploitation to enhance the ability of online learning for Nash- learning. In Markov Game the joint action of agents adopting regret matching algorithm can converge to a group of points of no-regret that can be viewed as coarse correlated equilibrium which includes Nash equilibrium in essence. It is can be inferred that regret matching can guide exploration of the state-action space so that the rate of convergence of Nash- learning algorithm can be increased. Simulation results on robot soccer validate that compared to original Nash- learning algorithm, the use of regret matching during the learning phase of Nash- learning has excellent ability of online learning and results in significant performance in terms of scores, average reward and policy convergence.

  1. Fairness in multi-agent systems

    NARCIS (Netherlands)

    Jong, de S.; Tuyls, K.P.; Verbeeck, K.

    2008-01-01

    Multi-agent systems are complex systems in which multiple autonomous entities, called agents, cooperate in order to achieve a common or personal goal. These entities may be computer software, robots, and also humans. In fact, many multi-agent systems are intended to operate in cooperation with or as

  2. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning.

    Science.gov (United States)

    Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S

    2011-01-01

    As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. © 2011 IEEE

  3. Multi-agent and complex systems

    CERN Document Server

    Ren, Fenghui; Fujita, Katsuhide; Zhang, Minjie; Ito, Takayuki

    2017-01-01

    This book provides a description of advanced multi-agent and artificial intelligence technologies for the modeling and simulation of complex systems, as well as an overview of the latest scientific efforts in this field. A complex system features a large number of interacting components, whose aggregate activities are nonlinear and self-organized. A multi-agent system is a group or society of agents which interact with others cooperatively and/or competitively in order to reach their individual or common goals. Multi-agent systems are suitable for modeling and simulation of complex systems, which is difficult to accomplish using traditional computational approaches.

  4. Multiagent-Based Simulation of Temporal-Spatial Characteristics of Activity-Travel Patterns Using Interactive Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Min Yang

    2014-01-01

    Full Text Available We propose a multiagent-based reinforcement learning algorithm, in which the interactions between travelers and the environment are considered to simulate temporal-spatial characteristics of activity-travel patterns in a city. Road congestion degree is added to the reinforcement learning algorithm as a medium that passes the influence of one traveler’s decision to others. Meanwhile, the agents used in the algorithm are initialized from typical activity patterns extracted from the travel survey diary data of Shangyu city in China. In the simulation, both macroscopic activity-travel characteristics such as traffic flow spatial-temporal distribution and microscopic characteristics such as activity-travel schedules of each agent are obtained. Comparing the simulation results with the survey data, we find that deviation of the peak-hour traffic flow is less than 5%, while the correlation of the simulated versus survey location choice distribution is over 0.9.

  5. Framework for robot skill learning using reinforcement learning

    Science.gov (United States)

    Wei, Yingzi; Zhao, Mingyang

    2003-09-01

    Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.

  6. Multi-Agent Inference in Social Networks: A Finite Population Learning Approach.

    Science.gov (United States)

    Fan, Jianqing; Tong, Xin; Zeng, Yao

    When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people's incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning , to address whether with high probability, a large fraction of people in a given finite population network can make "good" inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows.

  7. 10th KES Conference on Agent and Multi-Agent Systems : Technologies and Applications

    CERN Document Server

    Chen-Burger, Yun-Heh; Howlett, Robert; Jain, Lakhmi

    2016-01-01

    The modern economy is driven by technologies and knowledge. Digital technologies can free, shift and multiply choices, often intruding on the space of other industries, by providing new ways of conducting business operations and creating values for customers and companies. The topics covered in this volume include software agents, multi-agent systems, agent modelling, mobile and cloud computing, big data analysis, business intelligence, artificial intelligence, social systems, computer embedded systems and nature inspired manufacturing, etc. that contribute to the modern Digital Economy. This volume highlights new trends and challenges in agent, new digital and knowledge economy research and includes 28 papers classified in the following specific topics: business process management, agent-based modeling and simulation, anthropic-oriented computing, learning paradigms, business informatics and gaming, digital economy, and advances in networked virtual enterprises. Published papers were selected for presentatio...

  8. An agent-based approach equipped with game theory. Strategic collaboration among learning agents during a dynamic market change in the California electricity crisis

    Energy Technology Data Exchange (ETDEWEB)

    Sueyoshi, Toshiyuki [Department of Management, New Mexico Institute of Mining and Technology, Socorro, NM 87801 (United States); Department of Industrial and Information Management, National Cheng Kung University, Tainan (China)

    2010-09-15

    An agent-based approach is a numerical (computer-intensive) method to explore the complex characteristics and dynamics of microeconomics. Using the agent-based approach, this study investigates the learning speed of traders and their strategic collaboration in a dynamic market change of electricity. An example of such a market change can be found in the California electricity crisis (2000-2001). This study incorporates the concept of partial reinforcement learning into trading agents and finds that they have two learning components: learning from a dynamic market change and learning from collaboration with other traders. The learning speed of traders becomes slow when a large fluctuation occurs in the power exchange market. The learning speed depends upon the type of traders, their learning capabilities and the fluctuation of market fundamentals. The degree of collaboration among traders gradually reduces during the electricity crisis. The strategic collaboration among traders is examined by a large simulator equipped with multiple learning capabilities. (author)

  9. An agent-based approach equipped with game theory. Strategic collaboration among learning agents during a dynamic market change in the California electricity crisis

    International Nuclear Information System (INIS)

    Sueyoshi, Toshiyuki

    2010-01-01

    An agent-based approach is a numerical (computer-intensive) method to explore the complex characteristics and dynamics of microeconomics. Using the agent-based approach, this study investigates the learning speed of traders and their strategic collaboration in a dynamic market change of electricity. An example of such a market change can be found in the California electricity crisis (2000-2001). This study incorporates the concept of partial reinforcement learning into trading agents and finds that they have two learning components: learning from a dynamic market change and learning from collaboration with other traders. The learning speed of traders becomes slow when a large fluctuation occurs in the power exchange market. The learning speed depends upon the type of traders, their learning capabilities and the fluctuation of market fundamentals. The degree of collaboration among traders gradually reduces during the electricity crisis. The strategic collaboration among traders is examined by a large simulator equipped with multiple learning capabilities. (author)

  10. Advances on Practical Applications of Agents and Multi-Agent Systems 10th International Conference on Practical Applications of Agents and Multi-Agent Systems

    CERN Document Server

    Müller, Jörg; Rodríguez, Juan; Pérez, Javier

    2012-01-01

    Research on Agents and Multi-Agent Systems has matured during the last decade and many effective applications of this technology are now deployed. PAAMS provides an international forum to present and discuss the latest scientific developments and their effective applications, to assess the impact of the approach, and to facilitate technology transfer. PAAMS started as a local initiative, but has since grown to become THE international yearly platform to present, to discuss, and to disseminate the latest developments and the most important outcomes related to real-world applications. It provides a unique opportunity to bring multi-disciplinary experts, academics and practitioners together to exchange their experience in the development and deployment of Agents and Multi-Agent Systems. PAAMS intends to bring together researchers and developers from industry and the academic world to report on the latest scientific and technical advances on the application of multi-agent systems, to discuss and debate the major ...

  11. Highlights on Practical Applications of Agents and Multi-Agent Systems 10th International Conference on Practical Applications of Agents and Multi-Agent Systems

    CERN Document Server

    Sánchez, Miguel; Mathieu, Philippe; Rodríguez, Juan; Adam, Emmanuel; Ortega, Alfonso; Moreno, María; Navarro, Elena; Hirsch, Benjamin; Lopes-Cardoso, Henrique; Julián, Vicente

    2012-01-01

    Research on Agents and Multi-Agent Systems has matured during the last decade and many effective applications of this technology are now deployed. PAAMS provides an international forum to present and discuss the latest scientific developments and their effective applications, to assess the impact of the approach, and to facilitate technology transfer. PAAMS started as a local initiative, but has since grown to become THE international yearly platform to present, to discuss, and to disseminate the latest developments and the most important outcomes related to real-world applications. It provides a unique opportunity to bring multi-disciplinary experts, academics and practitioners together to exchange their experience in the development and deployment of Agents and Multi-Agent Systems. PAAMS intends to bring together researchers and developers from industry and the academic world to report on the latest scientific and technical advances on the application of multi-agent systems, to discuss and debate the major ...

  12. SCAFFOLDINGAND REINFORCEMENT: USING DIGITAL LOGBOOKS IN LEARNING VOCABULARY

    OpenAIRE

    Khalifa, Salma Hasan Almabrouk; Shabdin, Ahmad Affendi

    2016-01-01

    Reinforcement and scaffolding are tested approaches to enhance learning achievements. Keeping a record of the learning process as well as the new learned words functions as scaffolding to help learners build a comprehensive vocabulary. Similarly, repetitive learning of new words reinforces permanent learning for long-term memory. Paper-based logbooks may prove to be good records of the learning process, but if learners use digital logbooks, the results may be even better. Digital logbooks wit...

  13. Fuzzy OLAP association rules mining-based modular reinforcement learning approach for multiagent systems.

    Science.gov (United States)

    Kaya, Mehmet; Alhajj, Reda

    2005-04-01

    Multiagent systems and data mining have recently attracted considerable attention in the field of computing. Reinforcement learning is the most commonly used learning process for multiagent systems. However, it still has some drawbacks, including modeling other learning agents present in the domain as part of the state of the environment, and some states are experienced much less than others, or some state-action pairs are never visited during the learning phase. Further, before completing the learning process, an agent cannot exhibit a certain behavior in some states that may be experienced sufficiently. In this study, we propose a novel multiagent learning approach to handle these problems. Our approach is based on utilizing the mining process for modular cooperative learning systems. It incorporates fuzziness and online analytical processing (OLAP) based mining to effectively process the information reported by agents. First, we describe a fuzzy data cube OLAP architecture which facilitates effective storage and processing of the state information reported by agents. This way, the action of the other agent, not even in the visual environment. of the agent under consideration, can simply be predicted by extracting online association rules, a well-known data mining technique, from the constructed data cube. Second, we present a new action selection model, which is also based on association rules mining. Finally, we generalize not sufficiently experienced states, by mining multilevel association rules from the proposed fuzzy data cube. Experimental results obtained on two different versions of a well-known pursuit domain show the robustness and effectiveness of the proposed fuzzy OLAP mining based modular learning approach. Finally, we tested the scalability of the approach presented in this paper and compared it with our previous work on modular-fuzzy Q-learning and ordinary Q-learning.

  14. An intelligent agent for optimal river-reservoir system management

    Science.gov (United States)

    Rieker, Jeffrey D.; Labadie, John W.

    2012-09-01

    A generalized software package is presented for developing an intelligent agent for stochastic optimization of complex river-reservoir system management and operations. Reinforcement learning is an approach to artificial intelligence for developing a decision-making agent that learns the best operational policies without the need for explicit probabilistic models of hydrologic system behavior. The agent learns these strategies experientially in a Markov decision process through observational interaction with the environment and simulation of the river-reservoir system using well-calibrated models. The graphical user interface for the reinforcement learning process controller includes numerous learning method options and dynamic displays for visualizing the adaptive behavior of the agent. As a case study, the generalized reinforcement learning software is applied to developing an intelligent agent for optimal management of water stored in the Truckee river-reservoir system of California and Nevada for the purpose of streamflow augmentation for water quality enhancement. The intelligent agent successfully learns long-term reservoir operational policies that specifically focus on mitigating water temperature extremes during persistent drought periods that jeopardize the survival of threatened and endangered fish species.

  15. Episodic reinforcement learning control approach for biped walking

    Directory of Open Access Journals (Sweden)

    Katić Duško

    2012-01-01

    Full Text Available This paper presents a hybrid dynamic control approach to the realization of humanoid biped robotic walk, focusing on the policy gradient episodic reinforcement learning with fuzzy evaluative feedback. The proposed structure of controller involves two feedback loops: a conventional computed torque controller and an episodic reinforcement learning controller. The reinforcement learning part includes fuzzy information about Zero-Moment- Point errors. Simulation tests using a medium-size 36-DOF humanoid robot MEXONE were performed to demonstrate the effectiveness of our method.

  16. Reinforcement learning in complementarity game and population dynamics.

    Science.gov (United States)

    Jost, Jürgen; Li, Wei

    2014-02-01

    We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005)] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.

  17. 10th International Conference on Practical Applications of Agents and Multi-Agent Systems

    CERN Document Server

    Pérez, Javier; Golinska, Paulina; Giroux, Sylvain; Corchuelo, Rafael; Trends in Practical Applications of Agents and Multiagent Systems

    2012-01-01

    PAAMS, the International Conference on Practical Applications of Agents and Multi-Agent Systems is an evolution of the International Workshop on Practical Applications of Agents and Multi-Agent Systems. PAAMS is an international yearly tribune to present, to discuss, and to disseminate the latest developments and the most important outcomes related to real-world applications. It provides a unique opportunity to bring multi-disciplinary experts, academics and practitioners together to exchange their experience in the development of Agents and Multi-Agent Systems.   This volume presents the papers that have been accepted for the 2012 in the workshops: Workshop on Agents for Ambient Assisted Living, Workshop on Agent-Based Solutions for Manufacturing and Supply Chain and Workshop on Agents and Multi-agent systems for Enterprise Integration.

  18. Building Multi-Agent Systems Using Jason

    DEFF Research Database (Denmark)

    Boss, Niklas Skamriis; Jensen, Andreas Schmidt; Villadsen, Jørgen

    2010-01-01

    We provide a detailed description of the Jason-DTU system, including the used methodology, tools as well as team strategy. We also discuss the experience gathered in the contest. In spring 2009 the course “Artificial Intelligence and Multi- Agent Systems” was held for the first time...... on the Technical University of Denmark (DTU). A part of this course was a short introduction to the multi-agent framework Jason, which is an interpreter for AgentSpeak, an agent-oriented programming language. As the final project in this course a solution to the Multi-Agent Programming Contest from 2007, the Gold...

  19. A Multi-Agent Based Energy Management Solution for Integrated Buildings and Microgrid System

    DEFF Research Database (Denmark)

    Anvari-Moghaddam, Amjad; Rahimi-Kian, Ashkan; Mirian, Maryam S.

    2017-01-01

    -reflex to complex learning agents are designed and implemented to cooperate with each other to reach an optimal operating strategy for the mentioned integrated energy system (IES) while meeting the system’s objectives and related constraints. The optimization process for the EMS is defined as a coordinated......In this paper, an ontology-driven multi-agent based energy management system (EMS) is proposed for monitoring and optimal control of an integrated homes/buildings and microgrid system with various renewable energy resources (RESs) and controllable loads. Different agents ranging from simple...... distributed generation (DG) and demand response (DR) management problem within the studied environment and is solved by the proposed agent-based approach utilizing cooperation and communication among decision agents. To verify the effectiveness and applicability of the proposed multi-agent based EMS, several...

  20. Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers

    Science.gov (United States)

    Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.

    2014-01-01

    Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…

  1. Reinforcement learning in continuous state and action spaces

    NARCIS (Netherlands)

    H. P. van Hasselt (Hado); M.A. Wiering; M. van Otterlo

    2012-01-01

    textabstractMany traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action

  2. Regulated open multi-agent systems (ROMAS) a multi-agent approach for designing normative open systems

    CERN Document Server

    Garcia, Emilia; Botti, Vicente

    2015-01-01

    Addressing the open problem of engineering normative open systems using the multi-agent paradigm, normative open systems are explained as systems in which heterogeneous and autonomous entities and institutions coexist in a complex social and legal framework that can evolve to address the different and often conflicting objectives of the many stakeholders involved. Presenting  a software engineering approach which covers both the analysis and design of these kinds of systems, and which deals with the open issues in the area, ROMAS (Regulated Open Multi-Agent Systems) defines a specific multi-agent architecture, meta-model, methodology and CASE tool. This CASE tool is based on Model-Driven technology and integrates the graphical design with the formal verification of some properties of these systems by means of model checking techniques. Utilizing tables to enhance reader insights into the most important requirements for designing normative open multi-agent systems, the book also provides a detailed and easy t...

  3. Ontology-based multi-agent systems

    Energy Technology Data Exchange (ETDEWEB)

    Hadzic, Maja; Wongthongtham, Pornpit; Dillon, Tharam; Chang, Elizabeth [Digital Ecosystems and Business Intelligence Institute, Perth, WA (Australia)

    2009-07-01

    The Semantic web has given a great deal of impetus to the development of ontologies and multi-agent systems. Several books have appeared which discuss the development of ontologies or of multi-agent systems separately on their own. The growing interaction between agents and ontologies has highlighted the need for integrated development of these. This book is unique in being the first to provide an integrated treatment of the modeling, design and implementation of such combined ontology/multi-agent systems. It provides clear exposition of this integrated modeling and design methodology. It further illustrates this with two detailed case studies in (a) the biomedical area and (b) the software engineering area. The book is, therefore, of interest to researchers, graduate students and practitioners in the semantic web and web science area. (orig.)

  4. Neural Basis of Reinforcement Learning and Decision Making

    Science.gov (United States)

    Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan

    2012-01-01

    Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543

  5. A Plant Control Technology Using Reinforcement Learning Method with Automatic Reward Adjustment

    Science.gov (United States)

    Eguchi, Toru; Sekiai, Takaaki; Yamada, Akihiro; Shimizu, Satoru; Fukai, Masayuki

    A control technology using Reinforcement Learning (RL) and Radial Basis Function (RBF) Network has been developed to reduce environmental load substances exhausted from power and industrial plants. This technology consists of the statistic model using RBF Network, which estimates characteristics of plants with respect to environmental load substances, and RL agent, which learns the control logic for the plants using the statistic model. In this technology, it is necessary to design an appropriate reward function given to the agent immediately according to operation conditions and control goals to control plants flexibly. Therefore, we propose an automatic reward adjusting method of RL for plant control. This method adjusts the reward function automatically using information of the statistic model obtained in its learning process. In the simulations, it is confirmed that the proposed method can adjust the reward function adaptively for several test functions, and executes robust control toward the thermal power plant considering the change of operation conditions and control goals.

  6. Autonomous reinforcement learning with experience replay.

    Science.gov (United States)

    Wawrzyński, Paweł; Tanwani, Ajay Kumar

    2013-05-01

    This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. Copyright © 2012 Elsevier Ltd. All rights reserved.

  7. Reinforcement Learning with Autonomous Small Unmanned Aerial Vehicles in Cluttered Environments

    Science.gov (United States)

    Tran, Loc; Cross, Charles; Montague, Gilbert; Motter, Mark; Neilan, James; Qualls, Garry; Rothhaar, Paul; Trujillo, Anna; Allen, B. Danette

    2015-01-01

    We present ongoing work in the Autonomy Incubator at NASA Langley Research Center (LaRC) exploring the efficacy of a data set aggregation approach to reinforcement learning for small unmanned aerial vehicle (sUAV) flight in dense and cluttered environments with reactive obstacle avoidance. The goal is to learn an autonomous flight model using training experiences from a human piloting a sUAV around static obstacles. The training approach uses video data from a forward-facing camera that records the human pilot's flight. Various computer vision based features are extracted from the video relating to edge and gradient information. The recorded human-controlled inputs are used to train an autonomous control model that correlates the extracted feature vector to a yaw command. As part of the reinforcement learning approach, the autonomous control model is iteratively updated with feedback from a human agent who corrects undesired model output. This data driven approach to autonomous obstacle avoidance is explored for simulated forest environments furthering autonomous flight under the tree canopy research. This enables flight in previously inaccessible environments which are of interest to NASA researchers in Earth and Atmospheric sciences.

  8. Reinforcement Learning in Repeated Portfolio Decisions

    OpenAIRE

    Diao, Linan; Rieskamp, Jörg

    2011-01-01

    How do people make investment decisions when they receive outcome feedback? We examined how well the standard mean-variance model and two reinforcement models predict people's portfolio decisions. The basic reinforcement model predicts a learning process that relies solely on the portfolio's overall return, whereas the proposed extended reinforcement model also takes the risk and covariance of the investments into account. The experimental results illustrate that people reacted sensitively to...

  9. Reinforcement learning improves behaviour from evaluative feedback

    Science.gov (United States)

    Littman, Michael L.

    2015-05-01

    Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.

  10. Logics for Intelligent Agents and Multi-Agent Systems

    NARCIS (Netherlands)

    Meyer, John-Jules Charles

    2014-01-01

    This chapter presents the history of the application of logic in a quite popular paradigm in contemporary computer science and artificial intelligence, viz. the area of intelligent agents and multi-agent systems. In particular we discuss the logics that have been used to specify single agents, the

  11. Reinforcement learning for dpm of embedded visual sensor nodes

    International Nuclear Information System (INIS)

    Khani, U.; Sadhayo, I. H.

    2014-01-01

    This paper proposes a RL (Reinforcement Learning) based DPM (Dynamic Power Management) technique to learn time out policies during a visual sensor node's operation which has multiple power/performance states. As opposed to the widely used static time out policies, our proposed DPM policy which is also referred to as OLTP (Online Learning of Time out Policies), learns to dynamically change the time out decisions in the different node states including the non-operational states. The selection of time out values in different power/performance states of a visual sensing platform is based on the workload estimates derived from a ML-ANN (Multi-Layer Artificial Neural Network) and an objective function given by weighted performance and power parameters. The DPM approach is also able to dynamically adjust the power-performance weights online to satisfy a given constraint of either power consumption or performance. Results show that the proposed learning algorithm explores the power-performance tradeoff with non-stationary workload and outperforms other DPM policies. It also performs the online adjustment of the tradeoff parameters in order to meet a user-specified constraint. (author)

  12. Reinforcement mechanism of multi-anchor wall with double wall facing

    Science.gov (United States)

    Suzuki, Kouta; Kobayashi, Makoto; Miura, Kinya; Konami, Takeharu; Hayashi, Taketo

    2017-10-01

    The reinforced soil wall has high seismic performance as generally known. However, the seismic behavior has not been clarified accurately yet, especially on multi-anchor wall with double wall facing. Indefinite behavior of reinforced soil wall during earthquake make us complicated in case with adopting to the abutment, because of arrangement of anchor plate as reinforcement often different according to the width of roads. In this study, a series of centrifuge model tests were carried out to investigate the reinforcement mechanism of multi anchor wall with double wall facing from the perspective of the vertical earth pressure. Several types of reinforce arrangement and rigid wall were applied in order to verify the arch function in the reinforced regions. The test results show unique behavior of vertical earth pressure, which was affected by arch action. All the vertical earth pressure placed behind facing panel, are larger than that of middle part between facing panel despite of friction between backfill and facing panel. Similar results were obtained in case using rigid wall. On the other hands, the vertical earth pressure, which were measured at the 3cm high from bottom of model container, shows larger than that of bottom. This results show the existence of arch action between double walls. In addition, it implies that the wall facing of such soil structure confined the backfill as pseudo wall, which is very reason that the multi anchor wall with double wall facing has high seismic performance.

  13. Solution to reinforcement learning problems with artificial potential field

    Institute of Scientific and Technical Information of China (English)

    XIE Li-juan; XIE Guang-rong; CHEN Huan-wen; LI Xiao-li

    2008-01-01

    A novel method was designed to solve reinforcement learning problems with artificial potential field. Firstly a reinforcement learning problem was transferred to a path planning problem by using artificial potential field(APF), which was a very appropriate method to model a reinforcement learning problem. Secondly, a new APF algorithm was proposed to overcome the local minimum problem in the potential field methods with a virtual water-flow concept. The performance of this new method was tested by a gridworld problem named as key and door maze. The experimental results show that within 45 trials, good and deterministic policies are found in almost all simulations. In comparison with WIERING's HQ-learning system which needs 20 000 trials for stable solution, the proposed new method can obtain optimal and stable policy far more quickly than HQ-learning. Therefore, the new method is simple and effective to give an optimal solution to the reinforcement learning problem.

  14. Applications of Multi-Agent Technology to Power Systems

    Science.gov (United States)

    Nagata, Takeshi

    Currently, agents are focus of intense on many sub-fields of computer science and artificial intelligence. Agents are being used in an increasingly wide variety of applications. Many important computing applications such as planning, process control, communication networks and concurrent systems will benefit from using multi-agent system approach. A multi-agent system is a structure given by an environment together with a set of artificial agents capable to act on this environment. Multi-agent models are oriented towards interactions, collaborative phenomena, and autonomy. This article presents the applications of multi-agent technology to the power systems.

  15. Multi-target consensus circle pursuit for multi-agent systems via a distributed multi-flocking method

    Science.gov (United States)

    Pei, Huiqin; Chen, Shiming; Lai, Qiang

    2016-12-01

    This paper studies the multi-target consensus pursuit problem of multi-agent systems. For solving the problem, a distributed multi-flocking method is designed based on the partial information exchange, which is employed to realise the pursuit of multi-target and the uniform distribution of the number of pursuing agents with the dynamic target. Combining with the proposed circle formation control strategy, agents can adaptively choose the target to form the different circle formation groups accomplishing a multi-target pursuit. The speed state of pursuing agents in each group converges to the same value. A Lyapunov approach is utilised to analyse the stability of multi-agent systems. In addition, a sufficient condition is given for achieving the dynamic target consensus pursuit, and which is then analysed. Finally, simulation results verify the effectiveness of the proposed approaches.

  16. Reinforcement Learning in Autism Spectrum Disorder

    Directory of Open Access Journals (Sweden)

    Manuela Schuetze

    2017-11-01

    Full Text Available Early behavioral interventions are recognized as integral to standard care in autism spectrum disorder (ASD, and often focus on reinforcing desired behaviors (e.g., eye contact and reducing the presence of atypical behaviors (e.g., echoing others' phrases. However, efficacy of these programs is mixed. Reinforcement learning relies on neurocircuitry that has been reported to be atypical in ASD: prefrontal-sub-cortical circuits, amygdala, brainstem, and cerebellum. Thus, early behavioral interventions rely on neurocircuitry that may function atypically in at least a subset of individuals with ASD. Recent work has investigated physiological, behavioral, and neural responses to reinforcers to uncover differences in motivation and learning in ASD. We will synthesize this work to identify promising avenues for future research that ultimately can be used to enhance the efficacy of early intervention.

  17. Reinforcement and inference in cross-situational word learning.

    Science.gov (United States)

    Tilles, Paulo F C; Fontanari, José F

    2013-01-01

    Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation) inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only) reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.

  18. Study on state grouping and opportunity evaluation for reinforcement learning methods; Kyoka gakushuho no tame no jotai grouping to opportunity hyoka ni kansuru kenkyu

    Energy Technology Data Exchange (ETDEWEB)

    Yu, W.; Yokoi, H.; Kakazu, Y. [Hokkaido University, Sapporo (Japan)

    1997-08-20

    In this paper, we propose the State Grouping scheme for coping with the problem of scaling up the Reinforcement Learning Algorithm to real, large size application. The grouping scheme is based on geographical and trial-error information, and is made up with state generating, state combining, state splitting, state forgetting procedures, with corresponding action selecting module and learning module. Also, we discuss the Labeling Based Evaluation scheme which can evaluate the opportunity of the state-action pair, therefore, use better experience to guide the exploration of the state-space effectively. Incorporating the Labeling Based Evaluation and State Grouping scheme into the Reinforcement Learning Algorithm, we get the approach that can generate organized state space for Reinforcement Learning, and do problem solving as well. We argue that the approach with this kind of ability is necessary for autonomous agent, namely, autonomous agent can not act depending on any pre-defined map, instead, it should search the environment as well as find the optimal problem solution autonomously and simultaneously. By solving the large state-size 3-DOF and 4-link manipulator problem, we show the efficiency of the proposed approach, i.e., the agent can achieve the optimal or sub-optimal path with less memory and less time. 14 refs., 11 figs., 3 tabs.

  19. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

    OpenAIRE

    He, Frank S.; Liu, Yang; Schwing, Alexander G.; Peng, Jian

    2016-01-01

    We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and...

  20. Intrinsically motivated reinforcement learning for human-robot interaction in the real-world.

    Science.gov (United States)

    Qureshi, Ahmed Hussain; Nakamura, Yutaka; Yoshikawa, Yuichiro; Ishiguro, Hiroshi

    2018-03-26

    For a natural social human-robot interaction, it is essential for a robot to learn the human-like social skills. However, learning such skills is notoriously hard due to the limited availability of direct instructions from people to teach a robot. In this paper, we propose an intrinsically motivated reinforcement learning framework in which an agent gets the intrinsic motivation-based rewards through the action-conditional predictive model. By using the proposed method, the robot learned the social skills from the human-robot interaction experiences gathered in the real uncontrolled environments. The results indicate that the robot not only acquired human-like social skills but also took more human-like decisions, on a test dataset, than a robot which received direct rewards for the task achievement. Copyright © 2018 Elsevier Ltd. All rights reserved.

  1. Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.

    Science.gov (United States)

    Ren, Zhipeng; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Zhipeng Ren; Daoyi Dong; Huaxiong Li; Chunlin Chen; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Ren, Zhipeng

    2018-06-01

    In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.

  2. Manifold Regularized Reinforcement Learning.

    Science.gov (United States)

    Li, Hongliang; Liu, Derong; Wang, Ding

    2018-04-01

    This paper introduces a novel manifold regularized reinforcement learning scheme for continuous Markov decision processes. Smooth feature representations for value function approximation can be automatically learned using the unsupervised manifold regularization method. The learned features are data-driven, and can be adapted to the geometry of the state space. Furthermore, the scheme provides a direct basis representation extension for novel samples during policy learning and control. The performance of the proposed scheme is evaluated on two benchmark control tasks, i.e., the inverted pendulum and the energy storage problem. Simulation results illustrate the concepts of the proposed scheme and show that it can obtain excellent performance.

  3. Modeling and simulation of virtual human's coordination based on multi-agent systems

    Science.gov (United States)

    Zhang, Mei; Wen, Jing-Hua; Zhang, Zu-Xuan; Zhang, Jian-Qing

    2006-10-01

    The difficulties and hotspots researched in current virtual geographic environment (VGE) are sharing space and multiusers operation, distributed coordination and group decision-making. The theories and technologies of MAS provide a brand-new environment for analysis, design and realization of distributed opening system. This paper takes cooperation among virtual human in VGE which multi-user participate in as main researched object. First we describe theory foundation truss of VGE, and present the formalization description of Multi-Agent System (MAS). Then we detailed analyze and research arithmetic of collectivity operating behavior learning of virtual human based on best held Genetic Algorithm(GA), and establish dynamics action model which Multi-Agents and object interact dynamically and colony movement strategy. Finally we design a example which shows how 3 evolutional Agents cooperate to complete the task of colony pushing column box, and design a virtual world prototype of virtual human pushing box collectively based on V-Realm Builder 2.0, moreover we make modeling and dynamic simulation with Simulink 6.

  4. Switching Reinforcement Learning for Continuous Action Space

    Science.gov (United States)

    Nagayoshi, Masato; Murao, Hajime; Tamaki, Hisashi

    Reinforcement Learning (RL) attracts much attention as a technique of realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. This difficulty includes a problem of designing a suitable action space of an agent, i.e., satisfying two requirements in trade-off: (i) to keep the characteristics (or structure) of an original search space as much as possible in order to seek strategies that lie close to the optimal, and (ii) to reduce the search space as much as possible in order to expedite the learning process. In order to design a suitable action space adaptively, we propose switching RL model to mimic a process of an infant's motor development in which gross motor skills develop before fine motor skills. Then, a method for switching controllers is constructed by introducing and referring to the “entropy”. Further, through computational experiments by using robot navigation problems with one and two-dimensional continuous action space, the validity of the proposed method has been confirmed.

  5. Multi-agent systems simulation and applications

    CERN Document Server

    Uhrmacher, Adelinde M

    2009-01-01

    Methodological Guidelines for Modeling and Developing MAS-Based SimulationsThe intersection of agents, modeling, simulation, and application domains has been the subject of active research for over two decades. Although agents and simulation have been used effectively in a variety of application domains, much of the supporting research remains scattered in the literature, too often leaving scientists to develop multi-agent system (MAS) models and simulations from scratch. Multi-Agent Systems: Simulation and Applications provides an overdue review of the wide ranging facets of MAS simulation, i

  6. Continuous residual reinforcement learning for traffic signal control optimization

    NARCIS (Netherlands)

    Aslani, Mohammad; Seipel, Stefan; Wiering, Marco

    2018-01-01

    Traffic signal control can be naturally regarded as a reinforcement learning problem. Unfortunately, it is one of the most difficult classes of reinforcement learning problems owing to its large state space. A straightforward approach to address this challenge is to control traffic signals based on

  7. A Distributed Multi-Agent System for Collaborative Information Management and Learning

    Science.gov (United States)

    Chen, James R.; Wolfe, Shawn R.; Wragg, Stephen D.; Koga, Dennis (Technical Monitor)

    2000-01-01

    In this paper, we present DIAMS, a system of distributed, collaborative agents to help users access, manage, share and exchange information. A DIAMS personal agent helps its owner find information most relevant to current needs. It provides tools and utilities for users to manage their information repositories with dynamic organization and virtual views. Flexible hierarchical display is integrated with indexed query search-to support effective information access. Automatic indexing methods are employed to support user queries and communication between agents. Contents of a repository are kept in object-oriented storage to facilitate information sharing. Collaboration between users is aided by easy sharing utilities as well as automated information exchange. Matchmaker agents are designed to establish connections between users with similar interests and expertise. DIAMS agents provide needed services for users to share and learn information from one another on the World Wide Web.

  8. Can model-free reinforcement learning explain deontological moral judgments?

    Science.gov (United States)

    Ayars, Alisabeth

    2016-05-01

    Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.

    Science.gov (United States)

    Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna

    2016-09-01

    Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing.

  10. Human demonstrations for fast and safe exploration in reinforcement learning

    NARCIS (Netherlands)

    Schonebaum, G.K.; Junell, J.L.; van Kampen, E.

    2017-01-01

    Reinforcement learning is a promising framework for controlling complex vehicles with a high level of autonomy, since it does not need a dynamic model of the vehicle, and it is able to adapt to changing conditions. When learning from scratch, the performance of a reinforcement learning controller

  11. GA-based fuzzy reinforcement learning for control of a magnetic bearing system.

    Science.gov (United States)

    Lin, C T; Jou, C P

    2000-01-01

    This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.

  12. Reinforcement Learning in Continuous Action Spaces

    NARCIS (Netherlands)

    Hasselt, H. van; Wiering, M.A.

    2007-01-01

    Quite some research has been done on Reinforcement Learning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new class of algorithms named Continuous Actor Critic Learning Automaton (CACLA)

  13. Conversational Agents in E-Learning

    Science.gov (United States)

    Kerry, Alice; Ellis, Richard; Bull, Susan

    This paper discusses the use of natural language or 'conversational' agents in e-learning environments. We describe and contrast the various applications of conversational agent technology represented in the e-learning literature, including tutors, learning companions, language practice and systems to encourage reflection. We offer two more detailed examples of conversational agents, one which provides learning support, and the other support for self-assessment. Issues and challenges for developers of conversational agent systems for e-learning are identified and discussed.

  14. Improving Multi-Agent Systems Using Jason

    DEFF Research Database (Denmark)

    Vester, Steen; Boss, Niklas Skamriis; Jensen, Andreas Schmidt

    2011-01-01

    We describe the approach used to develop the multi-agent system of herders that competed as the Jason-DTU team at the Multi-Agent Programming Contest 2010. We also participated in 2009 with a system developed in the agentoriented programming language Jason which is an extension of AgentSpeak. We ...... used the implementation from 2009 as a foundation and therefore much of the work done this year was on improving that implementation. We present a description which includes design and analysis of the system as well as the main features of our agent team strategy. In addition we discuss...

  15. Adversarial Reinforcement Learning in a Cyber Security Simulation}

    NARCIS (Netherlands)

    Elderman, Richard; Pater, Leon; Thie, Albert; Drugan, Madalina; Wiering, Marco

    2017-01-01

    This paper focuses on cyber-security simulations in networks modeled as a Markov game with incomplete information and stochastic elements. The resulting game is an adversarial sequential decision making problem played with two agents, the attacker and defender. The two agents pit one reinforcement

  16. Human reinforcement learning subdivides structured action spaces by learning effector-specific values.

    Science.gov (United States)

    Gershman, Samuel J; Pesaran, Bijan; Daw, Nathaniel D

    2009-10-28

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.

  17. Evolutionary computation for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Wiering, M.; van Otterlo, M.

    2012-01-01

    Algorithms for evolutionary computation, which simulate the process of natural selection to solve optimization problems, are an effective tool for discovering high-performing reinforcement-learning policies. Because they can automatically find good representations, handle continuous action spaces,

  18. Cooperative heuristic multi-agent planning

    NARCIS (Netherlands)

    De Weerdt, M.M.; Tonino, J.F.M.; Witteveen, C.

    2001-01-01

    In this paper we will use the framework to study cooperative heuristic multi-agent planning. During the construction of their plans, the agents use a heuristic function inspired by the FF planner (l3l). At any time in the process of planning the agents may exchange available resources, or they may

  19. A Day-to-Day Route Choice Model Based on Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Fangfang Wei

    2014-01-01

    Full Text Available Day-to-day traffic dynamics are generated by individual traveler’s route choice and route adjustment behaviors, which are appropriate to be researched by using agent-based model and learning theory. In this paper, we propose a day-to-day route choice model based on reinforcement learning and multiagent simulation. Travelers’ memory, learning rate, and experience cognition are taken into account. Then the model is verified and analyzed. Results show that the network flow can converge to user equilibrium (UE if travelers can remember all the travel time they have experienced, but which is not necessarily the case under limited memory; learning rate can strengthen the flow fluctuation, but memory leads to the contrary side; moreover, high learning rate results in the cyclical oscillation during the process of flow evolution. Finally, both the scenarios of link capacity degradation and random link capacity are used to illustrate the model’s applications. Analyses and applications of our model demonstrate the model is reasonable and useful for studying the day-to-day traffic dynamics.

  20. Human reinforcement learning subdivides structured action spaces by learning effector-specific values

    OpenAIRE

    Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.

    2009-01-01

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable, due to the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning – such as prediction error signals for action valuation associated with dopamine and the striatum – can cope with this “curse of dimensionality...

  1. Field tests applying multi-agent technology for distributed control. Virtual power plants and wind energy

    Energy Technology Data Exchange (ETDEWEB)

    Schaeffer, G.J.; Warmer, C.J.; Hommelberg, M.P.F.; Kamphuis, I.G.; Kok, J.K. [Energy in the Built Environment and Networks, Petten (Netherlands)

    2007-01-15

    Multi-agent technology is state of the art ICT. It is not yet widely applied in power control systems. However, it has a large potential for bottom-up, distributed control of a network with large-scale renewable energy sources (RES) and distributed energy resources (DER) in future power systems. At least two major European R and D projects (MicroGrids and CRISP) have investigated its potential. Both grid-related as well as market-related applications have been studied. This paper will focus on two field tests, performed in the Netherlands, applying multi-agent control by means of the PowerMatcher concept. The first field test focuses on the application of multi-agent technology in a commercial setting, i.e. by reducing the need for balancing power in the case of intermittent energy sources, such as wind energy. In this case the flexibility is used of demand and supply of industrial and residential consumers and producers. Imbalance reduction rates of over 40% have been achieved applying the PowerMatcher, and with a proper portfolio even larger rates are expected. In the second field test the multi-agent technology is used in the design and implementation of a virtual power plant (VPP). This VPP digitally connects a number of micro-CHP units, installed in residential dwellings, into a cluster that is controlled to reduce the local peak demand of the common low-voltage grid segment the micro-CHP units are connected to. In this way the VPP supports the local distribution system operator (DSO) to defer reinforcements in the grid infrastructure (substations and cables)

  2. Field tests applying multi-agent technology for distributed control. Virtual power plants and wind energy

    International Nuclear Information System (INIS)

    Schaeffer, G.J.; Warmer, C.J.; Hommelberg, M.P.F.; Kamphuis, I.G.; Kok, J.K.

    2007-01-01

    Multi-agent technology is state of the art ICT. It is not yet widely applied in power control systems. However, it has a large potential for bottom-up, distributed control of a network with large-scale renewable energy sources (RES) and distributed energy resources (DER) in future power systems. At least two major European R and D projects (MicroGrids and CRISP) have investigated its potential. Both grid-related as well as market-related applications have been studied. This paper will focus on two field tests, performed in the Netherlands, applying multi-agent control by means of the PowerMatcher concept. The first field test focuses on the application of multi-agent technology in a commercial setting, i.e. by reducing the need for balancing power in the case of intermittent energy sources, such as wind energy. In this case the flexibility is used of demand and supply of industrial and residential consumers and producers. Imbalance reduction rates of over 40% have been achieved applying the PowerMatcher, and with a proper portfolio even larger rates are expected. In the second field test the multi-agent technology is used in the design and implementation of a virtual power plant (VPP). This VPP digitally connects a number of micro-CHP units, installed in residential dwellings, into a cluster that is controlled to reduce the local peak demand of the common low-voltage grid segment the micro-CHP units are connected to. In this way the VPP supports the local distribution system operator (DSO) to defer reinforcements in the grid infrastructure (substations and cables)

  3. Reinforcement Learning Based Novel Adaptive Learning Framework for Smart Grid Prediction

    Directory of Open Access Journals (Sweden)

    Tian Li

    2017-01-01

    Full Text Available Smart grid is a potential infrastructure to supply electricity demand for end users in a safe and reliable manner. With the rapid increase of the share of renewable energy and controllable loads in smart grid, the operation uncertainty of smart grid has increased briskly during recent years. The forecast is responsible for the safety and economic operation of the smart grid. However, most existing forecast methods cannot account for the smart grid due to the disabilities to adapt to the varying operational conditions. In this paper, reinforcement learning is firstly exploited to develop an online learning framework for the smart grid. With the capability of multitime scale resolution, wavelet neural network has been adopted in the online learning framework to yield reinforcement learning and wavelet neural network (RLWNN based adaptive learning scheme. The simulations on two typical prediction problems in smart grid, including wind power prediction and load forecast, validate the effectiveness and the scalability of the proposed RLWNN based learning framework and algorithm.

  4. Multi-Agent Modeling in Managing Six Sigma Projects

    Directory of Open Access Journals (Sweden)

    K. Y. Chau

    2009-10-01

    Full Text Available In this paper, a multi-agent model is proposed for considering the human resources factor in decision making in relation to the six sigma project. The proposed multi-agent system is expected to increase the acccuracy of project prioritization and to stabilize the human resources service level. A simulation of the proposed multiagent model is conducted. The results show that a multi-agent model which takes into consideration human resources when making decisions about project selection and project team formation is important in enabling efficient and effective project management. The multi-agent modeling approach provides an alternative approach for improving communication and the autonomy of six sigma projects in business organizations.

  5. Instructional control of reinforcement learning: a behavioral and neurocomputational investigation.

    Science.gov (United States)

    Doll, Bradley B; Jacobs, W Jake; Sanfey, Alan G; Frank, Michael J

    2009-11-24

    Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.

  6. Adaptive learning in agents behaviour: A framework for electricity markets simulation

    DEFF Research Database (Denmark)

    Pinto, Tiago; Vale, Zita; Sousa, Tiago M.

    2014-01-01

    decision support to MASCEM's negotiating agents so that they can properly achieve their goals. ALBidS uses artificial intelligence methodologies and data analysis algorithms to provide effective adaptive learning capabilities to such negotiating entities. The main contribution is provided by a methodology...... that combines several distinct strategies to build actions proposals, so that the best can be chosen at each time, depending on the context and simulation circumstances. The choosing process includes reinforcement learning algorithms, a mechanism for negotiating contexts analysis, a mechanism for the management...... allows integrating different strategic approaches for electricity market negotiations, and choosing the most appropriate one at each time, for each different negotiation context. This methodology is integrated in ALBidS (Adaptive Learning strategic Bidding System) – a multiagent system that provides...

  7. Cloud Computing and Multi Agent System to improve Learning Object Paradigm

    Directory of Open Access Journals (Sweden)

    Ana B. Gil

    2015-05-01

    Full Text Available The paradigm of Learning Object provides Educators and Learners with the ability to access an extensive number of learning resources. To do so, this paradigm provides different technologies and tools, such as federated search platforms and storage repositories, in order to obtain information ubiquitously and on demand. However, the vast amount and variety of educational content, which is distributed among several repositories, and the existence of various and incompatible standards, technologies and interoperability layers among repositories, constitutes a real problem for the expansion of this paradigm. This study presents an agent-based architecture that uses the advantages provided by Cloud Computing platforms to deal with the open issues on the Learning Object paradigm.

  8. Reinforcement Learning Based Artificial Immune Classifier

    Directory of Open Access Journals (Sweden)

    Mehmet Karakose

    2013-01-01

    Full Text Available One of the widely used methods for classification that is a decision-making process is artificial immune systems. Artificial immune systems based on natural immunity system can be successfully applied for classification, optimization, recognition, and learning in real-world problems. In this study, a reinforcement learning based artificial immune classifier is proposed as a new approach. This approach uses reinforcement learning to find better antibody with immune operators. The proposed new approach has many contributions according to other methods in the literature such as effectiveness, less memory cell, high accuracy, speed, and data adaptability. The performance of the proposed approach is demonstrated by simulation and experimental results using real data in Matlab and FPGA. Some benchmark data and remote image data are used for experimental results. The comparative results with supervised/unsupervised based artificial immune system, negative selection classifier, and resource limited artificial immune classifier are given to demonstrate the effectiveness of the proposed new method.

  9. Multi-physics corrosion modeling for sustainability assessment of steel reinforced high performance fiber reinforced cementitious composites

    DEFF Research Database (Denmark)

    Lepech, M.; Michel, Alexander; Geiker, Mette

    2016-01-01

    and widespread depassivation, are the mechanism behind experimental results of HPFRCC steel corrosion studies found in the literature. Such results provide an indication of the fundamental mechanisms by which steel reinforced HPFRCC materials may be more durable than traditional reinforced concrete and other......Using a newly developed multi-physics transport, corrosion, and cracking model, which models these phenomena as a coupled physiochemical processes, the role of HPFRCC crack control and formation in regulating steel reinforcement corrosion is investigated. This model describes transport of water...... and chemical species, the electric potential distribution in the HPFRCC, the electrochemical propagation of steel corrosion, and the role of microcracks in the HPFRCC material. Numerical results show that the reduction in anode and cathode size on the reinforcing steel surface, due to multiple crack formation...

  10. Online reinforcement learning control for aerospace systems

    NARCIS (Netherlands)

    Zhou, Y.

    2018-01-01

    Reinforcement Learning (RL) methods are relatively new in the field of aerospace guidance, navigation, and control. This dissertation aims to exploit RL methods to improve the autonomy and online learning of aerospace systems with respect to the a priori unknown system and environment, dynamical

  11. Structure identification in fuzzy inference using reinforcement learning

    Science.gov (United States)

    Berenji, Hamid R.; Khedkar, Pratap

    1993-01-01

    In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.

  12. Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

    Science.gov (United States)

    Uchibe, Eiji; Doya, Kenji

    2008-12-01

    Understanding the design principle of reward functions is a substantial challenge both in artificial intelligence and neuroscience. Successful acquisition of a task usually requires not only rewards for goals, but also for intermediate states to promote effective exploration. This paper proposes a method for designing 'intrinsic' rewards of autonomous agents by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use Cyber Rodent robots, in which collision avoidance, recharging from battery packs, and 'mating' by software reproduction are three major 'extrinsic' rewards. We show in hardware experiments that the robots can find appropriate 'intrinsic' rewards for the vision of battery packs and other robots to promote approach behaviors.

  13. Using a board game to reinforce learning.

    Science.gov (United States)

    Yoon, Bona; Rodriguez, Leslie; Faselis, Charles J; Liappis, Angelike P

    2014-03-01

    Experiential gaming strategies offer a variation on traditional learning. A board game was used to present synthesized content of fundamental catheter care concepts and reinforce evidence-based practices relevant to nursing. Board games are innovative educational tools that can enhance active learning. Copyright 2014, SLACK Incorporated.

  14. Exploiting Best-Match Equations for Efficient Reinforcement Learning

    NARCIS (Netherlands)

    van Seijen, Harm; Whiteson, Shimon; van Hasselt, Hado; Wiering, Marco

    This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of model-free methods. Best-match learning works by approximating the solution to a set of best-match equations,

  15. Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing

    OpenAIRE

    Le, Minh; Fokkens, Antske

    2017-01-01

    Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this paper, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error propagation. Reinforcement learning improves accuracy of both labeled and unlabeled dependencies of the Stanford Neural Dependency Parser, a high performance greedy parser, while maintaining its eff...

  16. A Trading Agent for a Multi-Issue Clearing House

    Science.gov (United States)

    Debenham, John

    The potential size of the electronic business market offers great incentives to trading agents that can bargain, bid in auctions and trade in exchanges. Much of business negotiation is multi-issue. A generic 'information-based' agent is proposed for multi-issue negotiation. Successful negotiation depends on shrewd strategies driven by the right information. This agent has machinery to value information and to manage its integrity. A multi-issue, many-to-many clearing house, and an agent to trade in it, are proposed.

  17. Longitudinal investigation on learned helplessness tested under negative and positive reinforcement involving stimulus control.

    Science.gov (United States)

    Oliveira, Emileane C; Hunziker, Maria Helena

    2014-07-01

    In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.

  18. Adaptive Trajectory Tracking Control using Reinforcement Learning for Quadrotor

    Directory of Open Access Journals (Sweden)

    Wenjie Lou

    2016-02-01

    Full Text Available Inaccurate system parameters and unpredicted external disturbances affect the performance of non-linear controllers. In this paper, a new adaptive control algorithm under the reinforcement framework is proposed to stabilize a quadrotor helicopter. Based on a command-filtered non-linear control algorithm, adaptive elements are added and learned by policy-search methods. To predict the inaccurate system parameters, a new kernel-based regression learning method is provided. In addition, Policy learning by Weighting Exploration with the Returns (PoWER and Return Weighted Regression (RWR are utilized to learn the appropriate parameters for adaptive elements in order to cancel the effect of external disturbance. Furthermore, numerical simulations under several conditions are performed, and the ability of adaptive trajectory-tracking control with reinforcement learning are demonstrated.

  19. Improving Multi-Instance Multi-Label Learning by Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Ying Yin

    2016-05-01

    Full Text Available Multi-instance multi-label learning is a learning framework, where every object is represented by a bag of instances and associated with multiple labels simultaneously. The existing degeneration strategy-based methods often suffer from some common drawbacks: (1 the user-specific parameter for the number of clusters may incur the effective problem; (2 SVM may bring a high computational cost when utilized as the classifier builder. In this paper, we propose an algorithm, namely multi-instance multi-label (MIML-extreme learning machine (ELM, to address the problems. To our best knowledge, we are the first to utilize ELM in the MIML problem and to conduct the comparison of ELM and SVM on MIML. Extensive experiments have been conducted on real datasets and synthetic datasets. The results show that MIMLELM tends to achieve better generalization performance at a higher learning speed.

  20. Enriching behavioral ecology with reinforcement learning methods.

    Science.gov (United States)

    Frankenhuis, Willem E; Panchanathan, Karthik; Barto, Andrew G

    2018-02-13

    This article focuses on the division of labor between evolution and development in solving sequential, state-dependent decision problems. Currently, behavioral ecologists tend to use dynamic programming methods to study such problems. These methods are successful at predicting animal behavior in a variety of contexts. However, they depend on a distinct set of assumptions. Here, we argue that behavioral ecology will benefit from drawing more than it currently does on a complementary collection of tools, called reinforcement learning methods. These methods allow for the study of behavior in highly complex environments, which conventional dynamic programming methods do not feasibly address. In addition, reinforcement learning methods are well-suited to studying how biological mechanisms solve developmental and learning problems. For instance, we can use them to study simple rules that perform well in complex environments. Or to investigate under what conditions natural selection favors fixed, non-plastic traits (which do not vary across individuals), cue-driven-switch plasticity (innate instructions for adaptive behavioral development based on experience), or developmental selection (the incremental acquisition of adaptive behavior based on experience). If natural selection favors developmental selection, which includes learning from environmental feedback, we can also make predictions about the design of reward systems. Our paper is written in an accessible manner and for a broad audience, though we believe some novel insights can be drawn from our discussion. We hope our paper will help advance the emerging bridge connecting the fields of behavioral ecology and reinforcement learning. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  1. A reusable multi-agent architecture for active intelligent websites

    NARCIS (Netherlands)

    Jonker, C.M.; Lam, R.A.; Treur, J.

    In this paper a reusable multi-agent architecture for intelligent Websites is presented and illustrated for an electronic department store. The architecture has been designed and implemented using the compositional design method for multi-agent systems DESIRE. The agents within this architecture are

  2. How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

    Science.gov (United States)

    Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C

    2014-03-01

    Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward

  3. Multi-agent Cooperation in a Planning Framework

    NARCIS (Netherlands)

    De Weerdt, M.M.; Bos, A.; Tonino, J.F.M.; Witteveen, C.

    2000-01-01

    The promise of multi-agent systems is that multiple agents can solve problems more efficiently than single agents can. In this paper we propose a method to implement cooperation between agents in the planning phase, in order to achive more cost-effective solutions than without cooperation. Two

  4. Agendas for Multi-Agent Learning

    National Research Council Canada - National Science Library

    Gordon, Geoffrey J

    2006-01-01

    .... We then consider research goals for modelling, design, and learning, and identify the problem of finding learning algorithms that guarantee convergence to Pareto-dominant equilibria against a wide range of opponents...

  5. Autonomous Formations of Multi-Agent Systems

    Science.gov (United States)

    Dhali, Sanjana; Joshi, Suresh M.

    2013-01-01

    Autonomous formation control of multi-agent dynamic systems has a number of applications that include ground-based and aerial robots and satellite formations. For air vehicles, formation flight ("flocking") has the potential to significantly increase airspace utilization as well as fuel efficiency. This presentation addresses two main problems in multi-agent formations: optimal role assignment to minimize the total cost (e.g., combined distance traveled by all agents); and maintaining formation geometry during flock motion. The Kuhn-Munkres ("Hungarian") algorithm is used for optimal assignment, and consensus-based leader-follower type control architecture is used to maintain formation shape despite the leader s independent movements. The methods are demonstrated by animated simulations.

  6. Flow Navigation by Smart Microswimmers via Reinforcement Learning

    Science.gov (United States)

    Colabrese, Simona; Biferale, Luca; Celani, Antonio; Gustavsson, Kristian

    2017-11-01

    We have numerically modeled active particles which are able to acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. We show that those swimmers can learn effective strategies just by experience, using a reinforcement learning algorithm. As an example, we focus on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, exploiting the underlying flow whenever possible. The reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This work paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. ERC AdG NewTURB 339032.

  7. The drift diffusion model as the choice rule in reinforcement learning.

    Science.gov (United States)

    Pedersen, Mads Lund; Frank, Michael J; Biele, Guido

    2017-08-01

    Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.

  8. Embedded Incremental Feature Selection for Reinforcement Learning

    Science.gov (United States)

    2012-05-01

    Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature

  9. Trends in practical applications of heterogeneous multi-agent systems : the PAAMS collection

    CERN Document Server

    Rodríguez, Juan; Mathieu, Philippe; Campbell, Andrew; Ortega, Alfonso; Adam, Emmanuel; Navarro, Elena; Ahrndt, Sebastian; Moreno, María; Julián, Vicente

    2014-01-01

    PAAMS, the International Conference on Practical Applications of Agents and Multi-Agent Systems is an evolution of the International Workshop on Practical Applications of Agents and Multi-Agent Systems. PAAMS is an international yearly tribune to present, to discuss, and to disseminate the latest developments and the most important outcomes related to real-world applications. It provides a unique opportunity to bring multi-disciplinary experts, academics and practitioners together to exchange their experience in the development of Agents and Multi-Agent Systems. This volume presents the papers that have been accepted for the 2014 special sessions: Agents Behaviours and Artificial Markets (ABAM), Agents and Mobile Devices (AM), Bio-Inspired and Multi-Agents Systems: Applications to Languages (BioMAS), Multi-Agent Systems and Ambient Intelligence (MASMAI), Self-Explaining Agents (SEA), Web Mining and Recommender systems (WebMiRes) and Intelligent Educational Systems (SSIES).

  10. Learning-based diagnosis and repair

    NARCIS (Netherlands)

    Roos, Nico

    2017-01-01

    This paper proposes a new form of diagnosis and repair based on reinforcement learning. Self-interested agents learn locally which agents may provide a low quality of service for a task. The correctness of learned assessments of other agents is proved under conditions on exploration versus

  11. Flexible Heuristic Dynamic Programming for Reinforcement Learning in Quadrotors

    NARCIS (Netherlands)

    Helmer, Alexander; de Visser, C.C.; van Kampen, E.

    2018-01-01

    Reinforcement learning is a paradigm for learning decision-making tasks from interaction with the environment. Function approximators solve a part of the curse of dimensionality when learning in high-dimensional state and/or action spaces. It can be a time-consuming process to learn a good policy in

  12. Working Memory and Reinforcement Schedule Jointly Determine Reinforcement Learning in Children: Potential Implications for Behavioral Parent Training

    Directory of Open Access Journals (Sweden)

    Elien Segers

    2018-03-01

    Full Text Available Introduction: Behavioral Parent Training (BPT is often provided for childhood psychiatric disorders. These disorders have been shown to be associated with working memory impairments. BPT is based on operant learning principles, yet how operant principles shape behavior (through the partial reinforcement (PRF extinction effect, i.e., greater resistance to extinction that is created when behavior is reinforced partially rather than continuously and the potential role of working memory therein is scarcely studied in children. This study explored the PRF extinction effect and the role of working memory therein using experimental tasks in typically developing children.Methods: Ninety-seven children (age 6–10 completed a working memory task and an operant learning task, in which children acquired a response-sequence rule under either continuous or PRF (120 trials, followed by an extinction phase (80 trials. Data of 88 children were used for analysis.Results: The PRF extinction effect was confirmed: We observed slower acquisition and extinction in the PRF condition as compared to the continuous reinforcement (CRF condition. Working memory was negatively related to acquisition but not extinction performance.Conclusion: Both reinforcement contingencies and working memory relate to acquisition performance. Potential implications for BPT are that decreasing working memory load may enhance the chance of optimally learning through reinforcement.

  13. Exploring complex dynamics in multi agent-based intelligent systems: Theoretical and experimental approaches using the Multi Agent-based Behavioral Economic Landscape (MABEL) model

    Science.gov (United States)

    Alexandridis, Konstantinos T.

    This dissertation adopts a holistic and detailed approach to modeling spatially explicit agent-based artificial intelligent systems, using the Multi Agent-based Behavioral Economic Landscape (MABEL) model. The research questions that addresses stem from the need to understand and analyze the real-world patterns and dynamics of land use change from a coupled human-environmental systems perspective. Describes the systemic, mathematical, statistical, socio-economic and spatial dynamics of the MABEL modeling framework, and provides a wide array of cross-disciplinary modeling applications within the research, decision-making and policy domains. Establishes the symbolic properties of the MABEL model as a Markov decision process, analyzes the decision-theoretic utility and optimization attributes of agents towards comprising statistically and spatially optimal policies and actions, and explores the probabilogic character of the agents' decision-making and inference mechanisms via the use of Bayesian belief and decision networks. Develops and describes a Monte Carlo methodology for experimental replications of agent's decisions regarding complex spatial parcel acquisition and learning. Recognizes the gap on spatially-explicit accuracy assessment techniques for complex spatial models, and proposes an ensemble of statistical tools designed to address this problem. Advanced information assessment techniques such as the Receiver-Operator Characteristic curve, the impurity entropy and Gini functions, and the Bayesian classification functions are proposed. The theoretical foundation for modular Bayesian inference in spatially-explicit multi-agent artificial intelligent systems, and the ensembles of cognitive and scenario assessment modular tools build for the MABEL model are provided. Emphasizes the modularity and robustness as valuable qualitative modeling attributes, and examines the role of robust intelligent modeling as a tool for improving policy-decisions related to land

  14. E-Learning Agents

    Science.gov (United States)

    Gregg, Dawn G.

    2007-01-01

    Purpose: The purpose of this paper is to illustrate the advantages of using intelligent agents to facilitate the location and customization of appropriate e-learning resources and to foster collaboration in e-learning environments. Design/methodology/approach: This paper proposes an e-learning environment that can be used to provide customized…

  15. Grounding the meanings in sensorimotor behavior using reinforcement learning

    Directory of Open Access Journals (Sweden)

    Igor eFarkaš

    2012-02-01

    Full Text Available The recent outburst of interest in cognitive developmental robotics is fueled by the ambition to propose ecologically plausible mechanisms of how, among other things, a learning agent/robot could ground linguistic meanings in its sensorimotor behaviour. Along this stream, we propose a model that allows the simulated iCub robot to learn the meanings of actions (point, touch and push oriented towards objects in robot's peripersonal space. In our experiments, the iCub learns to execute motor actions and comment on them. Architecturally, the model is composed of three neural-network-based modules that are trained in different ways. The first module, a two-layer perceptron, is trained by back-propagation to attend to the target position in the visual scene, given the low-level visual information and the feature-based target information. The second module, having the form of an actor-critic architecture, is the most distinguishing part of our model, and is trained by a continuous version of reinforcement learning to execute actions as sequences, based on a linguistic command. The third module, an echo-state network, is trained to provide the linguistic description of the executed actions. The trained model generalises well in case of novel action-target combinations with randomised initial arm positions. It can also promptly adapt its behavior if the action/target suddenly changes during motor execution.

  16. Mansion, A Distributed Multi-Agent System

    NARCIS (Netherlands)

    van t Noordende, G.; Brazier, F.M.; Tanenbaum, A.S.

    2001-01-01

    In this position summary we present work in progress on a worldwide, scalable multi-agent system, based on a paradigm of hyperlinked rooms. The framework offers facilities for managing distribution, security and mobility aspects for both active elements (agents) and passive elements (objects) in the

  17. Systems control with generalized probabilistic fuzzy-reinforcement learning

    NARCIS (Netherlands)

    Hinojosa, J.; Nefti, S.; Kaymak, U.

    2011-01-01

    Reinforcement learning (RL) is a valuable learning method when the systems require a selection of control actions whose consequences emerge over long periods for which input-output data are not available. In most combinations of fuzzy systems and RL, the environment is considered to be

  18. TACtic- A Multi Behavioral Agent for Trading Agent Competition

    Science.gov (United States)

    Khosravi, Hassan; Shiri, Mohammad E.; Khosravi, Hamid; Iranmanesh, Ehsan; Davoodi, Alireza

    Software agents are increasingly being used to represent humans in online auctions. Such agents have the advantages of being able to systematically monitor a wide variety of auctions and then make rapid decisions about what bids to place in what auctions. They can do this continuously and repetitively without losing concentration. To provide a means of evaluating and comparing (benchmarking) research methods in this area the trading agent competition (TAC) was established. This paper describes the design, of TACtic. Our agent uses multi behavioral techniques at the heart of its decision making to make bidding decisions in the face of uncertainty, to make predictions about the likely outcomes of auctions, and to alter the agent's bidding strategy in response to the prevailing market conditions.

  19. Multi-Agent Information Classification Using Dynamic Acquaintance Lists.

    Science.gov (United States)

    Mukhopadhyay, Snehasis; Peng, Shengquan; Raje, Rajeev; Palakal, Mathew; Mostafa, Javed

    2003-01-01

    Discussion of automated information services focuses on information classification and collaborative agents, i.e. intelligent computer programs. Highlights include multi-agent systems; distributed artificial intelligence; thesauri; document representation and classification; agent modeling; acquaintances, or remote agents discovered through…

  20. Biomorphic Multi-Agent Architecture for Persistent Computing

    Science.gov (United States)

    Lodding, Kenneth N.; Brewster, Paul

    2009-01-01

    A multi-agent software/hardware architecture, inspired by the multicellular nature of living organisms, has been proposed as the basis of design of a robust, reliable, persistent computing system. Just as a multicellular organism can adapt to changing environmental conditions and can survive despite the failure of individual cells, a multi-agent computing system, as envisioned, could adapt to changing hardware, software, and environmental conditions. In particular, the computing system could continue to function (perhaps at a reduced but still reasonable level of performance) if one or more component( s) of the system were to fail. One of the defining characteristics of a multicellular organism is unity of purpose. In biology, the purpose is survival of the organism. The purpose of the proposed multi-agent architecture is to provide a persistent computing environment in harsh conditions in which repair is difficult or impossible. A multi-agent, organism-like computing system would be a single entity built from agents or cells. Each agent or cell would be a discrete hardware processing unit that would include a data processor with local memory, an internal clock, and a suite of communication equipment capable of both local line-of-sight communications and global broadcast communications. Some cells, denoted specialist cells, could contain such additional hardware as sensors and emitters. Each cell would be independent in the sense that there would be no global clock, no global (shared) memory, no pre-assigned cell identifiers, no pre-defined network topology, and no centralized brain or control structure. Like each cell in a living organism, each agent or cell of the computing system would contain a full description of the system encoded as genes, but in this case, the genes would be components of a software genome.

  1. Specification of Behavioural Requirements within Compositional Multi-Agent System Design

    OpenAIRE

    Herlea, D.E.; Jonker, C.M.; Treur, J.; Wijngaards, N.J.E.

    1999-01-01

    In this paper it is shown how informal and formal specification of behavioural requirements and scenarios for agents and multi-agent systems can be integrated within multi-agent system design. In particular, it is addressed how a compositional

  2. Smart: sistemas multi-agente robótico

    Directory of Open Access Journals (Sweden)

    JOVANI ALBERTO JIMÉNEZ BUILES

    2008-01-01

    Full Text Available El siguiente artículo busca dar una visión global de los Sistemas Multi-Agentes Robóticos (MARS mediante una explicación de las áreas relacionadas con el tema para luego presentar el Sistema Multi-Agente Robótico (SMART. SMART es un enjambre inteligente conformado por un robot nodriza y tres robot tipo baliza (guías que navegan de manera colaborativa un escenario estructurado.

  3. The role of family planning communications--an agent of reinforcement or change.

    Science.gov (United States)

    Chen, E C

    1981-12-01

    Results are presented of a multiple classification analysis of responses to a 1972 KAP survey in Taiwan of 2013 married women aged 18-34 designed to determine whether family planning communication is primarily a reinforcement agent or a change agent. 2 types of independent variables, social demographic variables including age, number of children, residence, education, employment status, and duration of marriage; and social climate variables including ever receiving family planning information from mass media and ever discussing family planning with others, were used. KAP levels, the dependent variables, were measured by 2 variables each: awareness of effective methods and awareness of government supply of contraceptives for knowledge, wish for additional children and approve of 2-child family for attitude, and never use contraception and neither want children nor use contraception for practice. Social demographic and attitudinal variables were found to be the critical ones, while social climate and knowledge variables had only negligible effects on various stages of family planning adoption, indicating that family planning communications functioned primarily as a reinforcement agent. The effects of social demographic variables were prominent in all stages of contraceptive adoption. Examination of effects of individual variables on various stages of family planning adoption still supported the argument that family planning communications played a reinforcement role. Family planning communications functioned well in diffusing family planning knowledge and accessibility, but social demographic variables and desire for additional children were the most decisive influences on use of contraception.

  4. Self-healing in single and multiple fiber(s reinforced polymer composites

    Directory of Open Access Journals (Sweden)

    Woldesenbet E.

    2010-06-01

    Full Text Available You Polymer composites have been attractive medium to introduce the autonomic healing concept into modern day engineering materials. To date, there has been significant research in self-healing polymeric materials including several studies specifically in fiber reinforced polymers. Even though several methods have been suggested in autonomic healing materials, the concept of repair by bleeding of enclosed functional agents has garnered wide attention by the scientific community. A self-healing fiber reinforced polymer composite has been developed. Tensile tests are carried out on specimens that are fabricated by using the following components: hollow and solid glass fibers, healing agent, catalysts, multi-walled carbon nanotubes, and a polymer resin matrix. The test results have demonstrated that single fiber polymer composites and multiple fiber reinforced polymer matrix composites with healing agents and catalysts have provided 90.7% and 76.55% restoration of the original tensile strength, respectively. Incorporation of functionalized multi-walled carbon nanotubes in the healing medium of the single fiber polymer composite has provided additional efficiency. Healing is found to be localized, allowing multiple healing in the presence of several cracks.

  5. Memory Transformation Enhances Reinforcement Learning in Dynamic Environments.

    Science.gov (United States)

    Santoro, Adam; Frankland, Paul W; Richards, Blake A

    2016-11-30

    Over the course of systems consolidation, there is a switch from a reliance on detailed episodic memories to generalized schematic memories. This switch is sometimes referred to as "memory transformation." Here we demonstrate a previously unappreciated benefit of memory transformation, namely, its ability to enhance reinforcement learning in a dynamic environment. We developed a neural network that is trained to find rewards in a foraging task where reward locations are continuously changing. The network can use memories for specific locations (episodic memories) and statistical patterns of locations (schematic memories) to guide its search. We find that switching from an episodic to a schematic strategy over time leads to enhanced performance due to the tendency for the reward location to be highly correlated with itself in the short-term, but regress to a stable distribution in the long-term. We also show that the statistics of the environment determine the optimal utilization of both types of memory. Our work recasts the theoretical question of why memory transformation occurs, shifting the focus from the avoidance of memory interference toward the enhancement of reinforcement learning across multiple timescales. As time passes, memories transform from a highly detailed state to a more gist-like state, in a process called "memory transformation." Theories of memory transformation speak to its advantages in terms of reducing memory interference, increasing memory robustness, and building models of the environment. However, the role of memory transformation from the perspective of an agent that continuously acts and receives reward in its environment is not well explored. In this work, we demonstrate a view of memory transformation that defines it as a way of optimizing behavior across multiple timescales. Copyright © 2016 the authors 0270-6474/16/3612228-15$15.00/0.

  6. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning.

    Science.gov (United States)

    Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B J

    2014-06-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.

  7. Time representation in reinforcement learning models of the basal ganglia

    Directory of Open Access Journals (Sweden)

    Samuel Joseph Gershman

    2014-01-01

    Full Text Available Reinforcement learning models have been influential in understanding many aspects of basal ganglia function, from reward prediction to action selection. Time plays an important role in these models, but there is still no theoretical consensus about what kind of time representation is used by the basal ganglia. We review several theoretical accounts and their supporting evidence. We then discuss the relationship between reinforcement learning models and the timing mechanisms that have been attributed to the basal ganglia. We hypothesize that a single computational system may underlie both reinforcement learning and interval timing—the perception of duration in the range of seconds to hours. This hypothesis, which extends earlier models by incorporating a time-sensitive action selection mechanism, may have important implications for understanding disorders like Parkinson's disease in which both decision making and timing are impaired.

  8. Agent and multi-Agent systems in distributed systems digital economy and e-commerce

    CERN Document Server

    Hartung, Ronald

    2013-01-01

    Information and communication technology, in particular artificial intelligence, can be used to support economy and commerce using digital means. This book is about agents and multi-agent distributed systems applied to digital economy and e-commerce to meet, improve, and overcome challenges in the digital economy and e-commerce sphere. Agent and multi-agent solutions are applied in implementing real-life, exciting developments associated with the need to eliminate problems of distributed systems.   The book presents solutions for both technology and applications, illustrating the possible uses of agents in the enterprise domain, covering design and analytic methods, needed to provide a solid foundation required for practical systems. More specifically, the book provides solutions for the digital economy, e-sourcing clusters in network economy, and knowledge exchange between agents applicable to online trading agents, and security solutions to both digital economy and e-commerce. Furthermore, it offers soluti...

  9. Safe Exploration of State and Action Spaces in Reinforcement Learning

    OpenAIRE

    Garcia, Javier; Fernandez, Fernando

    2014-01-01

    In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some sta...

  10. Multi-View Multi-Instance Learning Based on Joint Sparse Representation and Multi-View Dictionary Learning.

    Science.gov (United States)

    Li, Bing; Yuan, Chunfeng; Xiong, Weihua; Hu, Weiming; Peng, Houwen; Ding, Xinmiao; Maybank, Steve

    2017-12-01

    In multi-instance learning (MIL), the relations among instances in a bag convey important contextual information in many applications. Previous studies on MIL either ignore such relations or simply model them with a fixed graph structure so that the overall performance inevitably degrades in complex environments. To address this problem, this paper proposes a novel multi-view multi-instance learning algorithm (MIL) that combines multiple context structures in a bag into a unified framework. The novel aspects are: (i) we propose a sparse -graph model that can generate different graphs with different parameters to represent various context relations in a bag, (ii) we propose a multi-view joint sparse representation that integrates these graphs into a unified framework for bag classification, and (iii) we propose a multi-view dictionary learning algorithm to obtain a multi-view graph dictionary that considers cues from all views simultaneously to improve the discrimination of the MIL. Experiments and analyses in many practical applications prove the effectiveness of the M IL.

  11. A meta-ontological framework for multi-agent systems design

    OpenAIRE

    Sokolova, Marina; Fernández Caballero, Antonio

    2007-01-01

    The paper introduces an approach to using a meta-ontology framework for complex multi-agent systems design, and illustrates it in an application related to ecological-medical issues. The described shared ontology is pooled from private sub-ontologies, which represent a problem area ontology, an agent ontology, a task ontology, an ontology of interactions, and the multi-agent system architecture ontology.

  12. 11th International Conference on Practical Applications of Agents and Multi-Agent Systems

    CERN Document Server

    Hermoso, Ramon; Moreno, María; Rodríguez, Juan; Hirsch, Benjamin; Mathieu, Philippe; Campbell, Andrew; Suarez-Figueroa, Mari; Ortega, Alfonso; Adam, Emmanuel; Navarro, Elena

    2013-01-01

    Research on Agents and Multi-agent Systems has matured during the last decade and many effective applications of this technology are now deployed. PAAMS provides an international forum to presents and discuss the latest scientific developments and their effective applications, to assess the impact of the approach, and to facilitate technology transfer. PAAMS started as a local initiative, but since grown to become the international yearly platform to present, to discuss, and to disseminate the latest developments and the most important outcomes related to real-world applications. It provides a unique opportunity to bring multi-disciplinary experts, academics and practitioners together to Exchange their experience in the development and deployment of Agents and Multiagents systems. PAAMS intends to bring together researchers and developers from industry and the academic world to report on the latest scientific and technical advances on the application of multi-agent systems, to discuss and debate the major iss...

  13. Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing

    NARCIS (Netherlands)

    Le, M.N.; Fokkens, A.S.

    Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this paper, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error

  14. The Computational Development of Reinforcement Learning during Adolescence.

    Directory of Open Access Journals (Sweden)

    Stefano Palminteri

    2016-06-01

    Full Text Available Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed. Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback and a value contextualisation module (enabling symmetrical reward and punishment learning. Unlike adults, adolescent performance did not benefit from counterfactual (complete feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.

  15. Agents in E-learning

    Directory of Open Access Journals (Sweden)

    S. Mencke

    2007-12-01

    Full Text Available This paper presents a framework to describe thecrossover domain of e-learning and agent technology.Furthermore it is used to classify existing work and possiblestarting points for the future development of agenttechniques and technologies order to enhance theperformance and the effectiveness of several aspects of elearningsystems. Agents are not a new concept but their usein the field of e-learning constitutes a basis for consequentialadvances.

  16. A multiplicative reinforcement learning model capturing learning dynamics and interindividual variability in mice

    OpenAIRE

    Bathellier, Brice; Tee, Sui Poh; Hrovat, Christina; Rumpel, Simon

    2013-01-01

    Learning speed can strongly differ across individuals. This is seen in humans and animals. Here, we measured learning speed in mice performing a discrimination task and developed a theoretical model based on the reinforcement learning framework to account for differences between individual mice. We found that, when using a multiplicative learning rule, the starting connectivity values of the model strongly determine the shape of learning curves. This is in contrast to current learning models ...

  17. Analysis of Foreign Exchange Interventions by Intervention Agent with an Artificial Market Approach

    Science.gov (United States)

    Matsui, Hiroki; Tojo, Satoshi

    We propose a multi-agent system which learns intervention policies and evaluates the effect of interventions in an artificial foreign exchange market. Izumi et al. had presented a system called AGEDASI TOF to simulate artificial market, together with a support system for the government to decide foreign exchange policies. However, the system needed to fix the amount of governmental intervention prior to the simulation, and was not realistic. In addition, the interventions in the system did not affect supply and demand of currencies; thus we could not discuss the effect of intervention correctly. First, we improve the system so as to make much of the weights of influential factors. Thereafter, we introduce an intervention agent that has the role of the central bank to stabilize the market. We could show that the agent learned the effective intervention policies through the reinforcement learning, and that the exchange rate converged to a certain extent in the expected range. We could also estimate the amount of intervention, showing the efficacy of signaling. In this model, in order to investigate the aliasing of the perception of the intervention agent, we introduced a pseudo-agent who was supposed to be able to observe all the behaviors of dealer agents; with this super-agent, we discussed the adequate granularity for a market state description.

  18. A Multi-Agent Architecture for an Intelligent Website in Insurance

    NARCIS (Netherlands)

    Jonker, C.M.; Lam, R.A.; Treur, J.

    1999-01-01

    In this paper a multi-agent architecture for intelligent Websites is presented and applied in insurance. The architecture has been designed and implemented using the compositional development method for multi-agent systems DESIRE. The agents within this architecture are based on a generic broker

  19. A Multi-Agent Traffic Control Model Based on Distributed System

    Directory of Open Access Journals (Sweden)

    Qian WU

    2014-06-01

    Full Text Available With the development of urbanization construction, urban travel has become a quite thorny and imminent problem. Some previous researches on the large urban traffic systems easily change into NPC problems. We purpose a multi-agent inductive control model based on the distributed approach. To describe the real traffic scene, this model designs four different types of intelligent agents, i.e. we regard each lane, route, intersection and traffic region as different types of intelligent agents. Each agent can achieve the real-time traffic data from its neighbor agents, and decision-making agents establish real-time traffic signal plans through the communication between local agents and their neighbor agents. To evaluate the traffic system, this paper takes the average delay, the stopped time and the average speed as performance parameters. Finally, the distributed multi-agent is simulated on the VISSIM simulation platform, the simulation results show that the multi-agent system is more effective than the adaptive control system in solving the traffic congestion.

  20. A Comparison of Organization-Centered and Agent-Centered Multi-Agent Systems

    DEFF Research Database (Denmark)

    Jensen, Andreas Schmidt; Villadsen, Jørgen

    2013-01-01

    Whereas most classical multi-agent systems have the agent in center, there has recently been a development towards focusing more on the organization of the system, thereby allowing the designer to focus on what the system goals are, without considering how the goals should be fulfilled. We have d...

  1. Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

    Directory of Open Access Journals (Sweden)

    Takahiro Ezaki

    2016-07-01

    Full Text Available Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.

  2. Analysis of Bullying in Cooperative Multi-agent Systems’ Communications

    Directory of Open Access Journals (Sweden)

    Celia Gutiérrez

    2013-12-01

    Full Text Available Cooperative Multi-agent Systems frameworks do not include modules to test communications yet. The proposed framework incorporates robust analysis tools using IDKAnalysis2.0 to evaluate bullying effect in communications. The present work is based on ICARO-T. This platform follows the Adaptive Multi-agent Systems paradigm. Experimentation with ICARO-T includes two deployments: the equitative and the authoritative. Results confirm the usefulness of the analysis tools when exporting to Cooperative Multi-agent Systems that use different configurations. Besides, ICARO-T is provided with new functionality by a set of tools for communication analysis.

  3. Multi-agent search for source localization in a turbulent medium

    International Nuclear Information System (INIS)

    Hajieghrary, Hadi; Hsieh, M. Ani; Schwartz, Ira B.

    2016-01-01

    We extend the gradient-less search strategy referred to as “infotaxis” to a distributed multi-agent system. “Infotaxis” is a search strategy that uses sporadic sensor measurements to determine the source location of materials dispersed in a turbulent medium. In this work, we leverage the spatio-temporal sensing capabilities of a mobile sensing agents to optimize the time spent finding and localizing the position of the source using a multi-agent collaborative search strategy. Our results suggest that the proposed multi-agent collaborative search strategy leverages the team's ability to obtain simultaneous measurements at different locations to speed up the search process. We present a multi-agent collaborative “infotaxis” strategy that uses the relative entropy of the system to synthesize a suitable search strategy for the team. The result is a collaborative information theoretic search strategy that results in control actions that maximize the information gained by the team, and improves estimates of the source position. - Highlights: • We extend the gradient-less infotaxis search strategy to a distributed multi-agent system. • Leveraging the spatio-temporal sensing capabilities of a team of mobile sensing agents speeds up the search process. • The resulting information theoretic search strategy maximizes the information gained and improves the estimate of the source position.

  4. Deep reinforcement learning for automated radiation adaptation in lung cancer.

    Science.gov (United States)

    Tseng, Huan-Hsin; Luo, Yi; Cui, Sunan; Chien, Jen-Tzung; Ten Haken, Randall K; Naqa, Issam El

    2017-12-01

    escalation/de-escalation between 1.5 and 3.8 Gy, a range similar to that used in the clinical protocol. The same DQN yielded two patterns of dose escalation for the 34 test patients, but with different reward variants. First, using the baseline P+ reward function, individual adaptive fraction doses of the DQN had similar tendencies to the clinical data with an RMSE = 0.76 Gy; but adaptations suggested by the DQN were generally lower in magnitude (less aggressive). Second, by adjusting the P+ reward function with higher emphasis on mitigating local failure, better matching of doses between the DQN and the clinical protocol was achieved with an RMSE = 0.5 Gy. Moreover, the decisions selected by the DQN seemed to have better concordance with patients eventual outcomes. In comparison, the traditional temporal difference (TD) algorithm for reinforcement learning yielded an RMSE = 3.3 Gy due to numerical instabilities and lack of sufficient learning. We demonstrated that automated dose adaptation by DRL is a feasible and a promising approach for achieving similar results to those chosen by clinicians. The process may require customization of the reward function if individual cases were to be considered. However, development of this framework into a fully credible autonomous system for clinical decision support would require further validation on larger multi-institutional datasets. © 2017 American Association of Physicists in Medicine.

  5. Modeling Multi-Mobile Agents System Based on Coalition Signature Mechanism Using UML

    Institute of Scientific and Technical Information of China (English)

    SUNZhixin; HUANGHaiping; WANGRuchuan

    2004-01-01

    With the development of electronic commerce and agent techniques, multi-mobile agents cooperation can not only improve the efficiency of electronic business trade, but more importantly, it has a comprehensive applicative value in solving the security issues of mobile agent system. This paper firstly describes the mechanism of multi-mobile agents coalition signature aiming at the system security. Subsequently it brings forward a basic architecture of Multi-mobile agents system (MMAS) based on the design pattern of multi-mobile agents. The paper uses the diagrs_rn of UML, such as use case diagram, class diagram and sequence diagram to build the detailed model of the coalition signature and multi-mobile agents cooperation results. Through security analysis, we find that multimobile agents cooperation and interaction can solve some security problems of mobile agents in transfer, and also it can improve the efficiency of business trade. These results indicate that MMAS has a high security performance and can be widely used in E-commerce trade.

  6. A reward optimization method based on action subrewards in hierarchical reinforcement learning.

    Science.gov (United States)

    Fu, Yuchen; Liu, Quan; Ling, Xionghong; Cui, Zhiming

    2014-01-01

    Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

  7. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning

    Directory of Open Access Journals (Sweden)

    Yuntian Feng

    2017-01-01

    Full Text Available We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  8. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning.

    Science.gov (United States)

    Feng, Yuntian; Zhang, Hongjun; Hao, Wenning; Chen, Gang

    2017-01-01

    We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q -Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  9. Pleasurable music affects reinforcement learning according to the listener

    Science.gov (United States)

    Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira

    2013-01-01

    Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875

  10. Normative multi-agent programs and their logics

    NARCIS (Netherlands)

    Dastani, M.; Grossi, D.; Meyer, J.-J.C.; Tinnemeier, N.

    2009-01-01

    Multi-agent systems are viewed as consisting of individual agents whose behaviors are regulated by an organization artefact. This paper presents a simplified version of a programming language that is designed to implement norm-based artefacts. Such artefacts are specified in terms of norms being

  11. Adaptive hierarchical multi-agent organizations

    NARCIS (Netherlands)

    Ghijsen, M.; Jansweijer, W.N.H.; Wielinga, B.J.; Babuška, R.; Groen, F.C.A.

    2010-01-01

    In this chapter, we discuss the design of adaptive hierarchical organizations for multi-agent systems (MAS). Hierarchical organizations have a number of advantages such as their ability to handle complex problems and their scalability to large organizations. By introducing adaptivity in the

  12. Modeling of a production system using the multi-agent approach

    Science.gov (United States)

    Gwiazda, A.; Sękala, A.; Banaś, W.

    2017-08-01

    The method that allows for the analysis of complex systems is a multi-agent simulation. The multi-agent simulation (Agent-based modeling and simulation - ABMS) is modeling of complex systems consisting of independent agents. In the case of the model of the production system agents may be manufactured pieces set apart from other types of agents like machine tools, conveyors or replacements stands. Agents are magazines and buffers. More generally speaking, the agents in the model can be single individuals, but you can also be defined as agents of collective entities. They are allowed hierarchical structures. It means that a single agent could belong to a certain class. Depending on the needs of the agent may also be a natural or physical resource. From a technical point of view, the agent is a bundle of data and rules describing its behavior in different situations. Agents can be autonomous or non-autonomous in making the decision about the types of classes of agents, class sizes and types of connections between elements of the system. Multi-agent modeling is a very flexible technique for modeling and model creating in the convention that could be adapted to any research problem analyzed from different points of views. One of the major problems associated with the organization of production is the spatial organization of the production process. Secondly, it is important to include the optimal scheduling. For this purpose use can approach multi-purposeful. In this regard, the model of the production process will refer to the design and scheduling of production space for four different elements. The program system was developed in the environment NetLogo. It was also used elements of artificial intelligence. The main agent represents the manufactured pieces that, according to previously assumed rules, generate the technological route and allow preprint the schedule of that line. Machine lines, reorientation stands, conveyors and transport devices also represent the

  13. Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning.

    Science.gov (United States)

    Yang, Xiong; Liu, Derong; Wang, Ding; Wei, Qinglai

    2014-07-01

    In this paper, a reinforcement-learning-based direct adaptive control is developed to deliver a desired tracking performance for a class of discrete-time (DT) nonlinear systems with unknown bounded disturbances. We investigate multi-input-multi-output unknown nonaffine nonlinear DT systems and employ two neural networks (NNs). By using Implicit Function Theorem, an action NN is used to generate the control signal and it is also designed to cancel the nonlinearity of unknown DT systems, for purpose of utilizing feedback linearization methods. On the other hand, a critic NN is applied to estimate the cost function, which satisfies the recursive equations derived from heuristic dynamic programming. The weights of both the action NN and the critic NN are directly updated online instead of offline training. By utilizing Lyapunov's direct method, the closed-loop tracking errors and the NN estimated weights are demonstrated to be uniformly ultimately bounded. Two numerical examples are provided to show the effectiveness of the present approach. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. Multi-agent: a technique to implement geo-visualization of networked virtual reality

    Science.gov (United States)

    Lin, Zhiyong; Li, Wenjing; Meng, Lingkui

    2007-06-01

    Networked Virtual Reality (NVR) is a system based on net connected and spatial information shared, whose demands cannot be fully meet by the existing architectures and application patterns of VR to some extent. In this paper, we propose a new architecture of NVR based on Multi-Agent framework. which includes the detailed definition of various agents and their functions and full description of the collaboration mechanism, Through the prototype system test with DEM Data and 3D Models Data, the advantages of Multi-Agent based Networked Virtual Reality System in terms of the data loading time, user response time and scene construction time etc. are verified. First, we introduce the characters of Networked Virtual Realty and the characters of Multi-Agent technique in Section 1. Then we give the architecture design of Networked Virtual Realty based on Multi-Agent in Section 2.The Section 2 content includes the rule of task division, the multi-agent architecture design to implement Networked Virtual Realty and the function of agents. Section 3 shows the prototype implementation according to the design. Finally, Section 4 discusses the benefits of using Multi-Agent to implement geovisualization of Networked Virtual Realty.

  15. Artificial agents learning human fairness

    NARCIS (Netherlands)

    Jong, de S.; Tuyls, K.P.; Verbeeck, K.; Padgham, xx; Parkes, xx

    2008-01-01

    Recent advances in technology allow multi-agent systems to be deployed in cooperation with or as a service for humans. Typically, those systems are designed assuming individually rational agents, according to the principles of classical game theory. However, research in the field of behavioral

  16. Using Spatial Reinforcement Learning to Build Forest Wildfire Dynamics Models From Satellite Images

    Directory of Open Access Journals (Sweden)

    Sriram Ganapathi Subramanian

    2018-04-01

    Full Text Available Machine learning algorithms have increased tremendously in power in recent years but have yet to be fully utilized in many ecology and sustainable resource management domains such as wildlife reserve design, forest fire management, and invasive species spread. One thing these domains have in common is that they contain dynamics that can be characterized as a spatially spreading process (SSP, which requires many parameters to be set precisely to model the dynamics, spread rates, and directional biases of the elements which are spreading. We present related work in artificial intelligence and machine learning for SSP sustainability domains including forest wildfire prediction. We then introduce a novel approach for learning in SSP domains using reinforcement learning (RL where fire is the agent at any cell in the landscape and the set of actions the fire can take from a location at any point in time includes spreading north, south, east, or west or not spreading. This approach inverts the usual RL setup since the dynamics of the corresponding Markov Decision Process (MDP is a known function for immediate wildfire spread. Meanwhile, we learn an agent policy for a predictive model of the dynamics of a complex spatial process. Rewards are provided for correctly classifying which cells are on fire or not compared with satellite and other related data. We examine the behavior of five RL algorithms on this problem: value iteration, policy iteration, Q-learning, Monte Carlo Tree Search, and Asynchronous Advantage Actor-Critic (A3C. We compare to a Gaussian process-based supervised learning approach and also discuss the relation of our approach to manually constructed, state-of-the-art methods from forest wildfire modeling. We validate our approach with satellite image data of two massive wildfire events in Northern Alberta, Canada; the Fort McMurray fire of 2016 and the Richardson fire of 2011. The results show that we can learn predictive, agent

  17. Implementing a Multi-Agent System in Python

    DEFF Research Database (Denmark)

    Ettienne, Mikko Berggren; Vester, Steen; Villadsen, Jørgen

    2012-01-01

    We describe the solution used by the Python-DTU team in the Multi-Agent Programming Contest 2011, where the scenario was called Agents on Mars. We present our auction-based agreement, area controlling and pathfinding algorithms and discuss our chosen strategy and our choice of technology used...

  18. Fault-Tolerant Consensus of Multi-Agent System With Distributed Adaptive Protocol.

    Science.gov (United States)

    Chen, Shun; Ho, Daniel W C; Li, Lulu; Liu, Ming

    2015-10-01

    In this paper, fault-tolerant consensus in multi-agent system using distributed adaptive protocol is investigated. Firstly, distributed adaptive online updating strategies for some parameters are proposed based on local information of the network structure. Then, under the online updating parameters, a distributed adaptive protocol is developed to compensate the fault effects and the uncertainty effects in the leaderless multi-agent system. Based on the local state information of neighboring agents, a distributed updating protocol gain is developed which leads to a fully distributed continuous adaptive fault-tolerant consensus protocol design for the leaderless multi-agent system. Furthermore, a distributed fault-tolerant leader-follower consensus protocol for multi-agent system is constructed by the proposed adaptive method. Finally, a simulation example is given to illustrate the effectiveness of the theoretical analysis.

  19. Multi-Agent Pathfinding with n Agents on Graphs with n Vertices

    DEFF Research Database (Denmark)

    Förster, Klaus-Tycho; Groner, Linus; Hoefler, Torsten

    2017-01-01

    We investigate the multi-agent pathfinding (MAPF) problem with $n$ agents on graphs with $n$ vertices: Each agent has a unique start and goal vertex, with the objective of moving all agents in parallel movements to their goal s.t.~each vertex and each edge may only be used by one agent at a time....... We give a combinatorial classification of all graphs where this problem is solvable in general, including cases where the solvability depends on the initial agent placement. Furthermore, we present an algorithm solving the MAPF problem in our setting, requiring O(n²) rounds, or O(n³) moves...... of individual agents. Complementing these results, we show that there are graphs where Omega(n²) rounds and Omega(n³) moves are required for any algorithm....

  20. Reinforcement function design and bias for efficient learning in mobile robots

    International Nuclear Information System (INIS)

    Touzet, C.; Santos, J.M.

    1998-01-01

    The main paradigm in sub-symbolic learning robot domain is the reinforcement learning method. Various techniques have been developed to deal with the memorization/generalization problem, demonstrating the superior ability of artificial neural network implementations. In this paper, the authors address the issue of designing the reinforcement so as to optimize the exploration part of the learning. They also present and summarize works relative to the use of bias intended to achieve the effective synthesis of the desired behavior. Demonstrative experiments involving a self-organizing map implementation of the Q-learning and real mobile robots (Nomad 200 and Khepera) in a task of obstacle avoidance behavior synthesis are described. 3 figs., 5 tabs

  1. Towards autonomous neuroprosthetic control using Hebbian reinforcement learning.

    Science.gov (United States)

    Mahmoudi, Babak; Pohlmeyer, Eric A; Prins, Noeline W; Geng, Shijia; Sanchez, Justin C

    2013-12-01

    Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.

  2. A Multi-Agent Environment for Negotiation

    Science.gov (United States)

    Hindriks, Koen V.; Jonker, Catholijn M.; Tykhonov, Dmytro

    In this chapter we introduce the System for Analysis of Multi-Issue Negotiation (SAMIN). SAMIN offers a negotiation environment that supports and facilitates the setup of various negotiation setups. The environment has been designed to analyse negotiation processes between human negotiators, between human and software agents, and between software agents. It offers a range of different agents, different domains, and other options useful to define a negotiation setup. The environment has been used to test and evaluate a range of negotiation strategies in various domains playing against other negotiating agents as well as humans. We discuss some of the results obtained by means of these experiments.

  3. Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement.

    Science.gov (United States)

    Fernández, Thalía; Bosch-Bayard, Jorge; Harmony, Thalía; Caballero, María I; Díaz-Comas, Lourdes; Galán, Lídice; Ricardo-Garcell, Josefina; Aubert, Eduardo; Otero-Ojeda, Gloria

    2016-03-01

    Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer.

  4. Reinforcement learning for optimal control of low exergy buildings

    International Nuclear Information System (INIS)

    Yang, Lei; Nagy, Zoltan; Goffin, Philippe; Schlueter, Arno

    2015-01-01

    Highlights: • Implementation of reinforcement learning control for LowEx Building systems. • Learning allows adaptation to local environment without prior knowledge. • Presentation of reinforcement learning control for real-life applications. • Discussion of the applicability for real-life situations. - Abstract: Over a third of the anthropogenic greenhouse gas (GHG) emissions stem from cooling and heating buildings, due to their fossil fuel based operation. Low exergy building systems are a promising approach to reduce energy consumption as well as GHG emissions. They consists of renewable energy technologies, such as PV, PV/T and heat pumps. Since careful tuning of parameters is required, a manual setup may result in sub-optimal operation. A model predictive control approach is unnecessarily complex due to the required model identification. Therefore, in this work we present a reinforcement learning control (RLC) approach. The studied building consists of a PV/T array for solar heat and electricity generation, as well as geothermal heat pumps. We present RLC for the PV/T array, and the full building model. Two methods, Tabular Q-learning and Batch Q-learning with Memory Replay, are implemented with real building settings and actual weather conditions in a Matlab/Simulink framework. The performance is evaluated against standard rule-based control (RBC). We investigated different neural network structures and find that some outperformed RBC already during the learning phase. Overall, every RLC strategy for PV/T outperformed RBC by over 10% after the third year. Likewise, for the full building, RLC outperforms RBC in terms of meeting the heating demand, maintaining the optimal operation temperature and compensating more effectively for ground heat. This allows to reduce engineering costs associated with the setup of these systems, as well as decrease the return-of-invest period, both of which are necessary to create a sustainable, zero-emission building

  5. Quantum Speedup for Active Learning Agents

    Directory of Open Access Journals (Sweden)

    Giuseppe Davide Paparo

    2014-07-01

    Full Text Available Can quantum mechanics help us build intelligent learning agents? A defining signature of intelligent behavior is the capacity to learn from experience. However, a major bottleneck for agents to learn in real-life situations is the size and complexity of the corresponding task environment. Even in a moderately realistic environment, it may simply take too long to rationally respond to a given situation. If the environment is impatient, allowing only a certain time for a response, an agent may then be unable to cope with the situation and to learn at all. Here, we show that quantum physics can help and provide a quadratic speedup for active learning as a genuine problem of artificial intelligence. This result will be particularly relevant for applications involving complex task environments.

  6. Modelling complex systems of heterogeneous agents to better design sustainability transitions policy

    NARCIS (Netherlands)

    Mercure, J.F.A.; Pollitt, H.; Bassi, A.M.; Viñuales, J.E.; Edwards, N.R.

    2016-01-01

    This article proposes a fundamental methodological shift in the modelling of policy interventions for sustainability transitions in order to account for complexity (e.g. self-reinforcing mechanisms, such as technology lock-ins, arising from multi-agent interactions) and agent heterogeneity (e.g.

  7. Intranasal oxytocin enhances socially-reinforced learning in rhesus monkeys

    Directory of Open Access Journals (Sweden)

    Lisa A Parr

    2014-09-01

    Full Text Available There are currently no drugs approved for the treatment of social deficits associated with autism spectrum disorders (ASD. One hypothesis for these deficits is that individuals with ASD lack the motivation to attend to social cues because those cues are not implicitly rewarding. Therefore, any drug that could enhance the rewarding quality of social stimuli could have a profound impact on the treatment of ASD, and other social disorders. Oxytocin (OT is a neuropeptide that has been effective in enhancing social cognition and social reward in humans. The present study examined the ability of OT to selectively enhance learning after social compared to nonsocial reward in rhesus monkeys, an important species for modeling the neurobiology of social behavior in humans. Monkeys were required to learn an implicit visual matching task after receiving either intranasal (IN OT or Placebo (saline. Correct trials were rewarded with the presentation of positive and negative social (play faces/threat faces or nonsocial (banana/cage locks stimuli, plus food. Incorrect trials were not rewarded. Results demonstrated a strong effect of socially-reinforced learning, monkeys’ performed significantly better when reinforced with social versus nonsocial stimuli. Additionally, socially-reinforced learning was significantly better and occurred faster after IN-OT compared to placebo treatment. Performance in the IN-OT, but not Placebo, condition was also significantly better when the reinforcement stimuli were emotionally positive compared to negative facial expressions. These data support the hypothesis that OT may function to enhance prosocial behavior in primates by increasing the rewarding quality of emotionally positive, social compared to emotionally negative or nonsocial images. These data also support the use of the rhesus monkey as a model for exploring the neurobiological basis of social behavior and its impairment.

  8. applying reinforcement learning to the weapon assignment problem

    African Journals Online (AJOL)

    ismith

    Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy ..... closest to the threat should fire (that weapon also had the highest probability to ... Monte Carlo ..... “Reinforcement learning: Theory, methods and application to.

  9. Applying reinforcement learning to the weapon assignment problem in air defence

    CSIR Research Space (South Africa)

    Mouton, H

    2011-12-01

    Full Text Available . The techniques investigated in this article were two methods from the machine-learning subfield of reinforcement learning (RL), namely a Monte Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy temporal-difference (TD) learning...

  10. Anticipatory vehicle routing using delegate multi-agent systems

    OpenAIRE

    Weyns, Danny; Holvoet, Tom; Helleboogh, Alexander

    2007-01-01

    This paper presents an agent-based approach, called delegate multi-agent systems, for anticipatory vehicle routing to avoid traffic congestion. In this approach, individual vehicles are represented by agents, which themselves issue light-weight agents that explore alternative routes in the environment on behalf of the vehicles. Based on the evaluation of the alternatives, the vehicles then issue light-weight agents for allocating road segments, spreading the vehicles’ intentions and coordi...

  11. The combination of appetitive and aversive reinforcers and the nature of their interaction during auditory learning.

    Science.gov (United States)

    Ilango, A; Wetzel, W; Scheich, H; Ohl, F W

    2010-03-31

    Learned changes in behavior can be elicited by either appetitive or aversive reinforcers. It is, however, not clear whether the two types of motivation, (approaching appetitive stimuli and avoiding aversive stimuli) drive learning in the same or different ways, nor is their interaction understood in situations where the two types are combined in a single experiment. To investigate this question we have developed a novel learning paradigm for Mongolian gerbils, which not only allows rewards and punishments to be presented in isolation or in combination with each other, but also can use these opposite reinforcers to drive the same learned behavior. Specifically, we studied learning of tone-conditioned hurdle crossing in a shuttle box driven by either an appetitive reinforcer (brain stimulation reward) or an aversive reinforcer (electrical footshock), or by a combination of both. Combination of the two reinforcers potentiated speed of acquisition, led to maximum possible performance, and delayed extinction as compared to either reinforcer alone. Additional experiments, using partial reinforcement protocols and experiments in which one of the reinforcers was omitted after the animals had been previously trained with the combination of both reinforcers, indicated that appetitive and aversive reinforcers operated together but acted in different ways: in this particular experimental context, punishment appeared to be more effective for initial acquisition and reward more effective to maintain a high level of conditioned responses (CRs). The results imply that learning mechanisms in problem solving were maximally effective when the initial punishment of mistakes was combined with the subsequent rewarding of correct performance. Copyright 2010 IBRO. Published by Elsevier Ltd. All rights reserved.

  12. Adaptive Synchronization for Heterogeneous Multi-Agent Systems with Switching Topologies

    Directory of Open Access Journals (Sweden)

    Muhammad Ridho Rosa

    2018-02-01

    Full Text Available This work provides a multi-agent extension of output-feedback model reference adaptive control (MRAC, designed to synchronize a network of heterogeneous uncertain agents. The implementation of this scheme is based on multi-agent matching conditions. The practical advantage of the proposed MRAC is the possibility of handling the case of the unknown dynamics of the agents only by using the output and the control input of its neighbors. In addition, it is reasonable to consider the case when the communication topology is time-varying. In this work, the time-varying communication leads to a switching control structure that depends on the number of the predecessor of the agents. By using the switching control structure to handle the time-varying topologies, we show that synchronization can be achieved. The multi-agent adaptive switching controller is first analyzed, and numerical simulations based on formation control of simplifier quadcopter dynamics are provided.

  13. Argumentation and Multi-Agent Decision Making

    OpenAIRE

    Parsons, S.; Jennings, N. R.

    1998-01-01

    This paper summarises our on-going work on mixed- initiative decision making which extends both classical decision theory and a symbolic theory of decision making based on argumentation to a multi-agent domain.

  14. Multi-Agent Software Engineering

    International Nuclear Information System (INIS)

    Mohamed, A.H.

    2014-01-01

    This paper proposed an alarm-monitoring system for people based on multi-agent using maps. The system monitors the users physical context using their mobile phone. The agents on the mobile phones are responsible for collecting, processing and sending data to the server. They can determine the parameters of their environment by sensors. The data are processed and sent to the server. On the other side, a set of agents on server can store this data and check the preconditions of the restrictions associated with the user, in order to trigger the appropriate alarms. These alarms are sent not only to the user who is alarmed to avoid the appeared restriction, but also to his supervisor. The proposed system is a general purpose alarm system that can be used in different critical application areas. It has been applied for monitoring the workers of radiation sites. However, these workers can do their activity tasks in the radiation environments safely

  15. Reimplementing a Multi-Agent System in Python

    DEFF Research Database (Denmark)

    Villadsen, Jørgen; Jensen, Andreas Schmidt; Ettienne, Mikko Berggren

    2012-01-01

    We provide a brief description of our Python-DTU system, including the overall design, the tools and the algorithms that we used in the Multi-Agent Programming Contest 2012, where the scenario was called Agents on Mars like in 2011. Our solution is an improvement of our Python-DTU system from last...

  16. Reimplementing a Multi-Agent System in Python

    DEFF Research Database (Denmark)

    Villadsen, Jørgen; Jensen, Andreas Schmidt; Ettienne, Mikko Berggren

    2013-01-01

    We provide a brief description of our Python-DTU system, including the overall design, the tools and the algorithms that we used in the Multi-Agent Programming Contest 2012, where the scenario was called Agents on Mars like in 2011. Our solution is an improvement of our Python-DTU system from last...

  17. Reinforcement and Systemic Machine Learning for Decision Making

    CERN Document Server

    Kulkarni, Parag

    2012-01-01

    Reinforcement and Systemic Machine Learning for Decision Making There are always difficulties in making machines that learn from experience. Complete information is not always available-or it becomes available in bits and pieces over a period of time. With respect to systemic learning, there is a need to understand the impact of decisions and actions on a system over that period of time. This book takes a holistic approach to addressing that need and presents a new paradigm-creating new learning applications and, ultimately, more intelligent machines. The first book of its kind in this new an

  18. Reinforcement active learning in the vibrissae system: optimal object localization.

    Science.gov (United States)

    Gordon, Goren; Dorfman, Nimrod; Ahissar, Ehud

    2013-01-01

    Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment. Copyright © 2012 Elsevier Ltd. All rights reserved.

  19. Cooperative control of multi-agent systems optimal and adaptive design approaches

    CERN Document Server

    Lewis, Frank L; Hengster-Movric, Kristian; Das, Abhijit

    2014-01-01

    Task complexity, communication constraints, flexibility and energy-saving concerns are all factors that may require a group of autonomous agents to work together in a cooperative manner. Applications involving such complications include mobile robots, wireless sensor networks, unmanned aerial vehicles (UAVs), spacecraft, and so on. In such networked multi-agent scenarios, the restrictions imposed by the communication graph topology can pose severe problems in the design of cooperative feedback control systems.  Cooperative control of multi-agent systems is a challenging topic for both control theorists and practitioners and has been the subject of significant recent research. Cooperative Control of Multi-Agent Systems extends optimal control and adaptive control design methods to multi-agent systems on communication graphs.  It develops Riccati design techniques for general linear dynamics for cooperative state feedback design, cooperative observer design, and cooperative dynamic output feedback design.  B...

  20. Neural correlates of reinforcement learning and social preferences in competitive bidding.

    Science.gov (United States)

    van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M

    2013-01-30

    In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.

  1. Continuum deformation of multi-agent systems

    CERN Document Server

    Rastgoftar, Hossein

    2016-01-01

    This monograph presents new algorithms for formation control of multi-agent systems (MAS) based on principles of continuum mechanics. Beginning with an overview of traditional methods, the author then introduces an innovative new approach whereby agents of an MAS are considered as particles in a continuum evolving in ℝn whose desired configuration is required to satisfy an admissible deformation function. The necessary theory and its validation on a mobile-agent-based swarm test bed are considered for two primary tasks: homogeneous transformation of the MAS and deployment of a random distribution of agents on a desired configuration. The framework for this model is based on homogeneous transformations for the evolution of an MAS under no inter-agent communication, local inter-agent communication, and intelligent perception by agents. Different communication protocols for MAS evolution, the robustness of tracking of a desired motion by an MAS evolving in ℝn, and the effect of communication delays in an MAS...

  2. Multi-Agent Framework in Visual Sensor Networks

    Directory of Open Access Journals (Sweden)

    J. M. Molina

    2007-01-01

    Full Text Available The recent interest in the surveillance of public, military, and commercial scenarios is increasing the need to develop and deploy intelligent and/or automated distributed visual surveillance systems. Many applications based on distributed resources use the so-called software agent technology. In this paper, a multi-agent framework is applied to coordinate videocamera-based surveillance. The ability to coordinate agents improves the global image and task distribution efficiency. In our proposal, a software agent is embedded in each camera and controls the capture parameters. Then coordination is based on the exchange of high-level messages among agents. Agents use an internal symbolic model to interpret the current situation from the messages from all other agents to improve global coordination.

  3. Bi-directional effect of increasing doses of baclofen on reinforcement learning

    Directory of Open Access Journals (Sweden)

    Jean eTerrier

    2011-07-01

    Full Text Available In rodents as well as in humans, efficient reinforcement learning depends on dopamine (DA released from ventral tegmental area (VTA neurons. It has been shown that in brain slices of mice, GABAB-receptor agonists at low concentrations increase the firing frequency of VTA-DA neurons, while high concentrations reduce the firing frequency. It remains however elusive whether baclofen can modulate reinforcement learning. Here, in a double blind study in 34 healthy human volunteers, we tested the effects of a low and a high concentration of oral baclofen in a gambling task associated with monetary reward. A low (20 mg dose of baclofen increased the efficiency of reward-associated learning but had no effect on the avoidance of monetary loss. A high (50 mg dose of baclofen on the other hand did not affect the learning curve. At the end of the task, subjects who received 20 mg baclofen p.o. were more accurate in choosing the symbol linked to the highest probability of earning money compared to the control group (89.55±1.39% vs 81.07±1.55%, p=0.002. Our results support a model where baclofen, at low concentrations, causes a disinhibition of DA neurons, increases DA levels and thus facilitates reinforcement learning.

  4. Traffic light control by multiagent reinforcement learning systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.; Groen, F.C.A.; Babuška, R.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  5. Traffic Light Control by Multiagent Reinforcement Learning Systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.J.H.M.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  6. A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control

    Science.gov (United States)

    Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu

    This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.

  7. Investigation Characteristics Of Pulp Fibers AS Green Potential Polymer Reinforcing Agents

    OpenAIRE

    Masruchin, Nanang; Subyakto

    2012-01-01

    Three kinds of pulp fiber (i.e. kenaf, pineapple and coconut fiber)were characterized as reinforcing agents in compositematerials to be applied at automotive interior industry.Abetter understanding on characteristics of fiber will lead to enhance interface adhesion between fiber and matrices. Furthermore, it will improve the properties of polymer significantly. Chemical, surface compositions as well as morphology of pulp fiber were investigated using TAPPI standard test method, Fourier Transf...

  8. TEXPLORE temporal difference reinforcement learning for robots and time-constrained domains

    CERN Document Server

    Hester, Todd

    2013-01-01

    This book presents and develops new reinforcement learning methods that enable fast and robust learning on robots in real-time. Robots have the potential to solve many problems in society, because of their ability to work in dangerous places doing necessary jobs that no one wants or is able to do. One barrier to their widespread deployment is that they are mainly limited to tasks where it is possible to hand-program behaviors for every situation that may be encountered. For robots to meet their potential, they need methods that enable them to learn and adapt to novel situations that they were not programmed for. Reinforcement learning (RL) is a paradigm for learning sequential decision making processes and could solve the problems of learning and adaptation on robots. This book identifies four key challenges that must be addressed for an RL algorithm to be practical for robotic control tasks. These RL for Robotics Challenges are: 1) it must learn in very few samples; 2) it must learn in domains with continuou...

  9. Perceptual learning rules based on reinforcers and attention

    NARCIS (Netherlands)

    Roelfsema, Pieter R.; van Ooyen, Arjen; Watanabe, Takeo

    2010-01-01

    How does the brain learn those visual features that are relevant for behavior? In this article, we focus on two factors that guide plasticity of visual representations. First, reinforcers cause the global release of diffusive neuromodulatory signals that gate plasticity. Second, attentional feedback

  10. Optimizing microstimulation using a reinforcement learning framework.

    Science.gov (United States)

    Brockmeier, Austin J; Choi, John S; Distasio, Marcello M; Francis, Joseph T; Príncipe, José C

    2011-01-01

    The ability to provide sensory feedback is desired to enhance the functionality of neuroprosthetics. Somatosensory feedback provides closed-loop control to the motor system, which is lacking in feedforward neuroprosthetics. In the case of existing somatosensory function, a template of the natural response can be used as a template of desired response elicited by electrical microstimulation. In the case of no initial training data, microstimulation parameters that produce responses close to the template must be selected in an online manner. We propose using reinforcement learning as a framework to balance the exploration of the parameter space and the continued selection of promising parameters for further stimulation. This approach avoids an explicit model of the neural response from stimulation. We explore a preliminary architecture--treating the task as a k-armed bandit--using offline data recorded for natural touch and thalamic microstimulation, and we examine the methods efficiency in exploring the parameter space while concentrating on promising parameter forms. The best matching stimulation parameters, from k = 68 different forms, are selected by the reinforcement learning algorithm consistently after 334 realizations.

  11. Layered Learning in Multi-Agent Systems

    Science.gov (United States)

    1998-12-15

    project almost from the beginning has tirelessly experimented with different robot architectures, always managing to pull things together and create...TEAM MEMBER AGENT ARCHITECTURE I " ! Midfielder, Left : • i ) ( ^ J Goalie , Center Home Coordinates Home Range Max Range Figure

  12. Temporal Memory Reinforcement Learning for the Autonomous Micro-mobile Robot Based-behavior

    Institute of Scientific and Technical Information of China (English)

    Yang Yujun(杨玉君); Cheng Junshi; Chen Jiapin; Li Xiaohai

    2004-01-01

    This paper presents temporal memory reinforcement learning for the autonomous micro-mobile robot based-behavior. Human being has a memory oblivion process, i.e. the earlier to memorize, the earlier to forget, only the repeated thing can be remembered firmly. Enlightening forms this, and the robot need not memorize all the past states, at the same time economizes the EMS memory space, which is not enough in the MPU of our AMRobot. The proposed algorithm is an extension of the Q-learning, which is an incremental reinforcement learning method. The results of simulation have shown that the algorithm is valid.

  13. Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments

    OpenAIRE

    Kidziński, Łukasz; Mohanty, Sharada Prasanna; Ong, Carmichael; Huang, Zhewei; Zhou, Shuchang; Pechenko, Anton; Stelmaszczyk, Adam; Jarosik, Piotr; Pavlov, Mikhail; Kolesnikov, Sergey; Plis, Sergey; Chen, Zhibo; Zhang, Zhizheng; Chen, Jiale; Shi, Jun

    2018-01-01

    In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar ...

  14. Multi-agent cooperation rescue algorithm based on influence degree and state prediction

    Science.gov (United States)

    Zheng, Yanbin; Ma, Guangfu; Wang, Linlin; Xi, Pengxue

    2018-04-01

    Aiming at the multi-agent cooperative rescue in disaster, a multi-agent cooperative rescue algorithm based on impact degree and state prediction is proposed. Firstly, based on the influence of the information in the scene on the collaborative task, the influence degree function is used to filter the information. Secondly, using the selected information to predict the state of the system and Agent behavior. Finally, according to the result of the forecast, the cooperative behavior of Agent is guided and improved the efficiency of individual collaboration. The simulation results show that this algorithm can effectively solve the cooperative rescue problem of multi-agent and ensure the efficient completion of the task.

  15. Multi-Source Multi-Target Dictionary Learning for Prediction of Cognitive Decline.

    Science.gov (United States)

    Zhang, Jie; Li, Qingyang; Caselli, Richard J; Thompson, Paul M; Ye, Jieping; Wang, Yalin

    2017-06-01

    Alzheimer's Disease (AD) is the most common type of dementia. Identifying correct biomarkers may determine pre-symptomatic AD subjects and enable early intervention. Recently, Multi-task sparse feature learning has been successfully applied to many computer vision and biomedical informatics researches. It aims to improve the generalization performance by exploiting the shared features among different tasks. However, most of the existing algorithms are formulated as a supervised learning scheme. Its drawback is with either insufficient feature numbers or missing label information. To address these challenges, we formulate an unsupervised framework for multi-task sparse feature learning based on a novel dictionary learning algorithm. To solve the unsupervised learning problem, we propose a two-stage Multi-Source Multi-Target Dictionary Learning (MMDL) algorithm. In stage 1, we propose a multi-source dictionary learning method to utilize the common and individual sparse features in different time slots. In stage 2, supported by a rigorous theoretical analysis, we develop a multi-task learning method to solve the missing label problem. Empirical studies on an N = 3970 longitudinal brain image data set, which involves 2 sources and 5 targets, demonstrate the improved prediction accuracy and speed efficiency of MMDL in comparison with other state-of-the-art algorithms.

  16. Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.

    Science.gov (United States)

    Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

    2015-01-01

    Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.

  17. Multi Agent System Based Wide Area Protection against Cascading Events

    DEFF Research Database (Denmark)

    Liu, Zhou; Chen, Zhe; Liu, Leo

    2012-01-01

    In this paper, a multi-agent system based wide area protection scheme is proposed in order to prevent long term voltage instability induced cascading events. The distributed relays and controllers work as a device agent which not only executes the normal function automatically but also can...... the effectiveness of proposed protection strategy. The simulation results indicate that the proposed multi agent control system can effectively coordinate the distributed relays and controllers to prevent the long term voltage instability induced cascading events....

  18. Assessing climate impact on reinforced concrete durability with a multi-physics model

    DEFF Research Database (Denmark)

    Michel, Alexander; Flint, Madeleine M.

    to shorter-term fluctuations in boundary conditions and therefore may underestimate climate change impacts. A highly sensitive fully-coupled, validated, multi-physics model for heat, moisture and ion transport and corrosion was used to assess a reinforced concrete structure located in coastal Norfolk...

  19. Teamwork in Multi-Agent Systems A Formal Approach

    CERN Document Server

    Dunin-Keplicz, Barbara Maria

    2010-01-01

    What makes teamwork tick?. Cooperation matters, in daily life and in complex applications. After all, many tasks need more than a single agent to be effectively performed. Therefore, teamwork rules!. Teams are social groups of agents dedicated to the fulfilment of particular persistent tasks. In modern multiagent environments, heterogeneous teams often consist of autonomous software agents, various types of robots and human beings. Teamwork in Multi-agent Systems: A Formal Approach explains teamwork rules in terms of agents' attitudes and their complex interplay. It provides the first comprehe

  20. Preparing culture change agents for academic medicine in a multi-institutional consortium: the C - change learning action network.

    Science.gov (United States)

    Pololi, Linda H; Krupat, Edward; Schnell, Eugene R; Kern, David E

    2013-01-01

    Research suggests an ongoing need for change in the culture of academic medicine. This article describes the structure, activities and evaluation of a culture change project: the C - Change Learning Action Network (LAN) and its impact on participants. The LAN was developed to create the experience of a culture that would prepare participants to facilitate a culture in academic medicine that would be more collaborative, inclusive, relational, and that supports the humanity and vitality of faculty. Purposefully diverse faculty, leaders, and deans from 5 US medical schools convened in 2 1/2-day meetings biannually over 4 years. LAN meetings employed experiential, cognitive, and affective learning modes; innovative dialogue strategies; and reflective practice aimed at facilitating deep dialogue, relationship formation, collaboration, authenticity, and transformative learning to help members experience the desired culture. Robust aggregated qualitative and quantitative data collected from the 5 schools were used to inform and stimulate culture-change plans. Quantitative and qualitative evaluation methods were used. Participants indicated that a safe, supportive, inclusive, collaborative culture was established in LAN and highly valued. LAN members reported a deepened understanding of organizational change, new and valued interpersonal connections, increased motivation and resilience, new skills and approaches, increased self-awareness and personal growth, emotional connection to the issues of diversity and inclusion, and application of new learnings in their work. A carefully designed multi-institutional learning community can transform the way participants experience and view institutional culture. It can motivate and prepare them to be change agents in their own institutions. Copyright © 2013 The Alliance for Continuing Education in the Health Professions, the Society for Academic Continuing Medical Education, and the Council on CME, Association for Hospital Medical

  1. Challenges in the Verification of Reinforcement Learning Algorithms

    Science.gov (United States)

    Van Wesel, Perry; Goodloe, Alwyn E.

    2017-01-01

    Machine learning (ML) is increasingly being applied to a wide array of domains from search engines to autonomous vehicles. These algorithms, however, are notoriously complex and hard to verify. This work looks at the assumptions underlying machine learning algorithms as well as some of the challenges in trying to verify ML algorithms. Furthermore, we focus on the specific challenges of verifying reinforcement learning algorithms. These are highlighted using a specific example. Ultimately, we do not offer a solution to the complex problem of ML verification, but point out possible approaches for verification and interesting research opportunities.

  2. A Multi-Agent Framework for Coordination of Intelligent Assistive Technologies

    DEFF Research Database (Denmark)

    Valente, Pedro Ricardo da Nova; Hossain, S.; Groenbaek, B.

    2010-01-01

    Intelligent care for the future is the IntelliCare project's main priority. This paper describes the design of a generic multi-agent framework for coordination of intelligent assistive technologies. The paper overviews technologies and software systems suitable for context awareness...... and housekeeping tasks, especially for performing a multi-robot cleaning-task activity. It also describes conducted work in the design of a multi-agent platform for coordination of intelligent assistive technologies. Instead of using traditional robot odometry estimation methods, we have tested an independent...

  3. Consensus of second-order multi-agent dynamic systems with quantized data

    Energy Technology Data Exchange (ETDEWEB)

    Guan, Zhi-Hong, E-mail: zhguan@mail.hust.edu.cn [Department of Control Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074 (China); Meng, Cheng [Department of Control Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074 (China); Liao, Rui-Quan [Petroleum Engineering College,Yangtze University, Jingzhou, 420400 (China); Zhang, Ding-Xue, E-mail: zdx7773@163.com [Petroleum Engineering College,Yangtze University, Jingzhou, 420400 (China)

    2012-01-09

    The consensus problem of second-order multi-agent systems with quantized link is investigated in this Letter. Some conditions are derived for the quantized consensus of the second-order multi-agent systems by the stability theory. Moreover, a result characterizing the relationship between the eigenvalues of the Laplacians matrix and the quantized consensus is obtained. Examples are given to illustrate the theoretical analysis. -- Highlights: ► A second-order multi-agent model with quantized data is proposed. ► Two sufficient and necessary conditions are obtained. ► The relationship between the eigenvalues of the Laplacians matrix and the quantized consensus is discovered.

  4. Study for the design method of multi-agent diagnostic system to improve diagnostic performance for similar abnormality

    International Nuclear Information System (INIS)

    Minowa, Hirotsugu; Gofuku, Akio

    2014-01-01

    Accidents on industrial plants cause large loss on human, economic, social credibility. In recent, studies of diagnostic methods using techniques of machine learning such as support vector machine is expected to detect the occurrence of abnormality in a plant early and correctly. There were reported that these diagnostic machines has high accuracy to diagnose the operating state of industrial plant under mono abnormality occurrence. But the each diagnostic machine on the multi-agent diagnostic system may misdiagnose similar abnormalities as a same abnormality if abnormalities to diagnose increases. That causes that a single diagnostic machine may show higher diagnostic performance than one of multi-agent diagnostic system because decision-making considering with misdiagnosis is difficult. Therefore, we study the design method for multi-agent diagnostic system to diagnose similar abnormality correctly. This method aimed to realize automatic generation of diagnostic system where the generation process and location of diagnostic machines are optimized to diagnose correctly the similar abnormalities which are evaluated from the similarity of process signals by statistical method. This paper explains our design method and reports the result evaluated our method applied to the process data of the fast-breeder reactor Monju

  5. Multi-agent based distributed control architecture for microgrid energy management and optimization

    International Nuclear Information System (INIS)

    Basir Khan, M. Reyasudin; Jidin, Razali; Pasupuleti, Jagadeesh

    2016-01-01

    Highlights: • A new multi-agent based distributed control architecture for energy management. • Multi-agent coordination based on non-cooperative game theory. • A microgrid model comprised of renewable energy generation systems. • Performance comparison of distributed with conventional centralized control. - Abstract: Most energy management systems are based on a centralized controller that is difficult to satisfy criteria such as fault tolerance and adaptability. Therefore, a new multi-agent based distributed energy management system architecture is proposed in this paper. The distributed generation system is composed of several distributed energy resources and a group of loads. A multi-agent system based decentralized control architecture was developed in order to provide control for the complex energy management of the distributed generation system. Then, non-cooperative game theory was used for the multi-agent coordination in the system. The distributed generation system was assessed by simulation under renewable resource fluctuations, seasonal load demand and grid disturbances. The simulation results show that the implementation of the new energy management system proved to provide more robust and high performance controls than conventional centralized energy management systems.

  6. Semiotics, Multi-Agent Systems and Organizations

    NARCIS (Netherlands)

    Gazendam, H.W.M.; Jorna, René J.

    1998-01-01

    Multi-agent systems are promising as models of organization because they are based on the idea that most work in human organizations is done based on intelligence, communication, cooperation, and massive parallel processing. They offer an alternative for system theories of organization, which are

  7. Social Learning, Reinforcement and Crime: Evidence from Three European Cities

    Science.gov (United States)

    Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina

    2012-01-01

    This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…

  8. Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning.

    Science.gov (United States)

    Mkrtchian, Anahit; Aylward, Jessica; Dayan, Peter; Roiser, Jonathan P; Robinson, Oliver J

    2017-10-01

    Serious and debilitating symptoms of anxiety are the most common mental health problem worldwide, accounting for around 5% of all adult years lived with disability in the developed world. Avoidance behavior-avoiding social situations for fear of embarrassment, for instance-is a core feature of such anxiety. However, as for many other psychiatric symptoms the biological mechanisms underlying avoidance remain unclear. Reinforcement learning models provide formal and testable characterizations of the mechanisms of decision making; here, we examine avoidance in these terms. A total of 101 healthy participants and individuals with mood and anxiety disorders completed an approach-avoidance go/no-go task under stress induced by threat of unpredictable shock. We show an increased reliance in the mood and anxiety group on a parameter of our reinforcement learning model that characterizes a prepotent (pavlovian) bias to withhold responding in the face of negative outcomes. This was particularly the case when the mood and anxiety group was under stress. This formal description of avoidance within the reinforcement learning framework provides a new means of linking clinical symptoms with biophysically plausible models of neural circuitry and, as such, takes us closer to a mechanistic understanding of mood and anxiety disorders. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  9. Reinforcement Learning for Online Control of Evolutionary Algorithms

    NARCIS (Netherlands)

    Eiben, A.; Horvath, Mark; Kowalczyk, Wojtek; Schut, Martijn

    2007-01-01

    The research reported in this paper is concerned with assessing the usefulness of reinforcment learning (RL) for on-line calibration of parameters in evolutionary algorithms (EA). We are running an RL procedure and the EA simultaneously and the RL is changing the EA parameters on-the-fly. We

  10. A Resource Logic for Multi-Agent Plan Merging

    NARCIS (Netherlands)

    De Weerdt, M.M.; Bos, A.; Tonino, H.; Witteveen, C.

    2003-01-01

    In a multi-agent system, agents are carrying out certain tasks by executing plans. Consequently, the problem of finding a plan, given a certain goal, has been given a lot of attention in the literature. Instead of concentrating on this problem, the focus of this paper is on cooperation between

  11. Relay tracking control for second-order multi-agent systems with damaged agents.

    Science.gov (United States)

    Dong, Lijing; Li, Jing; Liu, Qin

    2017-11-01

    This paper investigates a situation where smart agents capable of sensory and mobility are deployed to monitor a designated area. A preset number of agents start tracking when a target intrudes this area. Some of the tracking agents are possible to be out of order over the tracking course. Thus, we propose a cooperative relay tracking strategy to ensure the successful tracking with existence of damaged agents. Relay means that, when a tracking agent quits tracking due to malfunction, one of the near deployed agents replaces it to continue the tracking task. This results in jump of tracking errors and dynamic switching of topology of the multi-agent system. Switched system technique is employed to solve this specific problem. Finally, the effectiveness of proposed tracking strategy and validity of the theoretical results are verified by conducting a numerical simulation. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  12. Video Demo: Deep Reinforcement Learning for Coordination in Traffic Light Control

    NARCIS (Netherlands)

    van der Pol, E.; Oliehoek, F.A.; Bosse, T.; Bredeweg, B.

    2016-01-01

    This video demonstration contrasts two approaches to coordination in traffic light control using reinforcement learning: earlier work, based on a deconstruction of the state space into a linear combination of vehicle states, and our own approach based on the Deep Q-learning algorithm.

  13. Organization of the secure distributed computing based on multi-agent system

    Science.gov (United States)

    Khovanskov, Sergey; Rumyantsev, Konstantin; Khovanskova, Vera

    2018-04-01

    Nowadays developing methods for distributed computing is received much attention. One of the methods of distributed computing is using of multi-agent systems. The organization of distributed computing based on the conventional network computers can experience security threats performed by computational processes. Authors have developed the unified agent algorithm of control system of computing network nodes operation. Network PCs is used as computing nodes. The proposed multi-agent control system for the implementation of distributed computing allows in a short time to organize using of the processing power of computers any existing network to solve large-task by creating a distributed computing. Agents based on a computer network can: configure a distributed computing system; to distribute the computational load among computers operated agents; perform optimization distributed computing system according to the computing power of computers on the network. The number of computers connected to the network can be increased by connecting computers to the new computer system, which leads to an increase in overall processing power. Adding multi-agent system in the central agent increases the security of distributed computing. This organization of the distributed computing system reduces the problem solving time and increase fault tolerance (vitality) of computing processes in a changing computing environment (dynamic change of the number of computers on the network). Developed a multi-agent system detects cases of falsification of the results of a distributed system, which may lead to wrong decisions. In addition, the system checks and corrects wrong results.

  14. Simulation-based optimization parametric optimization techniques and reinforcement learning

    CERN Document Server

    Gosavi, Abhijit

    2003-01-01

    Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning introduces the evolving area of simulation-based optimization. The book's objective is two-fold: (1) It examines the mathematical governing principles of simulation-based optimization, thereby providing the reader with the ability to model relevant real-life problems using these techniques. (2) It outlines the computational technology underlying these methods. Taken together these two aspects demonstrate that the mathematical and computational methods discussed in this book do work. Broadly speaking, the book has two parts: (1) parametric (static) optimization and (2) control (dynamic) optimization. Some of the book's special features are: *An accessible introduction to reinforcement learning and parametric-optimization techniques. *A step-by-step description of several algorithms of simulation-based optimization. *A clear and simple introduction to the methodology of neural networks. *A gentle introduction to converg...

  15. Perception-based Co-evolutionary Reinforcement Learning for UAV Sensor Allocation

    National Research Council Canada - National Science Library

    Berenji, Hamid

    2003-01-01

    .... A Perception-based reasoning approach based on co-evolutionary reinforcement learning was developed for jointly addressing sensor allocation on each individual UAV and allocation of a team of UAVs...

  16. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention-A Neuroeducation Study.

    Science.gov (United States)

    Anderson, Sarah J; Hecker, Kent G; Krigolson, Olave E; Jamniczky, Heather A

    2018-01-01

    In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise.

  17. Nondestructive Intervention to Multi-Agent Systems through an Intelligent Agent

    Science.gov (United States)

    Han, Jing; Wang, Lin

    2013-01-01

    For a given multi-agent system where the local interaction rule of the existing agents can not be re-designed, one way to intervene the collective behavior of the system is to add one or a few special agents into the group which are still treated as normal agents by the existing ones. We study how to lead a Vicsek-like flocking model to reach synchronization by adding special agents. A popular method is to add some simple leaders (fixed-headings agents). However, we add one intelligent agent, called ‘shill’, which uses online feedback information of the group to decide the shill's moving direction at each step. A novel strategy for the shill to coordinate the group is proposed. It is strictly proved that a shill with this strategy and a limited speed can synchronize every agent in the group. The computer simulations show the effectiveness of this strategy in different scenarios, including different group sizes, shill speed, and with or without noise. Compared to the method of adding some fixed-heading leaders, our method can guarantee synchronization for any initial configuration in the deterministic scenario and improve the synchronization level significantly in low density groups, or model with noise. This suggests the advantage and power of feedback information in intervention of collective behavior. PMID:23658695

  18. Nondestructive intervention to multi-agent systems through an intelligent agent.

    Directory of Open Access Journals (Sweden)

    Jing Han

    Full Text Available For a given multi-agent system where the local interaction rule of the existing agents can not be re-designed, one way to intervene the collective behavior of the system is to add one or a few special agents into the group which are still treated as normal agents by the existing ones. We study how to lead a Vicsek-like flocking model to reach synchronization by adding special agents. A popular method is to add some simple leaders (fixed-headings agents. However, we add one intelligent agent, called 'shill', which uses online feedback information of the group to decide the shill's moving direction at each step. A novel strategy for the shill to coordinate the group is proposed. It is strictly proved that a shill with this strategy and a limited speed can synchronize every agent in the group. The computer simulations show the effectiveness of this strategy in different scenarios, including different group sizes, shill speed, and with or without noise. Compared to the method of adding some fixed-heading leaders, our method can guarantee synchronization for any initial configuration in the deterministic scenario and improve the synchronization level significantly in low density groups, or model with noise. This suggests the advantage and power of feedback information in intervention of collective behavior.

  19. Personalized E- learning System Based on Intelligent Agent

    Science.gov (United States)

    Duo, Sun; Ying, Zhou Cai

    Lack of personalized learning is the key shortcoming of traditional e-Learning system. This paper analyzes the personal characters in e-Learning activity. In order to meet the personalized e-learning, a personalized e-learning system based on intelligent agent was proposed and realized in the paper. The structure of system, work process, the design of intelligent agent and the realization of intelligent agent were introduced in the paper. After the test use of the system by certain network school, we found that the system could improve the learner's initiative participation, which can provide learners with personalized knowledge service. Thus, we thought it might be a practical solution to realize self- learning and self-promotion in the lifelong education age.

  20. Preparing Students for Future Learning with Teachable Agents

    Science.gov (United States)

    Chin, Doris B.; Dohmen, Ilsa M.; Cheng, Britte H.; Oppezzo, Marily A.; Chase, Catherine C.; Schwartz, Daniel L.

    2010-01-01

    One valuable goal of instructional technologies in K-12 education is to prepare students for future learning. Two classroom studies examined whether Teachable Agents (TA) achieves this goal. TA is an instructional technology that draws on the social metaphor of teaching a computer agent to help students learn. Students teach their agent by…

  1. Reinforcement learning on slow features of high-dimensional input streams.

    Directory of Open Access Journals (Sweden)

    Robert Legenstein

    Full Text Available Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning.

  2. Agent Based Fuzzy T-S Multi-Model System and Its Applications

    Directory of Open Access Journals (Sweden)

    Xiaopeng Zhao

    2015-11-01

    Full Text Available Based on the basic concepts of agent and fuzzy T-S model, an agent based fuzzy T-S multi-model (ABFT-SMM system is proposed in this paper. Different from the traditional method, the parameters and the membership value of the agent can be adjusted along with the process. In this system, each agent can be described as a dynamic equation, which can be seen as the local part of the multi-model, and it can execute the task alone or collaborate with other agents to accomplish a fixed goal. It is proved in this paper that the agent based fuzzy T-S multi-model system can approximate any linear or nonlinear system at arbitrary accuracy. The applications to the benchmark problem of chaotic time series prediction, water heater system and waste heat utilizing process illustrate the viability and the efficiency of the mentioned approach. At the same time, the method can be easily used to a number of engineering fields, including identification, nonlinear control, fault diagnostics and performance analysis.

  3. Online constrained model-based reinforcement learning

    CSIR Research Space (South Africa)

    Van Niekerk, B

    2017-08-01

    Full Text Available Constrained Model-based Reinforcement Learning Benjamin van Niekerk School of Computer Science University of the Witwatersrand South Africa Andreas Damianou∗ Amazon.com Cambridge, UK Benjamin Rosman Council for Scientific and Industrial Research, and School... MULTIPLE SHOOTING Using direct multiple shooting (Bock and Plitt, 1984), problem (1) can be transformed into a structured non- linear program (NLP). First, the time horizon [t0, t0 + T ] is partitioned into N equal subintervals [tk, tk+1] for k = 0...

  4. VigilAgent for the development of agent-based multi-robot surveillance systems

    OpenAIRE

    Gascueña Noheda, José Manuel; Navarro Martínez, Elena María; Fernández Caballero, Antonio

    2011-01-01

    Usually, surveillance applications are developed following an ad-hoc approach instead of using a methodology to guide stakeholders in achieving quality standards expected from commercial software. To solve this gap, our conjecture is that surveillance applications can be fully developed from their initial design stages by means of agent-based methodologies. Specifically, this paper describes the experience and the results of using a multi-agent systems approach according to the process provid...

  5. Model of interaction in Smart Grid on the basis of multi-agent system

    Science.gov (United States)

    Engel, E. A.; Kovalev, I. V.; Engel, N. E.

    2016-11-01

    This paper presents model of interaction in Smart Grid on the basis of multi-agent system. The use of travelling waves in the multi-agent system describes the behavior of the Smart Grid from the local point, which is being the complement of the conventional approach. The simulation results show that the absorption of the wave in the distributed multi-agent systems is effectively simulated the interaction in Smart Grid.

  6. Trends in Cyber-Physical Multi-Agent Systems. The PAAMS Collection - 15th International Conference

    OpenAIRE

    Fernando De la Prieta; Zita Vale; Luis Antunes; Tiago Pinto; Andrew T. Campbell; Vicente Julián; Antonio J.R. Neves; María N. Moreno

    2017-01-01

    PAAMS, the International Conference on Practical Applications of Agents and Multi-Agent Systems is an evolution of the International Workshop on Practical Applications of Agents and Multi-Agent Systems. PAAMS is an international yearly tribune to present, to discuss, and to disseminate the latest developments and the most important outcomes related to real-world applications. It provides a unique opportunity to bring multi-disciplinary experts, academics and practitioners together to exchange...

  7. Agent-Based Optimization

    CERN Document Server

    Jędrzejowicz, Piotr; Kacprzyk, Janusz

    2013-01-01

    This volume presents a collection of original research works by leading specialists focusing on novel and promising approaches in which the multi-agent system paradigm is used to support, enhance or replace traditional approaches to solving difficult optimization problems. The editors have invited several well-known specialists to present their solutions, tools, and models falling under the common denominator of the agent-based optimization. The book consists of eight chapters covering examples of application of the multi-agent paradigm and respective customized tools to solve  difficult optimization problems arising in different areas such as machine learning, scheduling, transportation and, more generally, distributed and cooperative problem solving.

  8. Co-Labeling for Multi-View Weakly Labeled Learning.

    Science.gov (United States)

    Xu, Xinxing; Li, Wen; Xu, Dong; Tsang, Ivor W

    2016-06-01

    It is often expensive and time consuming to collect labeled training samples in many real-world applications. To reduce human effort on annotating training samples, many machine learning techniques (e.g., semi-supervised learning (SSL), multi-instance learning (MIL), etc.) have been studied to exploit weakly labeled training samples. Meanwhile, when the training data is represented with multiple types of features, many multi-view learning methods have shown that classifiers trained on different views can help each other to better utilize the unlabeled training samples for the SSL task. In this paper, we study a new learning problem called multi-view weakly labeled learning, in which we aim to develop a unified approach to learn robust classifiers by effectively utilizing different types of weakly labeled multi-view data from a broad range of tasks including SSL, MIL and relative outlier detection (ROD). We propose an effective approach called co-labeling to solve the multi-view weakly labeled learning problem. Specifically, we model the learning problem on each view as a weakly labeled learning problem, which aims to learn an optimal classifier from a set of pseudo-label vectors generated by using the classifiers trained from other views. Unlike traditional co-training approaches using a single pseudo-label vector for training each classifier, our co-labeling approach explores different strategies to utilize the predictions from different views, biases and iterations for generating the pseudo-label vectors, making our approach more robust for real-world applications. Moreover, to further improve the weakly labeled learning on each view, we also exploit the inherent group structure in the pseudo-label vectors generated from different strategies, which leads to a new multi-layer multiple kernel learning problem. Promising results for text-based image retrieval on the NUS-WIDE dataset as well as news classification and text categorization on several real-world multi

  9. Distributed Economic Dispatch in Microgrids Based on Cooperative Reinforcement Learning.

    Science.gov (United States)

    Liu, Weirong; Zhuang, Peng; Liang, Hao; Peng, Jun; Huang, Zhiwu; Weirong Liu; Peng Zhuang; Hao Liang; Jun Peng; Zhiwu Huang; Liu, Weirong; Liang, Hao; Peng, Jun; Zhuang, Peng; Huang, Zhiwu

    2018-06-01

    Microgrids incorporated with distributed generation (DG) units and energy storage (ES) devices are expected to play more and more important roles in the future power systems. Yet, achieving efficient distributed economic dispatch in microgrids is a challenging issue due to the randomness and nonlinear characteristics of DG units and loads. This paper proposes a cooperative reinforcement learning algorithm for distributed economic dispatch in microgrids. Utilizing the learning algorithm can avoid the difficulty of stochastic modeling and high computational complexity. In the cooperative reinforcement learning algorithm, the function approximation is leveraged to deal with the large and continuous state spaces. And a diffusion strategy is incorporated to coordinate the actions of DG units and ES devices. Based on the proposed algorithm, each node in microgrids only needs to communicate with its local neighbors, without relying on any centralized controllers. Algorithm convergence is analyzed, and simulations based on real-world meteorological and load data are conducted to validate the performance of the proposed algorithm.

  10. A Neuro-Control Design Based on Fuzzy Reinforcement Learning

    DEFF Research Database (Denmark)

    Katebi, S.D.; Blanke, M.

    This paper describes a neuro-control fuzzy critic design procedure based on reinforcement learning. An important component of the proposed intelligent control configuration is the fuzzy credit assignment unit which acts as a critic, and through fuzzy implications provides adjustment mechanisms....... The fuzzy credit assignment unit comprises a fuzzy system with the appropriate fuzzification, knowledge base and defuzzification components. When an external reinforcement signal (a failure signal) is received, sequences of control actions are evaluated and modified by the action applier unit. The desirable...... ones instruct the neuro-control unit to adjust its weights and are simultaneously stored in the memory unit during the training phase. In response to the internal reinforcement signal (set point threshold deviation), the stored information is retrieved by the action applier unit and utilized for re...

  11. Distributed Cooperative Control of Nonlinear and Non-identical Multi-agent Systems

    DEFF Research Database (Denmark)

    Bidram, Ali; Lewis, Frank; Davoudi, Ali

    2013-01-01

    This paper exploits input-output feedback linearization technique to implement distributed cooperative control of multi-agent systems with nonlinear and non-identical dynamics. Feedback linearization transforms the synchronization problem for a nonlinear and heterogeneous multi-agent system...... for electric power microgrids. The effectiveness of the proposed control is verified by simulating a microgrid test system....

  12. Characterizing Reinforcement Learning Methods through Parameterized Learning Problems

    Science.gov (United States)

    2011-06-03

    extraneous. The agent could potentially adapt these representational aspects by applying methods from feature selection ( Kolter and Ng, 2009; Petrik et al...611–616. AAAI Press. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature selection in least-squares temporal difference learning. In A. P

  13. Study on collaborative optimization control of ventilation and radon reduction system based on multi-agent technology

    International Nuclear Information System (INIS)

    Dai Jianyong; Meng Lingcong; Zou Shuliang

    2015-01-01

    According to the radioactive safety features such as radon and its progeny, combined with the theory of ventilation system, structure of multi-agent system for ventilation and radon reduction system is constructed with the application of multi agent technology. The function attribute of the key agent and the connection between the nodes in the multi-agent system are analyzed to establish the distributed autonomous logic structure and negotiation mechanism of multi agent system of ventilation and radon reduction system, and thus to implement the coordination optimization control of the multi-agent system. The example analysis shows that the system structure of the multi-agent system of ventilation and reducing radon system and its collaborative mechanism can improve and optimize the radioactive pollutants control, which provides a theoretical basis and important application prospect. (authors)

  14. An Evolutionary Approach for Optimizing Hierarchical Multi-Agent System Organization

    OpenAIRE

    Shen, Zhiqi; Yu, Ling; Yu, Han

    2014-01-01

    It has been widely recognized that the performance of a multi-agent system is highly affected by its organization. A large scale system may have billions of possible ways of organization, which makes it impractical to find an optimal choice of organization using exhaustive search methods. In this paper, we propose a genetic algorithm aided optimization scheme for designing hierarchical structures of multi-agent systems. We introduce a novel algorithm, called the hierarchical genetic algorithm...

  15. An analysis of multi-agent diagnosis

    NARCIS (Netherlands)

    Roos, Nico; Ten Teije, Annette; Bos, André; Witteveen, Cees; Castelfranchi, C.; Johnson, W.L.

    2002-01-01

    This paper analyzes the use of a Multi-Agent System for Model-Based Diagnosis. In a large dynamical system, it is often infeasible or even impossible to maintain a model of the whole system. Instead, several incomplete models of the system have to be used to establish a diagnosis and to detect

  16. Automatic Generation of Agents using Reusable Soft Computing Code Libraries to develop Multi Agent System for Healthcare

    OpenAIRE

    Priti Srinivas Sajja

    2015-01-01

    This paper illustrates architecture for a multi agent system in healthcare domain. The architecture is generic and designed in form of multiple layers. One of the layers of the architecture contains many proactive, co-operative and intelligent agents such as resource management agent, query agent, pattern detection agent and patient management agent. Another layer of the architecture is a collection of libraries to auto-generate code for agents using soft computing techni...

  17. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study

    Science.gov (United States)

    Anderson, Sarah J.; Hecker, Kent G.; Krigolson, Olave E.; Jamniczky, Heather A.

    2018-01-01

    In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise. PMID:29467638

  18. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study

    Directory of Open Access Journals (Sweden)

    Sarah J. Anderson

    2018-02-01

    Full Text Available In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT methods incorporate pre-class exercises (typically online meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise.

  19. Optimal Control via Reinforcement Learning with Symbolic Policy Approximation

    NARCIS (Netherlands)

    Kubalìk, Jiřì; Alibekov, Eduard; Babuska, R.; Dochain, Denis; Henrion, Didier; Peaucelle, Dimitri

    2017-01-01

    Model-based reinforcement learning (RL) algorithms can be used to derive optimal control laws for nonlinear dynamic systems. With continuous-valued state and input variables, RL algorithms have to rely on function approximators to represent the value function and policy mappings. This paper

  20. Learning Similar Actions by Reinforcement or Sensory-Prediction Errors Rely on Distinct Physiological Mechanisms.

    Science.gov (United States)

    Uehara, Shintaro; Mawase, Firas; Celnik, Pablo

    2017-09-14

    Humans can acquire knowledge of new motor behavior via different forms of learning. The two forms most commonly studied have been the development of internal models based on sensory-prediction errors (error-based learning) and success-based feedback (reinforcement learning). Human behavioral studies suggest these are distinct learning processes, though the neurophysiological mechanisms that are involved have not been characterized. Here, we evaluated physiological markers from the cerebellum and the primary motor cortex (M1) using noninvasive brain stimulations while healthy participants trained finger-reaching tasks. We manipulated the extent to which subjects rely on error-based or reinforcement by providing either vector or binary feedback about task performance. Our results demonstrated a double dissociation where learning the task mainly via error-based mechanisms leads to cerebellar plasticity modifications but not long-term potentiation (LTP)-like plasticity changes in M1; while learning a similar action via reinforcement mechanisms elicited M1 LTP-like plasticity but not cerebellar plasticity changes. Our findings indicate that learning complex motor behavior is mediated by the interplay of different forms of learning, weighing distinct neural mechanisms in M1 and the cerebellum. Our study provides insights for designing effective interventions to enhance human motor learning. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Multi-agent platform for development of educational games for children with autism

    NARCIS (Netherlands)

    Alers, S.H.M.; Barakova, E.I.

    2009-01-01

    Multi-agent system of autonomous interactive blocks that can display its active state through color and light intensity has been developed. Depending on the individual rules, these autonomous blocks could express emergent behaviors which are a basis for various educational games. The multi-agent

  2. Service orientation in holonic and multi-agent manufacturing

    CERN Document Server

    Thomas, André; Trentesaux, Damien

    2015-01-01

    This volume gathers the peer reviewed papers presented at the 4th edition of the International Workshop “Service Orientation in Holonic and Multi-agent Manufacturing – SOHOMA’14” organized and hosted on November 5-6, 2014 by the University of Lorraine, France in collaboration with the CIMR Research Centre of the University Politehnica of Bucharest and the TEMPO Laboratory of the University of Valenciennes and Hainaut-Cambrésis.   The book is structured in six parts, each one covering a specific research line which represents a trend in future manufacturing: (1) Holonic and Agent-based Industrial Automation Systems; (2) Service-oriented Management and Control of Manufacturing Systems; (3) Distributed Modelling for Safety and Security in Industrial Systems; (4) Complexity, Big Data and Virtualization in Computing-oriented Manufacturing; (5) Adaptive, Bio-inspired and Self-organizing Multi-Agent Systems for Manufacturing, and (6) Physical Internet Simulation, Modelling and Control.   There is a clear ...

  3. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    Science.gov (United States)

    Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane

    2017-01-01

    Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning. PMID:29163004

  4. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Lucas Kastner

    2017-10-01

    Full Text Available Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1 Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2 Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3 Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.

  5. Research of negotiation in network trade system based on multi-agent

    Science.gov (United States)

    Cai, Jun; Wang, Guozheng; Wu, Haiyan

    2009-07-01

    A construction and implementation technology of network trade based on multi-agent is described in this paper. First, we researched the technology of multi-agent, then we discussed the consumer's behaviors and the negotiation between purchaser and bargainer which emerges in the traditional business mode and analysed the key technology to implement the network trade system. Finally, we implement the system.

  6. Multi-agent based modeling for electric vehicle integration in a distribution network operation

    DEFF Research Database (Denmark)

    Hu, Junjie; Morais, Hugo; Lind, Morten

    2016-01-01

    The purpose of this paper is to present a multi-agent based modeling technology for simulating and operating a hierarchical energy management of a power distribution system with focus on EVs integration. The proposed multi-agent system consists of four types of agents: i) Distribution system...... operator (DSO) technical agent and ii) DSO market agents that both belong to the top layer of the hierarchy and their roles are to manage the distribution network by avoiding grid congestions and using congestion prices to coordinate the energy scheduled; iii) Electric vehicle virtual power plant agents...

  7. Negotiation and argumentation in multi-agent systems

    CERN Document Server

    Lopes, Fernando

    2014-01-01

    Multi-agent systems (MAS) composed of autonomous agents representing individuals or organizations and capable of reaching mutually beneficial agreements through negotiation and argumentation are becoming increasingly important and pervasive.Research on both automated negotiation and argumentation in MAS has a vigorous, exciting tradition. However, efforts to integrate both areas have received only selective attention in the academia and the practitioner literature. A symbiotic relationship could significantly strengthen each area's progress and trigger new R&D challenges and prospects toward t

  8. E-learning paradigms and applications agent-based approach

    CERN Document Server

    Jain, Lakhmi

    2014-01-01

    Teaching and learning paradigms have attracted increased attention especially in the last decade. Immense developments of different ICT technologies and services have paved the way for alternative but effective approaches in educational processes. Many concepts of the agent technology, such as intelligence, autonomy, and cooperation, have had a direct positive impact on many of the requests imposed on modern e-learning systems and educational processes. This book presents the state-of-the-art of e-learning and tutoring systems, and discusses their capabilities and benefits that stem from integrating software agents. We hope that the presented work will be of a great use to our colleagues and researchers interested in the e-learning and agent technology.    

  9. Multi-agent robotic systems and applications for satellite missions

    Science.gov (United States)

    Nunes, Miguel A.

    A revolution in the space sector is happening. It is expected that in the next decade there will be more satellites launched than in the previous sixty years of space exploration. Major challenges are associated with this growth of space assets such as the autonomy and management of large groups of satellites, in particular with small satellites. There are two main objectives for this work. First, a flexible and distributed software architecture is presented to expand the possibilities of spacecraft autonomy and in particular autonomous motion in attitude and position. The approach taken is based on the concept of distributed software agents, also referred to as multi-agent robotic system. Agents are defined as software programs that are social, reactive and proactive to autonomously maximize the chances of achieving the set goals. Part of the work is to demonstrate that a multi-agent robotic system is a feasible approach for different problems of autonomy such as satellite attitude determination and control and autonomous rendezvous and docking. The second main objective is to develop a method to optimize multi-satellite configurations in space, also known as satellite constellations. This automated method generates new optimal mega-constellations designs for Earth observations and fast revisit times on large ground areas. The optimal satellite constellation can be used by researchers as the baseline for new missions. The first contribution of this work is the development of a new multi-agent robotic system for distributing the attitude determination and control subsystem for HiakaSat. The multi-agent robotic system is implemented and tested on the satellite hardware-in-the-loop testbed that simulates a representative space environment. The results show that the newly proposed system for this particular case achieves an equivalent control performance when compared to the monolithic implementation. In terms on computational efficiency it is found that the multi-agent

  10. Impacts of Pedagogical Agent Gender in an Accessible Learning Environment

    Science.gov (United States)

    Schroeder, Noah L.; Adesope, Olusola O.

    2015-01-01

    Advances in information technologies have resulted in the use of pedagogical agents to facilitate learning. Although several studies have been conducted to examine the effects of pedagogical agents on learning, little is known about gender stereotypes of agents and how those stereotypes influence student learning and attitudes. This study…

  11. Ensemble Network Architecture for Deep Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Xi-liang Chen

    2018-01-01

    Full Text Available The popular deep Q learning algorithm is known to be instability because of the Q-value’s shake and overestimation action values under certain conditions. These issues tend to adversely affect their performance. In this paper, we develop the ensemble network architecture for deep reinforcement learning which is based on value function approximation. The temporal ensemble stabilizes the training process by reducing the variance of target approximation error and the ensemble of target values reduces the overestimate and makes better performance by estimating more accurate Q-value. Our results show that this architecture leads to statistically significant better value evaluation and more stable and better performance on several classical control tasks at OpenAI Gym environment.

  12. Consensus pursuit of heterogeneous multi-agent systems under a directed acyclic graph

    Science.gov (United States)

    Yan, Jing; Guan, Xin-Ping; Luo, Xiao-Yuan

    2011-04-01

    This paper is concerned with the cooperative target pursuit problem by multiple agents based on directed acyclic graph. The target appears at a random location and moves only when sensed by the agents, and agents will pursue the target once they detect its existence. Since the ability of each agent may be different, we consider the heterogeneous multi-agent systems. According to the topology of the multi-agent systems, a novel consensus-based control law is proposed, where the target and agents are modeled as a leader and followers, respectively. Based on Mason's rule and signal flow graph analysis, the convergence conditions are provided to show that the agents can catch the target in a finite time. Finally, simulation studies are provided to verify the effectiveness of the proposed approach.

  13. Distributed Market-Based Algorithms for Multi-Agent Planning with Shared Resources

    Science.gov (United States)

    2013-02-01

    1 Introduction 1 2 Distributed Market-Based Multi-Agent Planning 5 2.1 Problem Formulation...over the deterministic planner, on the “test set” of scenarios with changing economies. . . 50 xi xii Chapter 1 Introduction Multi-agent planning is...representation of the objective (4.2.1). For example, for the supply chain mangement problem, we assumed a sequence of Bernoulli coin flips, which seems

  14. Integrating distributed Bayesian inference and reinforcement learning for sensor management

    NARCIS (Netherlands)

    Grappiolo, C.; Whiteson, S.; Pavlin, G.; Bakker, B.

    2009-01-01

    This paper introduces a sensor management approach that integrates distributed Bayesian inference (DBI) and reinforcement learning (RL). DBI is implemented using distributed perception networks (DPNs), a multiagent approach to performing efficient inference, while RL is used to automatically

  15. Multi-Agent Cooperative Target Search

    Directory of Open Access Journals (Sweden)

    Jinwen Hu

    2014-05-01

    Full Text Available This paper addresses a vision-based cooperative search for multiple mobile ground targets by a group of unmanned aerial vehicles (UAVs with limited sensing and communication capabilities. The airborne camera on each UAV has a limited field of view and its target discriminability varies as a function of altitude. First, by dividing the whole surveillance region into cells, a probability map can be formed for each UAV indicating the probability of target existence within each cell. Then, we propose a distributed probability map updating model which includes the fusion of measurement information, information sharing among neighboring agents, information decay and transmission due to environmental changes such as the target movement. Furthermore, we formulate the target search problem as a multi-agent cooperative coverage control problem by optimizing the collective coverage area and the detection performance. The proposed map updating model and the cooperative control scheme are distributed, i.e., assuming that each agent only communicates with its neighbors within its communication range. Finally, the effectiveness of the proposed algorithms is illustrated by simulation.

  16. Learning User Preferences in Ubiquitous Systems: A User Study and a Reinforcement Learning Approach

    OpenAIRE

    Zaidenberg , Sofia; Reignier , Patrick; Mandran , Nadine

    2010-01-01

    International audience; Our study concerns a virtual assistant, proposing services to the user based on its current perceived activity and situation (ambient intelligence). Instead of asking the user to define his preferences, we acquire them automatically using a reinforcement learning approach. Experiments showed that our system succeeded the learning of user preferences. In order to validate the relevance and usability of such a system, we have first conducted a user study. 26 non-expert s...

  17. Multi-agent system-based event-triggered hybrid control scheme for energy internet

    DEFF Research Database (Denmark)

    Dou, Chunxia; Yue, Dong; Han, Qing Long

    2017-01-01

    This paper is concerned with an event-triggered hybrid control for the energy Internet based on a multi-agent system approach with which renewable energy resources can be fully utilized to meet load demand with high security and well dynamical quality. In the design of control, a multi-agent system...

  18. SIMULATING AN EVOLUTIONARY MULTI-AGENT BASED MODEL OF THE STOCK MARKET

    Directory of Open Access Journals (Sweden)

    Diana MARICA

    2015-08-01

    Full Text Available The paper focuses on artificial stock market simulations using a multi-agent model incorporating 2,000 heterogeneous agents interacting on the artificial market. The agents interaction is due to trading activity on the market through a call auction trading mechanism. The multi-agent model uses evolutionary techniques such as genetic programming in order to generate an adaptive and evolving population of agents. Each artificial agent is endowed with wealth and a genetic programming induced trading strategy. The trading strategy evolves and adapts to the new market conditions through a process called breeding, which implies that at each simulation step, new agents with better trading strategies are generated by the model, from recombining the best performing trading strategies and replacing the agents which have the worst performing trading strategies. The simulation model was build with the help of the simulation software Altreva Adaptive Modeler which offers a suitable platform for financial market simulations of evolutionary agent based models, the S&P500 composite index being used as a benchmark for the simulation results.

  19. Minimum Information Loss Based Multi-kernel Learning for Flagellar Protein Recognition in Trypanosoma Brucei

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-12-01

    Trypanosma brucei (T. Brucei) is an important pathogen agent of African trypanosomiasis. The flagellum is an essential and multifunctional organelle of T. Brucei, thus it is very important to recognize the flagellar proteins from T. Brucei proteins for the purposes of both biological research and drug design. In this paper, we investigate computationally recognizing flagellar proteins in T. Brucei by pattern recognition methods. It is argued that an optimal decision function can be obtained as the difference of probability functions of flagella protein and the non-flagellar protein for the purpose of flagella protein recognition. We propose to learn a multi-kernel classification function to approximate this optimal decision function, by minimizing the information loss of such approximation which is measured by the Kull back-Leibler (KL) divergence. An iterative multi-kernel classifier learning algorithm is developed to minimize the KL divergence for the problem of T. Brucei flagella protein recognition, experiments show its advantage over other T. Brucei flagellar protein recognition and multi-kernel learning methods. © 2014 IEEE.

  20. PDL as a multi-agent strategy logic

    NARCIS (Netherlands)

    D.J.N. van Eijck (Jan); B.C. Schipper

    2013-01-01

    textabstractPropositional Dynamic Logic or PDL was invented as a logic for reasoning about regular programming constructs. We propose a new perspective on PDL as a multi-agent strategic logic (MASL). This logic for strategic reasoning has group strategies as first class citizens, and

  1. Code-specific learning rules improve action selection by populations of spiking neurons.

    Science.gov (United States)

    Friedrich, Johannes; Urbanczik, Robert; Senn, Walter

    2014-08-01

    Population coding is widely regarded as a key mechanism for achieving reliable behavioral decisions. We previously introduced reinforcement learning for population-based decision making by spiking neurons. Here we generalize population reinforcement learning to spike-based plasticity rules that take account of the postsynaptic neural code. We consider spike/no-spike, spike count and spike latency codes. The multi-valued and continuous-valued features in the postsynaptic code allow for a generalization of binary decision making to multi-valued decision making and continuous-valued action selection. We show that code-specific learning rules speed up learning both for the discrete classification and the continuous regression tasks. The suggested learning rules also speed up with increasing population size as opposed to standard reinforcement learning rules. Continuous action selection is further shown to explain realistic learning speeds in the Morris water maze. Finally, we introduce the concept of action perturbation as opposed to the classical weight- or node-perturbation as an exploration mechanism underlying reinforcement learning. Exploration in the action space greatly increases the speed of learning as compared to exploration in the neuron or weight space.

  2. Micro/Nanomechanical characterization of multi-walled carbon nanotubes reinforced epoxy composite.

    Science.gov (United States)

    Cui, Peng; Wang, Xinnan; Tangpong, X W

    2012-11-01

    In this paper, the mechanical properties of 1 wt.% multi-walled carbon nanotubes (MWCNTs) reinforced epoxy nanocomposites were characterized using a self-designed micro/nano three point bending tester that was on an atomic force microscope (AFM) to in situ observe MWCNTs movement on the sample surface under loading. The migration of an individual MWCNT at the surface of the nanocomposite was tracked to address the nanomechanical reinforcing mechanism of the nanocomposites. Through morphology analysis of the nanocomposite via scanning electron microscopy, AFM, and digital image correlation technique, it was found that the MWCNTs agglomerate and the bundles were the main factors for limiting the bending strength of the composites. The agglomeration/bundle effect was included in the Halpin-Tsai model to account for the elastic modulus of the nanocomposites.

  3. A Secured Cognitive Agent based Multi-strategic Intelligent Search System

    Directory of Open Access Journals (Sweden)

    Neha Gulati

    2018-04-01

    Full Text Available Search Engine (SE is the most preferred information retrieval tool ubiquitously used. In spite of vast scale involvement of users in SE’s, their limited capabilities to understand the user/searcher context and emotions places high cognitive, perceptual and learning load on the user to maintain the search momentum. In this regard, the present work discusses a Cognitive Agent (CA based approach to support the user in Web-based search process. The work suggests a framework called Secured Cognitive Agent based Multi-strategic Intelligent Search System (CAbMsISS to assist the user in search process. It helps to reduce the contextual and emotional mismatch between the SE’s and user. After implementation of the proposed framework, performance analysis shows that CAbMsISS framework improves Query Retrieval Time (QRT and effectiveness for retrieving relevant results as compared to Present Search Engine (PSE. Supplementary to this, it also provides search suggestions when user accesses a resource previously tagged with negative emotions. Overall, the goal of the system is to enhance the search experience for keeping the user motivated. The framework provides suggestions through the search log that tracks the queries searched, resources accessed and emotions experienced during the search. The implemented framework also considers user security. Keywords: BDI model, Cognitive Agent, Emotion, Information retrieval, Intelligent search, Search Engine

  4. Multi-agent Water Resources Management

    Science.gov (United States)

    Castelletti, A.; Giuliani, M.

    2011-12-01

    Increasing environmental awareness and emerging trends such as water trading, energy market, deregulation and democratization of water-related services are challenging integrated water resources planning and management worldwide. The traditional approach to water management design based on sector-by-sector optimization has to be reshaped to account for multiple interrelated decision-makers and many stakeholders with increasing decision power. Centralized management, though interesting from a conceptual point of view, is unfeasible in most of the modern social and institutional contexts, and often economically inefficient. Coordinated management, where different actors interact within a full open trust exchange paradigm under some institutional supervision is a promising alternative to the ideal centralized solution and the actual uncoordinated practices. This is a significant issue in most of the Southern Alps regulated lakes, where upstream hydropower reservoirs maximize their benefit independently form downstream users; it becomes even more relevant in the case of transboundary systems, where water management upstream affects water availability downstream (e.g. the River Zambesi flowing through Zambia, Zimbabwe and Mozambique or the Red River flowing from South-Western China through Northern Vietnam. In this study we apply Multi-Agent Systems (MAS) theory to design an optimal management in a decentralized way, considering a set of multiple autonomous agents acting in the same environment and taking into account the pay-off of individual water users, which are inherently distributed along the river and need to coordinate to jointly reach their objectives. In this way each real-world actor, representing the decision-making entity (e.g. the operator of a reservoir or a diversion dam) can be represented one-to-one by a computer agent, defined as a computer system that is situated in some environment and that is capable of autonomous action in this environment in

  5. Optimal and Autonomous Control Using Reinforcement Learning: A Survey.

    Science.gov (United States)

    Kiumarsi, Bahare; Vamvoudakis, Kyriakos G; Modares, Hamidreza; Lewis, Frank L

    2018-06-01

    This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.

  6. An Approach for Autonomy: A Collaborative Communication Framework for Multi-Agent Systems

    Science.gov (United States)

    Dufrene, Warren Russell, Jr.

    2005-01-01

    Research done during the last three years has studied the emersion properties of Complex Adaptive Systems (CAS). The deployment of Artificial Intelligence (AI) techniques applied to remote Unmanned Aerial Vehicles has led the author to investigate applications of CAS within the field of Autonomous Multi-Agent Systems. The core objective of current research efforts is focused on the simplicity of Intelligent Agents (IA) and the modeling of these agents within complex systems. This research effort looks at the communication, interaction, and adaptability of multi-agents as applied to complex systems control. The embodiment concept applied to robotics has application possibilities within multi-agent frameworks. A new framework for agent awareness within a virtual 3D world concept is possible where the vehicle is composed of collaborative agents. This approach has many possibilities for applications to complex systems. This paper describes the development of an approach to apply this virtual framework to the NASA Goddard Space Flight Center (GSFC) tetrahedron structure developed under the Autonomous Nano Technology Swarm (ANTS) program and the Super Miniaturized Addressable Reconfigurable Technology (SMART) architecture program. These projects represent an innovative set of novel concepts deploying adaptable, self-organizing structures composed of many tetrahedrons. This technology is pushing current applied Agents Concepts to new levels of requirements and adaptability.

  7. Unicorn: Continual Learning with a Universal, Off-policy Agent

    OpenAIRE

    Mankowitz, Daniel J.; Žídek, Augustin; Barreto, André; Horgan, Dan; Hessel, Matteo; Quan, John; Oh, Junhyuk; van Hasselt, Hado; Silver, David; Schaul, Tom

    2018-01-01

    Some real-world domains are best characterized as a single task, but for others this perspective is limiting. Instead, some tasks continually grow in complexity, in tandem with the agent's competence. In continual learning, also referred to as lifelong learning, there are no explicit task boundaries or curricula. As learning agents have become more powerful, continual learning remains one of the frontiers that has resisted quick progress. To test continual learning capabilities we consider a ...

  8. Universal effect of dynamical reinforcement learning mechanism in spatial evolutionary games

    International Nuclear Information System (INIS)

    Zhang, Hai-Feng; Wu, Zhi-Xi; Wang, Bing-Hong

    2012-01-01

    One of the prototypical mechanisms in understanding the ubiquitous cooperation in social dilemma situations is the win–stay, lose–shift rule. In this work, a generalized win–stay, lose–shift learning model—a reinforcement learning model with dynamic aspiration level—is proposed to describe how humans adapt their social behaviors based on their social experiences. In the model, the players incorporate the information of the outcomes in previous rounds with time-dependent aspiration payoffs to regulate the probability of choosing cooperation. By investigating such a reinforcement learning rule in the spatial prisoner's dilemma game and public goods game, a most noteworthy viewpoint is that moderate greediness (i.e. moderate aspiration level) favors best the development and organization of collective cooperation. The generality of this observation is tested against different regulation strengths and different types of network of interaction as well. We also make comparisons with two recently proposed models to highlight the importance of the mechanism of adaptive aspiration level in supporting cooperation in structured populations

  9. A multi-agent design for a pressurized water reactor (P.W.R.) control system; Modelisation multi-agents pour la conduite d'un reacteur a eau sous pression (REP)

    Energy Technology Data Exchange (ETDEWEB)

    Aimar-Lichtenberger, M. [Paris-11 Univ., 91 - Orsay (France)

    1999-01-01

    This PhD work is in keeping with the complex industrial process control. The starting point is the analysis of control principles in a Pressurized Water Reactor (P.W.R). In order to cope with the limits of the present control procedures, a new control organisation by objectives and means is defined. This functional organisation is based on the state approach and is characterized by the parallel management of control functions to ensure the continuous control of the installation essential variables. With regard to this complex system problematic, we search the most adapted computer modeling. We show that a multi-agent system approach brings an interesting answer to manage the distribution and parallelism of control decisions and tasks. We present a synthetic study of multi-agent systems and their application fields.The choice of a multi-agent approach proceeds with the design of an agent model. This model gains experiences from other applications. This model is implemented in a computer environment which combines the mechanisms of an object language with Prolog. We propose in this frame a multi-agent modeling of the control system where each function is represented by an agent. The agents are structured in a hierarchical organisation and deal with different abstraction levers of the problem. Following a prototype process, the validation is realized by an implementation and by a coupling to a reactor simulator. The essential contributions of an agent approach turn on the mastery of the system complexity, the openness, the robustness and the potentialities of human-machine cooperation. (author)

  10. Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain.

    Science.gov (United States)

    Niv, Yael; Edlund, Jeffrey A; Dayan, Peter; O'Doherty, John P

    2012-01-11

    Humans and animals are exquisitely, though idiosyncratically, sensitive to risk or variance in the outcomes of their actions. Economic, psychological, and neural aspects of this are well studied when information about risk is provided explicitly. However, we must normally learn about outcomes from experience, through trial and error. Traditional models of such reinforcement learning focus on learning about the mean reward value of cues and ignore higher order moments such as variance. We used fMRI to test whether the neural correlates of human reinforcement learning are sensitive to experienced risk. Our analysis focused on anatomically delineated regions of a priori interest in the nucleus accumbens, where blood oxygenation level-dependent (BOLD) signals have been suggested as correlating with quantities derived from reinforcement learning. We first provide unbiased evidence that the raw BOLD signal in these regions corresponds closely to a reward prediction error. We then derive from this signal the learned values of cues that predict rewards of equal mean but different variance and show that these values are indeed modulated by experienced risk. Moreover, a close neurometric-psychometric coupling exists between the fluctuations of the experience-based evaluations of risky options that we measured neurally and the fluctuations in behavioral risk aversion. This suggests that risk sensitivity is integral to human learning, illuminating economic models of choice, neuroscientific models of affective learning, and the workings of the underlying neural mechanisms.

  11. Planning of Autonomous Multi-agent Intersection

    Directory of Open Access Journals (Sweden)

    Viksnin Ilya I.

    2016-01-01

    Full Text Available In this paper, we propose a traffic management system with agents acting on behalf autonomous vehicle at the crossroads. Alternatively to existing solutions based on usage of semiautonomous control systems with the control unit, proposed in this paper algorithm apply the principles of decentralized multi-agent control. Agents during their collaboration generate intersection plan and determinate the optimal order of road intersection for a given criterion based on the exchange of information about them and their environment. The paper contains optimization criteria for possible routes selection and experiments that perform in order to estimate the proposed model. Experiment results show that this model can significantly reduce traffic density compared to the traditional traffic management systems. Moreover, the proposed algorithm efficiency increases with road traffic density. Furthermore, the availability of control unit in the system significantly reduces the negative impact of possible failures and hacker attacks.

  12. Optimal Sequential Resource Sharing and Exchange in Multi-Agent Systems

    OpenAIRE

    Xiao, Yuanzhang

    2014-01-01

    Central to the design of many engineering systems and social networks is to solve the underlying resource sharing and exchange problems, in which multiple decentralized agents make sequential decisions over time to optimize some long-term performance metrics. It is challenging for the decentralized agents to make optimal sequential decisions because of the complicated coupling among the agents and across time. In this dissertation, we mainly focus on three important classes of multi-agent seq...

  13. Multi-task Vector Field Learning.

    Science.gov (United States)

    Lin, Binbin; Yang, Sen; Zhang, Chiyuan; Ye, Jieping; He, Xiaofei

    2012-01-01

    Multi-task learning (MTL) aims to improve generalization performance by learning multiple related tasks simultaneously and identifying the shared information among tasks. Most of existing MTL methods focus on learning linear models under the supervised setting. We propose a novel semi-supervised and nonlinear approach for MTL using vector fields. A vector field is a smooth mapping from the manifold to the tangent spaces which can be viewed as a directional derivative of functions on the manifold. We argue that vector fields provide a natural way to exploit the geometric structure of data as well as the shared differential structure of tasks, both of which are crucial for semi-supervised multi-task learning. In this paper, we develop multi-task vector field learning (MTVFL) which learns the predictor functions and the vector fields simultaneously. MTVFL has the following key properties. (1) The vector fields MTVFL learns are close to the gradient fields of the predictor functions. (2) Within each task, the vector field is required to be as parallel as possible which is expected to span a low dimensional subspace. (3) The vector fields from all tasks share a low dimensional subspace. We formalize our idea in a regularization framework and also provide a convex relaxation method to solve the original non-convex problem. The experimental results on synthetic and real data demonstrate the effectiveness of our proposed approach.

  14. From fault classification to fault tolerance for multi-agent systems

    CERN Document Server

    Potiron, Katia; Taillibert, Patrick

    2013-01-01

    Faults are a concern for Multi-Agent Systems (MAS) designers, especially if the MAS are built for industrial or military use because there must be some guarantee of dependability. Some fault classification exists for classical systems, and is used to define faults. When dependability is at stake, such fault classification may be used from the beginning of the system's conception to define fault classes and specify which types of faults are expected. Thus, one may want to use fault classification for MAS; however, From Fault Classification to Fault Tolerance for Multi-Agent Systems argues that

  15. Second-Order Controllability of Multi-Agent Systems with Multiple Leaders

    International Nuclear Information System (INIS)

    Liu Bo; Han Xiao; Shi Yun-Tao; Su Hou-Sheng

    2016-01-01

    This paper proposes a new second-order continuous-time multi-agent model and analyzes the controllability of second-order multi-agent system with multiple leaders based on the asymmetric topology. This paper considers the more general case: velocity coupling topology is different from location coupling topology. Some sufficient and necessary conditions are presented for the controllability of the system with multiple leaders. In addition, the paper studies the controllability of the system with velocity damping gain. Simulation results are given to illustrate the correctness of theoretical results. (paper)

  16. Opportunities of creating multi-agent systems in the service sector

    Directory of Open Access Journals (Sweden)

    Shatsky A.A.

    2017-03-01

    Full Text Available the paper seeks to examine opportunities to create multi-agent systems (MAS in the service sector. Using methods of theoretical analysis and synthesis, the author attempts to apply a multi-agent technology to description of the socio-economic system, such as the service sector. As a result, the author identifies three types of MAS in the service sector based on different types of architecture of intelligent information systems. The research shows that the problem posed by the author requires further study and clarification of results

  17. Reviewing Microgrids from a Multi-Agent Systems Perspective

    Directory of Open Access Journals (Sweden)

    Jorge J. Gomez-Sanz

    2014-05-01

    Full Text Available The construction of Smart Grids leads to the main question of what kind of intelligence such grids require and how to build it. Some authors choose an agent based solution to realize this intelligence. However, there may be some misunderstandings in the way this technology is being applied. This paper exposes some considerations of this subject, focusing on the Microgrid level, and shows a practical example through INGENIAS methodology, which is a methodology for the development of Agent Oriented systems that applies Model Driven Development techniques to produce fully functional Multi-Agent Systems.

  18. Research on monitoring system of water resources in irrigation region based on multi-agent

    International Nuclear Information System (INIS)

    Zhao, T H; Wang, D S

    2012-01-01

    Irrigation agriculture is the basis of agriculture and rural economic development in China. Realizing the water resource information of irrigated area will make full use of existing water resource and increase benefit of irrigation agriculture greatly. However, the water resource information system of many irrigated areas in our country is not still very sound at present, it lead to the wasting of a lot of water resources. This paper has analyzed the existing water resource monitoring system of irrigated areas, introduced the Multi-Agent theories, and set up a water resource monitoring system of irrigated area based on multi-Agent. This system is composed of monitoring multi-Agent federal, telemetry multi-Agent federal, and the Communication Network GSM between them. It can make full use of good intelligence and communication coordination in the multi-Agent federation interior, improve the dynamic monitoring and controlling timeliness of water resource of irrigated area greatly, provide information service for the sustainable development of irrigated area, and lay a foundation for realizing high information of water resource of irrigated area.

  19. Active Multi-Field Learning for Spam Filtering

    OpenAIRE

    Wuying Liu; Lin Wang; Mianzhu Yi; Nan Xie

    2015-01-01

    Ubiquitous spam messages cause a serious waste of time and resources. This paper addresses the practical spam filtering problem, and proposes a universal approach to fight with various spam messages. The proposed active multi-field learning approach is based on: 1) It is cost-sensitive to obtain a label for a real-world spam filter, which suggests an active learning idea; and 2) Different messages often have a similar multi-field text structure, which suggests a multi-field learning idea. The...

  20. A Multi Agent Based Approach for Prehospital Emergency Management.

    Science.gov (United States)

    Safdari, Reza; Shoshtarian Malak, Jaleh; Mohammadzadeh, Niloofar; Danesh Shahraki, Azimeh

    2017-07-01

    To demonstrate an architecture to automate the prehospital emergency process to categorize the specialized care according to the situation at the right time for reducing the patient mortality and morbidity. Prehospital emergency process were analyzed using existing prehospital management systems, frameworks and the extracted process were modeled using sequence diagram in Rational Rose software. System main agents were identified and modeled via component diagram, considering the main system actors and by logically dividing business functionalities, finally the conceptual architecture for prehospital emergency management was proposed. The proposed architecture was simulated using Anylogic simulation software. Anylogic Agent Model, State Chart and Process Model were used to model the system. Multi agent systems (MAS) had a great success in distributed, complex and dynamic problem solving environments, and utilizing autonomous agents provides intelligent decision making capabilities.  The proposed architecture presents prehospital management operations. The main identified agents are: EMS Center, Ambulance, Traffic Station, Healthcare Provider, Patient, Consultation Center, National Medical Record System and quality of service monitoring agent. In a critical condition like prehospital emergency we are coping with sophisticated processes like ambulance navigation health care provider and service assignment, consultation, recalling patients past medical history through a centralized EHR system and monitoring healthcare quality in a real-time manner. The main advantage of our work has been the multi agent system utilization. Our Future work will include proposed architecture implementation and evaluation of its impact on patient quality care improvement.

  1. Multiagent Reinforcement Learning Dynamic Spectrum Access in Cognitive Radios

    Directory of Open Access Journals (Sweden)

    Wu Chun

    2014-02-01

    Full Text Available A multiuser independent Q-learning method which does not need information interaction is proposed for multiuser dynamic spectrum accessing in cognitive radios. The method adopts self-learning paradigm, in which each CR user performs reinforcement learning only through observing individual performance reward without spending communication resource on information interaction with others. The reward is defined suitably to present channel quality and channel conflict status. The learning strategy of sufficient exploration, preference for good channel, and punishment for channel conflict is designed to implement multiuser dynamic spectrum accessing. In two users two channels scenario, a fast learning algorithm is proposed and the convergence to maximal whole reward is proved. The simulation results show that, with the proposed method, the CR system can obtain convergence of Nash equilibrium with large probability and achieve great performance of whole reward.

  2. A Review of the Relationship between Novelty, Intrinsic Motivation and Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Siddique Nazmul

    2017-11-01

    Full Text Available This paper presents a review on the tri-partite relationship between novelty, intrinsic motivation and reinforcement learning. The paper first presents a literature survey on novelty and the different computational models of novelty detection, with a specific focus on the features of stimuli that trigger a Hedonic value for generating a novelty signal. It then presents an overview of intrinsic motivation and investigations into different models with the aim of exploring deeper co-relationships between specific features of a novelty signal and its effect on intrinsic motivation in producing a reward function. Finally, it presents survey results on reinforcement learning, different models and their functional relationship with intrinsic motivation.

  3. Scale-free memory model for multiagent reinforcement learning. Mean field approximation and rock-paper-scissors dynamics

    Science.gov (United States)

    Lubashevsky, I.; Kanemoto, S.

    2010-07-01

    A continuous time model for multiagent systems governed by reinforcement learning with scale-free memory is developed. The agents are assumed to act independently of one another in optimizing their choice of possible actions via trial-and-error search. To gain awareness about the action value the agents accumulate in their memory the rewards obtained from taking a specific action at each moment of time. The contribution of the rewards in the past to the agent current perception of action value is described by an integral operator with a power-law kernel. Finally a fractional differential equation governing the system dynamics is obtained. The agents are considered to interact with one another implicitly via the reward of one agent depending on the choice of the other agents. The pairwise interaction model is adopted to describe this effect. As a specific example of systems with non-transitive interactions, a two agent and three agent systems of the rock-paper-scissors type are analyzed in detail, including the stability analysis and numerical simulation. Scale-free memory is demonstrated to cause complex dynamics of the systems at hand. In particular, it is shown that there can be simultaneously two modes of the system instability undergoing subcritical and supercritical bifurcation, with the latter one exhibiting anomalous oscillations with the amplitude and period growing with time. Besides, the instability onset via this supercritical mode may be regarded as “altruism self-organization”. For the three agent system the instability dynamics is found to be rather irregular and can be composed of alternate fragments of oscillations different in their properties.

  4. Distributed Consensus of Stochastic Delayed Multi-agent Systems Under Asynchronous Switching.

    Science.gov (United States)

    Wu, Xiaotai; Tang, Yang; Cao, Jinde; Zhang, Wenbing

    2016-08-01

    In this paper, the distributed exponential consensus of stochastic delayed multi-agent systems with nonlinear dynamics is investigated under asynchronous switching. The asynchronous switching considered here is to account for the time of identifying the active modes of multi-agent systems. After receipt of confirmation of mode's switching, the matched controller can be applied, which means that the switching time of the matched controller in each node usually lags behind that of system switching. In order to handle the coexistence of switched signals and stochastic disturbances, a comparison principle of stochastic switched delayed systems is first proved. By means of this extended comparison principle, several easy to verified conditions for the existence of an asynchronously switched distributed controller are derived such that stochastic delayed multi-agent systems with asynchronous switching and nonlinear dynamics can achieve global exponential consensus. Two examples are given to illustrate the effectiveness of the proposed method.

  5. Deep imitation learning for 3D navigation tasks.

    Science.gov (United States)

    Hussein, Ahmed; Elyan, Eyad; Gaber, Mohamed Medhat; Jayne, Chrisina

    2018-01-01

    Deep learning techniques have shown success in learning from raw high-dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in imitation learning has been scarcely explored. Imitation learning can be an efficient method to teach intelligent agents by providing a set of demonstrations to learn from. However, generalizing to situations that are not represented in the demonstrations can be challenging, especially in 3D environments. In this paper, we propose a deep imitation learning method to learn navigation tasks from demonstrations in a 3D environment. The supervised policy is refined using active learning in order to generalize to unseen situations. This approach is compared to two popular deep reinforcement learning techniques: deep-Q-networks and Asynchronous actor-critic (A3C). The proposed method as well as the reinforcement learning methods employ deep convolutional neural networks and learn directly from raw visual input. Methods for combining learning from demonstrations and experience are also investigated. This combination aims to join the generalization ability of learning by experience with the efficiency of learning by imitation. The proposed methods are evaluated on 4 navigation tasks in a 3D simulated environment. Navigation tasks are a typical problem that is relevant to many real applications. They pose the challenge of requiring demonstrations of long trajectories to reach the target and only providing delayed rewards (usually terminal) to the agent. The experiments show that the proposed method can successfully learn navigation tasks from raw visual input while learning from experience methods fail to learn an effective policy. Moreover, it is shown that active learning can significantly improve the performance of the initially learned policy using a small number of active samples.

  6. Dopamine-Dependent Reinforcement of Motor Skill Learning: Evidence from Gilles de la Tourette Syndrome

    Science.gov (United States)

    Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias

    2011-01-01

    Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…

  7. Learning and innovative elements of strategy adoption rules expand cooperative network topologies.

    Science.gov (United States)

    Wang, Shijun; Szalay, Máté S; Zhang, Changshui; Csermely, Peter

    2008-04-09

    Cooperation plays a key role in the evolution of complex systems. However, the level of cooperation extensively varies with the topology of agent networks in the widely used models of repeated games. Here we show that cooperation remains rather stable by applying the reinforcement learning strategy adoption rule, Q-learning on a variety of random, regular, small-word, scale-free and modular network models in repeated, multi-agent Prisoner's Dilemma and Hawk-Dove games. Furthermore, we found that using the above model systems other long-term learning strategy adoption rules also promote cooperation, while introducing a low level of noise (as a model of innovation) to the strategy adoption rules makes the level of cooperation less dependent on the actual network topology. Our results demonstrate that long-term learning and random elements in the strategy adoption rules, when acting together, extend the range of network topologies enabling the development of cooperation at a wider range of costs and temptations. These results suggest that a balanced duo of learning and innovation may help to preserve cooperation during the re-organization of real-world networks, and may play a prominent role in the evolution of self-organizing, complex systems.

  8. Applications of Deep Learning and Reinforcement Learning to Biological Data.

    Science.gov (United States)

    Mahmud, Mufti; Kaiser, Mohammed Shamim; Hussain, Amir; Vassanelli, Stefano

    2018-06-01

    Rapid advances in hardware-based technologies during the past decades have opened up new possibilities for life scientists to gather multimodal data in various application domains, such as omics, bioimaging, medical imaging, and (brain/body)-machine interfaces. These have generated novel opportunities for development of dedicated data-intensive machine learning techniques. In particular, recent research in deep learning (DL), reinforcement learning (RL), and their combination (deep RL) promise to revolutionize the future of artificial intelligence. The growth in computational power accompanied by faster and increased data storage, and declining computing costs have already allowed scientists in various fields to apply these techniques on data sets that were previously intractable owing to their size and complexity. This paper provides a comprehensive survey on the application of DL, RL, and deep RL techniques in mining biological data. In addition, we compare the performances of DL techniques when applied to different data sets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.

  9. Multi-Agent Rendezvousing with a Finite Set of Candidate Rendezvous Points

    NARCIS (Netherlands)

    Fang, J.; Morse, A. S.; Cao, M.

    2008-01-01

    The discrete multi-agent rendezvous problem we consider in this paper is concerned with a specified set of points in the plane, called “dwell-points,” and a set of mobile autonomous agents with limited sensing range. Each agent is initially positioned at some dwell-point, and is able to determine

  10. Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults.

    Science.gov (United States)

    Vernetti, Angélina; Smith, Tim J; Senju, Atsushi

    2017-03-15

    While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.

  11. Learning alternative movement coordination patterns using reinforcement feedback.

    Science.gov (United States)

    Lin, Tzu-Hsiang; Denomme, Amber; Ranganathan, Rajiv

    2018-05-01

    One of the characteristic features of the human motor system is redundancy-i.e., the ability to achieve a given task outcome using multiple coordination patterns. However, once participants settle on using a specific coordination pattern, the process of learning to use a new alternative coordination pattern to perform the same task is still poorly understood. Here, using two experiments, we examined this process of how participants shift from one coordination pattern to another using different reinforcement schedules. Participants performed a virtual reaching task, where they moved a cursor to different targets positioned on the screen. Our goal was to make participants use a coordination pattern with greater trunk motion, and to this end, we provided reinforcement by making the cursor disappear if the trunk motion during the reach did not cross a specified threshold value. In Experiment 1, we compared two reinforcement schedules in two groups of participants-an abrupt group, where the threshold was introduced immediately at the beginning of practice; and a gradual group, where the threshold was introduced gradually with practice. Results showed that both abrupt and gradual groups were effective in shifting their coordination patterns to involve greater trunk motion, but the abrupt group showed greater retention when the reinforcement was removed. In Experiment 2, we examined the basis of this advantage in the abrupt group using two additional control groups. Results showed that the advantage of the abrupt group was because of a greater number of practice trials with the desired coordination pattern. Overall, these results show that reinforcement can be successfully used to shift coordination patterns, which has potential in the rehabilitation of movement disorders.

  12. Multi-agent simulation of purchasing activities in organizations

    NARCIS (Netherlands)

    Ebben, Mark; de Boer, L.; Sitar-Pop, C.E.; Yucesan, E.; Chen, C.H.; Snowdon, J.L.; Charnes, J.M.

    2002-01-01

    In this paper we present a multi-agent simulation model to investigate purchasing activities in an organizational environment. The starting point is the observation that the majority of purchasing activities in organizations are usually performed without any involvement of the organization's

  13. Multi-population genomic prediction using a multi-task Bayesian learning model.

    Science.gov (United States)

    Chen, Liuhong; Li, Changxi; Miller, Stephen; Schenkel, Flavio

    2014-05-03

    Genomic prediction in multiple populations can be viewed as a multi-task learning problem where tasks are to derive prediction equations for each population and multi-task learning property can be improved by sharing information across populations. The goal of this study was to develop a multi-task Bayesian learning model for multi-population genomic prediction with a strategy to effectively share information across populations. Simulation studies and real data from Holstein and Ayrshire dairy breeds with phenotypes on five milk production traits were used to evaluate the proposed multi-task Bayesian learning model and compare with a single-task model and a simple data pooling method. A multi-task Bayesian learning model was proposed for multi-population genomic prediction. Information was shared across populations through a common set of latent indicator variables while SNP effects were allowed to vary in different populations. Both simulation studies and real data analysis showed the effectiveness of the multi-task model in improving genomic prediction accuracy for the smaller Ayshire breed. Simulation studies suggested that the multi-task model was most effective when the number of QTL was small (n = 20), with an increase of accuracy by up to 0.09 when QTL effects were lowly correlated between two populations (ρ = 0.2), and up to 0.16 when QTL effects were highly correlated (ρ = 0.8). When QTL genotypes were included for training and validation, the improvements were 0.16 and 0.22, respectively, for scenarios of the low and high correlation of QTL effects between two populations. When the number of QTL was large (n = 200), improvement was small with a maximum of 0.02 when QTL genotypes were not included for genomic prediction. Reduction in accuracy was observed for the simple pooling method when the number of QTL was small and correlation of QTL effects between the two populations was low. For the real data, the multi-task model achieved an

  14. MULTI AGENT-BASED ENVIRONMENTAL LANDSCAPE (MABEL) - AN ARTIFICIAL INTELLIGENCE SIMULATION MODEL: SOME EARLY ASSESSMENTS

    OpenAIRE

    Alexandridis, Konstantinos T.; Pijanowski, Bryan C.

    2002-01-01

    The Multi Agent-Based Environmental Landscape model (MABEL) introduces a Distributed Artificial Intelligence (DAI) systemic methodology, to simulate land use and transformation changes over time and space. Computational agents represent abstract relations among geographic, environmental, human and socio-economic variables, with respect to land transformation pattern changes. A multi-agent environment is developed providing task-nonspecific problem-solving abilities, flexibility on achieving g...

  15. Nonparametric bayesian reward segmentation for skill discovery using inverse reinforcement learning

    CSIR Research Space (South Africa)

    Ranchod, P

    2015-10-01

    Full Text Available We present a method for segmenting a set of unstructured demonstration trajectories to discover reusable skills using inverse reinforcement learning (IRL). Each skill is characterised by a latent reward function which the demonstrator is assumed...

  16. Agent-specific learning signals for self-other distinction during mentalising.

    Directory of Open Access Journals (Sweden)

    Sam Ereira

    2018-04-01

    Full Text Available Humans have a remarkable ability to simulate the minds of others. How the brain distinguishes between mental states attributed to self and mental states attributed to someone else is unknown. Here, we investigated how fundamental neural learning signals are selectively attributed to different agents. Specifically, we asked whether learning signals are encoded in agent-specific neural patterns or whether a self-other distinction depends on encoding agent identity separately from this learning signal. To examine this, we tasked subjects to learn continuously 2 models of the same environment, such that one was selectively attributed to self and the other was selectively attributed to another agent. Combining computational modelling with magnetoencephalography (MEG enabled us to track neural representations of prediction errors (PEs and beliefs attributed to self, and of simulated PEs and beliefs attributed to another agent. We found that the representational pattern of a PE reliably predicts the identity of the agent to whom the signal is attributed, consistent with a neural self-other distinction implemented via agent-specific learning signals. Strikingly, subjects exhibiting a weaker neural self-other distinction also had a reduced behavioural capacity for self-other distinction and displayed more marked subclinical psychopathological traits. The neural self-other distinction was also modulated by social context, evidenced in a significantly reduced decoding of agent identity in a nonsocial control task. Thus, we show that self-other distinction is realised through an encoding of agent identity intrinsic to fundamental learning signals. The observation that the fidelity of this encoding predicts psychopathological traits is of interest as a potential neurocomputational psychiatric biomarker.

  17. Adaptive tracking control of leader-following linear multi-agent systems with external disturbances

    Science.gov (United States)

    Lin, Hanquan; Wei, Qinglai; Liu, Derong; Ma, Hongwen

    2016-10-01

    In this paper, the consensus problem for leader-following linear multi-agent systems with external disturbances is investigated. Brownian motions are used to describe exogenous disturbances. A distributed tracking controller based on Riccati inequalities with an adaptive law for adjusting coupling weights between neighbouring agents is designed for leader-following multi-agent systems under fixed and switching topologies. In traditional distributed static controllers, the coupling weights depend on the communication graph. However, coupling weights associated with the feedback gain matrix in our method are updated by state errors between neighbouring agents. We further present the stability analysis of leader-following multi-agent systems with stochastic disturbances under switching topology. Most traditional literature requires the graph to be connected all the time, while the communication graph is only assumed to be jointly connected in this paper. The design technique is based on Riccati inequalities and algebraic graph theory. Finally, simulations are given to show the validity of our method.

  18. Feature selection for domain knowledge representation through multitask learning

    CSIR Research Space (South Africa)

    Rosman, Benjamin S

    2014-10-01

    Full Text Available represent stimuli of interest, and rich feature sets which increase the dimensionality of the space and thus the difficulty of the learning problem. We focus on a multitask reinforcement learning setting, where the agent is learning domain knowledge...

  19. Explicit and implicit reinforcement learning across the psychosis spectrum.

    Science.gov (United States)

    Barch, Deanna M; Carter, Cameron S; Gold, James M; Johnson, Sheri L; Kring, Ann M; MacDonald, Angus W; Pizzagalli, Diego A; Ragland, J Daniel; Silverstein, Steven M; Strauss, Milton E

    2017-07-01

    Motivational and hedonic impairments are core features of a variety of types of psychopathology. An important aspect of motivational function is reinforcement learning (RL), including implicit (i.e., outside of conscious awareness) and explicit (i.e., including explicit representations about potential reward associations) learning, as well as both positive reinforcement (learning about actions that lead to reward) and punishment (learning to avoid actions that lead to loss). Here we present data from paradigms designed to assess both positive and negative components of both implicit and explicit RL, examine performance on each of these tasks among individuals with schizophrenia, schizoaffective disorder, and bipolar disorder with psychosis, and examine their relative relationships to specific symptom domains transdiagnostically. None of the diagnostic groups differed significantly from controls on the implicit RL tasks in either bias toward a rewarded response or bias away from a punished response. However, on the explicit RL task, both the individuals with schizophrenia and schizoaffective disorder performed significantly worse than controls, but the individuals with bipolar did not. Worse performance on the explicit RL task, but not the implicit RL task, was related to worse motivation and pleasure symptoms across all diagnostic categories. Performance on explicit RL, but not implicit RL, was related to working memory, which accounted for some of the diagnostic group differences. However, working memory did not account for the relationship of explicit RL to motivation and pleasure symptoms. These findings suggest transdiagnostic relationships across the spectrum of psychotic disorders between motivation and pleasure impairments and explicit RL. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  20. Dynamical Consensus Algorithm for Second-Order Multi-Agent Systems Subjected to Communication Delay

    International Nuclear Information System (INIS)

    Liu Chenglin; Liu Fei

    2013-01-01

    To solve the dynamical consensus problem of second-order multi-agent systems with communication delay, delay-dependent compensations are added into the normal asynchronously-coupled consensus algorithm so as to make the agents achieve a dynamical consensus. Based on frequency-domain analysis, sufficient conditions are gained for second-order multi-agent systems with communication delay under leaderless and leader-following consensus algorithms respectively. Simulation illustrates the correctness of the results. (interdisciplinary physics and related areas of science and technology)

  1. Use of frontal lobe hemodynamics as reinforcement signals to an adaptive controller.

    Directory of Open Access Journals (Sweden)

    Marcello M DiStasio

    Full Text Available Decision-making ability in the frontal lobe (among other brain structures relies on the assignment of value to states of the animal and its environment. Then higher valued states can be pursued and lower (or negative valued states avoided. The same principle forms the basis for computational reinforcement learning controllers, which have been fruitfully applied both as models of value estimation in the brain, and as artificial controllers in their own right. This work shows how state desirability signals decoded from frontal lobe hemodynamics, as measured with near-infrared spectroscopy (NIRS, can be applied as reinforcers to an adaptable artificial learning agent in order to guide its acquisition of skills. A set of experiments carried out on an alert macaque demonstrate that both oxy- and deoxyhemoglobin concentrations in the frontal lobe show differences in response to both primarily and secondarily desirable (versus undesirable stimuli. This difference allows a NIRS signal classifier to serve successfully as a reinforcer for an adaptive controller performing a virtual tool-retrieval task. The agent's adaptability allows its performance to exceed the limits of the NIRS classifier decoding accuracy. We also show that decoding state desirabilities is more accurate when using relative concentrations of both oxyhemoglobin and deoxyhemoglobin, rather than either species alone.

  2. A Distributed Intelligent E-Learning System

    Science.gov (United States)

    Kristensen, Terje

    2016-01-01

    An E-learning system based on a multi-agent (MAS) architecture combined with the Dynamic Content Manager (DCM) model of E-learning, is presented. We discuss the benefits of using such a multi-agent architecture. Finally, the MAS architecture is compared with a pure service-oriented architecture (SOA). This MAS architecture may also be used within…

  3. The Multi-Agent Transport Simulation MATSim

    OpenAIRE

    Horni Andreas; Nagel Kai; Axhausen Kay W.

    2016-01-01

    "The MATSim (Multi-Agent Transport Simulation) software project was started around 2006 with the goal of generating traffic and congestion patterns by following individual synthetic travelers through their daily or weekly activity programme. It has since then evolved from a collection of stand-alone C++ programs to an integrated Java-based framework which is publicly hosted, open-source available, automatically regression tested. It is currently used by about 40 groups throughout the world. T...

  4. Exchanging large data object in multi-agent systems

    Science.gov (United States)

    Al-Yaseen, Wathiq Laftah; Othman, Zulaiha Ali; Nazri, Mohd Zakree Ahmad

    2016-08-01

    One of the Business Intelligent solutions that is currently in use is the Multi-Agent System (MAS). Communication is one of the most important elements in MAS, especially for exchanging large low level data between distributed agents (physically). The Agent Communication Language in JADE has been offered as a secure method for sending data, whereby the data is defined as an object. However, the object cannot be used to send data to another agent in a different location. Therefore, the aim of this paper was to propose a method for the exchange of large low level data as an object by creating a proxy agent known as a Delivery Agent, which temporarily imitates the Receiver Agent. The results showed that the proposed method is able to send large-sized data. The experiments were conducted using 16 datasets ranging from 100,000 to 7 million instances. However, for the proposed method, the RAM and the CPU machine had to be slightly increased for the Receiver Agent, but the latency time was not significantly different compared to the use of the Java Socket method (non-agent and less secure). With such results, it was concluded that the proposed method can be used to securely send large data between agents.

  5. Service orientation in holonic and multi agent manufacturing and robotics

    CERN Document Server

    Thomas, Andre; Trentesaux, Damien

    2013-01-01

    The book covers four research domains representing a trend for modern manufacturing control: Holonic and Multi-agent technologies for industrial systems; Intelligent Product and Product-driven Automation; Service Orientation of Enterprise’s strategic and technical processes; and Distributed Intelligent Automation Systems. These evolution lines have in common concepts related to service orientation derived from the Service Oriented Architecture (SOA) paradigm.     The service-oriented multi-agent systems approach discussed in the book is characterized by the use of a set of distributed autonomous and cooperative agents, embedded in smart components that use the SOA principles, being oriented by offer and request of services, in order to fulfil production systems and value chain goals.   A new integrated vision combining emergent technologies is offered, to create control structures with distributed intelligence supporting the vertical and horizontal enterprise integration and running in truly distributed ...

  6. Multi-Agent System-Based Microgrid Operation Strategy for Demand Response

    OpenAIRE

    Cha, Hee-Jun; Won, Dong-Jun; Kim, Sang-Hyuk; Chung, Il-Yop; Han, Byung-Moon

    2015-01-01

    The microgrid and demand response (DR) are important technologies for future power grids. Among the variety of microgrid operations, the multi-agent system (MAS) has attracted considerable attention. In a microgrid with MAS, the agents installed on the microgrid components operate optimally by communicating with each other. This paper proposes an operation algorithm for the individual agents of a test microgrid that consists of a battery energy storage system (BESS) and an intelligent load. A...

  7. Enhancing E-Learning through Web Service and Intelligent Agents

    Directory of Open Access Journals (Sweden)

    Nasir Hussain

    2006-04-01

    Full Text Available E-learning is basically the integration of various technologies. E-Learning technology is now maturing and we can find a multiplicity of standards. New technologies such as agents and web services are promising better results. In this paper we have proposed an e-learning architecture that is dependent on intelligent agent systems and web services. These communication technologies will make the architecture more robust, scalable and efficient.

  8. Distributed multi-agent scheme for reactive power management with renewable energy

    International Nuclear Information System (INIS)

    Rahman, M.S.; Mahmud, M.A.; Pota, H.R.; Hossain, M.J.

    2014-01-01

    Highlights: • A distributed multi-agent scheme is proposed to enhance the dynamic voltage stability. • A control agent is designed where control actions are performed through PI controller. • Proposed scheme is compared with the conventional approach with DSTATCOM. • Proposed scheme adapts the capability of estimation and control under various operating conditions. - Abstract: This paper presents a new distributed multi-agent scheme for reactive power management in smart coordinated distribution networks with renewable energy sources (RESs) to enhance the dynamic voltage stability, which is mainly based on controlling distributed static synchronous compensators (DSTATCOMs). The proposed control scheme is incorporated in a multi-agent framework where the intelligent agents simultaneously coordinate with each other and represent various physical models to provide information and energy flow among different physical processes. The reactive power is estimated from the topology of distribution networks and with this information, necessary control actions are performed through the proposed proportional integral (PI) controller. The performance of the proposed scheme is evaluated on a 8-bus distribution network under various operating conditions. The performance of the proposed scheme is validated through simulation results and these results are compared to that of conventional PI-based DSTATCOM control scheme. From simulation results, it is found that the distributed MAS provides excellence performance for improving voltage profiles by managing reactive power in a smarter way

  9. A multi-agent safety response model in the construction industry.

    Science.gov (United States)

    Meliá, José L

    2015-01-01

    The construction industry is one of the sectors with the highest accident rates and the most serious accidents. A multi-agent safety response approach allows a useful diagnostic tool in order to understand factors affecting risk and accidents. The special features of the construction sector can influence the relationships among safety responses along the model of safety influences. The purpose of this paper is to test a model explaining risk and work-related accidents in the construction industry as a result of the safety responses of the organization, the supervisors, the co-workers and the worker. 374 construction employees belonging to 64 small Spanish construction companies working for two main companies participated in the study. Safety responses were measured using a 45-item Likert-type questionnaire. The structure of the measure was analyzed using factor analysis and the model of effects was tested using a structural equation model. Factor analysis clearly identifies the multi-agent safety dimensions hypothesized. The proposed safety response model of work-related accidents, involving construction specific results, showed a good fit. The multi-agent safety response approach to safety climate is a useful framework for the assessment of organizational and behavioral risks in construction.

  10. Optimizing Chemical Reactions with Deep Reinforcement Learning.

    Science.gov (United States)

    Zhou, Zhenpeng; Li, Xiaocheng; Zare, Richard N

    2017-12-27

    Deep reinforcement learning was employed to optimize chemical reactions. Our model iteratively records the results of a chemical reaction and chooses new experimental conditions to improve the reaction outcome. This model outperformed a state-of-the-art blackbox optimization algorithm by using 71% fewer steps on both simulations and real reactions. Furthermore, we introduced an efficient exploration strategy by drawing the reaction conditions from certain probability distributions, which resulted in an improvement on regret from 0.062 to 0.039 compared with a deterministic policy. Combining the efficient exploration policy with accelerated microdroplet reactions, optimal reaction conditions were determined in 30 min for the four reactions considered, and a better understanding of the factors that control microdroplet reactions was reached. Moreover, our model showed a better performance after training on reactions with similar or even dissimilar underlying mechanisms, which demonstrates its learning ability.

  11. Human-Robot Teaming in a Multi-Agent Space Assembly Task

    Science.gov (United States)

    Rehnmark, Fredrik; Currie, Nancy; Ambrose, Robert O.; Culbert, Christopher

    2004-01-01

    NASA's Human Space Flight program depends heavily on spacewalks performed by pairs of suited human astronauts. These Extra-Vehicular Activities (EVAs) are severely restricted in both duration and scope by consumables and available manpower. An expanded multi-agent EVA team combining the information-gathering and problem-solving skills of humans with the survivability and physical capabilities of robots is proposed and illustrated by example. Such teams are useful for large-scale, complex missions requiring dispersed manipulation, locomotion and sensing capabilities. To study collaboration modalities within a multi-agent EVA team, a 1-g test is conducted with humans and robots working together in various supporting roles.

  12. Endogenous Price Bubbles in a Multi-Agent System of the Housing Market.

    Science.gov (United States)

    Kouwenberg, Roy; Zwinkels, Remco C J

    2015-01-01

    Economic history shows a large number of boom-bust cycles, with the U.S. real estate market as one of the latest examples. Classical economic models have not been able to provide a full explanation for this type of market dynamics. Therefore, we analyze home prices in the U.S. using an alternative approach, a multi-agent complex system. Instead of the classical assumptions of agent rationality and market efficiency, agents in the model are heterogeneous, adaptive, and boundedly rational. We estimate the multi-agent system with historical house prices for the U.S. market. The model fits the data well and a deterministic version of the model can endogenously produce boom-and-bust cycles on the basis of the estimated coefficients. This implies that trading between agents themselves can create major price swings in absence of fundamental news.

  13. Endogenous Price Bubbles in a Multi-Agent System of the Housing Market.

    Directory of Open Access Journals (Sweden)

    Roy Kouwenberg

    Full Text Available Economic history shows a large number of boom-bust cycles, with the U.S. real estate market as one of the latest examples. Classical economic models have not been able to provide a full explanation for this type of market dynamics. Therefore, we analyze home prices in the U.S. using an alternative approach, a multi-agent complex system. Instead of the classical assumptions of agent rationality and market efficiency, agents in the model are heterogeneous, adaptive, and boundedly rational. We estimate the multi-agent system with historical house prices for the U.S. market. The model fits the data well and a deterministic version of the model can endogenously produce boom-and-bust cycles on the basis of the estimated coefficients. This implies that trading between agents themselves can create major price swings in absence of fundamental news.

  14. Endogenous Price Bubbles in a Multi-Agent System of the Housing Market

    Science.gov (United States)

    2015-01-01

    Economic history shows a large number of boom-bust cycles, with the U.S. real estate market as one of the latest examples. Classical economic models have not been able to provide a full explanation for this type of market dynamics. Therefore, we analyze home prices in the U.S. using an alternative approach, a multi-agent complex system. Instead of the classical assumptions of agent rationality and market efficiency, agents in the model are heterogeneous, adaptive, and boundedly rational. We estimate the multi-agent system with historical house prices for the U.S. market. The model fits the data well and a deterministic version of the model can endogenously produce boom-and-bust cycles on the basis of the estimated coefficients. This implies that trading between agents themselves can create major price swings in absence of fundamental news. PMID:26107740

  15. What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated.

    Science.gov (United States)

    Kumaran, Dharshan; Hassabis, Demis; McClelland, James L

    2016-07-01

    We update complementary learning systems (CLS) theory, which holds that intelligent agents must possess two learning systems, instantiated in mammalians in neocortex and hippocampus. The first gradually acquires structured knowledge representations while the second quickly learns the specifics of individual experiences. We broaden the role of replay of hippocampal memories in the theory, noting that replay allows goal-dependent weighting of experience statistics. We also address recent challenges to the theory and extend it by showing that recurrent activation of hippocampal traces can support some forms of generalization and that neocortical learning can be rapid for information that is consistent with known structure. Finally, we note the relevance of the theory to the design of artificial intelligent agents, highlighting connections between neuroscience and machine learning. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. L{sup 1} group consensus of multi-agent systems with switching topologies and stochastic inputs

    Energy Technology Data Exchange (ETDEWEB)

    Shang, Yilun, E-mail: shylmath@hotmail.com [Institute for Cyber Security, University of Texas at San Antonio, TX 78249 (United States); SUTD-MIT International Design Center, Singapore University of Technology and Design, Singapore 138682 (Singapore)

    2013-10-01

    Understanding how interacting subsystems of an overall system lead to cluster/group consensus is a key issue in the investigation of multi-agent systems. In this Letter, we study the L{sup 1} group consensus problem of discrete-time multi-agent systems with external stochastic inputs. Based on ergodicity theory and matrix analysis, L{sup 1} group consensus criteria are obtained for multi-agent systems with switching topologies. Some numerical examples are provided to illustrate the effectiveness and feasibility of the theoretical results.

  17. Instance annotation for multi-instance multi-label learning

    Science.gov (United States)

    F. Briggs; X.Z. Fern; R. Raich; Q. Lou

    2013-01-01

    Multi-instance multi-label learning (MIML) is a framework for supervised classification where the objects to be classified are bags of instances associated with multiple labels. For example, an image can be represented as a bag of segments and associated with a list of objects it contains. Prior work on MIML has focused on predicting label sets for previously unseen...

  18. A Reinforcement Learning Model Equipped with Sensors for Generating Perception Patterns: Implementation of a Simulated Air Navigation System Using ADS-B (Automatic Dependent Surveillance-Broadcast) Technology.

    Science.gov (United States)

    Álvarez de Toledo, Santiago; Anguera, Aurea; Barreiro, José M; Lara, Juan A; Lizcano, David

    2017-01-19

    Over the last few decades, a number of reinforcement learning techniques have emerged, and different reinforcement learning-based applications have proliferated. However, such techniques tend to specialize in a particular field. This is an obstacle to their generalization and extrapolation to other areas. Besides, neither the reward-punishment (r-p) learning process nor the convergence of results is fast and efficient enough. To address these obstacles, this research proposes a general reinforcement learning model. This model is independent of input and output types and based on general bioinspired principles that help to speed up the learning process. The model is composed of a perception module based on sensors whose specific perceptions are mapped as perception patterns. In this manner, similar perceptions (even if perceived at different positions in the environment) are accounted for by the same perception pattern. Additionally, the model includes a procedure that statistically associates perception-action pattern pairs depending on the positive or negative results output by executing the respective action in response to a particular perception during the learning process. To do this, the model is fitted with a mechanism that reacts positively or negatively to particular sensory stimuli in order to rate results. The model is supplemented by an action module that can be configured depending on the maneuverability of each specific agent. The model has been applied in the air navigation domain, a field with strong safety restrictions, which led us to implement a simulated system equipped with the proposed model. Accordingly, the perception sensors were based on Automatic Dependent Surveillance-Broadcast (ADS-B) technology, which is described in this paper. The results were quite satisfactory, and it outperformed traditional methods existing in the literature with respect to learning reliability and efficiency.

  19. Event-triggered hybrid control based on multi-Agent systems for Microgrids

    DEFF Research Database (Denmark)

    Dou, Chun-xia; Liu, Bin; Guerrero, Josep M.

    2014-01-01

    This paper is focused on a multi-agent system based event-triggered hybrid control for intelligently restructuring the operating mode of an microgrid (MG) to ensure the energy supply with high security, stability and cost effectiveness. Due to the microgrid is composed of different types...... of distributed energy resources, thus it is typical hybrid dynamic network. Considering the complex hybrid behaviors, a hierarchical decentralized coordinated control scheme is firstly constructed based on multi-agent sys-tem, then, the hybrid model of the microgrid is built by using differential hybrid Petri...

  20. [Effect of amount of silane coupling agent on flexural strength of dental composite resins reinforced with aluminium borate whisker].

    Science.gov (United States)

    Zhu, Ming-yi; Zhang, Xiu-yin

    2015-06-01

    To evaluate the effect of amount of silane coupling agent on flexural strength of dental composite resins reinforced with aluminium borate whisker (ABW). ABW was surface-treated with 0%, 1%, 2%, 3% and 4% silan coupling agent (γ-MPS), and mixed with resin matrix to synthesize 5 groups of composite resins. After heat-cured at 120 degrees centigrade for 1 h, specimens were tested in three-point flexure to measure strength according to ISO-4049. One specimen was selected randomly from each group and observed under scanning electron microscope (SEM). The data was analyzed with SAS 9.2 software package. The flexural strength (117.93±11.9 Mpa) of the group treated with 2% silane coupling agent was the highest, and significantly different from that of the other 4 groups (α=0.01). The amount of silane coupling agent has impact on the flexural strength of dental composite resins reinforced with whiskers; The flexual strength will be reduced whenever the amount is higher or lower than the threshold. Supported by Research Fund of Science and Technology Committee of Shanghai Municipality (08DZ2271100).

  1. Cooperative Epistemic Multi-Agent Planning With Implicit Coordination

    DEFF Research Database (Denmark)

    Engesser, Thorsten; Bolander, Thomas; Mattmüller, Robert

    2015-01-01

    , meaning coordination is only allowed implicitly by means of the available epistemic actions. While this approach can be fruitfully applied to model reasoning in some simple social situations, we also provide some benchmark applications to show that the concept is useful for multi-agent systems in practice....

  2. Multi agent gathering waste system

    Directory of Open Access Journals (Sweden)

    Álvaro LOZANO MURCIEGO

    2016-07-01

    Full Text Available Along this paper, we present a new multi agent-based system to gather waste on cities and villages. We have developed a low cost wireless sensor prototype to measure the volume level of the containers. Furthermore a route system is developed to optimize the routes of the trucks and a mobile application has been developed to help drivers in their working days. In order to evaluate and validate the proposed system a practical case study in a real city environment is modeled using open data available and with the purpose of identifying limitations of the system.

  3. Achieving semantic interoperability in multi-agent systems: A dialogue-based approach

    NARCIS (Netherlands)

    Diggelen, J. van

    2007-01-01

    Software agents sharing the same ontology can exchange their knowledge fluently as their knowledge representations are compatible with respect to the concepts regarded as relevant and with respect to the names given to these concepts. However, in open heterogeneous multi-agent systems, this scenario

  4. The dispersion of SWCNTs treated by coupling and dispersing agents in fiber reinforced polymer composities

    Science.gov (United States)

    Duan, Yuexin; Yuan, Lu; Zhao, Yan; Guan, Fengxia

    2007-07-01

    It is an obstacle issue for Carbon nanotubes (CNTs) applied in fiber reinforced polymer composites that CNTs is dispersed in nano-level, particularly for single-wall Carbon nanotubes (SWCNTs). In this paper, SWCNTs were treated by the coupling agent like volan and dispersing agent as BYK to improve the dispersion in the Glass Fiber/Epoxy composites. The result of dispersion of SWCNTs in composites was observed by Scanning electron microscopy (SEM). Then the Glass Transition Temperature (Tg) of these kinds of composites with treated and untreated SWCNTs were obtained by Dynamic Mechanical Thermal Analysis (DMTA). Moreover, the bending properties of these composites were tested.

  5. Research and Implementation of Key Technologies in Multi-Agent System to Support Distributed Workflow

    Science.gov (United States)

    Pan, Tianheng

    2018-01-01

    In recent years, the combination of workflow management system and Multi-agent technology is a hot research field. The problem of lack of flexibility in workflow management system can be improved by introducing multi-agent collaborative management. The workflow management system adopts distributed structure. It solves the problem that the traditional centralized workflow structure is fragile. In this paper, the agent of Distributed workflow management system is divided according to its function. The execution process of each type of agent is analyzed. The key technologies such as process execution and resource management are analyzed.

  6. A Multi-Agent System for Tracking the Intent of Surface Contacts in Ports and Waterways

    National Research Council Canada - National Science Library

    Tan, Kok S

    2005-01-01

    ...) and employ them to identify asymmetric maritime threats in port and waterways. Each surface track is monitored by a compound multi-agent system that comprise of the several intent models, each containing a nested multi-agent system...

  7. Nash Equilibrium of Social-Learning Agents in a Restless Multiarmed Bandit Game.

    Science.gov (United States)

    Nakayama, Kazuaki; Hisakado, Masato; Mori, Shintaro

    2017-05-16

    We study a simple model for social-learning agents in a restless multiarmed bandit (rMAB). The bandit has one good arm that changes to a bad one with a certain probability. Each agent stochastically selects one of the two methods, random search (individual learning) or copying information from other agents (social learning), using which he/she seeks the good arm. Fitness of an agent is the probability to know the good arm in the steady state of the agent system. In this model, we explicitly construct the unique Nash equilibrium state and show that the corresponding strategy for each agent is an evolutionarily stable strategy (ESS) in the sense of Thomas. It is shown that the fitness of an agent with ESS is superior to that of an asocial learner when the success probability of social learning is greater than a threshold determined from the probability of success of individual learning, the probability of change of state of the rMAB, and the number of agents. The ESS Nash equilibrium is a solution to Rogers' paradox.

  8. Active Learning for Autonomous Intelligent Agents: Exploration, Curiosity, and Interaction

    OpenAIRE

    Lopes, Manuel; Montesano, Luis

    2014-01-01

    In this survey we present different approaches that allow an intelligent agent to explore autonomous its environment to gather information and learn multiple tasks. Different communities proposed different solutions, that are in many cases, similar and/or complementary. These solutions include active learning, exploration/exploitation, online-learning and social learning. The common aspect of all these approaches is that it is the agent to selects and decides what information to gather next. ...

  9. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Science.gov (United States)

    Kato, Ayaka; Morita, Kenji

    2016-10-01

    It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2) value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning

  10. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Directory of Open Access Journals (Sweden)

    Ayaka Kato

    2016-10-01

    Full Text Available It has been suggested that dopamine (DA represents reward-prediction-error (RPE defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1 decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2 value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i slowdown of behavior by post-training blockade of DA signaling, (ii observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems

  11. Designing multi-targeted agents: An emerging anticancer drug discovery paradigm.

    Science.gov (United States)

    Fu, Rong-Geng; Sun, Yuan; Sheng, Wen-Bing; Liao, Duan-Fang

    2017-08-18

    The dominant paradigm in drug discovery is to design ligands with maximum selectivity to act on individual drug targets. With the target-based approach, many new chemical entities have been discovered, developed, and further approved as drugs. However, there are a large number of complex diseases such as cancer that cannot be effectively treated or cured only with one medicine to modulate the biological function of a single target. As simultaneous intervention of two (or multiple) cancer progression relevant targets has shown improved therapeutic efficacy, the innovation of multi-targeted drugs has become a promising and prevailing research topic and numerous multi-targeted anticancer agents are currently at various developmental stages. However, most multi-pharmacophore scaffolds are usually discovered by serendipity or screening, while rational design by combining existing pharmacophore scaffolds remains an enormous challenge. In this review, four types of multi-pharmacophore modes are discussed, and the examples from literature will be used to introduce attractive lead compounds with the capability of simultaneously interfering with different enzyme or signaling pathway of cancer progression, which will reveal the trends and insights to help the design of the next generation multi-targeted anticancer agents. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

  12. Pedagogical Agents as Learning Companions: The Impact of Agent Emotion and Gender

    Science.gov (United States)

    Kim, Yanghee; Baylor, A. L.; Shen, E.

    2007-01-01

    The potential of emotional interaction between human and computer has recently interested researchers in human-computer interaction. The instructional impact of this interaction in learning environments has not been established, however. This study examined the impact of emotion and gender of a pedagogical agent as a learning companion (PAL) on…

  13. An Empirical Study of AI Population Dynamics with Million-agent Reinforcement Learning

    OpenAIRE

    Yang, Yaodong; Yu, Lantao; Bai, Yiwei; Wang, Jun; Zhang, Weinan; Wen, Ying; Yu, Yong

    2017-01-01

    In this paper, we conduct an empirical study on discovering the ordered collective dynamics obtained by a population of artificial intelligence (AI) agents. Our intention is to put AI agents into a simulated natural context, and then to understand their induced dynamics at the population level. In particular, we aim to verify if the principles developed in the real world could also be used in understanding an artificially-created intelligent population. To achieve this, we simulate a large-sc...

  14. Vicarious reinforcement learning signals when instructing others.

    Science.gov (United States)

    Apps, Matthew A J; Lesage, Elise; Ramnani, Narender

    2015-02-18

    Reinforcement learning (RL) theory posits that learning is driven by discrepancies between the predicted and actual outcomes of actions (prediction errors [PEs]). In social environments, learning is often guided by similar RL mechanisms. For example, teachers monitor the actions of students and provide feedback to them. This feedback evokes PEs in students that guide their learning. We report the first study that investigates the neural mechanisms that underpin RL signals in the brain of a teacher. Neurons in the anterior cingulate cortex (ACC) signal PEs when learning from the outcomes of one's own actions but also signal information when outcomes are received by others. Does a teacher's ACC signal PEs when monitoring a student's learning? Using fMRI, we studied brain activity in human subjects (teachers) as they taught a confederate (student) action-outcome associations by providing positive or negative feedback. We examined activity time-locked to the students' responses, when teachers infer student predictions and know actual outcomes. We fitted a RL-based computational model to the behavior of the student to characterize their learning, and examined whether a teacher's ACC signals when a student's predictions are wrong. In line with our hypothesis, activity in the teacher's ACC covaried with the PE values in the model. Additionally, activity in the teacher's insula and ventromedial prefrontal cortex covaried with the predicted value according to the student. Our findings highlight that the ACC signals PEs vicariously for others' erroneous predictions, when monitoring and instructing their learning. These results suggest that RL mechanisms, processed vicariously, may underpin and facilitate teaching behaviors. Copyright © 2015 Apps et al.

  15. An Immune Agent for Web-Based AI Course

    Science.gov (United States)

    Gong, Tao; Cai, Zixing

    2006-01-01

    To overcome weakness and faults of a web-based e-learning course such as Artificial Intelligence (AI), an immune agent was proposed, simulating a natural immune mechanism against a virus. The immune agent was built on the multi-dimension education agent model and immune algorithm. The web-based AI course was comprised of many files, such as HTML…

  16. Diagnosis of multi-agent systems and its application to public administration

    NARCIS (Netherlands)

    Boer, A.; van Engers, T.; Abramowicz, W.; Maciaszek, L.; Węcel, K.

    2011-01-01

    In this paper we present a model-based diagnosis view on the complex social systems in which large public administration organizations operate. The purpose of diagnosis as presented in this paper is to identify agent role instances that are not conforming to expectations in a multi-agent system

  17. Controllability of multi-agent systems with time-delay in state and switching topology

    Science.gov (United States)

    Ji, Zhijian; Wang, Zidong; Lin, Hai; Wang, Zhen

    2010-02-01

    In this article, the controllability issue is addressed for an interconnected system of multiple agents. The network associated with the system is of the leader-follower structure with some agents taking leader role and others being followers interconnected via the neighbour-based rule. Sufficient conditions are derived for the controllability of multi-agent systems with time-delay in state, as well as a graph-based uncontrollability topology structure is revealed. Both single and double integrator dynamics are considered. For switching topology, two algebraic necessary and sufficient conditions are derived for the controllability of multi-agent systems. Several examples are also presented to illustrate how to control the system to shape into the desired configurations.

  18. Project Based Learning in Multi-Grade Class

    Science.gov (United States)

    Ciftci, Sabahattin; Baykan, Ayse Aysun

    2013-01-01

    The purpose of this study is to evaluate project based learning in multi-grade classes. This study, based on a student-centered learning approach, aims to analyze students' and parents' interpretations. The study was done in a primary village school belonging to the Centre of Batman, already adapting multi-grade classes in their education system,…

  19. A multi-agent design for a pressurized water reactor (P.W.R.) control system

    International Nuclear Information System (INIS)

    Aimar-Lichtenberger, M.

    1999-01-01

    This PhD work is in keeping with the complex industrial process control. The starting point is the analysis of control principles in a Pressurized Water Reactor (P.W.R). In order to cope with the limits of the present control procedures, a new control organisation by objectives and means is defined. This functional organisation is based on the state approach and is characterized by the parallel management of control functions to ensure the continuous control of the installation essential variables. With regard to this complex system problematic, we search the most adapted computer modeling. We show that a multi-agent system approach brings an interesting answer to manage the distribution and parallelism of control decisions and tasks. We present a synthetic study of multi-agent systems and their application fields.The choice of a multi-agent approach proceeds with the design of an agent model. This model gains experiences from other applications. This model is implemented in a computer environment which combines the mechanisms of an object language with Prolog. We propose in this frame a multi-agent modeling of the control system where each function is represented by an agent. The agents are structured in a hierarchical organisation and deal with different abstraction levers of the problem. Following a prototype process, the validation is realized by an implementation and by a coupling to a reactor simulator. The essential contributions of an agent approach turn on the mastery of the system complexity, the openness, the robustness and the potentialities of human-machine cooperation. (author)

  20. Amygdala and ventral striatum make distinct contributions to reinforcement learning

    Science.gov (United States)

    Costa, Vincent D.; Monte, Olga Dal; Lucas, Daniel R.; Murray, Elisabeth A.; Averbeck, Bruno B.

    2016-01-01

    Summary Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with a RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys’ choice reaction times, which emphasized a speed-accuracy tradeoff that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL. PMID:27720488

  1. Perturbation of Fractional Multi-Agent Systems in Cloud Entropy Computing

    Directory of Open Access Journals (Sweden)

    Rabha W. Ibrahim

    2016-01-01

    Full Text Available A perturbed multi-agent system is a scheme self-possessed of multiple networking agents within a location. This scheme can be used to discuss problems that are impossible or difficult for a specific agent to solve. Intelligence cloud entropy management systems involve functions, methods, procedural approaches, and algorithms. In this study, we introduce a new perturbed algorithm based on the fractional Poisson process. The discrete dynamics are suggested by using fractional entropy and fractional type Tsallis entropy. Moreover, we study the algorithm stability.

  2. Multi-Agent Design and Implementation for an Online Peer Help System

    Science.gov (United States)

    Meng, Anbo

    2014-01-01

    With the rapid advance of e-learning, the online peer help is playing increasingly important role. This paper explores the application of MAS to an online peer help system (MAPS). In the design phase, the architecture of MAPS is proposed, which consists of a set of agents including the personal agent, the course agent, the diagnosis agent, the DF…

  3. Multi-instance dictionary learning via multivariate performance measure optimization

    KAUST Repository

    Wang, Jim Jing-Yan

    2016-12-29

    The multi-instance dictionary plays a critical role in multi-instance data representation. Meanwhile, different multi-instance learning applications are evaluated by specific multivariate performance measures. For example, multi-instance ranking reports the precision and recall. It is not difficult to see that to obtain different optimal performance measures, different dictionaries are needed. This observation motives us to learn performance-optimal dictionaries for this problem. In this paper, we propose a novel joint framework for learning the multi-instance dictionary and the classifier to optimize a given multivariate performance measure, such as the F1 score and precision at rank k. We propose to represent the bags as bag-level features via the bag-instance similarity, and learn a classifier in the bag-level feature space to optimize the given performance measure. We propose to minimize the upper bound of a multivariate loss corresponding to the performance measure, the complexity of the classifier, and the complexity of the dictionary, simultaneously, with regard to both the dictionary and the classifier parameters. In this way, the dictionary learning is regularized by the performance optimization, and a performance-optimal dictionary is obtained. We develop an iterative algorithm to solve this minimization problem efficiently using a cutting-plane algorithm and a coordinate descent method. Experiments on multi-instance benchmark data sets show its advantage over both traditional multi-instance learning and performance optimization methods.

  4. Multi-instance dictionary learning via multivariate performance measure optimization

    KAUST Repository

    Wang, Jim Jing-Yan; Tsang, Ivor Wai-Hung; Cui, Xuefeng; Lu, Zhiwu; Gao, Xin

    2016-01-01

    The multi-instance dictionary plays a critical role in multi-instance data representation. Meanwhile, different multi-instance learning applications are evaluated by specific multivariate performance measures. For example, multi-instance ranking reports the precision and recall. It is not difficult to see that to obtain different optimal performance measures, different dictionaries are needed. This observation motives us to learn performance-optimal dictionaries for this problem. In this paper, we propose a novel joint framework for learning the multi-instance dictionary and the classifier to optimize a given multivariate performance measure, such as the F1 score and precision at rank k. We propose to represent the bags as bag-level features via the bag-instance similarity, and learn a classifier in the bag-level feature space to optimize the given performance measure. We propose to minimize the upper bound of a multivariate loss corresponding to the performance measure, the complexity of the classifier, and the complexity of the dictionary, simultaneously, with regard to both the dictionary and the classifier parameters. In this way, the dictionary learning is regularized by the performance optimization, and a performance-optimal dictionary is obtained. We develop an iterative algorithm to solve this minimization problem efficiently using a cutting-plane algorithm and a coordinate descent method. Experiments on multi-instance benchmark data sets show its advantage over both traditional multi-instance learning and performance optimization methods.

  5. Active-learning strategies: the use of a game to reinforce learning in nursing education. A case study.

    Science.gov (United States)

    Boctor, Lisa

    2013-03-01

    The majority of nursing students are kinesthetic learners, preferring a hands-on, active approach to education. Research shows that active-learning strategies can increase student learning and satisfaction. This study looks at the use of one active-learning strategy, a Jeopardy-style game, 'Nursopardy', to reinforce Fundamentals of Nursing material, aiding in students' preparation for a standardized final exam. The game was created keeping students varied learning styles and the NCLEX blueprint in mind. The blueprint was used to create 5 categories, with 26 total questions. Student survey results, using a five-point Likert scale showed that they did find this learning method enjoyable and beneficial to learning. More research is recommended regarding learning outcomes, when using active-learning strategies, such as games. Copyright © 2012 Elsevier Ltd. All rights reserved.

  6. Cooperative epistemic multi-agent planning for implicit coordination

    DEFF Research Database (Denmark)

    Engesser, Thorsten; Bolander, Thomas; Mattmüller, Robert

    2017-01-01

    framework to include perspective shifts, allowing us to define new notions of sequential and conditional planning with implicit coordination. With these, it is possible to solve planning tasks with joint goals in a decentralized manner without the agents having to negotiate about and commit to a joint...... policy at plan time. First we define the central planning notions and sketch the implementation of a planning system built on those notions. Afterwards we provide some case studies in order to evaluate the planner empirically and to show that the concept is useful for multi-agent systems in practice....

  7. Formation of Robust Multi-Agent Networks through Self-Organizing Random Regular Graphs

    KAUST Repository

    Yasin Yazicioǧlu, A.; Egerstedt, Magnus; Shamma, Jeff S.

    2015-01-01

    Multi-Agent networks are often modeled as interaction graphs, where the nodes represent the agents and the edges denote some direct interactions. The robustness of a multi-Agent network to perturbations such as failures, noise, or malicious attacks largely depends on the corresponding graph. In many applications, networks are desired to have well-connected interaction graphs with relatively small number of links. One family of such graphs is the random regular graphs. In this paper, we present a decentralized scheme for transforming any connected interaction graph with a possibly non-integer average degree of k into a connected random m-regular graph for some m ϵ [k+k ] 2. Accordingly, the agents improve the robustness of the network while maintaining a similar number of links as the initial configuration by locally adding or removing some edges. © 2015 IEEE.

  8. Formation of Robust Multi-Agent Networks through Self-Organizing Random Regular Graphs

    KAUST Repository

    Yasin Yazicioǧlu, A.

    2015-11-25

    Multi-Agent networks are often modeled as interaction graphs, where the nodes represent the agents and the edges denote some direct interactions. The robustness of a multi-Agent network to perturbations such as failures, noise, or malicious attacks largely depends on the corresponding graph. In many applications, networks are desired to have well-connected interaction graphs with relatively small number of links. One family of such graphs is the random regular graphs. In this paper, we present a decentralized scheme for transforming any connected interaction graph with a possibly non-integer average degree of k into a connected random m-regular graph for some m ϵ [k+k ] 2. Accordingly, the agents improve the robustness of the network while maintaining a similar number of links as the initial configuration by locally adding or removing some edges. © 2015 IEEE.

  9. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning

    OpenAIRE

    Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ

    2014-01-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated....

  10. Design of a Multi Agent Architecture for Robot Soccer. A Case Study

    NARCIS (Netherlands)

    Poel, Mannes; Seesink, R.A.; Schoute, Albert L.; Dierssen, W.; Kooij, N.

    A Multi Agent System (MAS) for the FIRA Mirosot League is presented. This MAS allows a general number of players and is used in the 5 against 5 and 7 against 7 competition. In the MAS there is coach agent and n (the number of robots in the team) player agents. There is a one to one correspondence

  11. Consensus of heterogeneous multi-agent systems based on sampled data with a small sampling delay

    International Nuclear Information System (INIS)

    Wang Na; Wu Zhi-Hai; Peng Li

    2014-01-01

    In this paper, consensus problems of heterogeneous multi-agent systems based on sampled data with a small sampling delay are considered. First, a consensus protocol based on sampled data with a small sampling delay for heterogeneous multi-agent systems is proposed. Then, the algebra graph theory, the matrix method, the stability theory of linear systems, and some other techniques are employed to derive the necessary and sufficient conditions guaranteeing heterogeneous multi-agent systems to asymptotically achieve the stationary consensus. Finally, simulations are performed to demonstrate the correctness of the theoretical results. (interdisciplinary physics and related areas of science and technology)

  12. FATMAS: a methodology to design fault-tolerant multi-agent systems

    OpenAIRE

    Mellouli, Sehl

    2005-01-01

    Un système multi-agent (SMA) est un système dans lequel plusieurs agents opèrent et interagissent. Chaque agent a la responsabilité d’exécuter des tâches. Cependant, chaque agent, pour diverses raisons, peut rencontrer des problèmes pendant l’exécution de ses tâches ; ce qui peut induire un disfonctionnement du SMA. Cependant, le SMA doit être en mesure de détecter les sources de problèms (d’erreurs) afin de les contrôler et ainsi continuer son exécution correctement. Un tel SMA est appelé un...

  13. High and low temperatures have unequal reinforcing properties in Drosophila spatial learning.

    Science.gov (United States)

    Zars, Melissa; Zars, Troy

    2006-07-01

    Small insects regulate their body temperature solely through behavior. Thus, sensing environmental temperature and implementing an appropriate behavioral strategy can be critical for survival. The fly Drosophila melanogaster prefers 24 degrees C, avoiding higher and lower temperatures when tested on a temperature gradient. Furthermore, temperatures above 24 degrees C have negative reinforcing properties. In contrast, we found that flies have a preference in operant learning experiments for a low-temperature-associated position rather than the 24 degrees C alternative in the heat-box. Two additional differences between high- and low-temperature reinforcement, i.e., temperatures above and below 24 degrees C, were found. Temperatures equally above and below 24 degrees C did not reinforce equally and only high temperatures supported increased memory performance with reversal conditioning. Finally, low- and high-temperature reinforced memories are similarly sensitive to two genetic mutations. Together these results indicate the qualitative meaning of temperatures below 24 degrees C depends on the dynamics of the temperatures encountered and that the reinforcing effects of these temperatures depend on at least some common genetic components. Conceptualizing these results using the Wolf-Heisenberg model of operant conditioning, we propose the maximum difference in experienced temperatures determines the magnitude of the reinforcement input to a conditioning circuit.

  14. Spared internal but impaired external reward prediction error signals in major depressive disorder during reinforcement learning.

    Science.gov (United States)

    Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris

    2017-01-01

    Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.

  15. Synchronization of multi-agent systems with metric-topological interactions.

    Science.gov (United States)

    Wang, Lin; Chen, Guanrong

    2016-09-01

    A hybrid multi-agent systems model integrating the advantages of both metric interaction and topological interaction rules, called the metric-topological model, is developed. This model describes planar motions of mobile agents, where each agent can interact with all the agents within a circle of a constant radius, and can furthermore interact with some distant agents to reach a pre-assigned number of neighbors, if needed. Some sufficient conditions imposed only on system parameters and agent initial states are presented, which ensure achieving synchronization of the whole group of agents. It reveals the intrinsic relationships among the interaction range, the speed, the initial heading, and the density of the group. Moreover, robustness against variations of interaction range, density, and speed are investigated by comparing the motion patterns and performances of the hybrid metric-topological interaction model with the conventional metric-only and topological-only interaction models. Practically in all cases, the hybrid metric-topological interaction model has the best performance in the sense of achieving highest frequency of synchronization, fastest convergent rate, and smallest heading difference.

  16. Sample efficient multiagent learning in the presence of Markovian agents

    CERN Document Server

    Chakraborty, Doran

    2014-01-01

    The problem of Multiagent Learning (or MAL) is concerned with the study of how intelligent entities can learn and adapt in the presence of other such entities that are simultaneously adapting. The problem is often studied in the stylized settings provided by repeated matrix games (a.k.a. normal form games). The goal of this book is to develop MAL algorithms for such a setting that achieve a new set of objectives which have not been previously achieved. In particular this book deals with learning in the presence of a new class of agent behavior that has not been studied or modeled before in a MAL context: Markovian agent behavior. Several new challenges arise when interacting with this particular class of agents. The book takes a series of steps towards building completely autonomous learning algorithms that maximize utility while interacting with such agents. Each algorithm is meticulously specified with a thorough formal treatment that elucidates its key theoretical properties.

  17. Multi-dimensional information diffusion and balancing market supply: an agent-based approach

    NARCIS (Netherlands)

    Osinga, S.A.; Kramer, M.R.; Hofstede, G.J.; Beulens, A.J.M.

    2013-01-01

    This agent-based information management model is designed to explore how multi-dimensional information, spreading through a population of agents (for example farmers) affects market supply. Farmers make quality decisions that must be aligned with available markets. Markets distinguish themselves by

  18. Coordination between Generation and Transmission Maintenance Scheduling by Means of Multi-agent Technique

    Science.gov (United States)

    Nagata, Takeshi; Tao, Yasuhiro; Utatani, Masahiro; Sasaki, Hiroshi; Fujita, Hideki

    This paper proposes a multi-agent approach to maintenance scheduling in restructured power systems. The restructuring of electric power industry has resulted in market-based approaches for unbundling a multitude of service provided by self-interested entities such as power generating companies (GENCOs), transmission providers (TRANSCOs) and distribution companies (DISCOs). The Independent System Operator (ISO) is responsible for the security of the system operation. The schedule submitted to ISO by GENCOs and TRANSCOs should satisfy security and reliability constraints. The proposed method consists of several GENCO Agents (GAGs), TARNSCO Agents (TAGs) and a ISO Agent(IAG). The IAG’s role in maintenance scheduling is limited to ensuring that the submitted schedules do not cause transmission congestion or endanger the system reliability. From the simulation results, it can be seen the proposed multi-agent approach could coordinate between generation and transmission maintenance schedules.

  19. Courseware Development with Animated Pedagogical Agents in Learning System to Improve Learning Motivation

    Science.gov (United States)

    Chin, Kai-Yi; Hong, Zeng-Wei; Huang, Yueh-Min; Shen, Wei-Wei; Lin, Jim-Min

    2016-01-01

    The addition of animated pedagogical agents (APAs) in computer-assisted learning (CAL) systems could successfully enhance students' learning motivation and engagement in learning activities. Conventionally, the APA incorporated multimedia materials are constructed through the cooperation of teachers and software programmers. However, the thinking…

  20. Multi-agent Pareto appointment exchanging in hospital patient scheduling

    NARCIS (Netherlands)

    I.B. Vermeulen (Ivan); S.M. Bohte (Sander); D.J.A. Somefun (Koye); J.A. La Poutré (Han)

    2007-01-01

    htmlabstractWe present a dynamic and distributed approach to the hospital patient scheduling problem, in which patients can have multiple appointments that have to be scheduled to different resources. To efficiently solve this problem we develop a multi-agent Pareto-improvement appointment

  1. Multi-agent Pareto appointment exchanging in hospital patient scheduling

    NARCIS (Netherlands)

    Vermeulen, I.B.; Bohté, S.M.; Somefun, D.J.A.; Poutré, La J.A.

    2007-01-01

    We present a dynamic and distributed approach to the hospital patient scheduling problem, in which patients can have multiple appointments that have to be scheduled to different resources. To efficiently solve this problem we develop a multi-agent Pareto-improvement appointment exchanging algorithm:

  2. Best Response Bayesian Reinforcement Learning for Multiagent Systems with State Uncertainty

    NARCIS (Netherlands)

    Oliehoek, F.A.; Amato, C.

    2014-01-01

    It is often assumed that agents in multiagent systems with state uncertainty have full knowledge of the model of dy- namics and sensors, but in many cases this is not feasible. A more realistic assumption is that agents must learn about the environment and other agents while acting. Bayesian methods

  3. An Intelligent Fleet Condition-Based Maintenance Decision Making Method Based on Multi-Agent

    OpenAIRE

    Bo Sun; Qiang Feng; Songjie Li

    2012-01-01

    According to the demand for condition-based maintenance online decision making among a mission oriented fleet, an intelligent maintenance decision making method based on Multi-agent and heuristic rules is proposed. The process of condition-based maintenance within an aircraft fleet (each containing one or more Line Replaceable Modules) based on multiple maintenance thresholds is analyzed. Then the process is abstracted into a Multi-Agent Model, a 2-layer model structure containing host negoti...

  4. Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.

    Science.gov (United States)

    Schönberg, Tom; Daw, Nathaniel D; Joel, Daphna; O'Doherty, John P

    2007-11-21

    The computational framework of reinforcement learning has been used to forward our understanding of the neural mechanisms underlying reward learning and decision-making behavior. It is known that humans vary widely in their performance in decision-making tasks. Here, we used a simple four-armed bandit task in which subjects are almost evenly split into two groups on the basis of their performance: those who do learn to favor choice of the optimal action and those who do not. Using models of reinforcement learning we sought to determine the neural basis of these intrinsic differences in performance by scanning both groups with functional magnetic resonance imaging. We scanned 29 subjects while they performed the reward-based decision-making task. Our results suggest that these two groups differ markedly in the degree to which reinforcement learning signals in the striatum are engaged during task performance. While the learners showed robust prediction error signals in both the ventral and dorsal striatum during learning, the nonlearner group showed a marked absence of such signals. Moreover, the magnitude of prediction error signals in a region of dorsal striatum correlated significantly with a measure of behavioral performance across all subjects. These findings support a crucial role of prediction error signals, likely originating from dopaminergic midbrain neurons, in enabling learning of action selection preferences on the basis of obtained rewards. Thus, spontaneously observed individual differences in decision making performance demonstrate the suggested dependence of this type of learning on the functional integrity of the dopaminergic striatal system in humans.

  5. Design and simulation of material-integrated distributed sensor processing with a code-based agent platform and mobile multi-agent systems.

    Science.gov (United States)

    Bosse, Stefan

    2015-02-16

    Multi-agent systems (MAS) can be used for decentralized and self-organizing data processing in a distributed system, like a resource-constrained sensor network, enabling distributed information extraction, for example, based on pattern recognition and self-organization, by decomposing complex tasks in simpler cooperative agents. Reliable MAS-based data processing approaches can aid the material-integration of structural-monitoring applications, with agent processing platforms scaled to the microchip level. The agent behavior, based on a dynamic activity-transition graph (ATG) model, is implemented with program code storing the control and the data state of an agent, which is novel. The program code can be modified by the agent itself using code morphing techniques and is capable of migrating in the network between nodes. The program code is a self-contained unit (a container) and embeds the agent data, the initialization instructions and the ATG behavior implementation. The microchip agent processing platform used for the execution of the agent code is a standalone multi-core stack machine with a zero-operand instruction format, leading to a small-sized agent program code, low system complexity and high system performance. The agent processing is token-queue-based, similar to Petri-nets. The agent platform can be implemented in software, too, offering compatibility at the operational and code level, supporting agent processing in strong heterogeneous networks. In this work, the agent platform embedded in a large-scale distributed sensor network is simulated at the architectural level by using agent-based simulation techniques.

  6. Design and Simulation of Material-Integrated Distributed Sensor Processing with a Code-Based Agent Platform and Mobile Multi-Agent Systems

    Directory of Open Access Journals (Sweden)

    Stefan Bosse

    2015-02-01

    Full Text Available Multi-agent systems (MAS can be used for decentralized and self-organizing data processing in a distributed system, like a resource-constrained sensor network, enabling distributed information extraction, for example, based on pattern recognition and self-organization, by decomposing complex tasks in simpler cooperative agents. Reliable MAS-based data processing approaches can aid the material-integration of structural-monitoring applications, with agent processing platforms scaled to the microchip level. The agent behavior, based on a dynamic activity-transition graph (ATG model, is implemented with program code storing the control and the data state of an agent, which is novel. The program code can be modified by the agent itself using code morphing techniques and is capable of migrating in the network between nodes. The program code is a self-contained unit (a container and embeds the agent data, the initialization instructions and the ATG behavior implementation. The microchip agent processing platform used for the execution of the agent code is a standalone multi-core stack machine with a zero-operand instruction format, leading to a small-sized agent program code, low system complexity and high system performance. The agent processing is token-queue-based, similar to Petri-nets. The agent platform can be implemented in software, too, offering compatibility at the operational and code level, supporting agent processing in strong heterogeneous networks. In this work, the agent platform embedded in a large-scale distributed sensor network is simulated at the architectural level by using agent-based simulation techniques.

  7. Optimized Sensor Network and Multi-Agent Decision Support for Smart Traffic Light Management.

    Science.gov (United States)

    Cruz-Piris, Luis; Rivera, Diego; Fernandez, Susel; Marsa-Maestre, Ivan

    2018-02-02

    One of the biggest challenges in modern societies is to solve vehicular traffic problems. Sensor networks in traffic environments have contributed to improving the decision-making process of Intelligent Transportation Systems. However, one of the limiting factors for the effectiveness of these systems is in the deployment of sensors to provide accurate information about the traffic. Our proposal is using the centrality measurement of a graph as a base to locate the best locations for sensor installation in a traffic network. After integrating these sensors in a simulation scenario, we define a Multi-Agent Systems composed of three types of agents: traffic light management agents, traffic jam detection agents, and agents that control the traffic lights at an intersection. The ultimate goal of these Multi-Agent Systems is to improve the trip duration for vehicles in the network. To validate our solution, we have developed the needed elements for modelling the sensors and agents in the simulation environment. We have carried out experiments using the Simulation of Urban MObility (SUMO) traffic simulator and the Travel and Activity PAtterns Simulation (TAPAS) Cologne traffic scenario. The obtained results show that our proposal allows to reduce the sensor network while still obtaining relevant information to have a global view of the environment. Finally, regarding the Multi-Agent Systems, we have carried out experiments that show that our proposal is able to improve other existing solutions such as conventional traffic light management systems (static or dynamic) in terms of reduction of vehicle trip duration and reduction of the message exchange overhead in the sensor network.

  8. Theories about architecture and performance of multi-agent systems

    NARCIS (Netherlands)

    Gazendam, Henk W.M.; Jorna, René J.

    1998-01-01

    Multi-agent systems are promising as models of organization because they are based on the idea that most work in human organizations is done based on intelligence, communication, cooperation, and massive parallel processing. They offer an alternative for system theories of organization, which are

  9. Multi-agent system for energy resource scheduling of integrated microgrids in a distributed system

    International Nuclear Information System (INIS)

    Logenthiran, T.; Srinivasan, Dipti; Khambadkone, Ashwin M.

    2011-01-01

    This paper proposes a multi-agent system for energy resource scheduling of an islanded power system with distributed resources, which consists of integrated microgrids and lumped loads. Distributed intelligent multi-agent technology is applied to make the power system more reliable, efficient and capable of exploiting and integrating alternative sources of energy. The algorithm behind the proposed energy resource scheduling has three stages. The first stage is to schedule each microgrid individually to satisfy its internal demand. The next stage involves finding the best possible bids for exporting power to the network and compete in a whole sale energy market. The final stage is to reschedule each microgrid individually to satisfy the total demand, which is the addition of internal demand and the demand from the results of the whole sale energy market simulation. The simulation results of a power system with distributed resources comprising three microgrids and five lumped loads show that the proposed multi-agent system allows efficient management of micro-sources with minimum operational cost. The case studies demonstrate that the system is successfully monitored, controlled and operated by means of the developed multi-agent system. (author)

  10. Multi-agent system for energy resource scheduling of integrated microgrids in a distributed system

    Energy Technology Data Exchange (ETDEWEB)

    Logenthiran, T.; Srinivasan, Dipti; Khambadkone, Ashwin M. [Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore)

    2011-01-15

    This paper proposes a multi-agent system for energy resource scheduling of an islanded power system with distributed resources, which consists of integrated microgrids and lumped loads. Distributed intelligent multi-agent technology is applied to make the power system more reliable, efficient and capable of exploiting and integrating alternative sources of energy. The algorithm behind the proposed energy resource scheduling has three stages. The first stage is to schedule each microgrid individually to satisfy its internal demand. The next stage involves finding the best possible bids for exporting power to the network and compete in a whole sale energy market. The final stage is to reschedule each microgrid individually to satisfy the total demand, which is the addition of internal demand and the demand from the results of the whole sale energy market simulation. The simulation results of a power system with distributed resources comprising three microgrids and five lumped loads show that the proposed multi-agent system allows efficient management of micro-sources with minimum operational cost. The case studies demonstrate that the system is successfully monitored, controlled and operated by means of the developed multi-agent system. (author)

  11. Reinforcement Learning in Distributed Domains: Beyond Team Games

    Science.gov (United States)

    Wolpert, David H.; Sill, Joseph; Turner, Kagan

    2000-01-01

    Distributed search algorithms are crucial in dealing with large optimization problems, particularly when a centralized approach is not only impractical but infeasible. Many machine learning concepts have been applied to search algorithms in order to improve their effectiveness. In this article we present an algorithm that blends Reinforcement Learning (RL) and hill climbing directly, by using the RL signal to guide the exploration step of a hill climbing algorithm. We apply this algorithm to the domain of a constellations of communication satellites where the goal is to minimize the loss of importance weighted data. We introduce the concept of 'ghost' traffic, where correctly setting this traffic induces the satellites to act to optimize the world utility. Our results indicated that the bi-utility search introduced in this paper outperforms both traditional hill climbing algorithms and distributed RL approaches such as team games.

  12. Multi-Agent Market Modeling of Foreign Exchange Rates

    Science.gov (United States)

    Zimmermann, Georg; Neuneier, Ralph; Grothmann, Ralph

    A market mechanism is basically driven by a superposition of decisions of many agents optimizing their profit. The oeconomic price dynamic is a consequence of the cumulated excess demand/supply created on this micro level. The behavior analysis of a small number of agents is well understood through the game theory. In case of a large number of agents one may use the limiting case that an individual agent does not have an influence on the market, which allows the aggregation of agents by statistic methods. In contrast to this restriction, we can omit the assumption of an atomic market structure, if we model the market through a multi-agent approach. The contribution of the mathematical theory of neural networks to the market price formation is mostly seen on the econometric side: neural networks allow the fitting of high dimensional nonlinear dynamic models. Furthermore, in our opinion, there is a close relationship between economics and the modeling ability of neural networks because a neuron can be interpreted as a simple model of decision making. With this in mind, a neural network models the interaction of many decisions and, hence, can be interpreted as the price formation mechanism of a market.

  13. Effect of hybrid fiber reinforcement on the cracking process in fiber reinforced cementitious composites

    DEFF Research Database (Denmark)

    Pereira, Eduardo B.; Fischer, Gregor; Barros, Joaquim A.O.

    2012-01-01

    The simultaneous use of different types of fibers as reinforcement in cementitious matrix composites is typically motivated by the underlying principle of a multi-scale nature of the cracking processes in fiber reinforced cementitious composites. It has been hypothesized that while undergoing...... tensile deformations in the composite, the fibers with different geometrical and mechanical properties restrain the propagation and further development of cracking at different scales from the micro- to the macro-scale. The optimized design of the fiber reinforcing systems requires the objective...... materials is carried out by assessing directly their tensile stress-crack opening behavior. The efficiency of hybrid fiber reinforcements and the multi-scale nature of cracking processes are discussed based on the experimental results obtained, as well as the micro-mechanisms underlying the contribution...

  14. Projective Simulation compared to reinforcement learning

    OpenAIRE

    Bjerland, Øystein Førsund

    2015-01-01

    This thesis explores the model of projective simulation (PS), a novel approach for an artificial intelligence (AI) agent. The model of PS learns by interacting with the environment it is situated in, and allows for simulating actions before real action is taken. The action selection is based on a random walk through the episodic & compositional memory (ECM), which is a network of clips that represent previous experienced percepts. The network takes percepts as inpu...

  15. Balancing Two-Player Stochastic Games with Soft Q-Learning

    OpenAIRE

    Grau-Moya, Jordi; Leibfried, Felix; Bou-Ammar, Haitham

    2018-01-01

    Within the context of video games the notion of perfectly rational agents can be undesirable as it leads to uninteresting situations, where humans face tough adversarial decision makers. Current frameworks for stochastic games and reinforcement learning prohibit tuneable strategies as they seek optimal performance. In this paper, we enable such tuneable behaviour by generalising soft Q-learning to stochastic games, where more than one agent interact strategically. We contribute both theoretic...

  16. Cerebellar and prefrontal cortex contributions to adaptation, strategies, and reinforcement learning.

    Science.gov (United States)

    Taylor, Jordan A; Ivry, Richard B

    2014-01-01

    Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. © 2014 Elsevier B.V. All rights reserved.

  17. Formalizing Knowledge in Multi-Scale Agent-Based Simulations.

    Science.gov (United States)

    Somogyi, Endre; Sluka, James P; Glazier, James A

    2016-10-01

    Multi-scale, agent-based simulations of cellular and tissue biology are increasingly common. These simulations combine and integrate a range of components from different domains. Simulations continuously create, destroy and reorganize constituent elements causing their interactions to dynamically change. For example, the multi-cellular tissue development process coordinates molecular, cellular and tissue scale objects with biochemical, biomechanical, spatial and behavioral processes to form a dynamic network. Different domain specific languages can describe these components in isolation, but cannot describe their interactions. No current programming language is designed to represent in human readable and reusable form the domain specific knowledge contained in these components and interactions. We present a new hybrid programming language paradigm that naturally expresses the complex multi-scale objects and dynamic interactions in a unified way and allows domain knowledge to be captured, searched, formalized, extracted and reused.

  18. Incremental learning of skill collections based on intrinsic motivation

    Science.gov (United States)

    Metzen, Jan H.; Kirchner, Frank

    2013-01-01

    Life-long learning of reusable, versatile skills is a key prerequisite for embodied agents that act in a complex, dynamic environment and are faced with different tasks over their lifetime. We address the question of how an agent can learn useful skills efficiently during a developmental period, i.e., when no task is imposed on him and no external reward signal is provided. Learning of skills in a developmental period needs to be incremental and self-motivated. We propose a new incremental, task-independent skill discovery approach that is suited for continuous domains. Furthermore, the agent learns specific skills based on intrinsic motivation mechanisms that determine on which skills learning is focused at a given point in time. We evaluate the approach in a reinforcement learning setup in two continuous domains with complex dynamics. We show that an intrinsically motivated, skill learning agent outperforms an agent which learns task solutions from scratch. Furthermore, we compare different intrinsic motivation mechanisms and how efficiently they make use of the agent's developmental period. PMID:23898265

  19. Emerging medical informatics with case-based reasoning for aiding clinical decision in multi-agent system.

    Science.gov (United States)

    Shen, Ying; Colloc, Joël; Jacquet-Andrieu, Armelle; Lei, Kai

    2015-08-01

    This research aims to depict the methodological steps and tools about the combined operation of case-based reasoning (CBR) and multi-agent system (MAS) to expose the ontological application in the field of clinical decision support. The multi-agent architecture works for the consideration of the whole cycle of clinical decision-making adaptable to many medical aspects such as the diagnosis, prognosis, treatment, therapeutic monitoring of gastric cancer. In the multi-agent architecture, the ontological agent type employs the domain knowledge to ease the extraction of similar clinical cases and provide treatment suggestions to patients and physicians. Ontological agent is used for the extension of domain hierarchy and the interpretation of input requests. Case-based reasoning memorizes and restores experience data for solving similar problems, with the help of matching approach and defined interfaces of ontologies. A typical case is developed to illustrate the implementation of the knowledge acquisition and restitution of medical experts. Copyright © 2015 Elsevier Inc. All rights reserved.

  20. Multi-Agent Model of Trust in a Human Game

    NARCIS (Netherlands)

    Bosse, T.; Jonker, C.M.; Meij, L. van der; Robu, V.; Treur, J.; Calisti, M.; Klusch, M.; Unland, R.

    2005-01-01

    This paper presents a System for Analysis of Multi-Issue Negotiation (SAMIN). The agents in this system conduct one-to-one negotiations, in which the values across multiple issues are negotiated on simultaneously. It is demonstrated how the system supports both automated negotiation (i.e., conducted

  1. Learning Networks: connecting people, organizations, autonomous agents and learning resources to establish the emergence of effective lifelong learning

    NARCIS (Netherlands)

    Koper, Rob; Sloep, Peter

    2003-01-01

    Koper, E.J.R., Sloep, P.B. (2002) Learning Networks connecting people, organizations, autonomous agents and learning resources to establish the emergence of effective lifelong learning. RTD Programma into Learning Technologies 2003-2008. More is different… Heerlen, Nederland: Open Universiteit

  2. A flocking algorithm for multi-agent systems with connectivity preservation under hybrid metric-topological interactions.

    Science.gov (United States)

    He, Chenlong; Feng, Zuren; Ren, Zhigang

    2018-01-01

    In this paper, we propose a connectivity-preserving flocking algorithm for multi-agent systems in which the neighbor set of each agent is determined by the hybrid metric-topological distance so that the interaction topology can be represented as the range-limited Delaunay graph, which combines the properties of the commonly used disk graph and Delaunay graph. As a result, the proposed flocking algorithm has the following advantages over the existing ones. First, range-limited Delaunay graph is sparser than the disk graph so that the information exchange among agents is reduced significantly. Second, some links irrelevant to the connectivity can be dynamically deleted during the evolution of the system. Thus, the proposed flocking algorithm is more flexible than existing algorithms, where links are not allowed to be disconnected once they are created. Finally, the multi-agent system spontaneously generates a regular quasi-lattice formation without imposing the constraint on the ratio of the sensing range of the agent to the desired distance between two adjacent agents. With the interaction topology induced by the hybrid distance, the proposed flocking algorithm can still be implemented in a distributed manner. We prove that the proposed flocking algorithm can steer the multi-agent system to a stable flocking motion, provided the initial interaction topology of multi-agent systems is connected and the hysteresis in link addition is smaller than a derived upper bound. The correctness and effectiveness of the proposed algorithm are verified by extensive numerical simulations, where the flocking algorithms based on the disk and Delaunay graph are compared.

  3. A flocking algorithm for multi-agent systems with connectivity preservation under hybrid metric-topological interactions.

    Directory of Open Access Journals (Sweden)

    Chenlong He

    Full Text Available In this paper, we propose a connectivity-preserving flocking algorithm for multi-agent systems in which the neighbor set of each agent is determined by the hybrid metric-topological distance so that the interaction topology can be represented as the range-limited Delaunay graph, which combines the properties of the commonly used disk graph and Delaunay graph. As a result, the proposed flocking algorithm has the following advantages over the existing ones. First, range-limited Delaunay graph is sparser than the disk graph so that the information exchange among agents is reduced significantly. Second, some links irrelevant to the connectivity can be dynamically deleted during the evolution of the system. Thus, the proposed flocking algorithm is more flexible than existing algorithms, where links are not allowed to be disconnected once they are created. Finally, the multi-agent system spontaneously generates a regular quasi-lattice formation without imposing the constraint on the ratio of the sensing range of the agent to the desired distance between two adjacent agents. With the interaction topology induced by the hybrid distance, the proposed flocking algorithm can still be implemented in a distributed manner. We prove that the proposed flocking algorithm can steer the multi-agent system to a stable flocking motion, provided the initial interaction topology of multi-agent systems is connected and the hysteresis in link addition is smaller than a derived upper bound. The correctness and effectiveness of the proposed algorithm are verified by extensive numerical simulations, where the flocking algorithms based on the disk and Delaunay graph are compared.

  4. A Multi-task Principal Agent Model for Knowledge Contribution of Enterprise Staff

    Directory of Open Access Journals (Sweden)

    Chengyi LE

    2016-10-01

    Full Text Available According to the different behavior characteristics of knowledge contribution of enterprise employees, a multi-task principal-agent relationship of knowledge contribution between enterprise and employees is established based on principal-agent theory, analyzing staff’s knowledge contribution behavior of knowledge creation and knowledge participation. Based on this, a multi-task principal agent model for knowledge contribution of enterprise staff is developed to formulate the asymmetry of information in knowledge contribution Then, a set of incentive measures are derived from the theoretic model, aiming to prompt the knowledge contribution in enterprise. The result shows that staff’s knowledge creation behavior and positive participation behavior can influence and further promote each other Enterprise should set up respective target levels of both knowledge creation contribution and knowledge participation contribution and make them irreplaceable to each other. This work contributes primarily to the development of the literature on knowledge management and principal-agent theory. In addition, the applicability of the findings will be improved by further empirical analysis.

  5. An Active Learning Exercise for Introducing Agent-Based Modeling

    Science.gov (United States)

    Pinder, Jonathan P.

    2013-01-01

    Recent developments in agent-based modeling as a method of systems analysis and optimization indicate that students in business analytics need an introduction to the terminology, concepts, and framework of agent-based modeling. This article presents an active learning exercise for MBA students in business analytics that demonstrates agent-based…

  6. Adaptive Multi-Agent Systems for Constrained Optimization

    Science.gov (United States)

    Macready, William; Bieniawski, Stefan; Wolpert, David H.

    2004-01-01

    Product Distribution (PD) theory is a new framework for analyzing and controlling distributed systems. Here we demonstrate its use for distributed stochastic optimization. First we review one motivation of PD theory, as the information-theoretic extension of conventional full-rationality game theory to the case of bounded rational agents. In this extension the equilibrium of the game is the optimizer of a Lagrangian of the (probability distribution of) the joint state of the agents. When the game in question is a team game with constraints, that equilibrium optimizes the expected value of the team game utility, subject to those constraints. The updating of the Lagrange parameters in the Lagrangian can be viewed as a form of automated annealing, that focuses the MAS more and more on the optimal pure strategy. This provides a simple way to map the solution of any constrained optimization problem onto the equilibrium of a Multi-Agent System (MAS). We present computer experiments involving both the Queen s problem and K-SAT validating the predictions of PD theory and its use for off-the-shelf distributed adaptive optimization.

  7. Learning automaton newtork and its dynamics. Gakushu automaton network to sono dynamics

    Energy Technology Data Exchange (ETDEWEB)

    Quan, F [Hiroshima-Denki Institute of Technology, Hiroshima (Jpaan); Unno, F; Hirata, H [Chiba Univ., Chiba (Japan)

    1991-10-20

    In order to construct a distributed processing system having learning automata as autonomous elements, a reinforcement learning network of the automaton is proposed and it{prime}s dynamics is investigated. In this paper, it is attempted to add another level of meaning to computational cooperativity by using a reinforcement learning network with generalized leaning automata. The collection of learning automata in the team situation acts as self-interested agents that work toward improving their performance with respect to their individual preference ordering. In the global state space of the network, the case of partially synchronous stochastic process is considered. In this case, the existence of mean field is shown and a reinforcement learning algorithm which can make the dynamics on the average reinforcement trajectory is presented. This algorithm is shown to have a high convergence speed as a result of a simple experiment. 14 refs., 9 figs.

  8. Multi-Objective Patch Optimization with Integrated Kinematic Draping Simulation for Continuous–Discontinuous Fiber-Reinforced Composite Structures

    Directory of Open Access Journals (Sweden)

    Benedikt Fengler

    2018-03-01

    Full Text Available Discontinuous fiber-reinforced polymers (DiCoFRP in combination with local continuous fiber reinforced polymers (CoFRP provide both a high design freedom and high weight-specific mechanical properties. For the optimization of CoFRP patches on complexly shaped DiCoFRP structures, an optimization strategy is needed which considers manufacturing constraints during the optimization procedure. Therefore, a genetic algorithm is combined with a kinematic draping simulation. To determine the optimal patch position with regard to structural performance and overall material consumption, a multi-objective optimization strategy is used. The resulting Pareto front and a corresponding heat-map of the patch position are useful tools for the design engineer to choose the right amount of reinforcement. The proposed patch optimization procedure is applied to two example structures and the effect of different optimization setups is demonstrated.

  9. Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder.

    Science.gov (United States)

    Rothkirch, Marcus; Tonn, Jonas; Köhler, Stephan; Sterzer, Philipp

    2017-04-01

    According to current concepts, major depressive disorder is strongly related to dysfunctional neural processing of motivational information, entailing impairments in reinforcement learning. While computational modelling can reveal the precise nature of neural learning signals, it has not been used to study learning-related neural dysfunctions in unmedicated patients with major depressive disorder so far. We thus aimed at comparing the neural coding of reward and punishment prediction errors, representing indicators of neural learning-related processes, between unmedicated patients with major depressive disorder and healthy participants. To this end, a group of unmedicated patients with major depressive disorder (n = 28) and a group of age- and sex-matched healthy control participants (n = 30) completed an instrumental learning task involving monetary gains and losses during functional magnetic resonance imaging. The two groups did not differ in their learning performance. Patients and control participants showed the same level of prediction error-related activity in the ventral striatum and the anterior insula. In contrast, neural coding of reward prediction errors in the medial orbitofrontal cortex was reduced in patients. Moreover, neural reward prediction error signals in the medial orbitofrontal cortex and ventral striatum showed negative correlations with anhedonia severity. Using a standard instrumental learning paradigm we found no evidence for an overall impairment of reinforcement learning in medication-free patients with major depressive disorder. Importantly, however, the attenuated neural coding of reward in the medial orbitofrontal cortex and the relation between anhedonia and reduced reward prediction error-signalling in the medial orbitofrontal cortex and ventral striatum likely reflect an impairment in experiencing pleasure from rewarding events as a key mechanism of anhedonia in major depressive disorder. © The Author (2017). Published by Oxford

  10. Optimized Sensor Network and Multi-Agent Decision Support for Smart Traffic Light Management

    Directory of Open Access Journals (Sweden)

    Luis Cruz-Piris

    2018-02-01

    Full Text Available One of the biggest challenges in modern societies is to solve vehicular traffic problems. Sensor networks in traffic environments have contributed to improving the decision-making process of Intelligent Transportation Systems. However, one of the limiting factors for the effectiveness of these systems is in the deployment of sensors to provide accurate information about the traffic. Our proposal is using the centrality measurement of a graph as a base to locate the best locations for sensor installation in a traffic network. After integrating these sensors in a simulation scenario, we define a Multi-Agent Systems composed of three types of agents: traffic light management agents, traffic jam detection agents, and agents that control the traffic lights at an intersection. The ultimate goal of these Multi-Agent Systems is to improve the trip duration for vehicles in the network. To validate our solution, we have developed the needed elements for modelling the sensors and agents in the simulation environment. We have carried out experiments using the Simulation of Urban MObility (SUMO traffic simulator and the Travel and Activity PAtterns Simulation (TAPAS Cologne traffic scenario. The obtained results show that our proposal allows to reduce the sensor network while still obtaining relevant information to have a global view of the environment. Finally, regarding the Multi-Agent Systems, we have carried out experiments that show that our proposal is able to improve other existing solutions such as conventional traffic light management systems (static or dynamic in terms of reduction of vehicle trip duration and reduction of the message exchange overhead in the sensor network.

  11. Fuzzy-probabilistic multi agent system for breast cancer risk assessment and insurance premium assignment.

    Science.gov (United States)

    Tatari, Farzaneh; Akbarzadeh-T, Mohammad-R; Sabahi, Ahmad

    2012-12-01

    In this paper, we present an agent-based system for distributed risk assessment of breast cancer development employing fuzzy and probabilistic computing. The proposed fuzzy multi agent system consists of multiple fuzzy agents that benefit from fuzzy set theory to demonstrate their soft information (linguistic information). Fuzzy risk assessment is quantified by two linguistic variables of high and low. Through fuzzy computations, the multi agent system computes the fuzzy probabilities of breast cancer development based on various risk factors. By such ranking of high risk and low risk fuzzy probabilities, the multi agent system (MAS) decides whether the risk of breast cancer development is high or low. This information is then fed into an insurance premium adjuster in order to provide preventive decision making as well as to make appropriate adjustment of insurance premium and risk. This final step of insurance analysis also provides a numeric measure to demonstrate the utility of the approach. Furthermore, actual data are gathered from two hospitals in Mashhad during 1 year. The results are then compared with a fuzzy distributed approach. Copyright © 2012 Elsevier Inc. All rights reserved.

  12. Incremental Learning of Skill Collections based on Intrinsic Motivation

    Directory of Open Access Journals (Sweden)

    Jan Hendrik Metzen

    2013-07-01

    Full Text Available Life-long learning of reusable, versatile skills is a key prerequisite forembodied agents that act in a complex, dynamic environment and are faced withdifferent tasks over their lifetime. We address the question of how an agentcan learn useful skills efficiently during a developmental period,i.e., when no task is imposed on him and no external reward signal is provided.Learning of skills in a developmental period needs to be incremental andself-motivated. We propose a new incremental, task-independent skill discoveryapproach that is suited for continuous domains. Furthermore, the agent learnsspecific skills based on intrinsic motivation mechanisms thatdetermine on which skills learning is focused at a given point in time. Weevaluate the approach in a reinforcement learning setup in two continuousdomains with complex dynamics. We show that an intrinsically motivated, skilllearning agent outperforms an agent which learns task solutions from scratch.Furthermore, we compare different intrinsic motivation mechanisms and howefficiently they make use of the agent's developmental period.

  13. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia

    Science.gov (United States)

    Markou, Athina; Salamone, John D.; Bussey, Timothy; Mar, Adam; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-01-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu). A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. PMID:23994273

  14. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia.

    Science.gov (United States)

    Markou, Athina; Salamone, John D; Bussey, Timothy J; Mar, Adam C; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-11-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu) meeting. A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. Copyright © 2013 Elsevier

  15. Robust Consensus of Multi-Agent Systems with Uncertain Exogenous Disturbances

    International Nuclear Information System (INIS)

    Yang Hong-Yong; Guo Lei; Han Chao

    2011-01-01

    The objective of this paper is to investigate the consensus of the multi-agent systems with nonlinear coupling function and external disturbances. The disturbance includes two parts, one part is supposed to be generated by an exogenous system, which is not required to be neutrally stable as in the output regulation theory, the other part is the modeling uncertainty in the exogenous disturbance system. A novel composite disturbance observer based control (DOBC) and H ∞ control scheme is presented so that the disturbance with the exogenous system can be estimated and compensated and the consensus of the multi-agent systems with fixed and switching graph can be reached by using H ∞ control law. Simulations demonstrate the advantages of the proposed DOBC and H ∞ control scheme. (interdisciplinary physics and related areas of science and technology)

  16. Consensus of Multi-Agent Systems with Prestissimo Scale-Free Networks

    International Nuclear Information System (INIS)

    Yang Hongyong; Lu Lan; Cao Kecai; Zhang Siying

    2010-01-01

    In this paper, the relations of the network topology and the moving consensus of multi-agent systems are studied. A consensus-prestissimo scale-free network model with the static preferential-consensus attachment is presented on the rewired link of the regular network. The effects of the static preferential-consensus BA network on the algebraic connectivity of the topology graph are compared with the regular network. The robustness gain to delay is analyzed for variable network topology with the same scale. The time to reach the consensus is studied for the dynamic network with and without communication delays. By applying the computer simulations, it is validated that the speed of the convergence of multi-agent systems can be greatly improved in the preferential-consensus BA network model with different configuration. (interdisciplinary physics and related areas of science and technology)

  17. An Agent Based Modelling Approach for Multi-Stakeholder Analysis of City Logistics Solutions

    NARCIS (Netherlands)

    Anand, N.

    2015-01-01

    This thesis presents a comprehensive framework for multi-stakeholder analysis of city logistics solutions using agent based modeling. The framework describes different stages for the systematic development of an agent based model for the city logistics domain. The framework includes a

  18. Multi-agent Negotiation Mechanisms for Statistical Target Classification in Wireless Multimedia Sensor Networks

    Science.gov (United States)

    Wang, Xue; Bi, Dao-wei; Ding, Liang; Wang, Sheng

    2007-01-01

    The recent availability of low cost and miniaturized hardware has allowed wireless sensor networks (WSNs) to retrieve audio and video data in real world applications, which has fostered the development of wireless multimedia sensor networks (WMSNs). Resource constraints and challenging multimedia data volume make development of efficient algorithms to perform in-network processing of multimedia contents imperative. This paper proposes solving problems in the domain of WMSNs from the perspective of multi-agent systems. The multi-agent framework enables flexible network configuration and efficient collaborative in-network processing. The focus is placed on target classification in WMSNs where audio information is retrieved by microphones. To deal with the uncertainties related to audio information retrieval, the statistical approaches of power spectral density estimates, principal component analysis and Gaussian process classification are employed. A multi-agent negotiation mechanism is specially developed to efficiently utilize limited resources and simultaneously enhance classification accuracy and reliability. The negotiation is composed of two phases, where an auction based approach is first exploited to allocate the classification task among the agents and then individual agent decisions are combined by the committee decision mechanism. Simulation experiments with real world data are conducted and the results show that the proposed statistical approaches and negotiation mechanism not only reduce memory and computation requirements in WMSNs but also significantly enhance classification accuracy and reliability. PMID:28903223

  19. Neural Control of a Tracking Task via Attention-Gated Reinforcement Learning for Brain-Machine Interfaces.

    Science.gov (United States)

    Wang, Yiwen; Wang, Fang; Xu, Kai; Zhang, Qiaosheng; Zhang, Shaomin; Zheng, Xiaoxiang

    2015-05-01

    Reinforcement learning (RL)-based brain machine interfaces (BMIs) enable the user to learn from the environment through interactions to complete the task without desired signals, which is promising for clinical applications. Previous studies exploited Q-learning techniques to discriminate neural states into simple directional actions providing the trial initial timing. However, the movements in BMI applications can be quite complicated, and the action timing explicitly shows the intention when to move. The rich actions and the corresponding neural states form a large state-action space, imposing generalization difficulty on Q-learning. In this paper, we propose to adopt attention-gated reinforcement learning (AGREL) as a new learning scheme for BMIs to adaptively decode high-dimensional neural activities into seven distinct movements (directional moves, holdings and resting) due to the efficient weight-updating. We apply AGREL on neural data recorded from M1 of a monkey to directly predict a seven-action set in a time sequence to reconstruct the trajectory of a center-out task. Compared to Q-learning techniques, AGREL could improve the target acquisition rate to 90.16% in average with faster convergence and more stability to follow neural activity over multiple days, indicating the potential to achieve better online decoding performance for more complicated BMI tasks.

  20. Multi-task feature learning by using trace norm regularization

    Directory of Open Access Journals (Sweden)

    Jiangmei Zhang

    2017-11-01

    Full Text Available Multi-task learning can extract the correlation of multiple related machine learning problems to improve performance. This paper considers applying the multi-task learning method to learn a single task. We propose a new learning approach, which employs the mixture of expert model to divide a learning task into several related sub-tasks, and then uses the trace norm regularization to extract common feature representation of these sub-tasks. A nonlinear extension of this approach by using kernel is also provided. Experiments conducted on both simulated and real data sets demonstrate the advantage of the proposed approach.