WorldWideScience

Sample records for reinforcement learning environment

  1. Reinforcement Learning with Autonomous Small Unmanned Aerial Vehicles in Cluttered Environments

    Science.gov (United States)

    Tran, Loc; Cross, Charles; Montague, Gilbert; Motter, Mark; Neilan, James; Qualls, Garry; Rothhaar, Paul; Trujillo, Anna; Allen, B. Danette

    2015-01-01

    We present ongoing work in the Autonomy Incubator at NASA Langley Research Center (LaRC) exploring the efficacy of a data set aggregation approach to reinforcement learning for small unmanned aerial vehicle (sUAV) flight in dense and cluttered environments with reactive obstacle avoidance. The goal is to learn an autonomous flight model using training experiences from a human piloting a sUAV around static obstacles. The training approach uses video data from a forward-facing camera that records the human pilot's flight. Various computer vision based features are extracted from the video relating to edge and gradient information. The recorded human-controlled inputs are used to train an autonomous control model that correlates the extracted feature vector to a yaw command. As part of the reinforcement learning approach, the autonomous control model is iteratively updated with feedback from a human agent who corrects undesired model output. This data driven approach to autonomous obstacle avoidance is explored for simulated forest environments furthering autonomous flight under the tree canopy research. This enables flight in previously inaccessible environments which are of interest to NASA researchers in Earth and Atmospheric sciences.

  2. Reinforcement Learning State-of-the-Art

    CERN Document Server

    Wiering, Marco

    2012-01-01

    Reinforcement learning encompasses both a science of adaptive behavior of rational beings in uncertain environments and a computational methodology for finding optimal behaviors for challenging problems in control, optimization and adaptive behavior of intelligent agents. As a field, reinforcement learning has progressed tremendously in the past decade. The main goal of this book is to present an up-to-date series of survey articles on the main contemporary sub-fields of reinforcement learning. This includes surveys on partially observable environments, hierarchical task decompositions, relational knowledge representation and predictive state representations. Furthermore, topics such as transfer, evolutionary methods and continuous spaces in reinforcement learning are surveyed. In addition, several chapters review reinforcement learning methods in robotics, in games, and in computational neuroscience. In total seventeen different subfields are presented by mostly young experts in those areas, and together the...

  3. Memory Transformation Enhances Reinforcement Learning in Dynamic Environments.

    Science.gov (United States)

    Santoro, Adam; Frankland, Paul W; Richards, Blake A

    2016-11-30

    Over the course of systems consolidation, there is a switch from a reliance on detailed episodic memories to generalized schematic memories. This switch is sometimes referred to as "memory transformation." Here we demonstrate a previously unappreciated benefit of memory transformation, namely, its ability to enhance reinforcement learning in a dynamic environment. We developed a neural network that is trained to find rewards in a foraging task where reward locations are continuously changing. The network can use memories for specific locations (episodic memories) and statistical patterns of locations (schematic memories) to guide its search. We find that switching from an episodic to a schematic strategy over time leads to enhanced performance due to the tendency for the reward location to be highly correlated with itself in the short-term, but regress to a stable distribution in the long-term. We also show that the statistics of the environment determine the optimal utilization of both types of memory. Our work recasts the theoretical question of why memory transformation occurs, shifting the focus from the avoidance of memory interference toward the enhancement of reinforcement learning across multiple timescales. As time passes, memories transform from a highly detailed state to a more gist-like state, in a process called "memory transformation." Theories of memory transformation speak to its advantages in terms of reducing memory interference, increasing memory robustness, and building models of the environment. However, the role of memory transformation from the perspective of an agent that continuously acts and receives reward in its environment is not well explored. In this work, we demonstrate a view of memory transformation that defines it as a way of optimizing behavior across multiple timescales. Copyright © 2016 the authors 0270-6474/16/3612228-15$15.00/0.

  4. Reinforcement learning in computer vision

    Science.gov (United States)

    Bernstein, A. V.; Burnaev, E. V.

    2018-04-01

    Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.

  5. The Study of Reinforcement Learning for Traffic Self-Adaptive Control under Multiagent Markov Game Environment

    Directory of Open Access Journals (Sweden)

    Lun-Hui Xu

    2013-01-01

    Full Text Available Urban traffic self-adaptive control problem is dynamic and uncertain, so the states of traffic environment are hard to be observed. Efficient agent which controls a single intersection can be discovered automatically via multiagent reinforcement learning. However, in the majority of the previous works on this approach, each agent needed perfect observed information when interacting with the environment and learned individually with less efficient coordination. This study casts traffic self-adaptive control as a multiagent Markov game problem. The design employs traffic signal control agent (TSCA for each signalized intersection that coordinates with neighboring TSCAs. A mathematical model for TSCAs’ interaction is built based on nonzero-sum markov game which has been applied to let TSCAs learn how to cooperate. A multiagent Markov game reinforcement learning approach is constructed on the basis of single-agent Q-learning. This method lets each TSCA learn to update its Q-values under the joint actions and imperfect information. The convergence of the proposed algorithm is analyzed theoretically. The simulation results show that the proposed method is convergent and effective in realistic traffic self-adaptive control setting.

  6. Human-level control through deep reinforcement learning

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-01

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  7. Human-level control through deep reinforcement learning.

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-26

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  8. Effect of reinforcement learning on coordination of multiangent systems

    Science.gov (United States)

    Bukkapatnam, Satish T. S.; Gao, Greg

    2000-12-01

    For effective coordination of distributed environments involving multiagent systems, learning ability of each agent in the environment plays a crucial role. In this paper, we develop a simple group learning method based on reinforcement, and study its effect on coordination through application to a supply chain procurement scenario involving a computer manufacturer. Here, all parties are represented by self-interested, autonomous agents, each capable of performing specific simple tasks. They negotiate with each other to perform complex tasks and thus coordinate supply chain procurement. Reinforcement learning is intended to enable each agent to reach a best negotiable price within a shortest possible time. Our simulations of the application scenario under different learning strategies reveals the positive effects of reinforcement learning on an agent's as well as the system's performance.

  9. Reinforcement learning or active inference?

    Science.gov (United States)

    Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J

    2009-07-29

    This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  10. Reinforcement learning or active inference?

    Directory of Open Access Journals (Sweden)

    Karl J Friston

    2009-07-01

    Full Text Available This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  11. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

    OpenAIRE

    He, Frank S.; Liu, Yang; Schwing, Alexander G.; Peng, Jian

    2016-01-01

    We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and...

  12. Reinforcement Learning in Continuous Action Spaces

    NARCIS (Netherlands)

    Hasselt, H. van; Wiering, M.A.

    2007-01-01

    Quite some research has been done on Reinforcement Learning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new class of algorithms named Continuous Actor Critic Learning Automaton (CACLA)

  13. Neural Basis of Reinforcement Learning and Decision Making

    Science.gov (United States)

    Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan

    2012-01-01

    Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543

  14. Online reinforcement learning control for aerospace systems

    NARCIS (Netherlands)

    Zhou, Y.

    2018-01-01

    Reinforcement Learning (RL) methods are relatively new in the field of aerospace guidance, navigation, and control. This dissertation aims to exploit RL methods to improve the autonomy and online learning of aerospace systems with respect to the a priori unknown system and environment, dynamical

  15. Algorithms for Reinforcement Learning

    CERN Document Server

    Szepesvari, Csaba

    2010-01-01

    Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms'

  16. Reinforcement learning: Solving two case studies

    Science.gov (United States)

    Duarte, Ana Filipa; Silva, Pedro; dos Santos, Cristina Peixoto

    2012-09-01

    Reinforcement Learning algorithms offer interesting features for the control of autonomous systems, such as the ability to learn from direct interaction with the environment, and the use of a simple reward signalas opposed to the input-outputs pairsused in classic supervised learning. The reward signal indicates the success of failure of the actions executed by the agent in the environment. In this work, are described RL algorithmsapplied to two case studies: the Crawler robot and the widely known inverted pendulum. We explore RL capabilities to autonomously learn a basic locomotion pattern in the Crawler, andapproach the balancing problem of biped locomotion using the inverted pendulum.

  17. Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments

    OpenAIRE

    Kidziński, Łukasz; Mohanty, Sharada Prasanna; Ong, Carmichael; Huang, Zhewei; Zhou, Shuchang; Pechenko, Anton; Stelmaszczyk, Adam; Jarosik, Piotr; Pavlov, Mikhail; Kolesnikov, Sergey; Plis, Sergey; Chen, Zhibo; Zhang, Zhizheng; Chen, Jiale; Shi, Jun

    2018-01-01

    In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar ...

  18. Flexible Heuristic Dynamic Programming for Reinforcement Learning in Quadrotors

    NARCIS (Netherlands)

    Helmer, Alexander; de Visser, C.C.; van Kampen, E.

    2018-01-01

    Reinforcement learning is a paradigm for learning decision-making tasks from interaction with the environment. Function approximators solve a part of the curse of dimensionality when learning in high-dimensional state and/or action spaces. It can be a time-consuming process to learn a good policy in

  19. Systems control with generalized probabilistic fuzzy-reinforcement learning

    NARCIS (Netherlands)

    Hinojosa, J.; Nefti, S.; Kaymak, U.

    2011-01-01

    Reinforcement learning (RL) is a valuable learning method when the systems require a selection of control actions whose consequences emerge over long periods for which input-output data are not available. In most combinations of fuzzy systems and RL, the environment is considered to be

  20. The Reinforcement Learning Competition 2014

    OpenAIRE

    Dimitrakakis, Christos; Li, Guangliang; Tziortziotis, Nikoalos

    2014-01-01

    Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the development of very general learning agents for arbitrary reinforcement learning problems and to provide a test-bed for the unbiased evaluation of algorithms.

  1. Enriching behavioral ecology with reinforcement learning methods.

    Science.gov (United States)

    Frankenhuis, Willem E; Panchanathan, Karthik; Barto, Andrew G

    2018-02-13

    This article focuses on the division of labor between evolution and development in solving sequential, state-dependent decision problems. Currently, behavioral ecologists tend to use dynamic programming methods to study such problems. These methods are successful at predicting animal behavior in a variety of contexts. However, they depend on a distinct set of assumptions. Here, we argue that behavioral ecology will benefit from drawing more than it currently does on a complementary collection of tools, called reinforcement learning methods. These methods allow for the study of behavior in highly complex environments, which conventional dynamic programming methods do not feasibly address. In addition, reinforcement learning methods are well-suited to studying how biological mechanisms solve developmental and learning problems. For instance, we can use them to study simple rules that perform well in complex environments. Or to investigate under what conditions natural selection favors fixed, non-plastic traits (which do not vary across individuals), cue-driven-switch plasticity (innate instructions for adaptive behavioral development based on experience), or developmental selection (the incremental acquisition of adaptive behavior based on experience). If natural selection favors developmental selection, which includes learning from environmental feedback, we can also make predictions about the design of reward systems. Our paper is written in an accessible manner and for a broad audience, though we believe some novel insights can be drawn from our discussion. We hope our paper will help advance the emerging bridge connecting the fields of behavioral ecology and reinforcement learning. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  2. Reinforcement learning for microgrid energy management

    International Nuclear Information System (INIS)

    Kuznetsova, Elizaveta; Li, Yan-Fu; Ruiz, Carlos; Zio, Enrico; Ault, Graham; Bell, Keith

    2013-01-01

    We consider a microgrid for energy distribution, with a local consumer, a renewable generator (wind turbine) and a storage facility (battery), connected to the external grid via a transformer. We propose a 2 steps-ahead reinforcement learning algorithm to plan the battery scheduling, which plays a key role in the achievement of the consumer goals. The underlying framework is one of multi-criteria decision-making by an individual consumer who has the goals of increasing the utilization rate of the battery during high electricity demand (so as to decrease the electricity purchase from the external grid) and increasing the utilization rate of the wind turbine for local use (so as to increase the consumer independence from the external grid). Predictions of available wind power feed the reinforcement learning algorithm for selecting the optimal battery scheduling actions. The embedded learning mechanism allows to enhance the consumer knowledge about the optimal actions for battery scheduling under different time-dependent environmental conditions. The developed framework gives the capability to intelligent consumers to learn the stochastic environment and make use of the experience to select optimal energy management actions. - Highlights: • A consumer exploits a 2 steps-ahead reinforcement learning for battery scheduling. • The Q-learning based mechanism is fed by the predictions of available wind power. • Wind speed state evolutions are modeled with a Markov chain model. • Optimal scheduling actions are learned through the occurrence of similar scenarios. • The consumer manifests a continuous enhance of his knowledge about optimal actions

  3. "Notice of Violation of IEEE Publication Principles" Multiobjective Reinforcement Learning: A Comprehensive Overview.

    Science.gov (United States)

    Liu, Chunming; Xu, Xin; Hu, Dewen

    2013-04-29

    Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.

  4. Flow Navigation by Smart Microswimmers via Reinforcement Learning

    Science.gov (United States)

    Colabrese, Simona; Biferale, Luca; Celani, Antonio; Gustavsson, Kristian

    2017-11-01

    We have numerically modeled active particles which are able to acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. We show that those swimmers can learn effective strategies just by experience, using a reinforcement learning algorithm. As an example, we focus on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, exploiting the underlying flow whenever possible. The reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This work paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. ERC AdG NewTURB 339032.

  5. Value learning through reinforcement : The basics of dopamine and reinforcement learning

    NARCIS (Netherlands)

    Daw, N.D.; Tobler, P.N.; Glimcher, P.W.; Fehr, E.

    2013-01-01

    This chapter provides an overview of reinforcement learning and temporal difference learning and relates these topics to the firing properties of midbrain dopamine neurons. First, we review the RescorlaWagner learning rule and basic learning phenomena, such as blocking, which the rule explains. Then

  6. GA-based fuzzy reinforcement learning for control of a magnetic bearing system.

    Science.gov (United States)

    Lin, C T; Jou, C P

    2000-01-01

    This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.

  7. Reinforcement active learning in the vibrissae system: optimal object localization.

    Science.gov (United States)

    Gordon, Goren; Dorfman, Nimrod; Ahissar, Ehud

    2013-01-01

    Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment. Copyright © 2012 Elsevier Ltd. All rights reserved.

  8. Deep Reinforcement Learning: An Overview

    OpenAIRE

    Li, Yuxi

    2017-01-01

    We give an overview of recent exciting achievements of deep reinforcement learning (RL). We discuss six core elements, six important mechanisms, and twelve applications. We start with background of machine learning, deep learning and reinforcement learning. Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. After that, we discuss important mechanisms for RL, including attention and memory, unsuperv...

  9. Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models.

    Science.gov (United States)

    Najnin, Shamima; Banerjee, Bonny

    2018-01-01

    Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The "novel words to novel objects" language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task.

  10. Reinforcement learning in supply chains.

    Science.gov (United States)

    Valluri, Annapurna; North, Michael J; Macal, Charles M

    2009-10-01

    Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.

  11. Rational and Mechanistic Perspectives on Reinforcement Learning

    Science.gov (United States)

    Chater, Nick

    2009-01-01

    This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…

  12. Learning to trade via direct reinforcement.

    Science.gov (United States)

    Moody, J; Saffell, M

    2001-01-01

    We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision-making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills.

  13. Multiagent cooperation and competition with deep reinforcement learning.

    Directory of Open Access Journals (Sweden)

    Ardi Tampuu

    Full Text Available Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

  14. Multiagent cooperation and competition with deep reinforcement learning

    Science.gov (United States)

    Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

    2017-01-01

    Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments. PMID:28380078

  15. Multiagent cooperation and competition with deep reinforcement learning.

    Science.gov (United States)

    Tampuu, Ardi; Matiisen, Tambet; Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

    2017-01-01

    Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

  16. Neural correlates of reinforcement learning and social preferences in competitive bidding.

    Science.gov (United States)

    van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M

    2013-01-30

    In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.

  17. What Can Reinforcement Learning Teach Us About Non-Equilibrium Quantum Dynamics

    Science.gov (United States)

    Bukov, Marin; Day, Alexandre; Sels, Dries; Weinberg, Phillip; Polkovnikov, Anatoli; Mehta, Pankaj

    Equilibrium thermodynamics and statistical physics are the building blocks of modern science and technology. Yet, our understanding of thermodynamic processes away from equilibrium is largely missing. In this talk, I will reveal the potential of what artificial intelligence can teach us about the complex behaviour of non-equilibrium systems. Specifically, I will discuss the problem of finding optimal drive protocols to prepare a desired target state in quantum mechanical systems by applying ideas from Reinforcement Learning [one can think of Reinforcement Learning as the study of how an agent (e.g. a robot) can learn and perfect a given policy through interactions with an environment.]. The driving protocols learnt by our agent suggest that the non-equilibrium world features possibilities easily defying intuition based on equilibrium physics.

  18. TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow

    OpenAIRE

    Hafner, Danijar; Davidson, James; Vanhoucke, Vincent

    2017-01-01

    We introduce TensorFlow Agents, an efficient infrastructure paradigm for building parallel reinforcement learning algorithms in TensorFlow. We simulate multiple environments in parallel, and group them to perform the neural network computation on a batch rather than individual observations. This allows the TensorFlow execution engine to parallelize computation, without the need for manual synchronization. Environments are stepped in separate Python processes to progress them in parallel witho...

  19. Framework for robot skill learning using reinforcement learning

    Science.gov (United States)

    Wei, Yingzi; Zhao, Mingyang

    2003-09-01

    Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.

  20. Ensemble Network Architecture for Deep Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Xi-liang Chen

    2018-01-01

    Full Text Available The popular deep Q learning algorithm is known to be instability because of the Q-value’s shake and overestimation action values under certain conditions. These issues tend to adversely affect their performance. In this paper, we develop the ensemble network architecture for deep reinforcement learning which is based on value function approximation. The temporal ensemble stabilizes the training process by reducing the variance of target approximation error and the ensemble of target values reduces the overestimate and makes better performance by estimating more accurate Q-value. Our results show that this architecture leads to statistically significant better value evaluation and more stable and better performance on several classical control tasks at OpenAI Gym environment.

  1. Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.

    Science.gov (United States)

    Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

    2015-01-01

    Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.

  2. SCAFFOLDINGAND REINFORCEMENT: USING DIGITAL LOGBOOKS IN LEARNING VOCABULARY

    OpenAIRE

    Khalifa, Salma Hasan Almabrouk; Shabdin, Ahmad Affendi

    2016-01-01

    Reinforcement and scaffolding are tested approaches to enhance learning achievements. Keeping a record of the learning process as well as the new learned words functions as scaffolding to help learners build a comprehensive vocabulary. Similarly, repetitive learning of new words reinforces permanent learning for long-term memory. Paper-based logbooks may prove to be good records of the learning process, but if learners use digital logbooks, the results may be even better. Digital logbooks wit...

  3. Episodic reinforcement learning control approach for biped walking

    Directory of Open Access Journals (Sweden)

    Katić Duško

    2012-01-01

    Full Text Available This paper presents a hybrid dynamic control approach to the realization of humanoid biped robotic walk, focusing on the policy gradient episodic reinforcement learning with fuzzy evaluative feedback. The proposed structure of controller involves two feedback loops: a conventional computed torque controller and an episodic reinforcement learning controller. The reinforcement learning part includes fuzzy information about Zero-Moment- Point errors. Simulation tests using a medium-size 36-DOF humanoid robot MEXONE were performed to demonstrate the effectiveness of our method.

  4. Reinforcement learning in complementarity game and population dynamics.

    Science.gov (United States)

    Jost, Jürgen; Li, Wei

    2014-02-01

    We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005)] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.

  5. Reinforcement learning for optimal control of low exergy buildings

    International Nuclear Information System (INIS)

    Yang, Lei; Nagy, Zoltan; Goffin, Philippe; Schlueter, Arno

    2015-01-01

    Highlights: • Implementation of reinforcement learning control for LowEx Building systems. • Learning allows adaptation to local environment without prior knowledge. • Presentation of reinforcement learning control for real-life applications. • Discussion of the applicability for real-life situations. - Abstract: Over a third of the anthropogenic greenhouse gas (GHG) emissions stem from cooling and heating buildings, due to their fossil fuel based operation. Low exergy building systems are a promising approach to reduce energy consumption as well as GHG emissions. They consists of renewable energy technologies, such as PV, PV/T and heat pumps. Since careful tuning of parameters is required, a manual setup may result in sub-optimal operation. A model predictive control approach is unnecessarily complex due to the required model identification. Therefore, in this work we present a reinforcement learning control (RLC) approach. The studied building consists of a PV/T array for solar heat and electricity generation, as well as geothermal heat pumps. We present RLC for the PV/T array, and the full building model. Two methods, Tabular Q-learning and Batch Q-learning with Memory Replay, are implemented with real building settings and actual weather conditions in a Matlab/Simulink framework. The performance is evaluated against standard rule-based control (RBC). We investigated different neural network structures and find that some outperformed RBC already during the learning phase. Overall, every RLC strategy for PV/T outperformed RBC by over 10% after the third year. Likewise, for the full building, RLC outperforms RBC in terms of meeting the heating demand, maintaining the optimal operation temperature and compensating more effectively for ground heat. This allows to reduce engineering costs associated with the setup of these systems, as well as decrease the return-of-invest period, both of which are necessary to create a sustainable, zero-emission building

  6. An Improved Reinforcement Learning System Using Affective Factors

    Directory of Open Access Journals (Sweden)

    Takashi Kuremoto

    2013-07-01

    Full Text Available As a powerful and intelligent machine learning method, reinforcement learning (RL has been widely used in many fields such as game theory, adaptive control, multi-agent system, nonlinear forecasting, and so on. The main contribution of this technique is its exploration and exploitation approaches to find the optimal solution or semi-optimal solution of goal-directed problems. However, when RL is applied to multi-agent systems (MASs, problems such as “curse of dimension”, “perceptual aliasing problem”, and uncertainty of the environment constitute high hurdles to RL. Meanwhile, although RL is inspired by behavioral psychology and reward/punishment from the environment is used, higher mental factors such as affects, emotions, and motivations are rarely adopted in the learning procedure of RL. In this paper, to challenge agents learning in MASs, we propose a computational motivation function, which adopts two principle affective factors “Arousal” and “Pleasure” of Russell’s circumplex model of affects, to improve the learning performance of a conventional RL algorithm named Q-learning (QL. Compared with the conventional QL, computer simulations of pursuit problems with static and dynamic preys were carried out, and the results showed that the proposed method results in agents having a faster and more stable learning performance.

  7. Functional Contour-following via Haptic Perception and Reinforcement Learning.

    Science.gov (United States)

    Hellman, Randall B; Tekin, Cem; van der Schaar, Mihaela; Santos, Veronica J

    2018-01-01

    Many tasks involve the fine manipulation of objects despite limited visual feedback. In such scenarios, tactile and proprioceptive feedback can be leveraged for task completion. We present an approach for real-time haptic perception and decision-making for a haptics-driven, functional contour-following task: the closure of a ziplock bag. This task is challenging for robots because the bag is deformable, transparent, and visually occluded by artificial fingertip sensors that are also compliant. A deep neural net classifier was trained to estimate the state of a zipper within a robot's pinch grasp. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards by balancing exploration versus exploitation of the state-action space. The C-MAB learner outperformed a benchmark Q-learner by more efficiently exploring the state-action space while learning a hard-to-code task. The learned C-MAB policy was tested with novel ziplock bag scenarios and contours (wire, rope). Importantly, this work contributes to the development of reinforcement learning approaches that account for limited resources such as hardware life and researcher time. As robots are used to perform complex, physically interactive tasks in unstructured or unmodeled environments, it becomes important to develop methods that enable efficient and effective learning with physical testbeds.

  8. Belief reward shaping in reinforcement learning

    CSIR Research Space (South Africa)

    Marom, O

    2018-02-01

    Full Text Available A key challenge in many reinforcement learning problems is delayed rewards, which can significantly slow down learning. Although reward shaping has previously been introduced to accelerate learning by bootstrapping an agent with additional...

  9. Adaptive representations for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.

    2010-01-01

    This book presents new algorithms for reinforcement learning, a form of machine learning in which an autonomous agent seeks a control policy for a sequential decision task. Since current methods typically rely on manually designed solution representations, agents that automatically adapt their own

  10. Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers

    Science.gov (United States)

    Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.

    2014-01-01

    Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…

  11. Reinforcement learning in continuous state and action spaces

    NARCIS (Netherlands)

    H. P. van Hasselt (Hado); M.A. Wiering; M. van Otterlo

    2012-01-01

    textabstractMany traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action

  12. Playing SNES in the Retro Learning Environment

    OpenAIRE

    Bhonker, Nadav; Rozenberg, Shai; Hubara, Itay

    2016-01-01

    Mastering a video game requires skill, tactics and strategy. While these attributes may be acquired naturally by human players, teaching them to a computer program is a far more challenging task. In recent years, extensive research was carried out in the field of reinforcement learning and numerous algorithms were introduced, aiming to learn how to perform human tasks such as playing video games. As a result, the Arcade Learning Environment (ALE) (Bellemare et al., 2013) has become a commonly...

  13. Autonomous reinforcement learning with experience replay.

    Science.gov (United States)

    Wawrzyński, Paweł; Tanwani, Ajay Kumar

    2013-05-01

    This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. Copyright © 2012 Elsevier Ltd. All rights reserved.

  14. Reinforcement Learning in Repeated Portfolio Decisions

    OpenAIRE

    Diao, Linan; Rieskamp, Jörg

    2011-01-01

    How do people make investment decisions when they receive outcome feedback? We examined how well the standard mean-variance model and two reinforcement models predict people's portfolio decisions. The basic reinforcement model predicts a learning process that relies solely on the portfolio's overall return, whereas the proposed extended reinforcement model also takes the risk and covariance of the investments into account. The experimental results illustrate that people reacted sensitively to...

  15. Reinforcement learning improves behaviour from evaluative feedback

    Science.gov (United States)

    Littman, Michael L.

    2015-05-01

    Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.

  16. 'Proactive' use of cue-context congruence for building reinforcement learning's reward function.

    Science.gov (United States)

    Zsuga, Judit; Biro, Klara; Tajti, Gabor; Szilasi, Magdolna Emma; Papp, Csaba; Juhasz, Bela; Gesztelyi, Rudolf

    2016-10-28

    Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent's knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability. The central objective of reinforcement learning is to solve these two functions outside the agent's control either using, or not using a model. In the present paper, using the proactive model of reinforcement learning we offer insight on how the brain creates simplified representations of the environment, and how these representations are organized to support the identification of relevant stimuli and action. Furthermore, we identify neurobiological correlates of our model by suggesting that the reward and policy functions, attributes of the Bellman equitation, are built by the orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC), respectively. Based on this we propose that the OFC assesses cue-context congruence to activate the most context frame. Furthermore given the bidirectional neuroanatomical link between the OFC and model-free structures, we suggest that model-based input is incorporated into the reward prediction error (RPE) signal, and conversely RPE signal may be used to update the reward-related information of context frames and the policy underlying action selection in the OFC and ACC, respectively. Furthermore clinical implications for cognitive behavioral interventions are discussed.

  17. Solution to reinforcement learning problems with artificial potential field

    Institute of Scientific and Technical Information of China (English)

    XIE Li-juan; XIE Guang-rong; CHEN Huan-wen; LI Xiao-li

    2008-01-01

    A novel method was designed to solve reinforcement learning problems with artificial potential field. Firstly a reinforcement learning problem was transferred to a path planning problem by using artificial potential field(APF), which was a very appropriate method to model a reinforcement learning problem. Secondly, a new APF algorithm was proposed to overcome the local minimum problem in the potential field methods with a virtual water-flow concept. The performance of this new method was tested by a gridworld problem named as key and door maze. The experimental results show that within 45 trials, good and deterministic policies are found in almost all simulations. In comparison with WIERING's HQ-learning system which needs 20 000 trials for stable solution, the proposed new method can obtain optimal and stable policy far more quickly than HQ-learning. Therefore, the new method is simple and effective to give an optimal solution to the reinforcement learning problem.

  18. Reinforcement Learning in Autism Spectrum Disorder

    Directory of Open Access Journals (Sweden)

    Manuela Schuetze

    2017-11-01

    Full Text Available Early behavioral interventions are recognized as integral to standard care in autism spectrum disorder (ASD, and often focus on reinforcing desired behaviors (e.g., eye contact and reducing the presence of atypical behaviors (e.g., echoing others' phrases. However, efficacy of these programs is mixed. Reinforcement learning relies on neurocircuitry that has been reported to be atypical in ASD: prefrontal-sub-cortical circuits, amygdala, brainstem, and cerebellum. Thus, early behavioral interventions rely on neurocircuitry that may function atypically in at least a subset of individuals with ASD. Recent work has investigated physiological, behavioral, and neural responses to reinforcers to uncover differences in motivation and learning in ASD. We will synthesize this work to identify promising avenues for future research that ultimately can be used to enhance the efficacy of early intervention.

  19. Reinforcement and inference in cross-situational word learning.

    Science.gov (United States)

    Tilles, Paulo F C; Fontanari, José F

    2013-01-01

    Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation) inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only) reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.

  20. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention-A Neuroeducation Study.

    Science.gov (United States)

    Anderson, Sarah J; Hecker, Kent G; Krigolson, Olave E; Jamniczky, Heather A

    2018-01-01

    In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise.

  1. Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.

    Science.gov (United States)

    Ren, Zhipeng; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Zhipeng Ren; Daoyi Dong; Huaxiong Li; Chunlin Chen; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Ren, Zhipeng

    2018-06-01

    In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.

  2. Manifold Regularized Reinforcement Learning.

    Science.gov (United States)

    Li, Hongliang; Liu, Derong; Wang, Ding

    2018-04-01

    This paper introduces a novel manifold regularized reinforcement learning scheme for continuous Markov decision processes. Smooth feature representations for value function approximation can be automatically learned using the unsupervised manifold regularization method. The learned features are data-driven, and can be adapted to the geometry of the state space. Furthermore, the scheme provides a direct basis representation extension for novel samples during policy learning and control. The performance of the proposed scheme is evaluated on two benchmark control tasks, i.e., the inverted pendulum and the energy storage problem. Simulation results illustrate the concepts of the proposed scheme and show that it can obtain excellent performance.

  3. Reinforcement Learning Based Data Self-Destruction Scheme for Secured Data Management

    Directory of Open Access Journals (Sweden)

    Young Ki Kim

    2018-04-01

    Full Text Available As technologies and services that leverage cloud computing have evolved, the number of businesses and individuals who use them are increasing rapidly. In the course of using cloud services, as users store and use data that include personal information, research on privacy protection models to protect sensitive information in the cloud environment is becoming more important. As a solution to this problem, a self-destructing scheme has been proposed that prevents the decryption of encrypted user data after a certain period of time using a Distributed Hash Table (DHT network. However, the existing self-destructing scheme does not mention how to set the number of key shares and the threshold value considering the environment of the dynamic DHT network. This paper proposes a method to set the parameters to generate the key shares needed for the self-destructing scheme considering the availability and security of data. The proposed method defines state, action, and reward of the reinforcement learning model based on the similarity of the graph, and applies the self-destructing scheme process by updating the parameter based on the reinforcement learning model. Through the proposed technique, key sharing parameters can be set in consideration of data availability and security in dynamic DHT network environments.

  4. Decentralized Reinforcement Learning of robot behaviors

    NARCIS (Netherlands)

    Leottau, David L.; Ruiz-del-Solar, Javier; Babuska, R.

    2018-01-01

    A multi-agent methodology is proposed for Decentralized Reinforcement Learning (DRL) of individual behaviors in problems where multi-dimensional action spaces are involved. When using this methodology, sub-tasks are learned in parallel by individual agents working toward a common goal. In

  5. Continuous residual reinforcement learning for traffic signal control optimization

    NARCIS (Netherlands)

    Aslani, Mohammad; Seipel, Stefan; Wiering, Marco

    2018-01-01

    Traffic signal control can be naturally regarded as a reinforcement learning problem. Unfortunately, it is one of the most difficult classes of reinforcement learning problems owing to its large state space. A straightforward approach to address this challenge is to control traffic signals based on

  6. Tunnel Ventilation Control Using Reinforcement Learning Methodology

    Science.gov (United States)

    Chu, Baeksuk; Kim, Dongnam; Hong, Daehie; Park, Jooyoung; Chung, Jin Taek; Kim, Tae-Hyung

    The main purpose of tunnel ventilation system is to maintain CO pollutant concentration and VI (visibility index) under an adequate level to provide drivers with comfortable and safe driving environment. Moreover, it is necessary to minimize power consumption used to operate ventilation system. To achieve the objectives, the control algorithm used in this research is reinforcement learning (RL) method. RL is a goal-directed learning of a mapping from situations to actions without relying on exemplary supervision or complete models of the environment. The goal of RL is to maximize a reward which is an evaluative feedback from the environment. In the process of constructing the reward of the tunnel ventilation system, two objectives listed above are included, that is, maintaining an adequate level of pollutants and minimizing power consumption. RL algorithm based on actor-critic architecture and gradient-following algorithm is adopted to the tunnel ventilation system. The simulations results performed with real data collected from existing tunnel ventilation system and real experimental verification are provided in this paper. It is confirmed that with the suggested controller, the pollutant level inside the tunnel was well maintained under allowable limit and the performance of energy consumption was improved compared to conventional control scheme.

  7. Can model-free reinforcement learning explain deontological moral judgments?

    Science.gov (United States)

    Ayars, Alisabeth

    2016-05-01

    Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.

    Science.gov (United States)

    Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna

    2016-09-01

    Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing.

  9. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study

    Science.gov (United States)

    Anderson, Sarah J.; Hecker, Kent G.; Krigolson, Olave E.; Jamniczky, Heather A.

    2018-01-01

    In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise. PMID:29467638

  10. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study

    Directory of Open Access Journals (Sweden)

    Sarah J. Anderson

    2018-02-01

    Full Text Available In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT methods incorporate pre-class exercises (typically online meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise.

  11. Human demonstrations for fast and safe exploration in reinforcement learning

    NARCIS (Netherlands)

    Schonebaum, G.K.; Junell, J.L.; van Kampen, E.

    2017-01-01

    Reinforcement learning is a promising framework for controlling complex vehicles with a high level of autonomy, since it does not need a dynamic model of the vehicle, and it is able to adapt to changing conditions. When learning from scratch, the performance of a reinforcement learning controller

  12. DYNAMIC AND INCREMENTAL EXPLORATION STRATEGY IN FUSION ADAPTIVE RESONANCE THEORY FOR ONLINE REINFORCEMENT LEARNING

    Directory of Open Access Journals (Sweden)

    Budhitama Subagdja

    2016-06-01

    Full Text Available One of the fundamental challenges in reinforcement learning is to setup a proper balance between exploration and exploitation to obtain the maximum cummulative reward in the long run. Most protocols for exploration bound the overall values to a convergent level of performance. If new knowledge is inserted or the environment is suddenly changed, the issue becomes more intricate as the exploration must compromise the pre-existing knowledge. This paper presents a type of multi-channel adaptive resonance theory (ART neural network model called fusion ART which serves as a fuzzy approximator for reinforcement learning with inherent features that can regulate the exploration strategy. This intrinsic regulation is driven by the condition of the knowledge learnt so far by the agent. The model offers a stable but incremental reinforcement learning that can involve prior rules as bootstrap knowledge for guiding the agent to select the right action. Experiments in obstacle avoidance and navigation tasks demonstrate that in the configuration of learning wherein the agent learns from scratch, the inherent exploration model in fusion ART model is comparable to the basic E-greedy policy. On the other hand, the model is demonstrated to deal with prior knowledge and strike a balance between exploration and exploitation.

  13. Human reinforcement learning subdivides structured action spaces by learning effector-specific values.

    Science.gov (United States)

    Gershman, Samuel J; Pesaran, Bijan; Daw, Nathaniel D

    2009-10-28

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.

  14. Evolutionary computation for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Wiering, M.; van Otterlo, M.

    2012-01-01

    Algorithms for evolutionary computation, which simulate the process of natural selection to solve optimization problems, are an effective tool for discovering high-performing reinforcement-learning policies. Because they can automatically find good representations, handle continuous action spaces,

  15. Stochastic abstract policies: generalizing knowledge to improve reinforcement learning.

    Science.gov (United States)

    Koga, Marcelo L; Freire, Valdinei; Costa, Anna H R

    2015-01-01

    Reinforcement learning (RL) enables an agent to learn behavior by acquiring experience through trial-and-error interactions with a dynamic environment. However, knowledge is usually built from scratch and learning to behave may take a long time. Here, we improve the learning performance by leveraging prior knowledge; that is, the learner shows proper behavior from the beginning of a target task, using the knowledge from a set of known, previously solved, source tasks. In this paper, we argue that building stochastic abstract policies that generalize over past experiences is an effective way to provide such improvement and this generalization outperforms the current practice of using a library of policies. We achieve that contributing with a new algorithm, AbsProb-PI-multiple and a framework for transferring knowledge represented as a stochastic abstract policy in new RL tasks. Stochastic abstract policies offer an effective way to encode knowledge because the abstraction they provide not only generalizes solutions but also facilitates extracting the similarities among tasks. We perform experiments in a robotic navigation environment and analyze the agent's behavior throughout the learning process and also assess the transfer ratio for different amounts of source tasks. We compare our method with the transfer of a library of policies, and experiments show that the use of a generalized policy produces better results by more effectively guiding the agent when learning a target task.

  16. Energy Management Strategy for a Hybrid Electric Vehicle Based on Deep Reinforcement Learning

    OpenAIRE

    Yue Hu; Weimin Li; Kun Xu; Taimoor Zahid; Feiyan Qin; Chenming Li

    2018-01-01

    An energy management strategy (EMS) is important for hybrid electric vehicles (HEVs) since it plays a decisive role on the performance of the vehicle. However, the variation of future driving conditions deeply influences the effectiveness of the EMS. Most existing EMS methods simply follow predefined rules that are not adaptive to different driving conditions online. Therefore, it is useful that the EMS can learn from the environment or driving cycle. In this paper, a deep reinforcement learn...

  17. Human reinforcement learning subdivides structured action spaces by learning effector-specific values

    OpenAIRE

    Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.

    2009-01-01

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable, due to the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning – such as prediction error signals for action valuation associated with dopamine and the striatum – can cope with this “curse of dimensionality...

  18. Reinforcement Learning Based Novel Adaptive Learning Framework for Smart Grid Prediction

    Directory of Open Access Journals (Sweden)

    Tian Li

    2017-01-01

    Full Text Available Smart grid is a potential infrastructure to supply electricity demand for end users in a safe and reliable manner. With the rapid increase of the share of renewable energy and controllable loads in smart grid, the operation uncertainty of smart grid has increased briskly during recent years. The forecast is responsible for the safety and economic operation of the smart grid. However, most existing forecast methods cannot account for the smart grid due to the disabilities to adapt to the varying operational conditions. In this paper, reinforcement learning is firstly exploited to develop an online learning framework for the smart grid. With the capability of multitime scale resolution, wavelet neural network has been adopted in the online learning framework to yield reinforcement learning and wavelet neural network (RLWNN based adaptive learning scheme. The simulations on two typical prediction problems in smart grid, including wind power prediction and load forecast, validate the effectiveness and the scalability of the proposed RLWNN based learning framework and algorithm.

  19. Reusable Reinforcement Learning via Shallow Trails.

    Science.gov (United States)

    Yu, Yang; Chen, Shi-Yong; Da, Qing; Zhou, Zhi-Hua

    2018-06-01

    Reinforcement learning has shown great success in helping learning agents accomplish tasks autonomously from environment interactions. Meanwhile in many real-world applications, an agent needs to accomplish not only a fixed task but also a range of tasks. For this goal, an agent can learn a metapolicy over a set of training tasks that are drawn from an underlying distribution. By maximizing the total reward summed over all the training tasks, the metapolicy can then be reused in accomplishing test tasks from the same distribution. However, in practice, we face two major obstacles to train and reuse metapolicies well. First, how to identify tasks that are unrelated or even opposite with each other, in order to avoid their mutual interference in the training. Second, how to characterize task features, according to which a metapolicy can be reused. In this paper, we propose the MetA-Policy LEarning (MAPLE) approach that overcomes the two difficulties by introducing the shallow trail. It probes a task by running a roughly trained policy. Using the rewards of the shallow trail, MAPLE automatically groups similar tasks. Moreover, when the task parameters are unknown, the rewards of the shallow trail also serve as task features. Empirical studies on several controlling tasks verify that MAPLE can train metapolicies well and receives high reward on test tasks.

  20. Instructional control of reinforcement learning: a behavioral and neurocomputational investigation.

    Science.gov (United States)

    Doll, Bradley B; Jacobs, W Jake; Sanfey, Alan G; Frank, Michael J

    2009-11-24

    Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.

  1. Reinforcement Learning Based Artificial Immune Classifier

    Directory of Open Access Journals (Sweden)

    Mehmet Karakose

    2013-01-01

    Full Text Available One of the widely used methods for classification that is a decision-making process is artificial immune systems. Artificial immune systems based on natural immunity system can be successfully applied for classification, optimization, recognition, and learning in real-world problems. In this study, a reinforcement learning based artificial immune classifier is proposed as a new approach. This approach uses reinforcement learning to find better antibody with immune operators. The proposed new approach has many contributions according to other methods in the literature such as effectiveness, less memory cell, high accuracy, speed, and data adaptability. The performance of the proposed approach is demonstrated by simulation and experimental results using real data in Matlab and FPGA. Some benchmark data and remote image data are used for experimental results. The comparative results with supervised/unsupervised based artificial immune system, negative selection classifier, and resource limited artificial immune classifier are given to demonstrate the effectiveness of the proposed new method.

  2. Multi-agent machine learning a reinforcement approach

    CERN Document Server

    Schwartz, H M

    2014-01-01

    The book begins with a chapter on traditional methods of supervised learning, covering recursive least squares learning, mean square error methods, and stochastic approximation. Chapter 2 covers single agent reinforcement learning. Topics include learning value functions, Markov games, and TD learning with eligibility traces. Chapter 3 discusses two player games including two player matrix games with both pure and mixed strategies. Numerous algorithms and examples are presented. Chapter 4 covers learning in multi-player games, stochastic games, and Markov games, focusing on learning multi-pla

  3. Structure identification in fuzzy inference using reinforcement learning

    Science.gov (United States)

    Berenji, Hamid R.; Khedkar, Pratap

    1993-01-01

    In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.

  4. Development and Evaluation of Mechatronics Learning System in a Web-Based Environment

    Science.gov (United States)

    Shyr, Wen-Jye

    2011-01-01

    The development of remote laboratory suitable for the reinforcement of undergraduate level teaching of mechatronics is important. For the reason, a Web-based mechatronics learning system, called the RECOLAB (REmote COntrol LABoratory), for remote learning in engineering education has been developed in this study. The web-based environment is an…

  5. Reinforcement Learning Based on the Bayesian Theorem for Electricity Markets Decision Support

    DEFF Research Database (Denmark)

    Sousa, Tiago; Pinto, Tiago; Praca, Isabel

    2014-01-01

    This paper presents the applicability of a reinforcement learning algorithm based on the application of the Bayesian theorem of probability. The proposed reinforcement learning algorithm is an advantageous and indispensable tool for ALBidS (Adaptive Learning strategic Bidding System), a multi...

  6. Using a board game to reinforce learning.

    Science.gov (United States)

    Yoon, Bona; Rodriguez, Leslie; Faselis, Charles J; Liappis, Angelike P

    2014-03-01

    Experiential gaming strategies offer a variation on traditional learning. A board game was used to present synthesized content of fundamental catheter care concepts and reinforce evidence-based practices relevant to nursing. Board games are innovative educational tools that can enhance active learning. Copyright 2014, SLACK Incorporated.

  7. Exploiting Best-Match Equations for Efficient Reinforcement Learning

    NARCIS (Netherlands)

    van Seijen, Harm; Whiteson, Shimon; van Hasselt, Hado; Wiering, Marco

    This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of model-free methods. Best-match learning works by approximating the solution to a set of best-match equations,

  8. Multi-Objective Reinforcement Learning-Based Deep Neural Networks for Cognitive Space Communications

    Science.gov (United States)

    Ferreria, Paulo Victor R.; Paffenroth, Randy; Wyglinski, Alexander M.; Hackett, Timothy M.; Bilen, Sven G.; Reinhart, Richard C.; Mortensen, Dale J.

    2017-01-01

    Future communication subsystems of space exploration missions can potentially benefit from software-defined radios (SDRs) controlled by machine learning algorithms. In this paper, we propose a novel hybrid radio resource allocation management control algorithm that integrates multi-objective reinforcement learning and deep artificial neural networks. The objective is to efficiently manage communications system resources by monitoring performance functions with common dependent variables that result in conflicting goals. The uncertainty in the performance of thousands of different possible combinations of radio parameters makes the trade-off between exploration and exploitation in reinforcement learning (RL) much more challenging for future critical space-based missions. Thus, the system should spend as little time as possible on exploring actions, and whenever it explores an action, it should perform at acceptable levels most of the time. The proposed approach enables on-line learning by interactions with the environment and restricts poor resource allocation performance through virtual environment exploration. Improvements in the multiobjective performance can be achieved via transmitter parameter adaptation on a packet-basis, with poorly predicted performance promptly resulting in rejected decisions. Simulations presented in this work considered the DVB-S2 standard adaptive transmitter parameters and additional ones expected to be present in future adaptive radio systems. Performance results are provided by analysis of the proposed hybrid algorithm when operating across a satellite communication channel from Earth to GEO orbit during clear sky conditions. The proposed approach constitutes part of the core cognitive engine proof-of-concept to be delivered to the NASA Glenn Research Center SCaN Testbed located onboard the International Space Station.

  9. Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing

    OpenAIRE

    Le, Minh; Fokkens, Antske

    2017-01-01

    Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this paper, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error propagation. Reinforcement learning improves accuracy of both labeled and unlabeled dependencies of the Stanford Neural Dependency Parser, a high performance greedy parser, while maintaining its eff...

  10. Knowledge-Based Reinforcement Learning for Data Mining

    Science.gov (United States)

    Kudenko, Daniel; Grzes, Marek

    Data Mining is the process of extracting patterns from data. Two general avenues of research in the intersecting areas of agents and data mining can be distinguished. The first approach is concerned with mining an agent’s observation data in order to extract patterns, categorize environment states, and/or make predictions of future states. In this setting, data is normally available as a batch, and the agent’s actions and goals are often independent of the data mining task. The data collection is mainly considered as a side effect of the agent’s activities. Machine learning techniques applied in such situations fall into the class of supervised learning. In contrast, the second scenario occurs where an agent is actively performing the data mining, and is responsible for the data collection itself. For example, a mobile network agent is acquiring and processing data (where the acquisition may incur a certain cost), or a mobile sensor agent is moving in a (perhaps hostile) environment, collecting and processing sensor readings. In these settings, the tasks of the agent and the data mining are highly intertwined and interdependent (or even identical). Supervised learning is not a suitable technique for these cases. Reinforcement Learning (RL) enables an agent to learn from experience (in form of reward and punishment for explorative actions) and adapt to new situations, without a teacher. RL is an ideal learning technique for these data mining scenarios, because it fits the agent paradigm of continuous sensing and acting, and the RL agent is able to learn to make decisions on the sampling of the environment which provides the data. Nevertheless, RL still suffers from scalability problems, which have prevented its successful use in many complex real-world domains. The more complex the tasks, the longer it takes a reinforcement learning algorithm to converge to a good solution. For many real-world tasks, human expert knowledge is available. For example, human

  11. Vision-based Navigation and Reinforcement Learning Path Finding for Social Robots

    OpenAIRE

    Pérez Sala, Xavier

    2010-01-01

    We propose a robust system for automatic Robot Navigation in uncontrolled en- vironments. The system is composed by three main modules: the Arti cial Vision module, the Reinforcement Learning module, and the behavior control module. The aim of the system is to allow a robot to automatically nd a path that arrives to a pre xed goal. Turn and straight movements in uncontrolled environments are automatically estimated and controlled using the proposed modules. The Arti cial Vi...

  12. Longitudinal investigation on learned helplessness tested under negative and positive reinforcement involving stimulus control.

    Science.gov (United States)

    Oliveira, Emileane C; Hunziker, Maria Helena

    2014-07-01

    In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.

  13. Adaptive Trajectory Tracking Control using Reinforcement Learning for Quadrotor

    Directory of Open Access Journals (Sweden)

    Wenjie Lou

    2016-02-01

    Full Text Available Inaccurate system parameters and unpredicted external disturbances affect the performance of non-linear controllers. In this paper, a new adaptive control algorithm under the reinforcement framework is proposed to stabilize a quadrotor helicopter. Based on a command-filtered non-linear control algorithm, adaptive elements are added and learned by policy-search methods. To predict the inaccurate system parameters, a new kernel-based regression learning method is provided. In addition, Policy learning by Weighting Exploration with the Returns (PoWER and Return Weighted Regression (RWR are utilized to learn the appropriate parameters for adaptive elements in order to cancel the effect of external disturbance. Furthermore, numerical simulations under several conditions are performed, and the ability of adaptive trajectory-tracking control with reinforcement learning are demonstrated.

  14. Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning.

    Science.gov (United States)

    Zhu, Lusha; Mathewson, Kyle E; Hsu, Ming

    2012-01-31

    Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents' beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs.

  15. Bio-robots automatic navigation with graded electric reward stimulation based on Reinforcement Learning.

    Science.gov (United States)

    Zhang, Chen; Sun, Chao; Gao, Liqiang; Zheng, Nenggan; Chen, Weidong; Zheng, Xiaoxiang

    2013-01-01

    Bio-robots based on brain computer interface (BCI) suffer from the lack of considering the characteristic of the animals in navigation. This paper proposed a new method for bio-robots' automatic navigation combining the reward generating algorithm base on Reinforcement Learning (RL) with the learning intelligence of animals together. Given the graded electrical reward, the animal e.g. the rat, intends to seek the maximum reward while exploring an unknown environment. Since the rat has excellent spatial recognition, the rat-robot and the RL algorithm can convergent to an optimal route by co-learning. This work has significant inspiration for the practical development of bio-robots' navigation with hybrid intelligence.

  16. How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

    Science.gov (United States)

    Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C

    2014-03-01

    Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward

  17. The "proactive" model of learning: Integrative framework for model-free and model-based reinforcement learning utilizing the associative learning-based proactive brain concept.

    Science.gov (United States)

    Zsuga, Judit; Biro, Klara; Papp, Csaba; Tajti, Gabor; Gesztelyi, Rudolf

    2016-02-01

    Reinforcement learning (RL) is a powerful concept underlying forms of associative learning governed by the use of a scalar reward signal, with learning taking place if expectations are violated. RL may be assessed using model-based and model-free approaches. Model-based reinforcement learning involves the amygdala, the hippocampus, and the orbitofrontal cortex (OFC). The model-free system involves the pedunculopontine-tegmental nucleus (PPTgN), the ventral tegmental area (VTA) and the ventral striatum (VS). Based on the functional connectivity of VS, model-free and model based RL systems center on the VS that by integrating model-free signals (received as reward prediction error) and model-based reward related input computes value. Using the concept of reinforcement learning agent we propose that the VS serves as the value function component of the RL agent. Regarding the model utilized for model-based computations we turned to the proactive brain concept, which offers an ubiquitous function for the default network based on its great functional overlap with contextual associative areas. Hence, by means of the default network the brain continuously organizes its environment into context frames enabling the formulation of analogy-based association that are turned into predictions of what to expect. The OFC integrates reward-related information into context frames upon computing reward expectation by compiling stimulus-reward and context-reward information offered by the amygdala and hippocampus, respectively. Furthermore we suggest that the integration of model-based expectations regarding reward into the value signal is further supported by the efferent of the OFC that reach structures canonical for model-free learning (e.g., the PPTgN, VTA, and VS). (c) 2016 APA, all rights reserved).

  18. The drift diffusion model as the choice rule in reinforcement learning.

    Science.gov (United States)

    Pedersen, Mads Lund; Frank, Michael J; Biele, Guido

    2017-08-01

    Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.

  19. Intrinsically motivated reinforcement learning for human-robot interaction in the real-world.

    Science.gov (United States)

    Qureshi, Ahmed Hussain; Nakamura, Yutaka; Yoshikawa, Yuichiro; Ishiguro, Hiroshi

    2018-03-26

    For a natural social human-robot interaction, it is essential for a robot to learn the human-like social skills. However, learning such skills is notoriously hard due to the limited availability of direct instructions from people to teach a robot. In this paper, we propose an intrinsically motivated reinforcement learning framework in which an agent gets the intrinsic motivation-based rewards through the action-conditional predictive model. By using the proposed method, the robot learned the social skills from the human-robot interaction experiences gathered in the real uncontrolled environments. The results indicate that the robot not only acquired human-like social skills but also took more human-like decisions, on a test dataset, than a robot which received direct rewards for the task achievement. Copyright © 2018 Elsevier Ltd. All rights reserved.

  20. Embedded Incremental Feature Selection for Reinforcement Learning

    Science.gov (United States)

    2012-05-01

    Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature

  1. Working Memory and Reinforcement Schedule Jointly Determine Reinforcement Learning in Children: Potential Implications for Behavioral Parent Training

    Directory of Open Access Journals (Sweden)

    Elien Segers

    2018-03-01

    Full Text Available Introduction: Behavioral Parent Training (BPT is often provided for childhood psychiatric disorders. These disorders have been shown to be associated with working memory impairments. BPT is based on operant learning principles, yet how operant principles shape behavior (through the partial reinforcement (PRF extinction effect, i.e., greater resistance to extinction that is created when behavior is reinforced partially rather than continuously and the potential role of working memory therein is scarcely studied in children. This study explored the PRF extinction effect and the role of working memory therein using experimental tasks in typically developing children.Methods: Ninety-seven children (age 6–10 completed a working memory task and an operant learning task, in which children acquired a response-sequence rule under either continuous or PRF (120 trials, followed by an extinction phase (80 trials. Data of 88 children were used for analysis.Results: The PRF extinction effect was confirmed: We observed slower acquisition and extinction in the PRF condition as compared to the continuous reinforcement (CRF condition. Working memory was negatively related to acquisition but not extinction performance.Conclusion: Both reinforcement contingencies and working memory relate to acquisition performance. Potential implications for BPT are that decreasing working memory load may enhance the chance of optimally learning through reinforcement.

  2. Vicarious reinforcement learning signals when instructing others.

    Science.gov (United States)

    Apps, Matthew A J; Lesage, Elise; Ramnani, Narender

    2015-02-18

    Reinforcement learning (RL) theory posits that learning is driven by discrepancies between the predicted and actual outcomes of actions (prediction errors [PEs]). In social environments, learning is often guided by similar RL mechanisms. For example, teachers monitor the actions of students and provide feedback to them. This feedback evokes PEs in students that guide their learning. We report the first study that investigates the neural mechanisms that underpin RL signals in the brain of a teacher. Neurons in the anterior cingulate cortex (ACC) signal PEs when learning from the outcomes of one's own actions but also signal information when outcomes are received by others. Does a teacher's ACC signal PEs when monitoring a student's learning? Using fMRI, we studied brain activity in human subjects (teachers) as they taught a confederate (student) action-outcome associations by providing positive or negative feedback. We examined activity time-locked to the students' responses, when teachers infer student predictions and know actual outcomes. We fitted a RL-based computational model to the behavior of the student to characterize their learning, and examined whether a teacher's ACC signals when a student's predictions are wrong. In line with our hypothesis, activity in the teacher's ACC covaried with the PE values in the model. Additionally, activity in the teacher's insula and ventromedial prefrontal cortex covaried with the predicted value according to the student. Our findings highlight that the ACC signals PEs vicariously for others' erroneous predictions, when monitoring and instructing their learning. These results suggest that RL mechanisms, processed vicariously, may underpin and facilitate teaching behaviors. Copyright © 2015 Apps et al.

  3. Visual reinforcement shapes eye movements in visual search.

    Science.gov (United States)

    Paeye, Céline; Schütz, Alexander C; Gegenfurtner, Karl R

    2016-08-01

    We use eye movements to gain information about our visual environment; this information can indirectly be used to affect the environment. Whereas eye movements are affected by explicit rewards such as points or money, it is not clear whether the information gained by finding a hidden target has a similar reward value. Here we tested whether finding a visual target can reinforce eye movements in visual search performed in a noise background, which conforms to natural scene statistics and contains a large number of possible target locations. First we tested whether presenting the target more often in one specific quadrant would modify eye movement search behavior. Surprisingly, participants did not learn to search for the target more often in high probability areas. Presumably, participants could not learn the reward structure of the environment. In two subsequent experiments we used a gaze-contingent display to gain full control over the reinforcement schedule. The target was presented more often after saccades into a specific quadrant or a specific direction. The proportions of saccades meeting the reinforcement criteria increased considerably, and participants matched their search behavior to the relative reinforcement rates of targets. Reinforcement learning seems to serve as the mechanism to optimize search behavior with respect to the statistics of the task.

  4. Efficient abstraction selection in reinforcement learning

    NARCIS (Netherlands)

    Seijen, H. van; Whiteson, S.; Kester, L.

    2013-01-01

    This paper introduces a novel approach for abstraction selection in reinforcement learning problems modelled as factored Markov decision processes (MDPs), for which a state is described via a set of state components. In abstraction selection, an agent must choose an abstraction from a set of

  5. Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play

    NARCIS (Netherlands)

    van der Ree, Michiel; Wiering, Marco

    2013-01-01

    This paper compares three strategies in using reinforcement learning algorithms to let an artificial agent learnto play the game of Othello. The three strategies that are compared are: Learning by self-play, learning from playing against a fixed opponent, and learning from playing against a fixed

  6. Neural Control of a Tracking Task via Attention-Gated Reinforcement Learning for Brain-Machine Interfaces.

    Science.gov (United States)

    Wang, Yiwen; Wang, Fang; Xu, Kai; Zhang, Qiaosheng; Zhang, Shaomin; Zheng, Xiaoxiang

    2015-05-01

    Reinforcement learning (RL)-based brain machine interfaces (BMIs) enable the user to learn from the environment through interactions to complete the task without desired signals, which is promising for clinical applications. Previous studies exploited Q-learning techniques to discriminate neural states into simple directional actions providing the trial initial timing. However, the movements in BMI applications can be quite complicated, and the action timing explicitly shows the intention when to move. The rich actions and the corresponding neural states form a large state-action space, imposing generalization difficulty on Q-learning. In this paper, we propose to adopt attention-gated reinforcement learning (AGREL) as a new learning scheme for BMIs to adaptively decode high-dimensional neural activities into seven distinct movements (directional moves, holdings and resting) due to the efficient weight-updating. We apply AGREL on neural data recorded from M1 of a monkey to directly predict a seven-action set in a time sequence to reconstruct the trajectory of a center-out task. Compared to Q-learning techniques, AGREL could improve the target acquisition rate to 90.16% in average with faster convergence and more stability to follow neural activity over multiple days, indicating the potential to achieve better online decoding performance for more complicated BMI tasks.

  7. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning.

    Science.gov (United States)

    Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B J

    2014-06-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.

  8. Time representation in reinforcement learning models of the basal ganglia

    Directory of Open Access Journals (Sweden)

    Samuel Joseph Gershman

    2014-01-01

    Full Text Available Reinforcement learning models have been influential in understanding many aspects of basal ganglia function, from reward prediction to action selection. Time plays an important role in these models, but there is still no theoretical consensus about what kind of time representation is used by the basal ganglia. We review several theoretical accounts and their supporting evidence. We then discuss the relationship between reinforcement learning models and the timing mechanisms that have been attributed to the basal ganglia. We hypothesize that a single computational system may underlie both reinforcement learning and interval timing—the perception of duration in the range of seconds to hours. This hypothesis, which extends earlier models by incorporating a time-sensitive action selection mechanism, may have important implications for understanding disorders like Parkinson's disease in which both decision making and timing are impaired.

  9. Safe Exploration of State and Action Spaces in Reinforcement Learning

    OpenAIRE

    Garcia, Javier; Fernandez, Fernando

    2014-01-01

    In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some sta...

  10. Adversarial Reinforcement Learning in a Cyber Security Simulation}

    OpenAIRE

    Elderman, Richard; Pater, Leon; Thie, Albert; Drugan, Madalina; Wiering, Marco

    2017-01-01

    This paper focuses on cyber-security simulations in networks modeled as a Markov game with incomplete information and stochastic elements. The resulting game is an adversarial sequential decision making problem played with two agents, the attacker and defender. The two agents pit one reinforcement learning technique, like neural networks, Monte Carlo learning and Q-learning, against each other and examine their effectiveness against learning opponents. The results showed that Monte Carlo lear...

  11. Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing

    NARCIS (Netherlands)

    Le, M.N.; Fokkens, A.S.

    Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this paper, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error

  12. The Computational Development of Reinforcement Learning during Adolescence.

    Directory of Open Access Journals (Sweden)

    Stefano Palminteri

    2016-06-01

    Full Text Available Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed. Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback and a value contextualisation module (enabling symmetrical reward and punishment learning. Unlike adults, adolescent performance did not benefit from counterfactual (complete feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.

  13. Reinforcement learning agents providing advice in complex video games

    Science.gov (United States)

    Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa

    2014-01-01

    This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

  14. A multiplicative reinforcement learning model capturing learning dynamics and interindividual variability in mice

    OpenAIRE

    Bathellier, Brice; Tee, Sui Poh; Hrovat, Christina; Rumpel, Simon

    2013-01-01

    Learning speed can strongly differ across individuals. This is seen in humans and animals. Here, we measured learning speed in mice performing a discrimination task and developed a theoretical model based on the reinforcement learning framework to account for differences between individual mice. We found that, when using a multiplicative learning rule, the starting connectivity values of the model strongly determine the shape of learning curves. This is in contrast to current learning models ...

  15. BOOK REVIEW STUDENT-TEACHER INTERACTION IN ONLINE LEARNING ENVIRONMENTS

    Directory of Open Access Journals (Sweden)

    Harun SERPIL

    2017-04-01

    Full Text Available As online learning environments do not lend themselves to face-to-face interaction between teachers and students, it is essential to understand how to ensure healthy social presence in online learning. This book provides a useful selection of both commonly used and recently developed theories by discussing current research and giving examples of social presence in latest Online Learning Environments (OLEs. The book examines how the appropriate use of technological tools can relate instructors, peers, and course content. The reports on successful implementations are reinforced with research involving pre-service teachers. Both experienced and inexperienced educators will benefit by being informed about the effective use of many valuable tools exemplified here. The last six chapters present an array of new models that support social presence, and demonstrate how traditional paradigms can be used to create online social presence.

  16. Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

    Directory of Open Access Journals (Sweden)

    Takahiro Ezaki

    2016-07-01

    Full Text Available Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.

  17. Reinforcement Learning for a New Piano Mover

    Directory of Open Access Journals (Sweden)

    Yuko Ishiwaka

    2005-08-01

    Full Text Available We attempt to achieve corporative behavior of autonomous decentralized agents constructed via Q-Learning, which is a type of reinforcement learning. As such, in the present paper, we examine the piano mover's problem. We propose a multi-agent architecture that has a training agent, learning agents and intermediate agent. Learning agents are heterogeneous and can communicate with each other. The movement of an object with three kinds of agent depends on the composition of the actions of the learning agents. By learning its own shape through the learning agents, avoidance of obstacles by the object is expected. We simulate the proposed method in a two-dimensional continuous world. Results obtained in the present investigation reveal the effectiveness of the proposed method.

  18. Place preference and vocal learning rely on distinct reinforcers in songbirds.

    Science.gov (United States)

    Murdoch, Don; Chen, Ruidong; Goldberg, Jesse H

    2018-04-30

    In reinforcement learning (RL) agents are typically tasked with maximizing a single objective function such as reward. But it remains poorly understood how agents might pursue distinct objectives at once. In machines, multiobjective RL can be achieved by dividing a single agent into multiple sub-agents, each of which is shaped by agent-specific reinforcement, but it remains unknown if animals adopt this strategy. Here we use songbirds to test if navigation and singing, two behaviors with distinct objectives, can be differentially reinforced. We demonstrate that strobe flashes aversively condition place preference but not song syllables. Brief noise bursts aversively condition song syllables but positively reinforce place preference. Thus distinct behavior-generating systems, or agencies, within a single animal can be shaped by correspondingly distinct reinforcement signals. Our findings suggest that spatially segregated vocal circuits can solve a credit assignment problem associated with multiobjective learning.

  19. A reward optimization method based on action subrewards in hierarchical reinforcement learning.

    Science.gov (United States)

    Fu, Yuchen; Liu, Quan; Ling, Xionghong; Cui, Zhiming

    2014-01-01

    Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

  20. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning

    Directory of Open Access Journals (Sweden)

    Yuntian Feng

    2017-01-01

    Full Text Available We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  1. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning.

    Science.gov (United States)

    Feng, Yuntian; Zhang, Hongjun; Hao, Wenning; Chen, Gang

    2017-01-01

    We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q -Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  2. Multiagent-Based Simulation of Temporal-Spatial Characteristics of Activity-Travel Patterns Using Interactive Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Min Yang

    2014-01-01

    Full Text Available We propose a multiagent-based reinforcement learning algorithm, in which the interactions between travelers and the environment are considered to simulate temporal-spatial characteristics of activity-travel patterns in a city. Road congestion degree is added to the reinforcement learning algorithm as a medium that passes the influence of one traveler’s decision to others. Meanwhile, the agents used in the algorithm are initialized from typical activity patterns extracted from the travel survey diary data of Shangyu city in China. In the simulation, both macroscopic activity-travel characteristics such as traffic flow spatial-temporal distribution and microscopic characteristics such as activity-travel schedules of each agent are obtained. Comparing the simulation results with the survey data, we find that deviation of the peak-hour traffic flow is less than 5%, while the correlation of the simulated versus survey location choice distribution is over 0.9.

  3. Pleasurable music affects reinforcement learning according to the listener

    Science.gov (United States)

    Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira

    2013-01-01

    Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875

  4. Curiosity driven reinforcement learning for motion planning on humanoids

    Science.gov (United States)

    Frank, Mikhail; Leitner, Jürgen; Stollenga, Marijn; Förster, Alexander; Schmidhuber, Jürgen

    2014-01-01

    Most previous work on artificial curiosity (AC) and intrinsic motivation focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study AC in a more realistic setting, we embody a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning (RL) framework consists of a state-of-the-art, low-level, reactive control layer, which controls the iCub while respecting constraints, and a high-level curious agent, which explores the iCub's state-action space through information gain maximization, learning a world model from experience, controlling the actual iCub hardware in real-time. To the best of our knowledge, this is the first ever embodied, curious agent for real-time motion planning on a humanoid. We demonstrate that it can learn compact Markov models to represent large regions of the iCub's configuration space, and that the iCub explores intelligently, showing interest in its physical constraints as well as in objects it finds in its environment. PMID:24432001

  5. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning

    National Research Council Canada - National Science Library

    Bowling, Michael

    2000-01-01

    .... In this paper we contribute a comprehensive presentation of the relevant techniques for solving stochastic games from both the game theory community and reinforcement learning communities. We examine the assumptions and limitations of these algorithms, and identify similarities between these algorithms, single agent reinforcement learners, and basic game theory techniques.

  6. Reinforcement function design and bias for efficient learning in mobile robots

    International Nuclear Information System (INIS)

    Touzet, C.; Santos, J.M.

    1998-01-01

    The main paradigm in sub-symbolic learning robot domain is the reinforcement learning method. Various techniques have been developed to deal with the memorization/generalization problem, demonstrating the superior ability of artificial neural network implementations. In this paper, the authors address the issue of designing the reinforcement so as to optimize the exploration part of the learning. They also present and summarize works relative to the use of bias intended to achieve the effective synthesis of the desired behavior. Demonstrative experiments involving a self-organizing map implementation of the Q-learning and real mobile robots (Nomad 200 and Khepera) in a task of obstacle avoidance behavior synthesis are described. 3 figs., 5 tabs

  7. Towards autonomous neuroprosthetic control using Hebbian reinforcement learning.

    Science.gov (United States)

    Mahmoudi, Babak; Pohlmeyer, Eric A; Prins, Noeline W; Geng, Shijia; Sanchez, Justin C

    2013-12-01

    Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.

  8. Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement.

    Science.gov (United States)

    Fernández, Thalía; Bosch-Bayard, Jorge; Harmony, Thalía; Caballero, María I; Díaz-Comas, Lourdes; Galán, Lídice; Ricardo-Garcell, Josefina; Aubert, Eduardo; Otero-Ojeda, Gloria

    2016-03-01

    Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer.

  9. Intranasal oxytocin enhances socially-reinforced learning in rhesus monkeys

    Directory of Open Access Journals (Sweden)

    Lisa A Parr

    2014-09-01

    Full Text Available There are currently no drugs approved for the treatment of social deficits associated with autism spectrum disorders (ASD. One hypothesis for these deficits is that individuals with ASD lack the motivation to attend to social cues because those cues are not implicitly rewarding. Therefore, any drug that could enhance the rewarding quality of social stimuli could have a profound impact on the treatment of ASD, and other social disorders. Oxytocin (OT is a neuropeptide that has been effective in enhancing social cognition and social reward in humans. The present study examined the ability of OT to selectively enhance learning after social compared to nonsocial reward in rhesus monkeys, an important species for modeling the neurobiology of social behavior in humans. Monkeys were required to learn an implicit visual matching task after receiving either intranasal (IN OT or Placebo (saline. Correct trials were rewarded with the presentation of positive and negative social (play faces/threat faces or nonsocial (banana/cage locks stimuli, plus food. Incorrect trials were not rewarded. Results demonstrated a strong effect of socially-reinforced learning, monkeys’ performed significantly better when reinforced with social versus nonsocial stimuli. Additionally, socially-reinforced learning was significantly better and occurred faster after IN-OT compared to placebo treatment. Performance in the IN-OT, but not Placebo, condition was also significantly better when the reinforcement stimuli were emotionally positive compared to negative facial expressions. These data support the hypothesis that OT may function to enhance prosocial behavior in primates by increasing the rewarding quality of emotionally positive, social compared to emotionally negative or nonsocial images. These data also support the use of the rhesus monkey as a model for exploring the neurobiological basis of social behavior and its impairment.

  10. applying reinforcement learning to the weapon assignment problem

    African Journals Online (AJOL)

    ismith

    Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy ..... closest to the threat should fire (that weapon also had the highest probability to ... Monte Carlo ..... “Reinforcement learning: Theory, methods and application to.

  11. Reinforcement Learning for Ramp Control: An Analysis of Learning Parameters

    Directory of Open Access Journals (Sweden)

    Chao Lu

    2016-08-01

    Full Text Available Reinforcement Learning (RL has been proposed to deal with ramp control problems under dynamic traffic conditions; however, there is a lack of sufficient research on the behaviour and impacts of different learning parameters. This paper describes a ramp control agent based on the RL mechanism and thoroughly analyzed the influence of three learning parameters; namely, learning rate, discount rate and action selection parameter on the algorithm performance. Two indices for the learning speed and convergence stability were used to measure the algorithm performance, based on which a series of simulation-based experiments were designed and conducted by using a macroscopic traffic flow model. Simulation results showed that, compared with the discount rate, the learning rate and action selection parameter made more remarkable impacts on the algorithm performance. Based on the analysis, some suggestionsabout how to select suitable parameter values that can achieve a superior performance were provided.

  12. Applying reinforcement learning to the weapon assignment problem in air defence

    CSIR Research Space (South Africa)

    Mouton, H

    2011-12-01

    Full Text Available . The techniques investigated in this article were two methods from the machine-learning subfield of reinforcement learning (RL), namely a Monte Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy temporal-difference (TD) learning...

  13. The combination of appetitive and aversive reinforcers and the nature of their interaction during auditory learning.

    Science.gov (United States)

    Ilango, A; Wetzel, W; Scheich, H; Ohl, F W

    2010-03-31

    Learned changes in behavior can be elicited by either appetitive or aversive reinforcers. It is, however, not clear whether the two types of motivation, (approaching appetitive stimuli and avoiding aversive stimuli) drive learning in the same or different ways, nor is their interaction understood in situations where the two types are combined in a single experiment. To investigate this question we have developed a novel learning paradigm for Mongolian gerbils, which not only allows rewards and punishments to be presented in isolation or in combination with each other, but also can use these opposite reinforcers to drive the same learned behavior. Specifically, we studied learning of tone-conditioned hurdle crossing in a shuttle box driven by either an appetitive reinforcer (brain stimulation reward) or an aversive reinforcer (electrical footshock), or by a combination of both. Combination of the two reinforcers potentiated speed of acquisition, led to maximum possible performance, and delayed extinction as compared to either reinforcer alone. Additional experiments, using partial reinforcement protocols and experiments in which one of the reinforcers was omitted after the animals had been previously trained with the combination of both reinforcers, indicated that appetitive and aversive reinforcers operated together but acted in different ways: in this particular experimental context, punishment appeared to be more effective for initial acquisition and reward more effective to maintain a high level of conditioned responses (CRs). The results imply that learning mechanisms in problem solving were maximally effective when the initial punishment of mistakes was combined with the subsequent rewarding of correct performance. Copyright 2010 IBRO. Published by Elsevier Ltd. All rights reserved.

  14. Continuous theta-burst stimulation (cTBS) over the lateral prefrontal cortex alters reinforcement learning bias.

    Science.gov (United States)

    Ott, Derek V M; Ullsperger, Markus; Jocham, Gerhard; Neumann, Jane; Klein, Tilmann A

    2011-07-15

    The prefrontal cortex is known to play a key role in higher-order cognitive functions. Recently, we showed that this brain region is active in reinforcement learning, during which subjects constantly have to integrate trial outcomes in order to optimize performance. To further elucidate the role of the dorsolateral prefrontal cortex (DLPFC) in reinforcement learning, we applied continuous theta-burst stimulation (cTBS) either to the left or right DLPFC, or to the vertex as a control region, respectively, prior to the performance of a probabilistic learning task in an fMRI environment. While there was no influence of cTBS on learning performance per se, we observed a stimulation-dependent modulation of reward vs. punishment sensitivity: Left-hemispherical DLPFC stimulation led to a more reward-guided performance, while right-hemispherical cTBS induced a more avoidance-guided behavior. FMRI results showed enhanced prediction error coding in the ventral striatum in subjects stimulated over the left as compared to the right DLPFC. Both behavioral and imaging results are in line with recent findings that left, but not right-hemispherical stimulation can trigger a release of dopamine in the ventral striatum, which has been suggested to increase the relative impact of rewards rather than punishment on behavior. Copyright © 2011 Elsevier Inc. All rights reserved.

  15. Reinforcement and Systemic Machine Learning for Decision Making

    CERN Document Server

    Kulkarni, Parag

    2012-01-01

    Reinforcement and Systemic Machine Learning for Decision Making There are always difficulties in making machines that learn from experience. Complete information is not always available-or it becomes available in bits and pieces over a period of time. With respect to systemic learning, there is a need to understand the impact of decisions and actions on a system over that period of time. This book takes a holistic approach to addressing that need and presents a new paradigm-creating new learning applications and, ultimately, more intelligent machines. The first book of its kind in this new an

  16. Bi-directional effect of increasing doses of baclofen on reinforcement learning

    Directory of Open Access Journals (Sweden)

    Jean eTerrier

    2011-07-01

    Full Text Available In rodents as well as in humans, efficient reinforcement learning depends on dopamine (DA released from ventral tegmental area (VTA neurons. It has been shown that in brain slices of mice, GABAB-receptor agonists at low concentrations increase the firing frequency of VTA-DA neurons, while high concentrations reduce the firing frequency. It remains however elusive whether baclofen can modulate reinforcement learning. Here, in a double blind study in 34 healthy human volunteers, we tested the effects of a low and a high concentration of oral baclofen in a gambling task associated with monetary reward. A low (20 mg dose of baclofen increased the efficiency of reward-associated learning but had no effect on the avoidance of monetary loss. A high (50 mg dose of baclofen on the other hand did not affect the learning curve. At the end of the task, subjects who received 20 mg baclofen p.o. were more accurate in choosing the symbol linked to the highest probability of earning money compared to the control group (89.55±1.39% vs 81.07±1.55%, p=0.002. Our results support a model where baclofen, at low concentrations, causes a disinhibition of DA neurons, increases DA levels and thus facilitates reinforcement learning.

  17. Traffic light control by multiagent reinforcement learning systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.; Groen, F.C.A.; Babuška, R.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  18. Traffic Light Control by Multiagent Reinforcement Learning Systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.J.H.M.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  19. A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control

    Science.gov (United States)

    Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu

    This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.

  20. Deep reinforcement learning for automated radiation adaptation in lung cancer.

    Science.gov (United States)

    Tseng, Huan-Hsin; Luo, Yi; Cui, Sunan; Chien, Jen-Tzung; Ten Haken, Randall K; Naqa, Issam El

    2017-12-01

    To investigate deep reinforcement learning (DRL) based on historical treatment plans for developing automated radiation adaptation protocols for nonsmall cell lung cancer (NSCLC) patients that aim to maximize tumor local control at reduced rates of radiation pneumonitis grade 2 (RP2). In a retrospective population of 114 NSCLC patients who received radiotherapy, a three-component neural networks framework was developed for deep reinforcement learning (DRL) of dose fractionation adaptation. Large-scale patient characteristics included clinical, genetic, and imaging radiomics features in addition to tumor and lung dosimetric variables. First, a generative adversarial network (GAN) was employed to learn patient population characteristics necessary for DRL training from a relatively limited sample size. Second, a radiotherapy artificial environment (RAE) was reconstructed by a deep neural network (DNN) utilizing both original and synthetic data (by GAN) to estimate the transition probabilities for adaptation of personalized radiotherapy patients' treatment courses. Third, a deep Q-network (DQN) was applied to the RAE for choosing the optimal dose in a response-adapted treatment setting. This multicomponent reinforcement learning approach was benchmarked against real clinical decisions that were applied in an adaptive dose escalation clinical protocol. In which, 34 patients were treated based on avid PET signal in the tumor and constrained by a 17.2% normal tissue complication probability (NTCP) limit for RP2. The uncomplicated cure probability (P+) was used as a baseline reward function in the DRL. Taking our adaptive dose escalation protocol as a blueprint for the proposed DRL (GAN + RAE + DQN) architecture, we obtained an automated dose adaptation estimate for use at ∼2/3 of the way into the radiotherapy treatment course. By letting the DQN component freely control the estimated adaptive dose per fraction (ranging from 1-5 Gy), the DRL automatically favored dose

  1. Service-life prediction of reinforced concrete structures in subsurface environment

    Energy Technology Data Exchange (ETDEWEB)

    Kwon, Ki Jung; Jung, Hae Ryong; Park, Joo Wan [Korea Radioactive Waste Agency, Daejeon (Korea, Republic of)

    2016-03-15

    This paper focuses on the estimation of durability and service-life of reinforced concrete structures in Wolsong Low- and intermediate-level wastes Disposal Center (WLDC) in Korea. There are six disposal silos located in the saturated environment. The silo concrete is degraded due to reactions with groundwater and chemical attacks, and finally it will lose its properties as a transport barrier. The infiltration of sulfate and magnesium, leaching of potassium hydroxide, and chlorine induced corrosion are the most significant factors for degradation of reinforced concrete structure in underground environment. From the result of evaluation of the degradation time for each factor, the degradation rate of the reinforced concrete due to sulfate and magnesium is 1.308×10{sup -3} cm/yr, and it is estimated to take 48,000 years for full degradation while potassium hydroxide is leached in depth of less than 1.5 cm at 1,000 years after the initiation of degradation. In case of chlorine induced corrosion, it takes 1,648 years to initiate corrosion in the main reinforced bar and 2,288 years to reach the lifetime limit of the structural integrity, and thus it is evaluated as the most significant factor.

  2. TEXPLORE temporal difference reinforcement learning for robots and time-constrained domains

    CERN Document Server

    Hester, Todd

    2013-01-01

    This book presents and develops new reinforcement learning methods that enable fast and robust learning on robots in real-time. Robots have the potential to solve many problems in society, because of their ability to work in dangerous places doing necessary jobs that no one wants or is able to do. One barrier to their widespread deployment is that they are mainly limited to tasks where it is possible to hand-program behaviors for every situation that may be encountered. For robots to meet their potential, they need methods that enable them to learn and adapt to novel situations that they were not programmed for. Reinforcement learning (RL) is a paradigm for learning sequential decision making processes and could solve the problems of learning and adaptation on robots. This book identifies four key challenges that must be addressed for an RL algorithm to be practical for robotic control tasks. These RL for Robotics Challenges are: 1) it must learn in very few samples; 2) it must learn in domains with continuou...

  3. Perceptual learning rules based on reinforcers and attention

    NARCIS (Netherlands)

    Roelfsema, Pieter R.; van Ooyen, Arjen; Watanabe, Takeo

    2010-01-01

    How does the brain learn those visual features that are relevant for behavior? In this article, we focus on two factors that guide plasticity of visual representations. First, reinforcers cause the global release of diffusive neuromodulatory signals that gate plasticity. Second, attentional feedback

  4. Optimizing microstimulation using a reinforcement learning framework.

    Science.gov (United States)

    Brockmeier, Austin J; Choi, John S; Distasio, Marcello M; Francis, Joseph T; Príncipe, José C

    2011-01-01

    The ability to provide sensory feedback is desired to enhance the functionality of neuroprosthetics. Somatosensory feedback provides closed-loop control to the motor system, which is lacking in feedforward neuroprosthetics. In the case of existing somatosensory function, a template of the natural response can be used as a template of desired response elicited by electrical microstimulation. In the case of no initial training data, microstimulation parameters that produce responses close to the template must be selected in an online manner. We propose using reinforcement learning as a framework to balance the exploration of the parameter space and the continued selection of promising parameters for further stimulation. This approach avoids an explicit model of the neural response from stimulation. We explore a preliminary architecture--treating the task as a k-armed bandit--using offline data recorded for natural touch and thalamic microstimulation, and we examine the methods efficiency in exploring the parameter space while concentrating on promising parameter forms. The best matching stimulation parameters, from k = 68 different forms, are selected by the reinforcement learning algorithm consistently after 334 realizations.

  5. Experiments with Online Reinforcement Learning in Real-Time Strategy Games

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-time strategy (RTS) games provide a challenging platform to implement online reinforcement learning (RL) techniques in a real application. Computer, as one game player, monitors opponents' (human or other computers) strategies and then updates its own policy using RL methods. In this article......, we first examine the suitability of applying the online RL in various computer games. Reinforcement learning application depends on both RL complexity and the game features. We then propose a multi-layer framework for implementing online RL in an RTS game. The framework significantly reduces RL...... the effectiveness of our proposed framework and shed light on relevant issues in using online RL in RTS games....

  6. Temporal Memory Reinforcement Learning for the Autonomous Micro-mobile Robot Based-behavior

    Institute of Scientific and Technical Information of China (English)

    Yang Yujun(杨玉君); Cheng Junshi; Chen Jiapin; Li Xiaohai

    2004-01-01

    This paper presents temporal memory reinforcement learning for the autonomous micro-mobile robot based-behavior. Human being has a memory oblivion process, i.e. the earlier to memorize, the earlier to forget, only the repeated thing can be remembered firmly. Enlightening forms this, and the robot need not memorize all the past states, at the same time economizes the EMS memory space, which is not enough in the MPU of our AMRobot. The proposed algorithm is an extension of the Q-learning, which is an incremental reinforcement learning method. The results of simulation have shown that the algorithm is valid.

  7. Challenges in the Verification of Reinforcement Learning Algorithms

    Science.gov (United States)

    Van Wesel, Perry; Goodloe, Alwyn E.

    2017-01-01

    Machine learning (ML) is increasingly being applied to a wide array of domains from search engines to autonomous vehicles. These algorithms, however, are notoriously complex and hard to verify. This work looks at the assumptions underlying machine learning algorithms as well as some of the challenges in trying to verify ML algorithms. Furthermore, we focus on the specific challenges of verifying reinforcement learning algorithms. These are highlighted using a specific example. Ultimately, we do not offer a solution to the complex problem of ML verification, but point out possible approaches for verification and interesting research opportunities.

  8. Scheduled power tracking control of the wind-storage hybrid system based on the reinforcement learning theory

    Science.gov (United States)

    Li, Ze

    2017-09-01

    In allusion to the intermittency and uncertainty of the wind electricity, energy storage and wind generator are combined into a hybrid system to improve the controllability of the output power. A scheduled power tracking control method is proposed based on the reinforcement learning theory and Q-learning algorithm. In this method, the state space of the environment is formed with two key factors, i.e. the state of charge of the energy storage and the difference value between the actual wind power and scheduled power, the feasible action is the output power of the energy storage, and the corresponding immediate rewarding function is designed to reflect the rationality of the control action. By interacting with the environment and learning from the immediate reward, the optimal control strategy is gradually formed. After that, it could be applied to the scheduled power tracking control of the hybrid system. Finally, the rationality and validity of the method are verified through simulation examples.

  9. Effect of hot-dry environment on fiber-reinforced self-compacting concrete

    Science.gov (United States)

    Tioua, Tahar; Kriker, Abdelouahed; Salhi, Aimad; Barluenga, Gonzalo

    2016-07-01

    Drying shrinkage can be a major reason for the deterioration of concrete structures. Variation in ambient temperature and relative humidity cause changes in the properties of hardened concrete which can affect their mechanical and drying shrinkage characteristics. The present study investigated mechanical strength and particularly drying shrinkage properties of self-compacting concretes (SCC) reinforced with date palm fiber exposed to hot and dry environment. In this study a total of nine different fibers reinforced self compacting concrete (FRSCC) mixtures and one mixture without fiber were prepared. The volume fraction and the length of fibers reinforcement were 0.1-0.2-0.3% and 10-20-30 mm. It was observed that drying shrinkage lessened with adding low volumetric fraction and short length of fibers in curing condition (T = 20 °C and RH = 50 ± 5 %), but increased in hot and dry environment.

  10. Social Learning, Reinforcement and Crime: Evidence from Three European Cities

    Science.gov (United States)

    Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina

    2012-01-01

    This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…

  11. Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning.

    Science.gov (United States)

    Mkrtchian, Anahit; Aylward, Jessica; Dayan, Peter; Roiser, Jonathan P; Robinson, Oliver J

    2017-10-01

    Serious and debilitating symptoms of anxiety are the most common mental health problem worldwide, accounting for around 5% of all adult years lived with disability in the developed world. Avoidance behavior-avoiding social situations for fear of embarrassment, for instance-is a core feature of such anxiety. However, as for many other psychiatric symptoms the biological mechanisms underlying avoidance remain unclear. Reinforcement learning models provide formal and testable characterizations of the mechanisms of decision making; here, we examine avoidance in these terms. A total of 101 healthy participants and individuals with mood and anxiety disorders completed an approach-avoidance go/no-go task under stress induced by threat of unpredictable shock. We show an increased reliance in the mood and anxiety group on a parameter of our reinforcement learning model that characterizes a prepotent (pavlovian) bias to withhold responding in the face of negative outcomes. This was particularly the case when the mood and anxiety group was under stress. This formal description of avoidance within the reinforcement learning framework provides a new means of linking clinical symptoms with biophysically plausible models of neural circuitry and, as such, takes us closer to a mechanistic understanding of mood and anxiety disorders. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  12. Reinforcement Learning for Online Control of Evolutionary Algorithms

    NARCIS (Netherlands)

    Eiben, A.; Horvath, Mark; Kowalczyk, Wojtek; Schut, Martijn

    2007-01-01

    The research reported in this paper is concerned with assessing the usefulness of reinforcment learning (RL) for on-line calibration of parameters in evolutionary algorithms (EA). We are running an RL procedure and the EA simultaneously and the RL is changing the EA parameters on-the-fly. We

  13. Video Demo: Deep Reinforcement Learning for Coordination in Traffic Light Control

    NARCIS (Netherlands)

    van der Pol, E.; Oliehoek, F.A.; Bosse, T.; Bredeweg, B.

    2016-01-01

    This video demonstration contrasts two approaches to coordination in traffic light control using reinforcement learning: earlier work, based on a deconstruction of the state space into a linear combination of vehicle states, and our own approach based on the Deep Q-learning algorithm.

  14. Simulation-based optimization parametric optimization techniques and reinforcement learning

    CERN Document Server

    Gosavi, Abhijit

    2003-01-01

    Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning introduces the evolving area of simulation-based optimization. The book's objective is two-fold: (1) It examines the mathematical governing principles of simulation-based optimization, thereby providing the reader with the ability to model relevant real-life problems using these techniques. (2) It outlines the computational technology underlying these methods. Taken together these two aspects demonstrate that the mathematical and computational methods discussed in this book do work. Broadly speaking, the book has two parts: (1) parametric (static) optimization and (2) control (dynamic) optimization. Some of the book's special features are: *An accessible introduction to reinforcement learning and parametric-optimization techniques. *A step-by-step description of several algorithms of simulation-based optimization. *A clear and simple introduction to the methodology of neural networks. *A gentle introduction to converg...

  15. Perception-based Co-evolutionary Reinforcement Learning for UAV Sensor Allocation

    National Research Council Canada - National Science Library

    Berenji, Hamid

    2003-01-01

    .... A Perception-based reasoning approach based on co-evolutionary reinforcement learning was developed for jointly addressing sensor allocation on each individual UAV and allocation of a team of UAVs...

  16. Reinforcement learning on slow features of high-dimensional input streams.

    Directory of Open Access Journals (Sweden)

    Robert Legenstein

    Full Text Available Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning.

  17. Online constrained model-based reinforcement learning

    CSIR Research Space (South Africa)

    Van Niekerk, B

    2017-08-01

    Full Text Available Constrained Model-based Reinforcement Learning Benjamin van Niekerk School of Computer Science University of the Witwatersrand South Africa Andreas Damianou∗ Amazon.com Cambridge, UK Benjamin Rosman Council for Scientific and Industrial Research, and School... MULTIPLE SHOOTING Using direct multiple shooting (Bock and Plitt, 1984), problem (1) can be transformed into a structured non- linear program (NLP). First, the time horizon [t0, t0 + T ] is partitioned into N equal subintervals [tk, tk+1] for k = 0...

  18. Reinforcement learning account of network reciprocity.

    Science.gov (United States)

    Ezaki, Takahiro; Masuda, Naoki

    2017-01-01

    Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model) approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results.

  19. Fuzzy OLAP association rules mining-based modular reinforcement learning approach for multiagent systems.

    Science.gov (United States)

    Kaya, Mehmet; Alhajj, Reda

    2005-04-01

    Multiagent systems and data mining have recently attracted considerable attention in the field of computing. Reinforcement learning is the most commonly used learning process for multiagent systems. However, it still has some drawbacks, including modeling other learning agents present in the domain as part of the state of the environment, and some states are experienced much less than others, or some state-action pairs are never visited during the learning phase. Further, before completing the learning process, an agent cannot exhibit a certain behavior in some states that may be experienced sufficiently. In this study, we propose a novel multiagent learning approach to handle these problems. Our approach is based on utilizing the mining process for modular cooperative learning systems. It incorporates fuzziness and online analytical processing (OLAP) based mining to effectively process the information reported by agents. First, we describe a fuzzy data cube OLAP architecture which facilitates effective storage and processing of the state information reported by agents. This way, the action of the other agent, not even in the visual environment. of the agent under consideration, can simply be predicted by extracting online association rules, a well-known data mining technique, from the constructed data cube. Second, we present a new action selection model, which is also based on association rules mining. Finally, we generalize not sufficiently experienced states, by mining multilevel association rules from the proposed fuzzy data cube. Experimental results obtained on two different versions of a well-known pursuit domain show the robustness and effectiveness of the proposed fuzzy OLAP mining based modular learning approach. Finally, we tested the scalability of the approach presented in this paper and compared it with our previous work on modular-fuzzy Q-learning and ordinary Q-learning.

  20. Decision Making in Reinforcement Learning Using a Modified Learning Space Based on the Importance of Sensors

    Directory of Open Access Journals (Sweden)

    Yasutaka Kishima

    2013-01-01

    Full Text Available Many studies have been conducted on the application of reinforcement learning (RL to robots. A robot which is made for general purpose has redundant sensors or actuators because it is difficult to assume an environment that the robot will face and a task that the robot must execute. In this case, -space on RL contains redundancy so that the robot must take much time to learn a given task. In this study, we focus on the importance of sensors with regard to a robot’s performance of a particular task. The sensors that are applicable to a task differ according to the task. By using the importance of the sensors, we try to adjust the state number of the sensors and to reduce the size of -space. In this paper, we define the measure of importance of a sensor for a task with the correlation between the value of each sensor and reward. A robot calculates the importance of the sensors and makes the size of -space smaller. We propose the method which reduces learning space and construct the learning system by putting it in RL. In this paper, we confirm the effectiveness of our proposed system with an experimental robot.

  1. Distributed Economic Dispatch in Microgrids Based on Cooperative Reinforcement Learning.

    Science.gov (United States)

    Liu, Weirong; Zhuang, Peng; Liang, Hao; Peng, Jun; Huang, Zhiwu; Weirong Liu; Peng Zhuang; Hao Liang; Jun Peng; Zhiwu Huang; Liu, Weirong; Liang, Hao; Peng, Jun; Zhuang, Peng; Huang, Zhiwu

    2018-06-01

    Microgrids incorporated with distributed generation (DG) units and energy storage (ES) devices are expected to play more and more important roles in the future power systems. Yet, achieving efficient distributed economic dispatch in microgrids is a challenging issue due to the randomness and nonlinear characteristics of DG units and loads. This paper proposes a cooperative reinforcement learning algorithm for distributed economic dispatch in microgrids. Utilizing the learning algorithm can avoid the difficulty of stochastic modeling and high computational complexity. In the cooperative reinforcement learning algorithm, the function approximation is leveraged to deal with the large and continuous state spaces. And a diffusion strategy is incorporated to coordinate the actions of DG units and ES devices. Based on the proposed algorithm, each node in microgrids only needs to communicate with its local neighbors, without relying on any centralized controllers. Algorithm convergence is analyzed, and simulations based on real-world meteorological and load data are conducted to validate the performance of the proposed algorithm.

  2. The Integration of Personal Learning Environments & Open Network Learning Environments

    Science.gov (United States)

    Tu, Chih-Hsiung; Sujo-Montes, Laura; Yen, Cherng-Jyh; Chan, Junn-Yih; Blocher, Michael

    2012-01-01

    Learning management systems traditionally provide structures to guide online learners to achieve their learning goals. Web 2.0 technology empowers learners to create, share, and organize their personal learning environments in open network environments; and allows learners to engage in social networking and collaborating activities. Advanced…

  3. A Neuro-Control Design Based on Fuzzy Reinforcement Learning

    DEFF Research Database (Denmark)

    Katebi, S.D.; Blanke, M.

    This paper describes a neuro-control fuzzy critic design procedure based on reinforcement learning. An important component of the proposed intelligent control configuration is the fuzzy credit assignment unit which acts as a critic, and through fuzzy implications provides adjustment mechanisms....... The fuzzy credit assignment unit comprises a fuzzy system with the appropriate fuzzification, knowledge base and defuzzification components. When an external reinforcement signal (a failure signal) is received, sequences of control actions are evaluated and modified by the action applier unit. The desirable...... ones instruct the neuro-control unit to adjust its weights and are simultaneously stored in the memory unit during the training phase. In response to the internal reinforcement signal (set point threshold deviation), the stored information is retrieved by the action applier unit and utilized for re...

  4. Energy Management Strategy for a Hybrid Electric Vehicle Based on Deep Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Yue Hu

    2018-01-01

    Full Text Available An energy management strategy (EMS is important for hybrid electric vehicles (HEVs since it plays a decisive role on the performance of the vehicle. However, the variation of future driving conditions deeply influences the effectiveness of the EMS. Most existing EMS methods simply follow predefined rules that are not adaptive to different driving conditions online. Therefore, it is useful that the EMS can learn from the environment or driving cycle. In this paper, a deep reinforcement learning (DRL-based EMS is designed such that it can learn to select actions directly from the states without any prediction or predefined rules. Furthermore, a DRL-based online learning architecture is presented. It is significant for applying the DRL algorithm in HEV energy management under different driving conditions. Simulation experiments have been conducted using MATLAB and Advanced Vehicle Simulator (ADVISOR co-simulation. Experimental results validate the effectiveness of the DRL-based EMS compared with the rule-based EMS in terms of fuel economy. The online learning architecture is also proved to be effective. The proposed method ensures the optimality, as well as real-time applicability, in HEVs.

  5. Emotion in reinforcement learning agents and robots : A survey

    NARCIS (Netherlands)

    Moerland, T.M.; Broekens, D.J.; Jonker, C.M.

    2018-01-01

    This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action

  6. Reinforcement learning account of network reciprocity.

    Directory of Open Access Journals (Sweden)

    Takahiro Ezaki

    Full Text Available Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results.

  7. Mechanical and Electrochemical Performance of Carbon Fiber Reinforced Polymer in Oxygen Evolution Environment

    Directory of Open Access Journals (Sweden)

    Ji-Hua Zhu

    2016-11-01

    Full Text Available Carbon fiber-reinforced polymer (CFRP is recognized as a promising anode material to prevent steel corrosion in reinforced concrete. However, the electrochemical performance of CFRP itself is unclear. This paper focuses on the understanding of electrochemical and mechanical properties of CFRP in an oxygen evolution environment by conducting accelerated polarization tests. Different amounts of current density were applied in polarization tests with various test durations, and feeding voltage and potential were measured. Afterwards, tensile tests were carried out to investigate the failure modes for the post-polarization CFRP specimens. Results show that CFRP specimens had two typical tensile-failure modes and had a stable anodic performance in an oxygen evolution environment. As such, CFRP can be potentially used as an anode material for impressed current cathodic protection (ICCP of reinforced concrete structures, besides the fact that CFRP can strengthen the structural properties of reinforced concrete.

  8. Optimal Control via Reinforcement Learning with Symbolic Policy Approximation

    NARCIS (Netherlands)

    Kubalìk, Jiřì; Alibekov, Eduard; Babuska, R.; Dochain, Denis; Henrion, Didier; Peaucelle, Dimitri

    2017-01-01

    Model-based reinforcement learning (RL) algorithms can be used to derive optimal control laws for nonlinear dynamic systems. With continuous-valued state and input variables, RL algorithms have to rely on function approximators to represent the value function and policy mappings. This paper

  9. Learning Similar Actions by Reinforcement or Sensory-Prediction Errors Rely on Distinct Physiological Mechanisms.

    Science.gov (United States)

    Uehara, Shintaro; Mawase, Firas; Celnik, Pablo

    2017-09-14

    Humans can acquire knowledge of new motor behavior via different forms of learning. The two forms most commonly studied have been the development of internal models based on sensory-prediction errors (error-based learning) and success-based feedback (reinforcement learning). Human behavioral studies suggest these are distinct learning processes, though the neurophysiological mechanisms that are involved have not been characterized. Here, we evaluated physiological markers from the cerebellum and the primary motor cortex (M1) using noninvasive brain stimulations while healthy participants trained finger-reaching tasks. We manipulated the extent to which subjects rely on error-based or reinforcement by providing either vector or binary feedback about task performance. Our results demonstrated a double dissociation where learning the task mainly via error-based mechanisms leads to cerebellar plasticity modifications but not long-term potentiation (LTP)-like plasticity changes in M1; while learning a similar action via reinforcement mechanisms elicited M1 LTP-like plasticity but not cerebellar plasticity changes. Our findings indicate that learning complex motor behavior is mediated by the interplay of different forms of learning, weighing distinct neural mechanisms in M1 and the cerebellum. Our study provides insights for designing effective interventions to enhance human motor learning. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Optimal control in microgrid using multi-agent reinforcement learning.

    Science.gov (United States)

    Li, Fu-Dong; Wu, Min; He, Yong; Chen, Xin

    2012-11-01

    This paper presents an improved reinforcement learning method to minimize electricity costs on the premise of satisfying the power balance and generation limit of units in a microgrid with grid-connected mode. Firstly, the microgrid control requirements are analyzed and the objective function of optimal control for microgrid is proposed. Then, a state variable "Average Electricity Price Trend" which is used to express the most possible transitions of the system is developed so as to reduce the complexity and randomicity of the microgrid, and a multi-agent architecture including agents, state variables, action variables and reward function is formulated. Furthermore, dynamic hierarchical reinforcement learning, based on change rate of key state variable, is established to carry out optimal policy exploration. The analysis shows that the proposed method is beneficial to handle the problem of "curse of dimensionality" and speed up learning in the unknown large-scale world. Finally, the simulation results under JADE (Java Agent Development Framework) demonstrate the validity of the presented method in optimal control for a microgrid with grid-connected mode. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  11. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    Science.gov (United States)

    Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane

    2017-01-01

    Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning. PMID:29163004

  12. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Lucas Kastner

    2017-10-01

    Full Text Available Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1 Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2 Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3 Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.

  13. Reinforced Ultra-Tightly Coupled GPS/INS System for Challenging Environment

    Directory of Open Access Journals (Sweden)

    Xueyun Wang

    2014-01-01

    Full Text Available Among all integration levels currently available for Global Positioning System (GPS and Inertial Navigation System (INS Integrated System, ultra-tightly coupled (UTC GPS/INS system is the best choice for accurate and reliable navigation. Nevertheless the performance of UTC GPS/INS system degrades in challenging environments, such as jamming, changing noise of GPS signals, and high dynamic maneuvers. When low-end Inertial Measurement Units (IMUs based on MEMS sensors are employed, the performance degradation will be more severe. To solve this problem, a reinforced UTC GPS/INS system is proposed. Two techniques are adopted to deal with jamming and high dynamics. Firstly, adaptive integration Kalman filter (IKF based on fuzzy logics is developed to reinforce the antijamming ability. The parameters of membership functions (MFs are adjusted and optimized through self-developed neutral network. Secondly, a Doppler frequency error estimator based on Kalman filter is designed to improve the navigation performance under high dynamics. A complete simulation platform is established to evaluate the reinforced system. Results demonstrate that the proposed system architecture significantly improves navigation performance in challenging environments and it is a more advanced solution to accurate and reliable navigation than traditional UTC GPS/INS system.

  14. Integrating distributed Bayesian inference and reinforcement learning for sensor management

    NARCIS (Netherlands)

    Grappiolo, C.; Whiteson, S.; Pavlin, G.; Bakker, B.

    2009-01-01

    This paper introduces a sensor management approach that integrates distributed Bayesian inference (DBI) and reinforcement learning (RL). DBI is implemented using distributed perception networks (DPNs), a multiagent approach to performing efficient inference, while RL is used to automatically

  15. Learning User Preferences in Ubiquitous Systems: A User Study and a Reinforcement Learning Approach

    OpenAIRE

    Zaidenberg , Sofia; Reignier , Patrick; Mandran , Nadine

    2010-01-01

    International audience; Our study concerns a virtual assistant, proposing services to the user based on its current perceived activity and situation (ambient intelligence). Instead of asking the user to define his preferences, we acquire them automatically using a reinforcement learning approach. Experiments showed that our system succeeded the learning of user preferences. In order to validate the relevance and usability of such a system, we have first conducted a user study. 26 non-expert s...

  16. Fast Conflict Resolution Based on Reinforcement Learning in Multi-agent System

    Institute of Scientific and Technical Information of China (English)

    PIAOSonghao; HONGBingrong; CHUHaitao

    2004-01-01

    In multi-agent system where each agen thas a different goal (even the team of agents has the same goal), agents must be able to resolve conflicts arising in the process of achieving their goal. Many researchers presented methods for conflict resolution, e.g., Reinforcement learning (RL), but the conventional RL requires a large computation cost because every agent must learn, at the same time the overlap of actions selected by each agent results in local conflict. Therefore in this paper, we propose a novel method to solve these problems. In order to deal with the conflict within the multi-agent system, the concept of potential field function based Action selection priority level (ASPL) is brought forward. In this method, all kinds of environment factor that may have influence on the priority are effectively computed with the potential field function. So the priority to access the local resource can be decided rapidly. By avoiding the complex coordination mechanism used in general multi-agent system, the conflict in multi-agent system is settled more efficiently. Our system consists of RL with ASPL module and generalized rules module. Using ASPL, RL module chooses a proper cooperative behavior, and generalized rule module can accelerate the learning process. By applying the proposed method to Robot Soccer, the learning process can be accelerated. The results of simulation and real experiments indicate the effectiveness of the method.

  17. Individual Learner Differences In Web-based Learning Environments: From Cognitive, Affective and Social-cultural Perspectives

    Directory of Open Access Journals (Sweden)

    Mustafa KOC

    2005-10-01

    Full Text Available Individual Learner DifferencesIn Web-based Learning Environments:From Cognitive, Affective and Social-cultural Perspectives Mustafa KOCPh.D Candidate Instructional TechnologyUniversity of Illinois at Urbana-ChampaignUrbana, IL - USA ABSTRACT Throughout the paper, the issues of individual differences in web-based learning, also known as online instruction, online training or distance education were examined and implications for designing distance education were discussed. Although the main purpose was to identify differences in learners’ characteristics such as cognitive, affective, physiological and social factors that affect learning in a web-enhanced environment, the questions of how the web could be used to reinforce learning, what kinds of development ideas, theories and models are currently being used to design and deliver online instruction, and finally what evidence for the effectiveness of using World Wide Web (WWW for learning and instruction has been reported, were also analyzed to extend theoretical and epistemogical understanding of web-based learning.

  18. Pervasive Learning Environments

    DEFF Research Database (Denmark)

    Hundebøl, Jesper; Helms, Niels Henrik

    2006-01-01

    The potentials of pervasive communication in learning within industry and education are right now being explored through different R&D projects. This paper outlines the background for and the possible learning potentials in what we describe as pervasive learning environments (PLE). PLE?s differ...... from virtual learning environments (VLE) primarily because in PLE?s the learning content is very much related to the actual context in which the learner finds himself. Two local (Denmark) cases illustrate various aspects of pervasive learning. One is the eBag, a pervasive digital portfolio used...

  19. Optimal and Autonomous Control Using Reinforcement Learning: A Survey.

    Science.gov (United States)

    Kiumarsi, Bahare; Vamvoudakis, Kyriakos G; Modares, Hamidreza; Lewis, Frank L

    2018-06-01

    This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.

  20. Universal effect of dynamical reinforcement learning mechanism in spatial evolutionary games

    International Nuclear Information System (INIS)

    Zhang, Hai-Feng; Wu, Zhi-Xi; Wang, Bing-Hong

    2012-01-01

    One of the prototypical mechanisms in understanding the ubiquitous cooperation in social dilemma situations is the win–stay, lose–shift rule. In this work, a generalized win–stay, lose–shift learning model—a reinforcement learning model with dynamic aspiration level—is proposed to describe how humans adapt their social behaviors based on their social experiences. In the model, the players incorporate the information of the outcomes in previous rounds with time-dependent aspiration payoffs to regulate the probability of choosing cooperation. By investigating such a reinforcement learning rule in the spatial prisoner's dilemma game and public goods game, a most noteworthy viewpoint is that moderate greediness (i.e. moderate aspiration level) favors best the development and organization of collective cooperation. The generality of this observation is tested against different regulation strengths and different types of network of interaction as well. We also make comparisons with two recently proposed models to highlight the importance of the mechanism of adaptive aspiration level in supporting cooperation in structured populations

  1. The Potential of Sumatran Pine Rosin for Reinforcement-Steel Coating in Wet Environment

    Directory of Open Access Journals (Sweden)

    Rudi Hartono

    2018-01-01

    Full Text Available The corrosion of reinforcement-steel is commonly prevented by applying hydrophobic coating. In this work, the potential of residual product from Sumatran Pine sap distillation, known as Sumatran pine rosin or gondorukem, as a natural and environmentally-friendy resource to coat reinforcement-steel, and an initial assessment on its capability to prevent the corrosion in wet environment were investigated. The experiments were performed using two types of commercially available gondorukem, namely type T and U. The coated reinforcement-steel samples were immersed in collected rainwater and their physical changes were observed periodically for 60 days. The results showed that gondorukem improve the durability of the reinforcement-steel from corrosion in a severe rainwater contact. Keywords:  corrosion, coating, gondorukem, hydrophobic, pine rosin, reinforcement bar

  2. Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain.

    Science.gov (United States)

    Niv, Yael; Edlund, Jeffrey A; Dayan, Peter; O'Doherty, John P

    2012-01-11

    Humans and animals are exquisitely, though idiosyncratically, sensitive to risk or variance in the outcomes of their actions. Economic, psychological, and neural aspects of this are well studied when information about risk is provided explicitly. However, we must normally learn about outcomes from experience, through trial and error. Traditional models of such reinforcement learning focus on learning about the mean reward value of cues and ignore higher order moments such as variance. We used fMRI to test whether the neural correlates of human reinforcement learning are sensitive to experienced risk. Our analysis focused on anatomically delineated regions of a priori interest in the nucleus accumbens, where blood oxygenation level-dependent (BOLD) signals have been suggested as correlating with quantities derived from reinforcement learning. We first provide unbiased evidence that the raw BOLD signal in these regions corresponds closely to a reward prediction error. We then derive from this signal the learned values of cues that predict rewards of equal mean but different variance and show that these values are indeed modulated by experienced risk. Moreover, a close neurometric-psychometric coupling exists between the fluctuations of the experience-based evaluations of risky options that we measured neurally and the fluctuations in behavioral risk aversion. This suggests that risk sensitivity is integral to human learning, illuminating economic models of choice, neuroscientific models of affective learning, and the workings of the underlying neural mechanisms.

  3. Metacognitive components in smart learning environment

    Science.gov (United States)

    Sumadyo, M.; Santoso, H. B.; Sensuse, D. I.

    2018-03-01

    Metacognitive ability in digital-based learning process helps students in achieving learning goals. So that digital-based learning environment should make the metacognitive component as a facility that must be equipped. Smart Learning Environment is the concept of a learning environment that certainly has more advanced components than just a digital learning environment. This study examines the metacognitive component of the smart learning environment to support the learning process. A review of the metacognitive literature was conducted to examine the components involved in metacognitive learning strategies. Review is also conducted on the results of study smart learning environment, ranging from design to context in building smart learning. Metacognitive learning strategies certainly require the support of adaptable, responsive and personalize learning environments in accordance with the principles of smart learning. The current study proposed the role of metacognitive component in smart learning environment, which is useful as the basis of research in building environment in smart learning.

  4. Multichannel sound reinforcement systems at work in a learning environment

    Science.gov (United States)

    Malek, John; Campbell, Colin

    2003-04-01

    Many people have experienced the entertaining benefits of a surround sound system, either in their own home or in a movie theater, but another application exists for multichannel sound that has for the most part gone unused. This is the application of multichannel sound systems to the learning environment. By incorporating a 7.1 surround processor and a touch panel interface programmable control system, the main lecture hall at the University of Michigan Taubman College of Architecture and Urban Planning has been converted from an ordinary lecture hall to a working audiovisual laboratory. The multichannel sound system is used in a wide variety of experiments, including exposure to sounds to test listeners' aural perception of the tonal characteristics of varying pitch, reverberation, speech transmission index, and sound-pressure level. The touch panel's custom interface allows a variety of user groups to control different parts of the AV system and provides preset capability that allows for numerous system configurations.

  5. Multiagent Reinforcement Learning with Regret Matching for Robot Soccer

    Directory of Open Access Journals (Sweden)

    Qiang Liu

    2013-01-01

    Full Text Available This paper proposes a novel multiagent reinforcement learning (MARL algorithm Nash- learning with regret matching, in which regret matching is used to speed up the well-known MARL algorithm Nash- learning. It is critical that choosing a suitable strategy for action selection to harmonize the relation between exploration and exploitation to enhance the ability of online learning for Nash- learning. In Markov Game the joint action of agents adopting regret matching algorithm can converge to a group of points of no-regret that can be viewed as coarse correlated equilibrium which includes Nash equilibrium in essence. It is can be inferred that regret matching can guide exploration of the state-action space so that the rate of convergence of Nash- learning algorithm can be increased. Simulation results on robot soccer validate that compared to original Nash- learning algorithm, the use of regret matching during the learning phase of Nash- learning has excellent ability of online learning and results in significant performance in terms of scores, average reward and policy convergence.

  6. Multiagent Reinforcement Learning Dynamic Spectrum Access in Cognitive Radios

    Directory of Open Access Journals (Sweden)

    Wu Chun

    2014-02-01

    Full Text Available A multiuser independent Q-learning method which does not need information interaction is proposed for multiuser dynamic spectrum accessing in cognitive radios. The method adopts self-learning paradigm, in which each CR user performs reinforcement learning only through observing individual performance reward without spending communication resource on information interaction with others. The reward is defined suitably to present channel quality and channel conflict status. The learning strategy of sufficient exploration, preference for good channel, and punishment for channel conflict is designed to implement multiuser dynamic spectrum accessing. In two users two channels scenario, a fast learning algorithm is proposed and the convergence to maximal whole reward is proved. The simulation results show that, with the proposed method, the CR system can obtain convergence of Nash equilibrium with large probability and achieve great performance of whole reward.

  7. Study on state grouping and opportunity evaluation for reinforcement learning methods; Kyoka gakushuho no tame no jotai grouping to opportunity hyoka ni kansuru kenkyu

    Energy Technology Data Exchange (ETDEWEB)

    Yu, W.; Yokoi, H.; Kakazu, Y. [Hokkaido University, Sapporo (Japan)

    1997-08-20

    In this paper, we propose the State Grouping scheme for coping with the problem of scaling up the Reinforcement Learning Algorithm to real, large size application. The grouping scheme is based on geographical and trial-error information, and is made up with state generating, state combining, state splitting, state forgetting procedures, with corresponding action selecting module and learning module. Also, we discuss the Labeling Based Evaluation scheme which can evaluate the opportunity of the state-action pair, therefore, use better experience to guide the exploration of the state-space effectively. Incorporating the Labeling Based Evaluation and State Grouping scheme into the Reinforcement Learning Algorithm, we get the approach that can generate organized state space for Reinforcement Learning, and do problem solving as well. We argue that the approach with this kind of ability is necessary for autonomous agent, namely, autonomous agent can not act depending on any pre-defined map, instead, it should search the environment as well as find the optimal problem solution autonomously and simultaneously. By solving the large state-size 3-DOF and 4-link manipulator problem, we show the efficiency of the proposed approach, i.e., the agent can achieve the optimal or sub-optimal path with less memory and less time. 14 refs., 11 figs., 3 tabs.

  8. A Review of the Relationship between Novelty, Intrinsic Motivation and Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Siddique Nazmul

    2017-11-01

    Full Text Available This paper presents a review on the tri-partite relationship between novelty, intrinsic motivation and reinforcement learning. The paper first presents a literature survey on novelty and the different computational models of novelty detection, with a specific focus on the features of stimuli that trigger a Hedonic value for generating a novelty signal. It then presents an overview of intrinsic motivation and investigations into different models with the aim of exploring deeper co-relationships between specific features of a novelty signal and its effect on intrinsic motivation in producing a reward function. Finally, it presents survey results on reinforcement learning, different models and their functional relationship with intrinsic motivation.

  9. Dopamine-Dependent Reinforcement of Motor Skill Learning: Evidence from Gilles de la Tourette Syndrome

    Science.gov (United States)

    Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias

    2011-01-01

    Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…

  10. Pervasive Learning Environments

    DEFF Research Database (Denmark)

    Helms, Niels Henrik; Hundebøl, Jesper

    2006-01-01

    The potentials of pervasive communication in learning within industry and education are right know being explored through different R&D projects. This paper outlines the background for and the possible learning potentials in what we describe as pervasive learning environments (PLE). PLE's differ...... from virtual learning environments (VLE) primarily because in PLE's the learning content is very much related to the actual context in which the learner finds himself. Two local (Denmark) cases illustrate various aspects of pervasive learning. One is the eBag, a pervasive digital portfolio used...... in schools. The other is moreover related to work based learning in that it foresees a community of practitioners accessing, sharing and adding to knowledge and learning objects held within a pervasive business intelligence system. Limitations and needed developments of these and other systems are discussed...

  11. Joy, Distress, Hope, and Fear in Reinforcement Learning (Extended Abstract)

    NARCIS (Netherlands)

    Jacobs, E.J.; Broekens, J.; Jonker, C.M.

    2014-01-01

    In this paper we present a mapping between joy, distress, hope and fear, and Reinforcement Learning primitives. Joy / distress is a signal that is derived from the RL update signal, while hope/fear is derived from the utility of the current state. Agent-based simulation experiments replicate

  12. Applications of Deep Learning and Reinforcement Learning to Biological Data.

    Science.gov (United States)

    Mahmud, Mufti; Kaiser, Mohammed Shamim; Hussain, Amir; Vassanelli, Stefano

    2018-06-01

    Rapid advances in hardware-based technologies during the past decades have opened up new possibilities for life scientists to gather multimodal data in various application domains, such as omics, bioimaging, medical imaging, and (brain/body)-machine interfaces. These have generated novel opportunities for development of dedicated data-intensive machine learning techniques. In particular, recent research in deep learning (DL), reinforcement learning (RL), and their combination (deep RL) promise to revolutionize the future of artificial intelligence. The growth in computational power accompanied by faster and increased data storage, and declining computing costs have already allowed scientists in various fields to apply these techniques on data sets that were previously intractable owing to their size and complexity. This paper provides a comprehensive survey on the application of DL, RL, and deep RL techniques in mining biological data. In addition, we compare the performances of DL techniques when applied to different data sets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.

  13. Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults.

    Science.gov (United States)

    Vernetti, Angélina; Smith, Tim J; Senju, Atsushi

    2017-03-15

    While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.

  14. Adaptive Load Balancing of Parallel Applications with Multi-Agent Reinforcement Learning on Heterogeneous Systems

    Directory of Open Access Journals (Sweden)

    Johan Parent

    2004-01-01

    Full Text Available We report on the improvements that can be achieved by applying machine learning techniques, in particular reinforcement learning, for the dynamic load balancing of parallel applications. The applications being considered in this paper are coarse grain data intensive applications. Such applications put high pressure on the interconnect of the hardware. Synchronization and load balancing in complex, heterogeneous networks need fast, flexible, adaptive load balancing algorithms. Viewing a parallel application as a one-state coordination game in the framework of multi-agent reinforcement learning, and by using a recently introduced multi-agent exploration technique, we are able to improve upon the classic job farming approach. The improvements are achieved with limited computation and communication overhead.

  15. Learning alternative movement coordination patterns using reinforcement feedback.

    Science.gov (United States)

    Lin, Tzu-Hsiang; Denomme, Amber; Ranganathan, Rajiv

    2018-05-01

    One of the characteristic features of the human motor system is redundancy-i.e., the ability to achieve a given task outcome using multiple coordination patterns. However, once participants settle on using a specific coordination pattern, the process of learning to use a new alternative coordination pattern to perform the same task is still poorly understood. Here, using two experiments, we examined this process of how participants shift from one coordination pattern to another using different reinforcement schedules. Participants performed a virtual reaching task, where they moved a cursor to different targets positioned on the screen. Our goal was to make participants use a coordination pattern with greater trunk motion, and to this end, we provided reinforcement by making the cursor disappear if the trunk motion during the reach did not cross a specified threshold value. In Experiment 1, we compared two reinforcement schedules in two groups of participants-an abrupt group, where the threshold was introduced immediately at the beginning of practice; and a gradual group, where the threshold was introduced gradually with practice. Results showed that both abrupt and gradual groups were effective in shifting their coordination patterns to involve greater trunk motion, but the abrupt group showed greater retention when the reinforcement was removed. In Experiment 2, we examined the basis of this advantage in the abrupt group using two additional control groups. Results showed that the advantage of the abrupt group was because of a greater number of practice trials with the desired coordination pattern. Overall, these results show that reinforcement can be successfully used to shift coordination patterns, which has potential in the rehabilitation of movement disorders.

  16. Nonparametric bayesian reward segmentation for skill discovery using inverse reinforcement learning

    CSIR Research Space (South Africa)

    Ranchod, P

    2015-10-01

    Full Text Available We present a method for segmenting a set of unstructured demonstration trajectories to discover reusable skills using inverse reinforcement learning (IRL). Each skill is characterised by a latent reward function which the demonstrator is assumed...

  17. Pervasive Learning Environments

    DEFF Research Database (Denmark)

    Hundebøl, Jesper; Helms, Niels Henrik

    in schools. The other is moreover related to work based learning in that it foresees a community of practitioners accessing, sharing and adding to knowledge and learning objects held within a pervasive business intelligence system. Limitations and needed developments of these and other systems are discussed......Abstract: The potentials of pervasive communication in learning within industry and education are right know being explored through different R&D projects. This paper outlines the background for and the possible learning potentials in what we describe as pervasive learning environments (PLE). PLE......'s differ from virtual learning environments (VLE) primarily because in PLE's the learning content is very much related to the actual context in which the learner finds himself. Two local (Denmark) cases illustrate various aspects of pervasive learning. One is the eBag, a pervasive digital portfolio used...

  18. Explicit and implicit reinforcement learning across the psychosis spectrum.

    Science.gov (United States)

    Barch, Deanna M; Carter, Cameron S; Gold, James M; Johnson, Sheri L; Kring, Ann M; MacDonald, Angus W; Pizzagalli, Diego A; Ragland, J Daniel; Silverstein, Steven M; Strauss, Milton E

    2017-07-01

    Motivational and hedonic impairments are core features of a variety of types of psychopathology. An important aspect of motivational function is reinforcement learning (RL), including implicit (i.e., outside of conscious awareness) and explicit (i.e., including explicit representations about potential reward associations) learning, as well as both positive reinforcement (learning about actions that lead to reward) and punishment (learning to avoid actions that lead to loss). Here we present data from paradigms designed to assess both positive and negative components of both implicit and explicit RL, examine performance on each of these tasks among individuals with schizophrenia, schizoaffective disorder, and bipolar disorder with psychosis, and examine their relative relationships to specific symptom domains transdiagnostically. None of the diagnostic groups differed significantly from controls on the implicit RL tasks in either bias toward a rewarded response or bias away from a punished response. However, on the explicit RL task, both the individuals with schizophrenia and schizoaffective disorder performed significantly worse than controls, but the individuals with bipolar did not. Worse performance on the explicit RL task, but not the implicit RL task, was related to worse motivation and pleasure symptoms across all diagnostic categories. Performance on explicit RL, but not implicit RL, was related to working memory, which accounted for some of the diagnostic group differences. However, working memory did not account for the relationship of explicit RL to motivation and pleasure symptoms. These findings suggest transdiagnostic relationships across the spectrum of psychotic disorders between motivation and pleasure impairments and explicit RL. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  19. Intrinsic interactive reinforcement learning - Using error-related potentials for real world human-robot interaction.

    Science.gov (United States)

    Kim, Su Kyoung; Kirchner, Elsa Andrea; Stefes, Arne; Kirchner, Frank

    2017-12-14

    Reinforcement learning (RL) enables robots to learn its optimal behavioral strategy in dynamic environments based on feedback. Explicit human feedback during robot RL is advantageous, since an explicit reward function can be easily adapted. However, it is very demanding and tiresome for a human to continuously and explicitly generate feedback. Therefore, the development of implicit approaches is of high relevance. In this paper, we used an error-related potential (ErrP), an event-related activity in the human electroencephalogram (EEG), as an intrinsically generated implicit feedback (rewards) for RL. Initially we validated our approach with seven subjects in a simulated robot learning scenario. ErrPs were detected online in single trial with a balanced accuracy (bACC) of 91%, which was sufficient to learn to recognize gestures and the correct mapping between human gestures and robot actions in parallel. Finally, we validated our approach in a real robot scenario, in which seven subjects freely chose gestures and the real robot correctly learned the mapping between gestures and actions (ErrP detection (90% bACC)). In this paper, we demonstrated that intrinsically generated EEG-based human feedback in RL can successfully be used to implicitly improve gesture-based robot control during human-robot interaction. We call our approach intrinsic interactive RL.

  20. Enhancing Learning within the 3-D Virtual Learning Environment

    OpenAIRE

    Shirin Shafieiyoun; Akbar Moazen Safaei

    2013-01-01

    Today’s using of virtual learning environments becomes more remarkable in education. The potential of virtual learning environments has frequently been related to the expansion of sense of social presence which is obtained from students and educators. This study investigated the effectiveness of social presence within virtual learning environments and analysed the impact of social presence on increasing learning satisfaction within virtual learning environments. Second Life, as an example of ...

  1. Manufacturing Scheduling Using Colored Petri Nets and Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Maria Drakaki

    2017-02-01

    Full Text Available Agent-based intelligent manufacturing control systems are capable to efficiently respond and adapt to environmental changes. Manufacturing system adaptation and evolution can be addressed with learning mechanisms that increase the intelligence of agents. In this paper a manufacturing scheduling method is presented based on Timed Colored Petri Nets (CTPNs and reinforcement learning (RL. CTPNs model the manufacturing system and implement the scheduling. In the search for an optimal solution a scheduling agent uses RL and in particular the Q-learning algorithm. A warehouse order-picking scheduling is presented as a case study to illustrate the method. The proposed scheduling method is compared to existing methods. Simulation and state space results are used to evaluate performance and identify system properties.

  2. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning.

    Science.gov (United States)

    Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S

    2011-01-01

    As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. © 2011 IEEE

  3. Optimizing Chemical Reactions with Deep Reinforcement Learning.

    Science.gov (United States)

    Zhou, Zhenpeng; Li, Xiaocheng; Zare, Richard N

    2017-12-27

    Deep reinforcement learning was employed to optimize chemical reactions. Our model iteratively records the results of a chemical reaction and chooses new experimental conditions to improve the reaction outcome. This model outperformed a state-of-the-art blackbox optimization algorithm by using 71% fewer steps on both simulations and real reactions. Furthermore, we introduced an efficient exploration strategy by drawing the reaction conditions from certain probability distributions, which resulted in an improvement on regret from 0.062 to 0.039 compared with a deterministic policy. Combining the efficient exploration policy with accelerated microdroplet reactions, optimal reaction conditions were determined in 30 min for the four reactions considered, and a better understanding of the factors that control microdroplet reactions was reached. Moreover, our model showed a better performance after training on reactions with similar or even dissimilar underlying mechanisms, which demonstrates its learning ability.

  4. Stimulating Deep Learning Using Active Learning Techniques

    Science.gov (United States)

    Yew, Tee Meng; Dawood, Fauziah K. P.; a/p S. Narayansany, Kannaki; a/p Palaniappa Manickam, M. Kamala; Jen, Leong Siok; Hoay, Kuan Chin

    2016-01-01

    When students and teachers behave in ways that reinforce learning as a spectator sport, the result can often be a classroom and overall learning environment that is mostly limited to transmission of information and rote learning rather than deep approaches towards meaningful construction and application of knowledge. A group of college instructors…

  5. Designing Learning Resources in Synchronous Learning Environments

    DEFF Research Database (Denmark)

    Christiansen, Rene B

    2015-01-01

    Computer-mediated Communication (CMC) and synchronous learning environments offer new solutions for teachers and students that transcend the singular one-way transmission of content knowledge from teacher to student. CMC makes it possible not only to teach computer mediated but also to design...... and create new learning resources targeted to a specific group of learners. This paper addresses the possibilities of designing learning resources within synchronous learning environments. The empirical basis is a cross-country study involving students and teachers in primary schools in three Nordic...... Countries (Denmark, Sweden and Norway). On the basis of these empirical studies a set of design examples is drawn with the purpose of showing how the design fulfills the dual purpose of functioning as a remote, synchronous learning environment and - using the learning materials used and recordings...

  6. Effective Learning Environments in Relation to Different Learning Theories

    OpenAIRE

    Guney, Ali; Al, Selda

    2012-01-01

    There are diverse learning theories which explain learning processes which are discussed within this paper, through cognitive structure of learning process. Learning environments are usually described in terms of pedagogical philosophy, curriculum design and social climate. There have been only just a few studies about how physical environment is related to learning process. Many researchers generally consider teaching and learning issues as if independent from physical environment, whereas p...

  7. Maximize Producer Rewards in Distributed Windmill Environments: A Q-Learning Approach

    Directory of Open Access Journals (Sweden)

    Bei Li

    2015-03-01

    Full Text Available In Smart Grid environments, homes equipped with windmills are encouraged to generate energy and sell it back to utilities. Time of Use pricing and the introduction of storage devices would greatly influence a user in deciding when to sell back energy and how much to sell. Therefore, a study of sequential decision making algorithms that can optimize the total pay off for the user is necessary. In this paper, reinforcement learning is used to tackle this optimization problem. The problem of determining when to sell back energy is formulated as a Markov decision process and the model is learned adaptively using Q-learning. Experiments are done with varying sizes of storage capacities and under periodic energy generation rates of different levels of fluctuations. The results show a notable increase in discounted total rewards from selling back energy with the proposed approach.

  8. Tank War Using Online Reinforcement Learning

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-Time Strategy(RTS) games provide a challenging platform to implement online reinforcement learning(RL) techniques in a real application. Computer as one player monitors opponents'(human or other computers) strategies and then updates its own policy using RL methods. In this paper, we propose...... a multi-layer framework for implementing the online RL in a RTS game. The framework significantly reduces the RL computational complexity by decomposing the state space in a hierarchical manner. We implement the RTS game - Tank General, and perform a thorough test on the proposed framework. The results...... show the effectiveness of our proposed framework and shed light on relevant issues on using the RL in RTS games....

  9. A Model to Explain the Emergence of Reward Expectancy neurons using Reinforcement Learning and Neural Network

    OpenAIRE

    Shinya, Ishii; Munetaka, Shidara; Katsunari, Shibata

    2006-01-01

    In an experiment of multi-trial task to obtain a reward, reward expectancy neurons,###which responded only in the non-reward trials that are necessary to advance###toward the reward, have been observed in the anterior cingulate cortex of monkeys.###In this paper, to explain the emergence of the reward expectancy neuron in###terms of reinforcement learning theory, a model that consists of a recurrent neural###network trained based on reinforcement learning is proposed. The analysis of the###hi...

  10. FMRQ-A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks.

    Science.gov (United States)

    Zhang, Zhen; Zhao, Dongbin; Gao, Junwei; Wang, Dongqing; Dai, Yujie

    2017-06-01

    In this paper, we propose a multiagent reinforcement learning algorithm dealing with fully cooperative tasks. The algorithm is called frequency of the maximum reward Q-learning (FMRQ). FMRQ aims to achieve one of the optimal Nash equilibria so as to optimize the performance index in multiagent systems. The frequency of obtaining the highest global immediate reward instead of immediate reward is used as the reinforcement signal. With FMRQ each agent does not need the observation of the other agents' actions and only shares its state and reward at each step. We validate FMRQ through case studies of repeated games: four cases of two-player two-action and one case of three-player two-action. It is demonstrated that FMRQ can converge to one of the optimal Nash equilibria in these cases. Moreover, comparison experiments on tasks with multiple states and finite steps are conducted. One is box-pushing and the other one is distributed sensor network problem. Experimental results show that the proposed algorithm outperforms others with higher performance.

  11. GEOMETRIC AND MATERIAL NONLINEAR ANALYSIS OF REINFORCED CONCRETE SLABS AT FIRE ENVIRONMENT

    Directory of Open Access Journals (Sweden)

    Ayad A. Abdul -Razzak

    2013-05-01

    Full Text Available In the present study a nonlinear finite element analysis is presented  to predict the fire resistance of reinforced concrete slabs at fire environment. An eight node layered degenerated shell element utilizing Mindlin/Reissner thick plate theory is employed. The proposed model considered cracking, crushing and yielding of concrete and steel at elevated temperatures. The layered approach is used to represent the steel reinforcement and discretize the concrete slab through the thickness. The reinforcement steel is represented as a smeared layer of equivalent thickness with uniaxial strength and rigidity properties.Geometric nonlinear analysis may play an important role in the behavior of reinforced concrete slabs at high temperature. Geometrical nonlinearity in the layered approach is considered in the mathematical model, which is based on the total Lagrangian approach taking into account Von Karman assumptions.Finally two examples for which experimental results are available are analyzed, using the proposed model .The comparison showed good agreement with experimental results. 

  12. Mapping Students’ Informal Learning Using Personal Learning Environment

    Directory of Open Access Journals (Sweden)

    Jelena Anđelković Labrović

    2014-07-01

    Full Text Available Personal learning environments are a widely spared ways of learning, especially for the informal learning process. The aim of this research is to identify the elements of studens’ personal learning environment and to identify the extent to which students use modern technology for learning as part of their non-formal learning. A mapping system was used for gathering data and an analysis of percentages and frequency counts was used for data analysis in the SPSS. The results show that students’ personal learning environment includes the following elements: Wikipedia, Google, YouTube and Facebook in 75% of all cases, and an interesting fact is that all of them belong to a group of Web 2.0 tools and applications.

  13. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Science.gov (United States)

    Kato, Ayaka; Morita, Kenji

    2016-10-01

    It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2) value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning

  14. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Directory of Open Access Journals (Sweden)

    Ayaka Kato

    2016-10-01

    Full Text Available It has been suggested that dopamine (DA represents reward-prediction-error (RPE defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1 decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2 value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i slowdown of behavior by post-training blockade of DA signaling, (ii observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems

  15. Mathematical modeling for corrosion environment estimation based on concrete resistivity measurement directly above reinforcement

    International Nuclear Information System (INIS)

    Lim, Young-Chul; Lee, Han-Seung; Noguchi, Takafumi

    2009-01-01

    This study aims to formulate a resistivity model whereby the concrete resistivity expressing the environment of steel reinforcement can be directly estimated and evaluated based on measurement immediately above reinforcement as a method of evaluating corrosion deterioration in reinforced concrete structures. It also aims to provide a theoretical ground for the feasibility of durability evaluation by electric non-destructive techniques with no need for chipping of cover concrete. This Resistivity Estimation Model (REM), which is a mathematical model using the mirror method, combines conventional four-electrode measurement of resistivity with geometric parameters including cover depth, bar diameter, and electrode intervals. This model was verified by estimation using this model at areas directly above reinforcement and resistivity measurement at areas unaffected by reinforcement in regard to the assessment of the concrete resistivity. Both results strongly correlated, proving the validity of this model. It is expected to be applicable to laboratory study and field diagnosis regarding reinforcement corrosion. (author)

  16. Conditions for Productive Learning in Network Learning Environments

    DEFF Research Database (Denmark)

    Ponti, M.; Dirckinck-Holmfeld, Lone; Lindström, B.

    2004-01-01

    are designed without a deep understanding of the pedagogical, communicative and collaborative conditions embedded in networked learning. Despite the existence of good theoretical views pointing to a social understanding of learning, rather than a traditional individualistic and information processing approach......The Kaleidoscope1 Jointly Executed Integrating Research Project (JEIRP) on Conditions for Productive Networked Learning Environments is developing and elaborating conceptual understandings of Computer Supported Collaborative Learning (CSCL) emphasizing the use of cross-cultural comparative......: Pedagogical design and the dialectics of the digital artefacts, the concept of collaboration, ethics/trust, identity and the role of scaffolding of networked learning environments.   The JEIRP is motivated by the fact that many networked learning environments in various European educational settings...

  17. Students’ Motivation for Learning in Virtual Learning Environments

    OpenAIRE

    Beluce, Andrea Carvalho; Oliveira, Katya Luciane de

    2015-01-01

    The specific characteristics of online education require of the student engagement and autonomy, factors which are related to motivation for learning. This study investigated students’ motivation in virtual learning environments (VLEs). For this, it used the Teaching and Learning Strategy and Motivation to Learn Scale in Virtual Learning Environments (TLSM-VLE). The scale presented 32 items and six dimensions, three of which aimed to measure the variables of autonomous motivation, controlled ...

  18. Amygdala and ventral striatum make distinct contributions to reinforcement learning

    Science.gov (United States)

    Costa, Vincent D.; Monte, Olga Dal; Lucas, Daniel R.; Murray, Elisabeth A.; Averbeck, Bruno B.

    2016-01-01

    Summary Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with a RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys’ choice reaction times, which emphasized a speed-accuracy tradeoff that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL. PMID:27720488

  19. Active-learning strategies: the use of a game to reinforce learning in nursing education. A case study.

    Science.gov (United States)

    Boctor, Lisa

    2013-03-01

    The majority of nursing students are kinesthetic learners, preferring a hands-on, active approach to education. Research shows that active-learning strategies can increase student learning and satisfaction. This study looks at the use of one active-learning strategy, a Jeopardy-style game, 'Nursopardy', to reinforce Fundamentals of Nursing material, aiding in students' preparation for a standardized final exam. The game was created keeping students varied learning styles and the NCLEX blueprint in mind. The blueprint was used to create 5 categories, with 26 total questions. Student survey results, using a five-point Likert scale showed that they did find this learning method enjoyable and beneficial to learning. More research is recommended regarding learning outcomes, when using active-learning strategies, such as games. Copyright © 2012 Elsevier Ltd. All rights reserved.

  20. Characterizing Response-Reinforcer Relations in the Natural Environment: Exploratory Matching Analyses

    Science.gov (United States)

    Sy, Jolene R.; Borrero, John C.; Borrero, Carrie S. W.

    2010-01-01

    We assessed problem and appropriate behavior in the natural environment from a matching perspective. Problem and appropriate behavior were conceptualized as concurrently available responses, the occurrence of which was thought to be determined by the relative rates or durations of reinforcement. We also assessed whether response allocation could…

  1. Reinforcement learning for dpm of embedded visual sensor nodes

    International Nuclear Information System (INIS)

    Khani, U.; Sadhayo, I. H.

    2014-01-01

    This paper proposes a RL (Reinforcement Learning) based DPM (Dynamic Power Management) technique to learn time out policies during a visual sensor node's operation which has multiple power/performance states. As opposed to the widely used static time out policies, our proposed DPM policy which is also referred to as OLTP (Online Learning of Time out Policies), learns to dynamically change the time out decisions in the different node states including the non-operational states. The selection of time out values in different power/performance states of a visual sensing platform is based on the workload estimates derived from a ML-ANN (Multi-Layer Artificial Neural Network) and an objective function given by weighted performance and power parameters. The DPM approach is also able to dynamically adjust the power-performance weights online to satisfy a given constraint of either power consumption or performance. Results show that the proposed learning algorithm explores the power-performance tradeoff with non-stationary workload and outperforms other DPM policies. It also performs the online adjustment of the tradeoff parameters in order to meet a user-specified constraint. (author)

  2. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning

    OpenAIRE

    Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ

    2014-01-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated....

  3. Exploring Collaborative Learning Effect in Blended Learning Environments

    Science.gov (United States)

    Sun, Z.; Liu, R.; Luo, L.; Wu, M.; Shi, C.

    2017-01-01

    The use of new technology encouraged exploration of the effectiveness and difference of collaborative learning in blended learning environments. This study investigated the social interactive network of students, level of knowledge building and perception level on usefulness in online and mobile collaborative learning environments in higher…

  4. The learning environment and learning styles: a guide for mentors.

    Science.gov (United States)

    Vinales, James Jude

    The learning environment provides crucial exposure for the pre-registration nursing student. It is during this time that the student nurse develops his or her repertoire of skills, knowledge, attitudes and behaviour in order to meet competencies and gain registration with the Nursing and Midwifery Council. The role of the mentor is vital within the learning environment for aspiring nurses. The learning environment is a fundamental platform for student learning, with mentors key to identifying what is conducive to learning. This article will consider the learning environment and learning styles, and how these two essential elements guide the mentor in making sure they are conducive to learning.

  5. High and low temperatures have unequal reinforcing properties in Drosophila spatial learning.

    Science.gov (United States)

    Zars, Melissa; Zars, Troy

    2006-07-01

    Small insects regulate their body temperature solely through behavior. Thus, sensing environmental temperature and implementing an appropriate behavioral strategy can be critical for survival. The fly Drosophila melanogaster prefers 24 degrees C, avoiding higher and lower temperatures when tested on a temperature gradient. Furthermore, temperatures above 24 degrees C have negative reinforcing properties. In contrast, we found that flies have a preference in operant learning experiments for a low-temperature-associated position rather than the 24 degrees C alternative in the heat-box. Two additional differences between high- and low-temperature reinforcement, i.e., temperatures above and below 24 degrees C, were found. Temperatures equally above and below 24 degrees C did not reinforce equally and only high temperatures supported increased memory performance with reversal conditioning. Finally, low- and high-temperature reinforced memories are similarly sensitive to two genetic mutations. Together these results indicate the qualitative meaning of temperatures below 24 degrees C depends on the dynamics of the temperatures encountered and that the reinforcing effects of these temperatures depend on at least some common genetic components. Conceptualizing these results using the Wolf-Heisenberg model of operant conditioning, we propose the maximum difference in experienced temperatures determines the magnitude of the reinforcement input to a conditioning circuit.

  6. Spared internal but impaired external reward prediction error signals in major depressive disorder during reinforcement learning.

    Science.gov (United States)

    Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris

    2017-01-01

    Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.

  7. Emotion in reinforcement learning agents and robots: A survey

    OpenAIRE

    Moerland, T.M.; Broekens, D.J.; Jonker, C.M.

    2018-01-01

    This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action selection. Therefore, computational emotion models are usually grounded in the agent's decision making architecture, of which RL is an important subclass. Studying emotions in RL-based agents is useful for ...

  8. Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.

    Science.gov (United States)

    Schönberg, Tom; Daw, Nathaniel D; Joel, Daphna; O'Doherty, John P

    2007-11-21

    The computational framework of reinforcement learning has been used to forward our understanding of the neural mechanisms underlying reward learning and decision-making behavior. It is known that humans vary widely in their performance in decision-making tasks. Here, we used a simple four-armed bandit task in which subjects are almost evenly split into two groups on the basis of their performance: those who do learn to favor choice of the optimal action and those who do not. Using models of reinforcement learning we sought to determine the neural basis of these intrinsic differences in performance by scanning both groups with functional magnetic resonance imaging. We scanned 29 subjects while they performed the reward-based decision-making task. Our results suggest that these two groups differ markedly in the degree to which reinforcement learning signals in the striatum are engaged during task performance. While the learners showed robust prediction error signals in both the ventral and dorsal striatum during learning, the nonlearner group showed a marked absence of such signals. Moreover, the magnitude of prediction error signals in a region of dorsal striatum correlated significantly with a measure of behavioral performance across all subjects. These findings support a crucial role of prediction error signals, likely originating from dopaminergic midbrain neurons, in enabling learning of action selection preferences on the basis of obtained rewards. Thus, spontaneously observed individual differences in decision making performance demonstrate the suggested dependence of this type of learning on the functional integrity of the dopaminergic striatal system in humans.

  9. Alkali-resistant glass fiber reinforced high strength concrete in simulated aggressive environment

    International Nuclear Information System (INIS)

    Kwan, W.H.; Cheah, C.B.; Ramli, M.; Chang, K.Y.

    2018-01-01

    The durability of the alkali-resistant (AR) glass fiber reinforced concrete (GFRC) in three simulated aggresive environments, namely tropical climate, cyclic air and seawater and seawater immersion was investigated. Durability examinations include chloride diffusion, gas permeability, X-ray diffraction (XRD) and scanning electron microscopy examination (SEM). The fiber content is in the range of 0.6 % to 2.4 %. Results reveal that the specimen containing highest AR glass fiber content suffered severe strength loss in seawater environment and relatively milder strength loss under cyclic conditions. The permeability property was found to be more inferior with the increase in the fiber content of the concrete. This suggests that the AR glass fiber is not suitable for use as the fiber reinforcement in concrete is exposed to seawater. However, in both the tropical climate and cyclic wetting and drying, the incorporation of AR glass fiber prevents a drastic increase in permeability. [es

  10. Reinforcement Learning in Distributed Domains: Beyond Team Games

    Science.gov (United States)

    Wolpert, David H.; Sill, Joseph; Turner, Kagan

    2000-01-01

    Distributed search algorithms are crucial in dealing with large optimization problems, particularly when a centralized approach is not only impractical but infeasible. Many machine learning concepts have been applied to search algorithms in order to improve their effectiveness. In this article we present an algorithm that blends Reinforcement Learning (RL) and hill climbing directly, by using the RL signal to guide the exploration step of a hill climbing algorithm. We apply this algorithm to the domain of a constellations of communication satellites where the goal is to minimize the loss of importance weighted data. We introduce the concept of 'ghost' traffic, where correctly setting this traffic induces the satellites to act to optimize the world utility. Our results indicated that the bi-utility search introduced in this paper outperforms both traditional hill climbing algorithms and distributed RL approaches such as team games.

  11. Cerebellar and prefrontal cortex contributions to adaptation, strategies, and reinforcement learning.

    Science.gov (United States)

    Taylor, Jordan A; Ivry, Richard B

    2014-01-01

    Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. © 2014 Elsevier B.V. All rights reserved.

  12. ABOUT INFLUENCE OF DIFFERENT SCHEMES IMPACT RADIATION ENVIRONMENTS AND LOADS ON REINFORCED LAMELLAR STRUCTURAL MEMBERS

    Directory of Open Access Journals (Sweden)

    Rafail B. Garibov

    2017-12-01

    Full Text Available The article discusses the model of deformation of fiber-reinforced concrete rectangular plate under the influence of radiation environments. In the calculation of the plate was considered different schemes impact of the applied external loads and radiation environments.

  13. School and workplace as learning environments

    DEFF Research Database (Denmark)

    Jørgensen, Christian Helms

    In vocational education and training the school and the workplace are two different learning environments. But how should we conceive of a learning environment, and what characterizes the school and the workplace respectively as learning environments? And how can the two environ-ments be linked......? These questions are treated in this paper. School and workplace are assessed us-ing the same analytical approach. Thereby it is pointed out how different forms of learning are en-couraged in each of them and how different forms of knowledge are valued. On this basis sugges-tions are made about how to understand...

  14. Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder.

    Science.gov (United States)

    Rothkirch, Marcus; Tonn, Jonas; Köhler, Stephan; Sterzer, Philipp

    2017-04-01

    According to current concepts, major depressive disorder is strongly related to dysfunctional neural processing of motivational information, entailing impairments in reinforcement learning. While computational modelling can reveal the precise nature of neural learning signals, it has not been used to study learning-related neural dysfunctions in unmedicated patients with major depressive disorder so far. We thus aimed at comparing the neural coding of reward and punishment prediction errors, representing indicators of neural learning-related processes, between unmedicated patients with major depressive disorder and healthy participants. To this end, a group of unmedicated patients with major depressive disorder (n = 28) and a group of age- and sex-matched healthy control participants (n = 30) completed an instrumental learning task involving monetary gains and losses during functional magnetic resonance imaging. The two groups did not differ in their learning performance. Patients and control participants showed the same level of prediction error-related activity in the ventral striatum and the anterior insula. In contrast, neural coding of reward prediction errors in the medial orbitofrontal cortex was reduced in patients. Moreover, neural reward prediction error signals in the medial orbitofrontal cortex and ventral striatum showed negative correlations with anhedonia severity. Using a standard instrumental learning paradigm we found no evidence for an overall impairment of reinforcement learning in medication-free patients with major depressive disorder. Importantly, however, the attenuated neural coding of reward in the medial orbitofrontal cortex and the relation between anhedonia and reduced reward prediction error-signalling in the medial orbitofrontal cortex and ventral striatum likely reflect an impairment in experiencing pleasure from rewarding events as a key mechanism of anhedonia in major depressive disorder. © The Author (2017). Published by Oxford

  15. Multiobjective Reinforcement Learning for Traffic Signal Control Using Vehicular Ad Hoc Network

    Directory of Open Access Journals (Sweden)

    Houli Duan

    2010-01-01

    Full Text Available We propose a new multiobjective control algorithm based on reinforcement learning for urban traffic signal control, named multi-RL. A multiagent structure is used to describe the traffic system. A vehicular ad hoc network is used for the data exchange among agents. A reinforcement learning algorithm is applied to predict the overall value of the optimization objective given vehicles' states. The policy which minimizes the cumulative value of the optimization objective is regarded as the optimal one. In order to make the method adaptive to various traffic conditions, we also introduce a multiobjective control scheme in which the optimization objective is selected adaptively to real-time traffic states. The optimization objectives include the vehicle stops, the average waiting time, and the maximum queue length of the next intersection. In addition, we also accommodate a priority control to the buses and the emergency vehicles through our model. The simulation results indicated that our algorithm could perform more efficiently than traditional traffic light control methods.

  16. Corrosion performance of epoxy-coated reinforcement in aggressive environments

    Science.gov (United States)

    Vaca Cortes, Enrique

    The objective of this research was to investigate the integrity and corrosion performance of epoxy-coated reinforcement in aggressive environments. A series of experimental studies were conducted: (a) hot water immersion and knife adhesion testing for assessment of coating adhesion; (b) materials and procedures for repairing coating damage; (c) degree of mechanical damage caused during concrete placement when using metal head and rubber head vibrators; (d) accelerated corrosion of coated bars embedded in macrocell and beam specimens placed in a corrosive environment for more than four years. The effects of coating condition and amount of damage, repaired vs. unrepaired damage, bar fabrication, and concrete cracking were studied. Regardless of coating condition, the performance of epoxy-coated bars was better than that of uncoated bars. Unlike black bars, coated bars did not exhibit deep pitting or substantial loss of cross section at crack locations. Damage to epoxy coating was the most significant factor affecting corrosion performance. Bars with coating in good condition, without any visible damage, performed best. The greater the size and frequency of damage, the more severe and extensive the amount of corrosion. The performance of bars that were fabricated or bent after coating was worse than that of coated straight bars. Mixing coated and uncoated bars in the same concrete member led to undesirable performance. Patching damaged coating reduced but did not prevent corrosion, particularly at bar ends. The most important factor in coating repair was the type and properties of the patching material. Surface preparation prior to coating had little effect. The absence of cracks in the concrete delayed, but did not prevent the onset of corrosion of coated bars. During consolidation of concrete, rubber head vibrators caused less damage to epoxy-coated reinforcement than did comparable metal heads. Hot water and adhesion tests were useful and practical for evaluating

  17. Designing Creative Learning Environments

    Directory of Open Access Journals (Sweden)

    Thomas Cochrane

    2015-05-01

    Full Text Available Designing creative learning environments involves not only facilitating student creativity, but also modeling creative pedagogical practice. In this paper we explore the implementation of a framework for designing creative learning environments using mobile social media as a catalyst for redefining both lecturer pedagogical practice, as well as redesigning the curriculum around student generated m-portfolios.

  18. Oxytocin attenuates trust as a subset of more general reinforcement learning, with altered reward circuit functional connectivity in males.

    Science.gov (United States)

    Ide, Jaime S; Nedic, Sanja; Wong, Kin F; Strey, Shmuel L; Lawson, Elizabeth A; Dickerson, Bradford C; Wald, Lawrence L; La Camera, Giancarlo; Mujica-Parodi, Lilianne R

    2018-07-01

    Oxytocin (OT) is an endogenous neuropeptide that, while originally thought to promote trust, has more recently been found to be context-dependent. Here we extend experimental paradigms previously restricted to de novo decision-to-trust, to a more realistic environment in which social relationships evolve in response to iterative feedback over twenty interactions. In a randomized, double blind, placebo-controlled within-subject/crossover experiment of human adult males, we investigated the effects of a single dose of intranasal OT (40 IU) on Bayesian expectation updating and reinforcement learning within a social context, with associated brain circuit dynamics. Subjects participated in a neuroeconomic task (Iterative Trust Game) designed to probe iterative social learning while their brains were scanned using ultra-high field (7T) fMRI. We modeled each subject's behavior using Bayesian updating of belief-states ("willingness to trust") as well as canonical measures of reinforcement learning (learning rate, inverse temperature). Behavioral trajectories were then used as regressors within fMRI activation and connectivity analyses to identify corresponding brain network functionality affected by OT. Behaviorally, OT reduced feedback learning, without bias with respect to positive versus negative reward. Neurobiologically, reduced learning under OT was associated with muted communication between three key nodes within the reward circuit: the orbitofrontal cortex, amygdala, and lateral (limbic) habenula. Our data suggest that OT, rather than inspiring feelings of generosity, instead attenuates the brain's encoding of prediction error and therefore its ability to modulate pre-existing beliefs. This effect may underlie OT's putative role in promoting what has typically been reported as 'unjustified trust' in the face of information that suggests likely betrayal, while also resolving apparent contradictions with regard to OT's context-dependent behavioral effects. Copyright

  19. Prediction of the Service Life of a Reinforced Concrete Column under Chloride Environment

    Directory of Open Access Journals (Sweden)

    Mohammad K. Alkam

    2015-01-01

    Full Text Available In the present investigation, service life of a reinforced concrete column exposed to chloride environment has been predicted. This study has been based on numerical simulation of chloride ion diffusion in a concrete column during its anticipated life span. The simulation process has included the concrete cover replacement whenever chloride ion concentration has reached the critical threshold value at the reinforcement surface. Repair scheduling of the concrete column under consideration has been discussed. Effects of the concrete cover thickness and the water cement ratio on the service life of the concrete column at hand have been presented. A new approach for arranging locations of reinforcement steel bars has been introduced. This approach is intended to prolong the service life of the concrete column under consideration against chloride induced corrosion.

  20. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia

    Science.gov (United States)

    Markou, Athina; Salamone, John D.; Bussey, Timothy; Mar, Adam; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-01-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu). A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. PMID:23994273

  1. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia.

    Science.gov (United States)

    Markou, Athina; Salamone, John D; Bussey, Timothy J; Mar, Adam C; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-11-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu) meeting. A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. Copyright © 2013 Elsevier

  2. Learning Environment And Pupils Academic Performance ...

    African Journals Online (AJOL)

    Learning Environment And Pupils Academic Performance: Implications For Counselling. ... facilities as well as learning materials to make teaching and learning easy. In addition, teachers should provide conducive classroom environment to ...

  3. Creating a flexible learning environment.

    Science.gov (United States)

    Taylor, B A; Jones, S; Winters, P

    1990-01-01

    Lack of classroom space is a common problem for many hospital-based nurse educators. This article describes how nursing educators in one institution redesigned fixed classroom space into a flexible learning center that accommodates their various programs. Using the nursing process, the educators assessed their needs, planned the learning environment, implemented changes in the interior design, and evaluated the outcome of the project. The result was a learning environment conducive to teaching and learning.

  4. Group Modeling in Social Learning Environments

    Science.gov (United States)

    Stankov, Slavomir; Glavinic, Vlado; Krpan, Divna

    2012-01-01

    Students' collaboration while learning could provide better learning environments. Collaboration assumes social interactions which occur in student groups. Social theories emphasize positive influence of such interactions on learning. In order to create an appropriate learning environment that enables social interactions, it is important to…

  5. Switching Reinforcement Learning for Continuous Action Space

    Science.gov (United States)

    Nagayoshi, Masato; Murao, Hajime; Tamaki, Hisashi

    Reinforcement Learning (RL) attracts much attention as a technique of realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. This difficulty includes a problem of designing a suitable action space of an agent, i.e., satisfying two requirements in trade-off: (i) to keep the characteristics (or structure) of an original search space as much as possible in order to seek strategies that lie close to the optimal, and (ii) to reduce the search space as much as possible in order to expedite the learning process. In order to design a suitable action space adaptively, we propose switching RL model to mimic a process of an infant's motor development in which gross motor skills develop before fine motor skills. Then, a method for switching controllers is constructed by introducing and referring to the “entropy”. Further, through computational experiments by using robot navigation problems with one and two-dimensional continuous action space, the validity of the proposed method has been confirmed.

  6. Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing

    Science.gov (United States)

    Lefebvre, Germain; Blakemore, Sarah-Jayne

    2017-01-01

    Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice. PMID:28800597

  7. Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing.

    Science.gov (United States)

    Palminteri, Stefano; Lefebvre, Germain; Kilford, Emma J; Blakemore, Sarah-Jayne

    2017-08-01

    Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice.

  8. Within- and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory.

    Science.gov (United States)

    Collins, Anne G E; Frank, Michael J

    2018-03-06

    Learning from rewards and punishments is essential to survival and facilitates flexible human behavior. It is widely appreciated that multiple cognitive and reinforcement learning systems contribute to decision-making, but the nature of their interactions is elusive. Here, we leverage methods for extracting trial-by-trial indices of reinforcement learning (RL) and working memory (WM) in human electro-encephalography to reveal single-trial computations beyond that afforded by behavior alone. Neural dynamics confirmed that increases in neural expectation were predictive of reduced neural surprise in the following feedback period, supporting central tenets of RL models. Within- and cross-trial dynamics revealed a cooperative interplay between systems for learning, in which WM contributes expectations to guide RL, despite competition between systems during choice. Together, these results provide a deeper understanding of how multiple neural systems interact for learning and decision-making and facilitate analysis of their disruption in clinical populations.

  9. Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

    Directory of Open Access Journals (Sweden)

    Nicolas Frémaux

    2013-04-01

    Full Text Available Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD learning of Doya (2000 to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

  10. Performance Comparison of Two Reinforcement Learning Algorithms for Small Mobile Robots

    Czech Academy of Sciences Publication Activity Database

    Neruda, Roman; Slušný, Stanislav

    2009-01-01

    Roč. 2, č. 1 (2009), s. 59-68 ISSN 2005-4297 R&D Projects: GA MŠk(CZ) 1M0567 Grant - others:GA UK(CZ) 7637/2007 Institutional research plan: CEZ:AV0Z10300504 Keywords : reinforcement learning * mobile robots * inteligent agents Subject RIV: IN - Informatics, Computer Science http://www.sersc.org/journals/IJCA/vol2_no1/7.pdf

  11. IMPLEMENTATION OF MULTIAGENT REINFORCEMENT LEARNING MECHANISM FOR OPTIMAL ISLANDING OPERATION OF DISTRIBUTION NETWORK

    DEFF Research Database (Denmark)

    Saleem, Arshad; Lind, Morten

    2008-01-01

    among electric power utilities to utilize modern information and communication technologies (ICT) in order to improve the automation of the distribution system. In this paper we present our work for the implementation of a dynamic multi-agent based distributed reinforcement learning mechanism...

  12. A Reinforcement Learning Model Equipped with Sensors for Generating Perception Patterns: Implementation of a Simulated Air Navigation System Using ADS-B (Automatic Dependent Surveillance-Broadcast) Technology.

    Science.gov (United States)

    Álvarez de Toledo, Santiago; Anguera, Aurea; Barreiro, José M; Lara, Juan A; Lizcano, David

    2017-01-19

    Over the last few decades, a number of reinforcement learning techniques have emerged, and different reinforcement learning-based applications have proliferated. However, such techniques tend to specialize in a particular field. This is an obstacle to their generalization and extrapolation to other areas. Besides, neither the reward-punishment (r-p) learning process nor the convergence of results is fast and efficient enough. To address these obstacles, this research proposes a general reinforcement learning model. This model is independent of input and output types and based on general bioinspired principles that help to speed up the learning process. The model is composed of a perception module based on sensors whose specific perceptions are mapped as perception patterns. In this manner, similar perceptions (even if perceived at different positions in the environment) are accounted for by the same perception pattern. Additionally, the model includes a procedure that statistically associates perception-action pattern pairs depending on the positive or negative results output by executing the respective action in response to a particular perception during the learning process. To do this, the model is fitted with a mechanism that reacts positively or negatively to particular sensory stimuli in order to rate results. The model is supplemented by an action module that can be configured depending on the maneuverability of each specific agent. The model has been applied in the air navigation domain, a field with strong safety restrictions, which led us to implement a simulated system equipped with the proposed model. Accordingly, the perception sensors were based on Automatic Dependent Surveillance-Broadcast (ADS-B) technology, which is described in this paper. The results were quite satisfactory, and it outperformed traditional methods existing in the literature with respect to learning reliability and efficiency.

  13. SCAFFOLDING IN CONNECTIVIST MOBILE LEARNING ENVIRONMENT

    Directory of Open Access Journals (Sweden)

    Ozlem OZAN

    2013-04-01

    Full Text Available Social networks and mobile technologies are transforming learning ecology. In this changing learning environment, we find a variety of new learner needs. The aim of this study is to investigate how to provide scaffolding to the learners in connectivist mobile learning environment: Ø to learn in a networked environment, Ø to manage their networked learning process, Ø to interact in a networked society, and Ø to use the tools belonging to the network society. The researcher described how Vygotsky's “scaffolding” concept, Berge’s “learner support” strategies, and Siemens’ “connectivism” approach can be used together to satisfy mobile learners’ needs. A connectivist mobile learning environment was designed for the research, and the research was executed as a mixed-method study. Data collection tools were Facebook wall entries, personal messages, chat records; Twitter, Diigo, blog entries; emails, mobile learning management system statistics, perceived learning survey and demographic information survey. Results showed that there were four major aspects of scaffolding in connectivist mobile learning environment as type of it, provider of it, and timing of it and strategies of it. Participants preferred mostly social scaffolding, and then preferred respectively, managerial, instructional and technical scaffolding. Social scaffolding was mostly provided by peers, and managerial scaffolding was mostly provided by instructor. Use of mobile devices increased the learner motivation and interest. Some participants stated that learning was more permanent by using mobile technologies. Social networks and mobile technologies made it easier to manage the learning process and expressed a positive impact on perceived learning.

  14. Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer.

    Science.gov (United States)

    Hu, Yujing; Gao, Yang; An, Bo

    2015-07-01

    An important approach in multiagent reinforcement learning (MARL) is equilibrium-based MARL, which adopts equilibrium solution concepts in game theory and requires agents to play equilibrium strategies at each state. However, most existing equilibrium-based MARL algorithms cannot scale due to a large number of computationally expensive equilibrium computations (e.g., computing Nash equilibria is PPAD-hard) during learning. For the first time, this paper finds that during the learning process of equilibrium-based MARL, the one-shot games corresponding to each state's successive visits often have the same or similar equilibria (for some states more than 90% of games corresponding to successive visits have similar equilibria). Inspired by this observation, this paper proposes to use equilibrium transfer to accelerate equilibrium-based MARL. The key idea of equilibrium transfer is to reuse previously computed equilibria when each agent has a small incentive to deviate. By introducing transfer loss and transfer condition, a novel framework called equilibrium transfer-based MARL is proposed. We prove that although equilibrium transfer brings transfer loss, equilibrium-based MARL algorithms can still converge to an equilibrium policy under certain assumptions. Experimental results in widely used benchmarks (e.g., grid world game, soccer game, and wall game) show that the proposed framework: 1) not only significantly accelerates equilibrium-based MARL (up to 96.7% reduction in learning time), but also achieves higher average rewards than algorithms without equilibrium transfer and 2) scales significantly better than algorithms without equilibrium transfer when the state/action space grows and the number of agents increases.

  15. Shrinkage modeling of concrete reinforced by palm fibres in hot dry environments

    Science.gov (United States)

    Akchiche, Hamida; Kriker, Abdelouahed

    2017-02-01

    The cement materials, such as concrete and conventional mortar present very little resistance to traction and cracking, these hydraulic materials which induces large withdrawals on materials and cracks in structures. The hot dry environments such as: the Saharan regions of Algeria, Indeed, concrete structures in these regions are very fragile, and present high shrinkage. Strengthening of these materials by fibers can provide technical solutions for improving the mechanical performance. The aim of this study is firstly, to reduce the shrinkage of conventional concrete with its reinforcement with date palm fibers. In fact, Algeria has an extraordinary resources in natural fibers (from Palm, Abaca, Hemp) but without valorization in practical areas, especially in building materials. Secondly, to model the shrinkage behavior of concrete was reinforced by date palm fibers. In the literature, several models for still fiber concrete were founded but few are offers for natural fiber concretes. To do so, a still fiber concretes model of YOUNG - CHERN was used. According to the results, a reduction of shrinkage with reinforcement by date palm fibers was showed. A good ability of molding of shrinkage of date palm reinforced concrete with YOUNG - CHERN Modified model was obtained. In fact, a good correlation between experimental data and the model data was recorded.

  16. Judgments of Learning in Collaborative Learning Environments

    NARCIS (Netherlands)

    Helsdingen, Anne

    2010-01-01

    Helsdingen, A. S. (2010, March). Judgments of Learning in Collaborative Learning Environments. Poster presented at the 1st International Air Transport and Operations Symposium (ATOS 2010), Delft, The Netherlands: Delft University of Technology.

  17. Reinforcement learning produces dominant strategies for the Iterated Prisoner's Dilemma.

    Science.gov (United States)

    Harper, Marc; Knight, Vincent; Jones, Martin; Koutsovoulos, Georgios; Glynatsi, Nikoleta E; Campbell, Owen

    2017-01-01

    We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.

  18. Dynamic Resource Allocation with Integrated Reinforcement Learning for a D2D-Enabled LTE-A Network with Access to Unlicensed Band

    Directory of Open Access Journals (Sweden)

    Alia Asheralieva

    2016-01-01

    Full Text Available We propose a dynamic resource allocation algorithm for device-to-device (D2D communication underlying a Long Term Evolution Advanced (LTE-A network with reinforcement learning (RL applied for unlicensed channel allocation. In a considered system, the inband and outband resources are assigned by the LTE evolved NodeB (eNB to different device pairs to maximize the network utility subject to the target signal-to-interference-and-noise ratio (SINR constraints. Because of the absence of an established control link between the unlicensed and cellular radio interfaces, the eNB cannot acquire any information about the quality and availability of unlicensed channels. As a result, a considered problem becomes a stochastic optimization problem that can be dealt with by deploying a learning theory (to estimate the random unlicensed channel environment. Consequently, we formulate the outband D2D access as a dynamic single-player game in which the player (eNB estimates its possible strategy and expected utility for all of its actions based only on its own local observations using a joint utility and strategy estimation based reinforcement learning (JUSTE-RL with regret algorithm. A proposed approach for resource allocation demonstrates near-optimal performance after a small number of RL iterations and surpasses the other comparable methods in terms of energy efficiency and throughput maximization.

  19. Experiential Learning and Learning Environments: The Case of Active Listening Skills

    Science.gov (United States)

    Huerta-Wong, Juan Enrique; Schoech, Richard

    2010-01-01

    Social work education research frequently has suggested an interaction between teaching techniques and learning environments. However, this interaction has never been tested. This study compared virtual and face-to-face learning environments and included active listening concepts to test whether the effectiveness of learning environments depends…

  20. Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex.

    Directory of Open Access Journals (Sweden)

    George L Chadderdon

    Full Text Available Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint "forearm" to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1, no learning (0, or punishment (-1, corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.

  1. Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex.

    Science.gov (United States)

    Chadderdon, George L; Neymotin, Samuel A; Kerr, Cliff C; Lytton, William W

    2012-01-01

    Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint "forearm" to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1), no learning (0), or punishment (-1), corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.

  2. Self-organized Learning Environments

    DEFF Research Database (Denmark)

    Dalsgaard, Christian; Mathiasen, Helle

    2007-01-01

    system actively. The two groups used the system in their own way to support their specific activities and ways of working. The paper concludes that self-organized learning environments can strengthen the development of students’ academic as well as social qualifications. Further, the paper identifies......The purpose of the paper is to discuss the potentials of using a conference system in support of a project based university course. We use the concept of a self-organized learning environment to describe the shape of the course. In the paper we argue that educational technology, such as conference...... systems, has a potential to support students’ development of self-organized learning environments and facilitate self-governed activities in higher education. The paper is based on an empirical study of two project groups’ use of a conference system. The study showed that the students used the conference...

  3. Blended Learning in Personalized Assistive Learning Environments

    Science.gov (United States)

    Marinagi, Catherine; Skourlas, Christos

    2013-01-01

    In this paper, the special needs/requirements of disabled students and cost-benefits for applying blended learning in Personalized Educational Learning Environments (PELE) in Higher Education are studied. The authors describe how blended learning can form an attractive and helpful framework for assisting Deaf and Hard-of-Hearing (D-HH) students to…

  4. Learning environment, learning styles and conceptual understanding

    Science.gov (United States)

    Ferrer, Lourdes M.

    1990-01-01

    In recent years there have been many studies on learners developing conceptions of natural phenomena. However, so far there have been few attempts to investigate how the characteristics of the learners and their environment influence such conceptions. This study began with an attempt to use an instrument developed by McCarthy (1981) to describe learners in Malaysian primary schools. This proved inappropriate as Asian primary classrooms do not provide the same kind of environment as US classrooms. It was decided to develop a learning style checklist to suit the local context and which could be used to describe differences between learners which teachers could appreciate and use. The checklist included four dimensions — perceptual, process, self-confidence and motivation. The validated instrument was used to determine the learning style preferences of primary four pupils in Penang, Malaysia. Later, an analysis was made regarding the influence of learning environment and learning styles on conceptual understanding in the topics of food, respiration and excretion. This study was replicated in the Philippines with the purpose of investigating the relationship between learning styles and achievement in science, where the topics of food, respiration and excretion have been taken up. A number of significant relationships were observed in these two studies.

  5. Corrosion of reinforcement induced by environment containing ...

    Indian Academy of Sciences (India)

    Unknown

    carbonation and chlorides causing corrosion of steel reinforcement. ... interesting and important when the evaluation of the service life of the ... preferably in the areas of industrial and transport activities. ... For controlling the embedded corrosion sensors, elec- .... danger of corrosion of reinforcement seems to be more.

  6. Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions.

    Science.gov (United States)

    Tamosiunaite, Minija; Asfour, Tamim; Wörgötter, Florentin

    2009-03-01

    Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult.

  7. Learning Networks Distributed Environment

    NARCIS (Netherlands)

    Martens, Harrie; Vogten, Hubert; Koper, Rob; Tattersall, Colin; Van Rosmalen, Peter; Sloep, Peter; Van Bruggen, Jan; Spoelstra, Howard

    2005-01-01

    Learning Networks Distributed Environment is a prototype of an architecture that allows the sharing and modification of learning materials through a number of transport protocols. The prototype implements a p2p protcol using JXTA.

  8. New Educational Environments Aimed at Developing Intercultural Understanding While Reinforcing the Use of English in Experience-Based Learning

    Directory of Open Access Journals (Sweden)

    Leonard R. Bruguier

    2012-07-01

    Full Text Available New learning environments with communication and information tools are increasingly accessible with technology playing a crucial role in expanding and reconceptualizing student learning experiences. This paper reviews the outcome of an innovative course offered by four universities in three countries: Canada, the United States, and Mexico. Course objectives focused on broadening the understanding of indigenous and non-indigenous peoples primarily in relation to identity as it encouraged students to reflect on their own identity while improving their English skills in an interactive and experiential manner and thus enhancing their intercultural competence.

  9. Learning Environment and Student Effort

    Science.gov (United States)

    Hopland, Arnt O.; Nyhus, Ole Henning

    2016-01-01

    Purpose: The purpose of this paper is to explore the relationship between satisfaction with learning environment and student effort, both in class and with homework assignments. Design/methodology/approach: The authors use data from a nationwide and compulsory survey to analyze the relationship between learning environment and student effort. The…

  10. The Learning Impact of a 4-Dimensional Digital Construction Learning Environment

    OpenAIRE

    Chris Landorf; Stephen Ward

    2017-01-01

    This paper addresses a virtual environment approach to work integrated learning for students in construction-related disciplines. The virtual approach provides a safe and pedagogically rigorous environment where students can apply theoretical knowledge in a simulated real-world context. The paper describes the development of a 4-dimensional digital construction environment and associated learning activities funded by the Australian Office for Learning and Teaching. The environment was trialle...

  11. Reinforcement learning produces dominant strategies for the Iterated Prisoner's Dilemma.

    Directory of Open Access Journals (Sweden)

    Marc Harper

    Full Text Available We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms. These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.

  12. Students’ digital learning environments

    DEFF Research Database (Denmark)

    Caviglia, Francesco; Dalsgaard, Christian; Davidsen, Jacob

    2018-01-01

    The objective of the paper is to examine the nature of students’ digital learning environments to understand the interplay of institutional systems and tools that are managed by the students themselves. The paper is based on a study of 128 students’ digital learning environments. The objectives...... used tools in the students’ digital learning environments are Facebook, Google Drive, tools for taking notes, and institutional systems. Additionally, the study shows that the tools meet some very basic demands of the students in relation to collaboration, communication, and feedback. Finally...... of the study are 1) to provide an overview of tools for students’ study activities, 2) to identify the most used and most important tools for students and 3) to discover which activities the tools are used for. The empirical study reveals that the students have a varied use of digital media. Some of the most...

  13. Creating a supportive learning environment for students with learning difficulties

    OpenAIRE

    Grah, Jana

    2013-01-01

    Co-building of supporting learning environment for the learners with learning difficulties is one of the 21st century inclusive school’s elements. Since the physical presence of learners with learning difficulties in the classroom does not self-evidently lead to an effective co-operation and implementation of 21st century inclusive school, I have dedicated my doctor thesis to the establishment of supporting learning environment for the learners with learning difficulties in primary school wit...

  14. Personal Learning Environments for Language Learning

    Directory of Open Access Journals (Sweden)

    Panagiotis Panagiotidis

    2013-02-01

    Full Text Available The advent of web 2.0 and the developments it has introduced both in everyday practice and in education have generated discussion and reflection concerning the technologies which higher education should rely on in order to provide the appropriate e-learning services to future students. In this context, the Virtual Learning Environments (VLEs, which are widely used in universities around the world to provide online courses to every specific knowledge area and of course in foreign languages, have started to appear rather outdated. Extensive research is under progress, concerning the ways in which educational practice will follow the philosophy of web 2.0 by adopting the more learner-centred and collaborative approach of e-learning 2.0 applications, without abandoning the existing investment of the academic institutions in VLEs, which belong to the e-learning 1.0 generation, and, thus, serve a teacher- or coursecentred approach. Towards this direction, a notably promising solution seems to be the exploitation of web 2.0 tools in order to form Personal Learning Environments (PLEs. These are systems specifically designed or created by the combined use of various external applications or tools that can be used independently or act as a supplement to existing VLE platforms, creating a personalized learning environment. In a PLE, students have the opportunity to form their own personal way of working, using the tools they feel are most appropriate to achieve their purpose. Regarding the subject of foreign language, in particular, the creation of such personalized and adaptable learning environments that extend the traditional approach of a course seems to promise a more holistic response to students’ needs, who, functioning in the PLE, could combine learning with their daily practice, communicating and collaborating with others, thus increasing the possibilities of access to multiple sources, informal communication and practice and eventually

  15. Personal Learning Environments for Language Learning

    Directory of Open Access Journals (Sweden)

    Panagiotis Panagiotidis

    2012-12-01

    Full Text Available The advent of web 2.0 and the developments it has introduced both in everyday practice and in education have generated discussion and reflection concerning the technologies which higher education should rely on in order to provide the appropriate e-learning services to future students.In this context, the Virtual Learning Environments (VLEs, which are widely used in universities around the world to provide online courses to every specific knowledge area and of course in foreign languages, have started to appear rather outdated. Extensive research is under progress, concerning the ways in which educational practice will follow the philosophy of web 2.0 by adopting the more learner-centred and collaborative approach of e-learning 2.0 applications, without abandoning the existing investment of the academic institutions in VLEs, which belong to the e-learning 1.0 generation, and, thus, serve a teacher- or coursecentred approach.Towards this direction, a notably promising solution seems to be the exploitation of web 2.0 tools in order to form Personal Learning Environments (PLEs. These are systems specifically designed or created by the combined use of various external applications or tools that can be used independently or act as a supplement to existing VLE platforms, creating a personalized learning environment. In a PLE, students have the opportunity to form their own personal way of working, using the tools they feel are most appropriate to achieve their purpose.Regarding the subject of foreign language, in particular, the creation of such personalized and adaptable learning environments that extend the traditional approach of a course seems to promise a more holistic response to students’ needs, who, functioning in the PLE, could combine learning with their daily practice, communicating and collaborating with others, thus increasing the possibilities of access to multiple sources, informal communication and practice and eventually acquiring

  16. Profiling medical school learning environments in Malaysia: a validation study of the Johns Hopkins Learning Environment Scale

    Directory of Open Access Journals (Sweden)

    Sean Tackett

    2015-07-01

    Full Text Available Purpose: While a strong learning environment is critical to medical student education, the assessment of medical school learning environments has confounded researchers. Our goal was to assess the validity and utility of the Johns Hopkins Learning Environment Scale (JHLES for preclinical students at three Malaysian medical schools with distinct educational and institutional models. Two schools were new international partnerships, and the third was school leaver program established without international partnership. Methods: First- and second-year students responded anonymously to surveys at the end of the academic year. The surveys included the JHLES, a 28-item survey using five-point Likert scale response options, the Dundee Ready Educational Environment Measure (DREEM, the most widely used method to assess learning environments internationally, a personal growth scale, and single-item global learning environment assessment variables. Results: The overall response rate was 369/429 (86%. After adjusting for the medical school year, gender, and ethnicity of the respondents, the JHLES detected differences across institutions in four out of seven domains (57%, with each school having a unique domain profile. The DREEM detected differences in one out of five categories (20%. The JHLES was more strongly correlated than the DREEM to two thirds of the single-item variables and the personal growth scale. The JHLES showed high internal reliability for the total score (α=0.92 and the seven domains (α, 0.56-0.85. Conclusion: The JHLES detected variation between learning environment domains across three educational settings, thereby creating unique learning environment profiles. Interpretation of these profiles may allow schools to understand how they are currently supporting trainees and identify areas needing attention.

  17. Extending the Peak Bandwidth of Parameters for Softmax Selection in Reinforcement Learning.

    Science.gov (United States)

    Iwata, Kazunori

    2016-05-11

    Softmax selection is one of the most popular methods for action selection in reinforcement learning. Although various recently proposed methods may be more effective with full parameter tuning, implementing a complicated method that requires the tuning of many parameters can be difficult. Thus, softmax selection is still worth revisiting, considering the cost savings of its implementation and tuning. In fact, this method works adequately in practice with only one parameter appropriately set for the environment. The aim of this paper is to improve the variable setting of this method to extend the bandwidth of good parameters, thereby reducing the cost of implementation and parameter tuning. To achieve this, we take advantage of the asymptotic equipartition property in a Markov decision process to extend the peak bandwidth of softmax selection. Using a variety of episodic tasks, we show that our setting is effective in extending the bandwidth and that it yields a better policy in terms of stability. The bandwidth is quantitatively assessed in a series of statistical tests.

  18. Sequential decisions: a computational comparison of observational and reinforcement accounts.

    Directory of Open Access Journals (Sweden)

    Nazanin Mohammadi Sepahvand

    Full Text Available Right brain damaged patients show impairments in sequential decision making tasks for which healthy people do not show any difficulty. We hypothesized that this difficulty could be due to the failure of right brain damage patients to develop well-matched models of the world. Our motivation is the idea that to navigate uncertainty, humans use models of the world to direct the decisions they make when interacting with their environment. The better the model is, the better their decisions are. To explore the model building and updating process in humans and the basis for impairment after brain injury, we used a computational model of non-stationary sequence learning. RELPH (Reinforcement and Entropy Learned Pruned Hypothesis space was able to qualitatively and quantitatively reproduce the results of left and right brain damaged patient groups and healthy controls playing a sequential version of Rock, Paper, Scissors. Our results suggests that, in general, humans employ a sub-optimal reinforcement based learning method rather than an objectively better statistical learning approach, and that differences between right brain damaged and healthy control groups can be explained by different exploration policies, rather than qualitatively different learning mechanisms.

  19. A Day-to-Day Route Choice Model Based on Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Fangfang Wei

    2014-01-01

    Full Text Available Day-to-day traffic dynamics are generated by individual traveler’s route choice and route adjustment behaviors, which are appropriate to be researched by using agent-based model and learning theory. In this paper, we propose a day-to-day route choice model based on reinforcement learning and multiagent simulation. Travelers’ memory, learning rate, and experience cognition are taken into account. Then the model is verified and analyzed. Results show that the network flow can converge to user equilibrium (UE if travelers can remember all the travel time they have experienced, but which is not necessarily the case under limited memory; learning rate can strengthen the flow fluctuation, but memory leads to the contrary side; moreover, high learning rate results in the cyclical oscillation during the process of flow evolution. Finally, both the scenarios of link capacity degradation and random link capacity are used to illustrate the model’s applications. Analyses and applications of our model demonstrate the model is reasonable and useful for studying the day-to-day traffic dynamics.

  20. Durability of precast prestressed concrete piles in marine environment : reinforcement corrosion and mitigation - Part 1.

    Science.gov (United States)

    2011-06-01

    Research conducted in Part 1 has verified that precast prestressed concrete piles in : Georgias marine environment are deteriorating. The concrete is subjected to sulfate and : biological attack and the prestressed and nonprestressed reinforcement...

  1. Deep imitation learning for 3D navigation tasks.

    Science.gov (United States)

    Hussein, Ahmed; Elyan, Eyad; Gaber, Mohamed Medhat; Jayne, Chrisina

    2018-01-01

    Deep learning techniques have shown success in learning from raw high-dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in imitation learning has been scarcely explored. Imitation learning can be an efficient method to teach intelligent agents by providing a set of demonstrations to learn from. However, generalizing to situations that are not represented in the demonstrations can be challenging, especially in 3D environments. In this paper, we propose a deep imitation learning method to learn navigation tasks from demonstrations in a 3D environment. The supervised policy is refined using active learning in order to generalize to unseen situations. This approach is compared to two popular deep reinforcement learning techniques: deep-Q-networks and Asynchronous actor-critic (A3C). The proposed method as well as the reinforcement learning methods employ deep convolutional neural networks and learn directly from raw visual input. Methods for combining learning from demonstrations and experience are also investigated. This combination aims to join the generalization ability of learning by experience with the efficiency of learning by imitation. The proposed methods are evaluated on 4 navigation tasks in a 3D simulated environment. Navigation tasks are a typical problem that is relevant to many real applications. They pose the challenge of requiring demonstrations of long trajectories to reach the target and only providing delayed rewards (usually terminal) to the agent. The experiments show that the proposed method can successfully learn navigation tasks from raw visual input while learning from experience methods fail to learn an effective policy. Moreover, it is shown that active learning can significantly improve the performance of the initially learned policy using a small number of active samples.

  2. Interactive learning environments in augmented reality technology

    Directory of Open Access Journals (Sweden)

    Rafał Wojciechowski

    2010-01-01

    Full Text Available In this paper, the problem of creation of learning environments based on augmented reality (AR is considered. The concept of AR is presented as a tool for safe and cheap experimental learning. In AR learning environments students may acquire knowledge by personally carrying out experiments on virtual objects by manipulating real objects located in real environments. In the paper, a new approach to creation of interactive educational scenarios, called Augmented Reality Interactive Scenario Modeling (ARISM, is mentioned. In this approach, the process of building learning environments is divided into three stages, each of them performed by users with different technical and domain knowledge. The ARISM approach enables teachers who are not computer science experts to create AR learning environments adapted to the needs of their students.

  3. Constructivist learning theories and complex learning environments

    NARCIS (Netherlands)

    R-J. Simons; Dr. S. Bolhuis

    2004-01-01

    Learning theories broadly characterised as constructivist, agree on the importance to learning of the environment, but differ on what exactly it is that constitutes this importance. Accordingly, they also differ on the educational consequences to be drawn from the theoretical perspective. Cognitive

  4. Effective Learning Environments in Relation to Different Learning Theories

    NARCIS (Netherlands)

    Guney, A.; Al, S.

    2012-01-01

    There are diverse learning theories which explain learning processes which are discussed within this paper, through cognitive structure of learning process. Learning environments are usually described in terms of pedagogical philosophy, curriculum design and social climate. There have been only just

  5. Effects of Multisensory Environments on Stereotyped Behaviours Assessed as Maintained by Automatic Reinforcement

    Science.gov (United States)

    Hill, Lindsay; Trusler, Karen; Furniss, Frederick; Lancioni, Giulio

    2012-01-01

    Background: The aim of the present study was to evaluate the effects of the sensory equipment provided in a multi-sensory environment (MSE) and the level of social contact provided on levels of stereotyped behaviours assessed as being maintained by automatic reinforcement. Method: Stereotyped and engaged behaviours of two young people with severe…

  6. A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment.

    Science.gov (United States)

    Hung, Shao-Ming; Givigi, Sidney N

    2017-01-01

    In the past two decades, unmanned aerial vehicles (UAVs) have demonstrated their efficacy in supporting both military and civilian applications, where tasks can be dull, dirty, dangerous, or simply too costly with conventional methods. Many of the applications contain tasks that can be executed in parallel, hence the natural progression is to deploy multiple UAVs working together as a force multiplier. However, to do so requires autonomous coordination among the UAVs, similar to swarming behaviors seen in animals and insects. This paper looks at flocking with small fixed-wing UAVs in the context of a model-free reinforcement learning problem. In particular, Peng's Q(λ) with a variable learning rate is employed by the followers to learn a control policy that facilitates flocking in a leader-follower topology. The problem is structured as a Markov decision process, where the agents are modeled as small fixed-wing UAVs that experience stochasticity due to disturbances such as winds and control noises, as well as weight and balance issues. Learned policies are compared to ones solved using stochastic optimal control (i.e., dynamic programming) by evaluating the average cost incurred during flight according to a cost function. Simulation results demonstrate the feasibility of the proposed learning approach at enabling agents to learn how to flock in a leader-follower topology, while operating in a nonstationary stochastic environment.

  7. Influences of Formal Learning, Personal Learning Orientation, and Supportive Learning Environment on Informal Learning

    Science.gov (United States)

    Choi, Woojae; Jacobs, Ronald L.

    2011-01-01

    While workplace learning includes formal and informal learning, the relationship between the two has been overlooked, because they have been viewed as separate entities. This study investigated the effects of formal learning, personal learning orientation, and supportive learning environment on informal learning among 203 middle managers in Korean…

  8. A strategy learning model for autonomous agents based on classification

    Directory of Open Access Journals (Sweden)

    Śnieżyński Bartłomiej

    2015-09-01

    Full Text Available In this paper we propose a strategy learning model for autonomous agents based on classification. In the literature, the most commonly used learning method in agent-based systems is reinforcement learning. In our opinion, classification can be considered a good alternative. This type of supervised learning can be used to generate a classifier that allows the agent to choose an appropriate action for execution. Experimental results show that this model can be successfully applied for strategy generation even if rewards are delayed. We compare the efficiency of the proposed model and reinforcement learning using the farmer-pest domain and configurations of various complexity. In complex environments, supervised learning can improve the performance of agents much faster that reinforcement learning. If an appropriate knowledge representation is used, the learned knowledge may be analyzed by humans, which allows tracking the learning process

  9. Architecture for Collaborative Learning Activities in Hybrid Learning Environments

    OpenAIRE

    Ibáñez, María Blanca; Maroto, David; García Rueda, José Jesús; Leony, Derick; Delgado Kloos, Carlos

    2012-01-01

    3D virtual worlds are recognized as collaborative learning environments. However, the underlying technology is not sufficiently mature and the virtual worlds look cartoonish, unlinked to reality. Thus, it is important to enrich them with elements from the real world to enhance student engagement in learning activities. Our approach is to build learning environments where participants can either be in the real world or in its mirror world while sharing the same hybrid space in a collaborative ...

  10. Collaborations in Open Learning Environments

    NARCIS (Netherlands)

    Spoelstra, Howard

    2015-01-01

    This thesis researches automated services for professionals aiming at starting collaborative learning projects in open learning environments, such as MOOCs. It investigates the theoretical backgrounds of team formation for collaborative learning. Based on the outcomes, a model is developed

  11. Personalized learning Ecologies in Problem and Project Based Learning Environments

    DEFF Research Database (Denmark)

    Rongbutsri, Nikorn; Ryberg, Thomas; Zander, Pär-Ola

    2012-01-01

    is in contrast to an artificial learning setting often found in traditional education. As many other higher education institutions, Aalborg University aims at providing learning environments that support the underlying pedagogical approach employed, and which can lead to different online and offline learning.......g. coordination, communication, negotiation, document sharing, calendars, meetings and version control. Furthermore, the pedagogical fabric of LMSs/VLEs have recently been called into question and critiqued by proponents of Personal Learning Environments (PLEs)(Ryberg, Buus, & Georgsen, 2011) . In sum....... making it important to understand and conceptualise students’ use of technology. Ecology is the study of relationship between organisms in an environment which is the set of circumstances surrounding that organism. Learning ecologies are the study of the relationship of a learner or a group of learners...

  12. Web-Based Learning Environment Based on Students’ Needs

    Science.gov (United States)

    Hamzah, N.; Ariffin, A.; Hamid, H.

    2017-08-01

    Traditional learning needs to be improved since it does not involve active learning among students. Therefore, in the twenty-first century, the development of internet technology in the learning environment has become the main needs of each student. One of the learning environments to meet the needs of the teaching and learning process is a web-based learning environment. This study aims to identify the characteristics of a web-based learning environment that supports students’ learning needs. The study involved 542 students from fifteen faculties in a public higher education institution in Malaysia. A quantitative method was used to collect the data via a questionnaire survey by randomly. The findings indicate that the characteristics of a web-based learning environment that support students’ needs in the process of learning are online discussion forum, lecture notes, assignments, portfolio, and chat. In conclusion, the students overwhelmingly agreed that online discussion forum is the highest requirement because the tool can provide a space for students and teachers to share knowledge and experiences related to teaching and learning.

  13. EFFICIENT SPECTRUM UTILIZATION IN COGNITIVE RADIO THROUGH REINFORCEMENT LEARNING

    Directory of Open Access Journals (Sweden)

    Dhananjay Kumar

    2013-09-01

    Full Text Available Machine learning schemes can be employed in cognitive radio systems to intelligently locate the spectrum holes with some knowledge about the operating environment. In this paper, we formulate a variation of Actor Critic Learning algorithm known as Continuous Actor Critic Learning Automaton (CACLA and compare this scheme with Actor Critic Learning scheme and existing Q–learning scheme. Simulation results show that our CACLA scheme has lesser execution time and achieves higher throughput compared to other two schemes.

  14. Students’ Motivation for Learning in Virtual Learning Environments

    Directory of Open Access Journals (Sweden)

    Andrea Carvalho Beluce

    2015-04-01

    Full Text Available The specific characteristics of online education require of the student engagement and autonomy, factors which are related to motivation for learning. This study investigated students’ motivation in virtual learning environments (VLEs. For this, it used the Teaching and Learning Strategy and Motivation to Learn Scale in Virtual Learning Environments (TLSM-VLE. The scale presented 32 items and six dimensions, three of which aimed to measure the variables of autonomous motivation, controlled motivation, and demotivation. The participants were 572 students from the Brazilian state of Paraná, enrolled on higher education courses on a continuous education course. The results revealed significant rates for autonomous motivational behavior. It is considered that the results obtained may provide contributions for the educators and psychologists who work with VLEs, leading to further studies of the area providing information referent to the issue investigated in this study.

  15. An Energy-Efficient Spectrum-Aware Reinforcement Learning-Based Clustering Algorithm for Cognitive Radio Sensor Networks.

    Science.gov (United States)

    Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal

    2015-08-13

    It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach.

  16. Relationship between learning environment characteristics and academic engagement

    NARCIS (Netherlands)

    Opdenakker, Marie-Christine; Minnaert, Alexander

    The relationship between learning environment characteristics and academic engagement of 777 Grade 6 children located in 41 learning environments was explored. Questionnaires were used to tap learning environment perceptions of children, their academic engagement, and their ethnic-cultural

  17. Rats bred for helplessness exhibit positive reinforcement learning deficits which are not alleviated by an antidepressant dose of the MAO-B inhibitor deprenyl.

    Science.gov (United States)

    Schulz, Daniela; Henn, Fritz A; Petri, David; Huston, Joseph P

    2016-08-04

    Principles of negative reinforcement learning may play a critical role in the etiology and treatment of depression. We examined the integrity of positive reinforcement learning in congenitally helpless (cH) rats, an animal model of depression, using a random ratio schedule and a devaluation-extinction procedure. Furthermore, we tested whether an antidepressant dose of the monoamine oxidase (MAO)-B inhibitor deprenyl would reverse any deficits in positive reinforcement learning. We found that cH rats (n=9) were impaired in the acquisition of even simple operant contingencies, such as a fixed interval (FI) 20 schedule. cH rats exhibited no apparent deficits in appetite or reward sensitivity. They reacted to the devaluation of food in a manner consistent with a dose-response relationship. Reinforcer motivation as assessed by lever pressing across sessions with progressively decreasing reward probabilities was highest in congenitally non-helpless (cNH, n=10) rats as long as the reward probabilities remained relatively high. cNH compared to wild-type (n=10) rats were also more resistant to extinction across sessions. Compared to saline (n=5), deprenyl (n=5) reduced the duration of immobility of cH rats in the forced swimming test, indicative of antidepressant effects, but did not restore any deficits in the acquisition of a FI 20 schedule. We conclude that positive reinforcement learning was impaired in rats bred for helplessness, possibly due to motivational impairments but not deficits in reward sensitivity, and that deprenyl exerted antidepressant effects but did not reverse the deficits in positive reinforcement learning. Copyright © 2016 IBRO. Published by Elsevier Ltd. All rights reserved.

  18. School and workplace as learning environments in VET

    DEFF Research Database (Denmark)

    Jørgensen, Christian Helms

    as limitations for learning, and thus frame the opportunities for learning. The second, the socio-cultural learning environment is constituted by the social and cultural relations and communities in the workplace and in school. I distinguish between three different types of social relations in the workplace......The aim of this paper is to present an analytical model to study school and workplace as different learning environments and discuss some findings from the application of the model on a case study. First the paper tries to answer the question: what is a learning environment? In most other studies...... schools and workplaces are not only considered to be different learning environment, but are also analysed using different approaches. In this paper I will propose a common model to analyse and compare the two learning environments, drawing on sociology of work (Kern & Schumann 1984; Braverman 1976...

  19. Factors Influencing Learning Environments in an Integrated Experiential Program

    Science.gov (United States)

    Koci, Peter

    The research conducted for this dissertation examined the learning environment of a specific high school program that delivered the explicit curriculum through an integrated experiential manner, which utilized field and outdoor experiences. The program ran over one semester (five months) and it integrated the grade 10 British Columbian curriculum in five subjects. A mixed methods approach was employed to identify the students' perceptions and provide richer descriptions of their experiences related to their unique learning environment. Quantitative instruments were used to assess changes in students' perspectives of their learning environment, as well as other supporting factors including students' mindfulness, and behaviours towards the environment. Qualitative data collection included observations, open-ended questions, and impromptu interviews with the teacher. The qualitative data describe the factors and processes that influenced the learning environment and give a richer, deeper interpretation which complements the quantitative findings. The research results showed positive scores on all the quantitative measures conducted, and the qualitative data provided further insight into descriptions of learning environment constructs that the students perceived as most important. A major finding was that the group cohesion measure was perceived by students as the most important attribute of their preferred learning environment. A flow chart was developed to help the researcher conceptualize how the learning environment, learning process, and outcomes relate to one another in the studied program. This research attempts to explain through the consideration of this case study: how learning environments can influence behavioural change and how an interconnectedness among several factors in the learning process is influenced by the type of learning environment facilitated. Considerably more research is needed in this area to understand fully the complexity learning

  20. Reinforcement Learning Based Web Service Compositions for Mobile Business

    Science.gov (United States)

    Zhou, Juan; Chen, Shouming

    In this paper, we propose a new solution to Reactive Web Service Composition, via molding with Reinforcement Learning, and introducing modified (alterable) QoS variables into the model as elements in the Markov Decision Process tuple. Moreover, we give an example of Reactive-WSC-based mobile banking, to demonstrate the intrinsic capability of the solution in question of obtaining the optimized service composition, characterized by (alterable) target QoS variable sets with optimized values. Consequently, we come to the conclusion that the solution has decent potentials in boosting customer experiences and qualities of services in Web Services, and those in applications in the whole electronic commerce and business sector.

  1. Global reinforcement training of CrossNets

    Science.gov (United States)

    Ma, Xiaolong

    2007-10-01

    Hybrid "CMOL" integrated circuits, incorporating advanced CMOS devices for neural cell bodies, nanowires as axons and dendrites, and latching switches as synapses, may be used for the hardware implementation of extremely dense (107 cells and 1012 synapses per cm2) neuromorphic networks, operating up to 10 6 times faster than their biological prototypes. We are exploring several "Cross- Net" architectures that accommodate the limitations imposed by CMOL hardware and should allow effective training of the networks without a direct external access to individual synapses. Our studies have show that CrossNets based on simple (two-terminal) crosspoint devices can work well in at least two modes: as Hop-field networks for associative memory and multilayer perceptrons for classification tasks. For more intelligent tasks (such as robot motion control or complex games), which do not have "examples" for supervised learning, more advanced training methods such as the global reinforcement learning are necessary. For application of global reinforcement training algorithms to CrossNets, we have extended Williams's REINFORCE learning principle to a more general framework and derived several learning rules that are more suitable for CrossNet hardware implementation. The results of numerical experiments have shown that these new learning rules can work well for both classification tasks and reinforcement tasks such as the cartpole balancing control problem. Some limitations imposed by the CMOL hardware need to be carefully addressed for the the successful application of in situ reinforcement training to CrossNets.

  2. Grounding the meanings in sensorimotor behavior using reinforcement learning

    Directory of Open Access Journals (Sweden)

    Igor eFarkaš

    2012-02-01

    Full Text Available The recent outburst of interest in cognitive developmental robotics is fueled by the ambition to propose ecologically plausible mechanisms of how, among other things, a learning agent/robot could ground linguistic meanings in its sensorimotor behaviour. Along this stream, we propose a model that allows the simulated iCub robot to learn the meanings of actions (point, touch and push oriented towards objects in robot's peripersonal space. In our experiments, the iCub learns to execute motor actions and comment on them. Architecturally, the model is composed of three neural-network-based modules that are trained in different ways. The first module, a two-layer perceptron, is trained by back-propagation to attend to the target position in the visual scene, given the low-level visual information and the feature-based target information. The second module, having the form of an actor-critic architecture, is the most distinguishing part of our model, and is trained by a continuous version of reinforcement learning to execute actions as sequences, based on a linguistic command. The third module, an echo-state network, is trained to provide the linguistic description of the executed actions. The trained model generalises well in case of novel action-target combinations with randomised initial arm positions. It can also promptly adapt its behavior if the action/target suddenly changes during motor execution.

  3. The VREST learning environment.

    Science.gov (United States)

    Kunst, E E; Geelkerken, R H; Sanders, A J B

    2005-01-01

    The VREST learning environment is an integrated architecture to improve the education of health care professionals. It is a combination of a learning, content and assessment management system based on virtual reality. The generic architecture is now being build and tested around the Lichtenstein protocol for hernia inguinalis repair.

  4. How People Learn in an Asynchronous Online Learning Environment: The Relationships between Graduate Students' Learning Strategies and Learning Satisfaction

    Science.gov (United States)

    Choi, Beomkyu

    2016-01-01

    The purpose of this study was to examine the relationships between learners' learning strategies and learning satisfaction in an asynchronous online learning environment. In an attempt to shed some light on how people learn in an online learning environment, one hundred and sixteen graduate students who were taking online learning courses…

  5. Student-Teacher Interaction in Online Learning Environments

    Science.gov (United States)

    Wright, Robert D., Ed.

    2015-01-01

    As face-to-face interaction between student and instructor is not present in online learning environments, it is increasingly important to understand how to establish and maintain social presence in online learning. "Student-Teacher Interaction in Online Learning Environments" provides successful strategies and procedures for developing…

  6. A Well Designed School Environment Facilitates Brain Learning.

    Science.gov (United States)

    Chan, Tak Cheung; Petrie, Garth

    2000-01-01

    Examines how school design facilitates learning by complementing how the brain learns. How the brain learns is discussed and how an artistic environment, spaciousness in the learning areas, color and lighting, and optimal thermal and acoustical environments aid student learning. School design suggestions conclude the article. (GR)

  7. Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Taylor, M.E.; Stone, P.

    2010-01-01

    Temporal difference and evolutionary methods are two of the most common approaches to solving reinforcement learning problems. However, there is little consensus on their relative merits and there have been few empirical studies that directly compare their performance. This article aims to address

  8. CLEW: A Cooperative Learning Environment for the Web.

    Science.gov (United States)

    Ribeiro, Marcelo Blois; Noya, Ricardo Choren; Fuks, Hugo

    This paper outlines CLEW (collaborative learning environment for the Web). The project combines MUD (Multi-User Dimension), workflow, VRML (Virtual Reality Modeling Language) and educational concepts like constructivism in a learning environment where students actively participate in the learning process. The MUD shapes the environment structure.…

  9. DynaLearn-An Intelligent Learning Environment for Learning Conceptual Knowledge

    NARCIS (Netherlands)

    Bredeweg, Bert; Liem, Jochem; Beek, Wouter; Linnebank, Floris; Gracia, Jorge; Lozano, Esther; Wißner, Michael; Bühling, René; Salles, Paulo; Noble, Richard; Zitek, Andreas; Borisova, Petya; Mioduser, David

    2013-01-01

    Articulating thought in computerbased media is a powerful means for humans to develop their understanding of phenomena. We have created DynaLearn, an intelligent learning environment that allows learners to acquire conceptual knowledge by constructing and simulating qualitative models of how systems

  10. The use of Twitter to facilitate engagement and reflection in a constructionist learning environment.

    Science.gov (United States)

    Desselle, Shane P

    Determine students' self-reported use of Twitter in a health systems course and gauge their perceptions of its value and utility for self-guided supplementation of course material, and evaluate the quality of students' reflections from information they found on Twitter. Students in a health systems course create a Twitter account to remain abreast of current developments in pharmacy and health systems. They were afforded the autonomy to follow organizations/individuals they chose and write reflective mini-papers on selected tweets from their Twitter feed prior to each course session. A self-administered survey solicited students' favor toward various aspects of the Twitter reflection assignment. An examination of students' reflections as the course progressed was also undertaken. Approximately 2/3 of the students enrolled in the course responded to the survey. Student perceptions of the Twitter assignment were quite favorable, with highest favor related to facets regarding the construction of their own learning and continuation of engagement throughout the course. Responses to open-ended questions corroborated students' perceptions of their own learning, as did the content and quality of their reflections during progression of the course. The course design reinforced previous claims outside of pharmacy that Twitter can be a useful tool to reinforce or create new learning paradigms, but especially under the auspices of established theory, such as a constructivist environment employing constructionism pedagogy. A course like health systems in programs of academic pharmacy might especially benefit from use of Twitter and such course design. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Learning Environments Designed According to Learning Styles and Its Effects on Mathematics Achievement

    Science.gov (United States)

    Özerem, Aysen; Akkoyunlu, Buket

    2015-01-01

    Problem Statement: While designing a learning environment it is vital to think about learner characteristics (learning styles, approaches, motivation, interests… etc.) in order to promote effective learning. The learning environment and learning process should be designed not to enable students to learn in the same manner and at the same level,…

  12. Distribution majorization of corner points by reinforcement learning for moving object detection

    Science.gov (United States)

    Wu, Hao; Yu, Hao; Zhou, Dongxiang; Cheng, Yongqiang

    2018-04-01

    Corner points play an important role in moving object detection, especially in the case of free-moving camera. Corner points provide more accurate information than other pixels and reduce the computation which is unnecessary. Previous works only use intensity information to locate the corner points, however, the information that former and the last frames provided also can be used. We utilize the information to focus on more valuable area and ignore the invaluable area. The proposed algorithm is based on reinforcement learning, which regards the detection of corner points as a Markov process. In the Markov model, the video to be detected is regarded as environment, the selections of blocks for one corner point are regarded as actions and the performance of detection is regarded as state. Corner points are assigned to be the blocks which are seperated from original whole image. Experimentally, we select a conventional method which uses marching and Random Sample Consensus algorithm to obtain objects as the main framework and utilize our algorithm to improve the result. The comparison between the conventional method and the same one with our algorithm show that our algorithm reduce 70% of the false detection.

  13. From Creatures of Habit to Goal-Directed Learners: Tracking the Developmental Emergence of Model-Based Reinforcement Learning.

    Science.gov (United States)

    Decker, Johannes H; Otto, A Ross; Daw, Nathaniel D; Hartley, Catherine A

    2016-06-01

    Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults performed a sequential reinforcement-learning task that enabled estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was apparent in choice behavior across all age groups, a model-based strategy was absent in children, became evident in adolescents, and strengthened in adults. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior. © The Author(s) 2016.

  14. Reinforcement Learning for Routing in Cognitive Radio Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    Hasan A. A. Al-Rawi

    2014-01-01

    Full Text Available Cognitive radio (CR enables unlicensed users (or secondary users, SUs to sense for and exploit underutilized licensed spectrum owned by the licensed users (or primary users, PUs. Reinforcement learning (RL is an artificial intelligence approach that enables a node to observe, learn, and make appropriate decisions on action selection in order to maximize network performance. Routing enables a source node to search for a least-cost route to its destination node. While there have been increasing efforts to enhance the traditional RL approach for routing in wireless networks, this research area remains largely unexplored in the domain of routing in CR networks. This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate through simulation. New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing. Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs’ network performance without significantly jeopardizing PUs’ network performance, specifically SUs’ interference to PUs.

  15. Learning motor skills from algorithms to robot experiments

    CERN Document Server

    Kober, Jens

    2014-01-01

    This book presents the state of the art in reinforcement learning applied to robotics both in terms of novel algorithms and applications. It discusses recent approaches that allow robots to learn motor skills and presents tasks that need to take into account the dynamic behavior of the robot and its environment, where a kinematic movement plan is not sufficient. The book illustrates a method that learns to generalize parameterized motor plans which is obtained by imitation or reinforcement learning, by adapting a small set of global parameters, and appropriate kernel-based reinforcement learning algorithms. The presented applications explore highly dynamic tasks and exhibit a very efficient learning process. All proposed approaches have been extensively validated with benchmarks tasks, in simulation, and on real robots. These tasks correspond to sports and games but the presented techniques are also applicable to more mundane household tasks. The book is based on the first author’s doctoral thesis, which wo...

  16. Learning Object Metadata in a Web-Based Learning Environment

    NARCIS (Netherlands)

    Avgeriou, Paris; Koutoumanos, Anastasios; Retalis, Symeon; Papaspyrou, Nikolaos

    2000-01-01

    The plethora and variance of learning resources embedded in modern web-based learning environments require a mechanism to enable their structured administration. This goal can be achieved by defining metadata on them and constructing a system that manages the metadata in the context of the learning

  17. Space Objects Maneuvering Detection and Prediction via Inverse Reinforcement Learning

    Science.gov (United States)

    Linares, R.; Furfaro, R.

    This paper determines the behavior of Space Objects (SOs) using inverse Reinforcement Learning (RL) to estimate the reward function that each SO is using for control. The approach discussed in this work can be used to analyze maneuvering of SOs from observational data. The inverse RL problem is solved using the Feature Matching approach. This approach determines the optimal reward function that a SO is using while maneuvering by assuming that the observed trajectories are optimal with respect to the SO's own reward function. This paper uses estimated orbital elements data to determine the behavior of SOs in a data-driven fashion.

  18. Student Motivation in Constructivist Learning Environment

    Science.gov (United States)

    Cetin-Dindar, Ayla

    2016-01-01

    The purpose of this study was to investigate the relation between constructivist learning environment and students'motivation to learn science by testing whether students' self-efficacy in learning science, intrinsically and extrinsically motivated science learning increase and students' anxiety about science assessment decreases when more…

  19. The Predicaments of Language Learners in Traditional Learning Environments

    Science.gov (United States)

    Shafie, Latisha Asmaak; Mansor, Mahani

    2009-01-01

    Some public universities in developing countries have traditional language learning environments such as classrooms with only blackboards and furniture which do not provide conducive learning environments. These traditional environments are unable to cater for digital learners who need to learn with learning technologies. In order to create…

  20. Continuous theta-burst stimulation (cTBS) over the lateral prefrontal cortex alters reinforcement learning bias

    NARCIS (Netherlands)

    Ott, D.V.M.; Ullsperger, M.; Jocham, G.; Neumann, J.; Klein, T.A.

    2011-01-01

    The prefrontal cortex is known to play a key role in higher-order cognitive functions. Recently, we showed that this brain region is active in reinforcement learning, during which subjects constantly have to integrate trial outcomes in order to optimize performance. To further elucidate the role of

  1. Beyond the Art Lesson: Free-Choice Learning Centers

    Science.gov (United States)

    Werth, Laurie

    2010-01-01

    In this article, the author emphasizes that by providing learning centers in the art studio environment and by providing "free-choice time," art educators can encourage and reinforce the natural learning styles of students. Learning centers give elementary students the freedom to pursue individual artistic expression. They give students an…

  2. Smile: Student Modification in Learning Environments. Establishing Congruence between Actual and Preferred Classroom Learning Environment.

    Science.gov (United States)

    Yarrow, Allan; Millwater, Jan

    1995-01-01

    This study investigated whether classroom psychosocial environment, as perceived by student teachers, could be improved to their preferred level. Students completed the College and University Classroom Environment Inventory, discussed interventions, then completed it again. Significant deficiencies surfaced in the learning environment early in the…

  3. A collaborative learning environment for Management Education based on Experiential Learning

    DEFF Research Database (Denmark)

    Lidón, Iván; Rebollar, Rubén; Møller, Charles

    2011-01-01

    from a student learning perspective. This paper presents the design and the operating principles of a learning environment that has been formulated in a joint development by teachers and researchers of the universities of Zaragoza (Spain) and Aalborg (Denmark). In this paper we describe what...... the learning environment developed consists in, beginning by presenting the theoretical foundation considered for its design, to then describe it in detail and present it. Finally, we will discuss the implications of this environment for researching and teaching in this field, and gather the conclusions...

  4. Clinical Learning Environment at Shiraz Medical School

    Directory of Open Access Journals (Sweden)

    Sedigheh Ebrahimi

    2013-01-01

    Full Text Available Clinical learning occurs in the context of a dynamic environment. Learning environment found to be one of the most important factors in determining the success of an effective teaching program. To investigate, from the attending and resident's perspective, factors that may affect student leaning in the educational hospital setting at Shiraz University of Medical Sciences (SUMS. This study combined qualitative and quantitative methods to determine factors affecting effective learning in clinical setting. Residents evaluated the perceived effectiveness of the university hospital learning environment. Fifty two faculty members and 132 residents participated in this study. Key determinants that contribute to an effective clinical teaching were autonomy, supervision, social support, workload, role clarity, learning opportunity, work diversity and physical facilities. In a good clinical setting, residents should be appreciated and given appropriate opportunities to study in order to meet their objectives. They require a supportive environment to consolidate their knowledge, skills and judgment.

  5. Clinical learning environment at Shiraz Medical School.

    Science.gov (United States)

    Rezaee, Rita; Ebrahimi, Sedigheh

    2013-01-01

    Clinical learning occurs in the context of a dynamic environment. Learning environment found to be one of the most important factors in determining the success of an effective teaching program. To investigate, from the attending and resident's perspective, factors that may affect student leaning in the educational hospital setting at Shiraz University of Medical Sciences (SUMS). This study combined qualitative and quantitative methods to determine factors affecting effective learning in clinical setting. Residents evaluated the perceived effectiveness of the university hospital learning environment. Fifty two faculty members and 132 residents participated in this study. Key determinants that contribute to an effective clinical teaching were autonomy, supervision, social support, workload, role clarity, learning opportunity, work diversity and physical facilities. In a good clinical setting, residents should be appreciated and given appropriate opportunities to study in order to meet their objectives. They require a supportive environment to consolidate their knowledge, skills and judgment. © 2013 Tehran University of Medical Sciences. All rights reserved.

  6. Active Learning Environment with Lenses in Geometric Optics

    Science.gov (United States)

    Tural, Güner

    2015-01-01

    Geometric optics is one of the difficult topics for students within physics discipline. Students learn better via student-centered active learning environments than the teacher-centered learning environments. So this study aimed to present a guide for middle school teachers to teach lenses in geometric optics via active learning environment…

  7. Learning styles: individualizing computer-based learning environments

    Directory of Open Access Journals (Sweden)

    Tim Musson

    1995-12-01

    Full Text Available While the need to adapt teaching to the needs of a student is generally acknowledged (see Corno and Snow, 1986, for a wide review of the literature, little is known about the impact of individual learner-differences on the quality of learning attained within computer-based learning environments (CBLEs. What evidence there is appears to support the notion that individual differences have implications for the degree of success or failure experienced by students (Ford and Ford, 1992 and by trainee end-users of software packages (Bostrom et al, 1990. The problem is to identify the way in which specific individual characteristics of a student interact with particular features of a CBLE, and how the interaction affects the quality of the resultant learning. Teaching in a CBLE is likely to require a subset of teaching strategies different from that subset appropriate to more traditional environments, and the use of a machine may elicit different behaviours from those normally arising in a classroom context.

  8. The sociability of computer-supported collaborative learning environments

    NARCIS (Netherlands)

    Kreijns, C.J.; Kirschner, P.A.; Jochems, W.M.G.

    2002-01-01

    There is much positive research on computer-supported collaborative learning (CSCL) environments in asynchronous distributed learning groups (DLGs). There is also research that shows that contemporary CSCL environments do not completely fulfil expectations on supporting interactive group learning,

  9. Optimal medication dosing from suboptimal clinical examples: a deep reinforcement learning approach.

    Science.gov (United States)

    Nemati, Shamim; Ghassemi, Mohammad M; Clifford, Gari D

    2016-08-01

    Misdosing medications with sensitive therapeutic windows, such as heparin, can place patients at unnecessary risk, increase length of hospital stay, and lead to wasted hospital resources. In this work, we present a clinician-in-the-loop sequential decision making framework, which provides an individualized dosing policy adapted to each patient's evolving clinical phenotype. We employed retrospective data from the publicly available MIMIC II intensive care unit database, and developed a deep reinforcement learning algorithm that learns an optimal heparin dosing policy from sample dosing trails and their associated outcomes in large electronic medical records. Using separate training and testing datasets, our model was observed to be effective in proposing heparin doses that resulted in better expected outcomes than the clinical guidelines. Our results demonstrate that a sequential modeling approach, learned from retrospective data, could potentially be used at the bedside to derive individualized patient dosing policies.

  10. Context-aware Cloud Computing for Personal Learning Environment

    OpenAIRE

    Chen, Feng; Al-Bayatti, Ali Hilal; Siewe, Francois

    2016-01-01

    Virtual learning means to learn from social interactions in a virtual platform that enables people to study anywhere and at any time. Current Virtual Learning Environments (VLEs) are a range of integrated web based applications to support and enhance the education. Normally, VLEs are institution centric; are owned by the institutions and are designed to support formal learning, which do not support lifelong learning. These limitations led to the research of Personal Learning Environments (PLE...

  11. Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems

    International Nuclear Information System (INIS)

    Wei Qing-Lai; Song Rui-Zhuo; Xiao Wen-Dong; Sun Qiu-Ye

    2015-01-01

    This paper estimates an off-policy integral reinforcement learning (IRL) algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the system data generated by an arbitrary control. Moreover, off-policy IRL can be regarded as a direct learning method, which avoids the identification of system dynamics. In this paper, the performance index function is first given based on the system tracking error and control error. For solving the Hamilton–Jacobi–Bellman (HJB) equation, an off-policy IRL algorithm is proposed. It is proven that the iterative control makes the tracking error system asymptotically stable, and the iterative performance index function is convergent. Simulation study demonstrates the effectiveness of the developed tracking control method. (paper)

  12. Heads for learning, tails for memory: Reward, reinforcement and a role of dopamine in determining behavioural relevance across multiple timescales

    Directory of Open Access Journals (Sweden)

    Mathieu eBaudonnat

    2013-10-01

    Full Text Available Dopamine has long been tightly associated with aspects of reinforcement learning and motivation in simple situations where there are a limited number of stimuli to guide behaviour and constrained range of outcomes. In naturalistic situations, however, there are many potential cues and foraging strategies that could be adopted, and it is critical that animals determine what might be behaviourally relevant in such complex environments. This requires not only detecting discrepancies with what they have recently experienced, but also identifying similarities with past experiences stored in memory. Here, we review what role dopamine might play in determining how and when to learn about the world, and how to develop choice policies appropriate to the situation faced. We discuss evidence that dopamine is shaped by motivation and memory and in turn shapes reward-based memory formation. In particular, we suggest that hippocampal-striatal-dopamine networks may interact to determine how surprising the world is and to either inhibit or promote actions at time of behavioural uncertainty.

  13. A reinforcement learning model of joy, distress, hope and fear

    Science.gov (United States)

    Broekens, Joost; Jacobs, Elmer; Jonker, Catholijn M.

    2015-07-01

    In this paper we computationally study the relation between adaptive behaviour and emotion. Using the reinforcement learning framework, we propose that learned state utility, ?, models fear (negative) and hope (positive) based on the fact that both signals are about anticipation of loss or gain. Further, we propose that joy/distress is a signal similar to the error signal. We present agent-based simulation experiments that show that this model replicates psychological and behavioural dynamics of emotion. This work distinguishes itself by assessing the dynamics of emotion in an adaptive agent framework - coupling it to the literature on habituation, development, extinction and hope theory. Our results support the idea that the function of emotion is to provide a complex feedback signal for an organism to adapt its behaviour. Our work is relevant for understanding the relation between emotion and adaptation in animals, as well as for human-robot interaction, in particular how emotional signals can be used to communicate between adaptive agents and humans.

  14. Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.

    Directory of Open Access Journals (Sweden)

    Borja Fernandez-Gauna

    Full Text Available Multi-Agent Reinforcement Learning (MARL algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS in the relevant state-action subspace. Each agent's local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.

  15. Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: a simulated robotic study.

    Science.gov (United States)

    Mirolli, Marco; Santucci, Vieri G; Baldassarre, Gianluca

    2013-03-01

    An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions. Copyright © 2013 Elsevier Ltd. All rights reserved.

  16. Homeostatic Agent for General Environment

    Science.gov (United States)

    Yoshida, Naoto

    2018-03-01

    One of the essential aspect in biological agents is dynamic stability. This aspect, called homeostasis, is widely discussed in ethology, neuroscience and during the early stages of artificial intelligence. Ashby's homeostats are general-purpose learning machines for stabilizing essential variables of the agent in the face of general environments. However, despite their generality, the original homeostats couldn't be scaled because they searched their parameters randomly. In this paper, first we re-define the objective of homeostats as the maximization of a multi-step survival probability from the view point of sequential decision theory and probabilistic theory. Then we show that this optimization problem can be treated by using reinforcement learning algorithms with special agent architectures and theoretically-derived intrinsic reward functions. Finally we empirically demonstrate that agents with our architecture automatically learn to survive in a given environment, including environments with visual stimuli. Our survival agents can learn to eat food, avoid poison and stabilize essential variables through theoretically-derived single intrinsic reward formulations.

  17. Language Learning in Outdoor Environments: Perspectives of preschool staff

    Directory of Open Access Journals (Sweden)

    Martina Norling

    2015-03-01

    Full Text Available Language environment is highlighted as an important area in the early childhood education sector. The term language environment refers to language-promoting aspects of education, such as preschool staff’s use of verbal language in interacting with the children. There is a lack of research about language learning in outdoor environments; thus children’s language learning is mostly based on the indoor physical environment. The aim of this study is therefore to explore, analyse, and describe how preschool staff perceive language learning in outdoor environments. The data consists of focus-group interviews with 165 preschool staff members, conducted in three cities in Sweden. The study is meaningful, thus results contribute knowledge regarding preschool staffs’ understandings of language learning in outdoor environments and develop insights to help preschool staff stimulate children’s language learning in outdoor environments.

  18. Identification of animal behavioral strategies by inverse reinforcement learning.

    Directory of Open Access Journals (Sweden)

    Shoichiro Yamaguchi

    2018-05-01

    Full Text Available Animals are able to reach a desired state in an environment by controlling various behavioral patterns. Identification of the behavioral strategy used for this control is important for understanding animals' decision-making and is fundamental to dissect information processing done by the nervous system. However, methods for quantifying such behavioral strategies have not been fully established. In this study, we developed an inverse reinforcement-learning (IRL framework to identify an animal's behavioral strategy from behavioral time-series data. We applied this framework to C. elegans thermotactic behavior; after cultivation at a constant temperature with or without food, fed worms prefer, while starved worms avoid the cultivation temperature on a thermal gradient. Our IRL approach revealed that the fed worms used both the absolute temperature and its temporal derivative and that their behavior involved two strategies: directed migration (DM and isothermal migration (IM. With DM, worms efficiently reached specific temperatures, which explains their thermotactic behavior when fed. With IM, worms moved along a constant temperature, which reflects isothermal tracking, well-observed in previous studies. In contrast to fed animals, starved worms escaped the cultivation temperature using only the absolute, but not the temporal derivative of temperature. We also investigated the neural basis underlying these strategies, by applying our method to thermosensory neuron-deficient worms. Thus, our IRL-based approach is useful in identifying animal strategies from behavioral time-series data and could be applied to a wide range of behavioral studies, including decision-making, in other organisms.

  19. Theoretical Foundations of Learning Environments. Second Edition

    Science.gov (United States)

    Jonassen, David, Ed.; Land, Susan, Ed.

    2012-01-01

    "Theoretical Foundations of Learning Environments" provides students, faculty, and instructional designers with a clear, concise introduction to the major pedagogical and psychological theories and their implications for the design of new learning environments for schools, universities, or corporations. Leading experts describe the most…

  20. Projective Simulation compared to reinforcement learning

    OpenAIRE

    Bjerland, Øystein Førsund

    2015-01-01

    This thesis explores the model of projective simulation (PS), a novel approach for an artificial intelligence (AI) agent. The model of PS learns by interacting with the environment it is situated in, and allows for simulating actions before real action is taken. The action selection is based on a random walk through the episodic & compositional memory (ECM), which is a network of clips that represent previous experienced percepts. The network takes percepts as inpu...

  1. Toward Project-based Learning and Team Formation in Open Learning Environments

    NARCIS (Netherlands)

    Spoelstra, Howard; Van Rosmalen, Peter; Sloep, Peter

    2014-01-01

    Open Learning Environments, MOOCs, as well as Social Learning Networks, embody a new approach to learning. Although both emphasise interactive participation, somewhat surprisingly, they do not readily support bond creating and motivating collaborative learning opportunities. Providing project-based

  2. The Internet: A Learning Environment.

    Science.gov (United States)

    McGreal, Rory

    1997-01-01

    The Internet environment is suitable for many types of learning activities and teaching and learning styles. Every World Wide Web-based course should provide: home page; introduction; course overview; course requirements, vital information; roles and responsibilities; assignments; schedule; resources; sample tests; teacher biography; course…

  3. Depression, Activity, and Evaluation of Reinforcement

    Science.gov (United States)

    Hammen, Constance L.; Glass, David R., Jr.

    1975-01-01

    This research attempted to find the causal relation between mood and level of reinforcement. An effort was made to learn what mood change might occur if depressed subjects increased their levels of participation in reinforcing activities. (Author/RK)

  4. The Relationship among Self-Regulated Learning, Procrastination, and Learning Behaviors in Blended Learning Environment

    Science.gov (United States)

    Yamada, Masanori; Goda, Yoshiko; Matsuda, Takeshi; Kato, Hiroshi; Miyagawa, Hiroyuki

    2015-01-01

    This research aims to investigate the relationship among the awareness of self-regulated learning (SRL), procrastination, and learning behaviors in blended learning environment. One hundred seventy nine freshmen participated in this research, conducted in the blended learning style class using learning management system. Data collection was…

  5. EDUCATION REFORMS TOWARDS 21ST CENTURY SKILLS: TRANSFORMING STUDENTS' LEARNING EXPERIENCES THROUGH EFFECTIVE LEARNING ENVIRONMENTS

    OpenAIRE

    Harriet Wambui Njui

    2018-01-01

    This paper reviews literature on learning environments with a view to making recommendations on how teachers could create effective and high-quality learning environments that provide learners with transformative learning experiences as they go through the process of education. An effective learning environment is critical because quality education, which is essential to real learning and human development, is influenced by factors both inside and outside the classroom. Learning institutions ...

  6. A Design Framework for Personal Learning Environments

    NARCIS (Netherlands)

    Rahimi, E.

    2015-01-01

    The purpose of our research was to develop a PLE (personal learning environment) design framework for workplace settings. By doing such, the research has answered this research question, how should a technology-based personal learning environment be designed, aiming at supporting learners to gain

  7. A Study on Students’ Views On Blended Learning Environment

    Directory of Open Access Journals (Sweden)

    Meryem YILMAZ SOYLU

    2006-07-01

    Full Text Available In the 21st century, information and communication technologies (ICT have developed rapidly and influenced most of the fields and education as well. Then, ICT have offered a favorable environment for the development and use of various methods and tools. With the developments in technology, blended learning has gained considerable popularity in recent years. Together with the developments it brought along the description of particular forms of teaching with technology. Blended learning is defined simply as a learning environment that combines technology with face-to-face learning. In other words blended learning means using a variety of delivery methods to best meet the course objectives by combining face-to-face teaching in a traditional classroom with teaching online. This article examines students’ views on blended learning environment. The study was conducted on 64 students from Department of Computer Education and Instructional Technologies in 2005–2006 fall semester in Instructional Design and Authoring Languages in PC Environment at Hacettepe University. The results showed that the students enjoyed taking part in the blended learning environment. Students’ achievement levels and their frequency of participation to forum affected their views about blended learning environment. Face-to-face interaction in blended learning application had the highest score. This result demonstrated the importance of interaction and communication for the success of on-line learning.

  8. Study Circles in Online Learning Environment in the Spirit of Learning-Centered Approach

    Directory of Open Access Journals (Sweden)

    Simándi Szilvia

    2017-08-01

    Full Text Available Introduction: In the era of information society and knowledge economy, learning in non-formal environments gets a highlighted role: it can supplement, replace or raise the knowledge and skills gained in the school system to a higher level (Forray & Juhász, 2008, as the so-called “valid” knowledge significantly changes due to the acceleration of development. With the appearance of information technology means and their booming development, the possibilities of gaining information have widened and, according to the forecasts, the role of learning communities will grow. Purpose: Our starting point is that today, with the involvement of community sites (e.g. Google+, Facebook etc. there is a new possibility for inspiring learning communities: by utilizing the power of community and the possibilities of network-based learning (Ollé & Lévai, 2013. Methods: We intend to make a synthesis based on former research and literature focusing on the learning-centered approach, online learning environment, learning communities and study circles (Noesgaard & Ørngreen, 2015; Biggs & Tang, 2007; Kindström, 2010 Conclusions: The online learning environment can be well utilized for community learning. In the online learning environment, the process of learning is built on activity-oriented work for which active participation, and an intensive, initiative communication are necessary and cooperative and collaborative learning get an important role.

  9. A Plant Control Technology Using Reinforcement Learning Method with Automatic Reward Adjustment

    Science.gov (United States)

    Eguchi, Toru; Sekiai, Takaaki; Yamada, Akihiro; Shimizu, Satoru; Fukai, Masayuki

    A control technology using Reinforcement Learning (RL) and Radial Basis Function (RBF) Network has been developed to reduce environmental load substances exhausted from power and industrial plants. This technology consists of the statistic model using RBF Network, which estimates characteristics of plants with respect to environmental load substances, and RL agent, which learns the control logic for the plants using the statistic model. In this technology, it is necessary to design an appropriate reward function given to the agent immediately according to operation conditions and control goals to control plants flexibly. Therefore, we propose an automatic reward adjusting method of RL for plant control. This method adjusts the reward function automatically using information of the statistic model obtained in its learning process. In the simulations, it is confirmed that the proposed method can adjust the reward function adaptively for several test functions, and executes robust control toward the thermal power plant considering the change of operation conditions and control goals.

  10. The effects of different learning environments on students' motivation for learning and their achievement.

    Science.gov (United States)

    Baeten, Marlies; Dochy, Filip; Struyven, Katrien

    2013-09-01

    Research in higher education on the effects of student-centred versus lecture-based learning environments generally does not take into account the psychological need support provided in these learning environments. From a self-determination theory perspective, need support is important to study because it has been associated with benefits such as autonomous motivation and achievement. The purpose of the study is to investigate the effects of different learning environments on students' motivation for learning and achievement, while taking into account the perceived need support. First-year student teachers (N= 1,098) studying a child development course completed questionnaires assessing motivation and perceived need support. In addition, a prior knowledge test and case-based assessment were administered. A quasi-experimental pre-test/post-test design was set up consisting of four learning environments: (1) lectures, (2) case-based learning (CBL), (3) alternation of lectures and CBL, and (4) gradual implementation with lectures making way for CBL. Autonomous motivation and achievement were higher in the gradually implemented CBL environment, compared to the CBL environment. Concerning achievement, two additional effects were found; students in the lecture-based learning environment scored higher than students in the CBL environment, and students in the gradually implemented CBL environment scored higher than students in the alternated learning environment. Additionally, perceived need support was positively related to autonomous motivation, and negatively to controlled motivation. The study shows the importance of gradually introducing students to CBL, in terms of their autonomous motivation and achievement. Moreover, the study emphasizes the importance of perceived need support for students' motivation. © 2012 The British Psychological Society.

  11. Personal Learning Environments: A Solution for Self-Directed Learners

    Science.gov (United States)

    Haworth, Ryan

    2016-01-01

    In this paper I discuss "personal learning environments" and their diverse benefits, uses, and implications for life-long learning. Personal Learning Environments (PLEs) are Web 2.0 and social media technologies that enable individual learners the ability to manage their own learning. Self-directed learning is explored as a foundation…

  12. Information literacy experiencies inside virtual learning environments

    Directory of Open Access Journals (Sweden)

    Patricia Hernández Salazar

    2016-03-01

    Full Text Available Objective. Suggest the use of virtual learning environments as an Information Literacy (IL alternative. Method. Analysis of the main elements of web sites. To achieve this purpose the article includes the relationship between IL and the learning virtual environment (by defining both phrases; phases to create virtual IL programs; processes to elaborate didactic media; the applications that may support this plan; and the description of eleven examples of learning virtual environments IL experiences from four countries (Mexico, United States of America, Spain and United Kingdom these examples fulfill the conditions expressed. Results. We obtained four comparative tables examining five elements of each experience: objectives; target community; institution; country; and platform used. Conclusions. Any IL proposal should have a clear definition; IL experiences have to follow a didactic systematic process; described experiences are based on IL definition; the experiences analyzed are similar; virtual learning environments can be used as alternatives of IL.

  13. Measuring the clinical learning environment in anaesthesia.

    Science.gov (United States)

    Smith, N A; Castanelli, D J

    2015-03-01

    The learning environment describes the way that trainees perceive the culture of their workplace. We audited the learning environment for trainees throughout Australia and New Zealand in the early stages of curriculum reform. A questionnaire was developed and sent electronically to a large random sample of Australian and New Zealand College of Anaesthetists trainees, with a 26% final response rate. This new instrument demonstrated good psychometric properties, with Cronbach's α ranging from 0.81 to 0.91 for each domain. The median score was equivalent to 78%, with the majority of trainees giving scores in the medium range. Introductory respondents scored their learning environment more highly than all other levels of respondents (P=0.001 for almost all comparisons). We present a simple questionnaire instrument that can be used to determine characteristics of the anaesthesia learning environment. The instrument can be used to help assess curricular change over time, alignment of the formal and informal curricula and strengths and weaknesses of individual departments.

  14. Brain Circuits of Methamphetamine Place Reinforcement Learning: The Role of the Hippocampus-VTA Loop.

    Science.gov (United States)

    Keleta, Yonas B; Martinez, Joe L

    2012-03-01

    The reinforcing effects of addictive drugs including methamphetamine (METH) involve the midbrain ventral tegmental area (VTA). VTA is primary source of dopamine (DA) to the nucleus accumbens (NAc) and the ventral hippocampus (VHC). These three brain regions are functionally connected through the hippocampal-VTA loop that includes two main neural pathways: the bottom-up pathway and the top-down pathway. In this paper, we take the view that addiction is a learning process. Therefore, we tested the involvement of the hippocampus in reinforcement learning by studying conditioned place preference (CPP) learning by sequentially conditioning each of the three nuclei in either the bottom-up order of conditioning; VTA, then VHC, finally NAc, or the top-down order; VHC, then VTA, finally NAc. Following habituation, the rats underwent experimental modules consisting of two conditioning trials each followed by immediate testing (test 1 and test 2) and two additional tests 24 h (test 3) and/or 1 week following conditioning (test 4). The module was repeated three times for each nucleus. The results showed that METH, but not Ringer's, produced positive CPP following conditioning each brain area in the bottom-up order. In the top-down order, METH, but not Ringer's, produced either an aversive CPP or no learning effect following conditioning each nucleus of interest. In addition, METH place aversion was antagonized by coadministration of the N-methyl-d-aspartate (NMDA) receptor antagonist MK801, suggesting that the aversion learning was an NMDA receptor activation-dependent process. We conclude that the hippocampus is a critical structure in the reward circuit and hence suggest that the development of target-specific therapeutics for the control of addiction emphasizes on the hippocampus-VTA top-down connection.

  15. Reinforcement learning of self-regulated β-oscillations for motor restoration in chronic stroke

    Directory of Open Access Journals (Sweden)

    Georgios eNaros

    2015-07-01

    Full Text Available Neurofeedback training of motor imagery-related brain-states with brain-machine interfaces (BMI is currently being explored prior to standard physiotherapy to improve the motor outcome of stroke rehabilitation. Pilot studies suggest that such a priming intervention before physiotherapy might increase the responsiveness of the brain to the subsequent physiotherapy, thereby improving the clinical outcome. However, there is little evidence up to now that these BMI-based interventions have achieved operate conditioning of specific brain states that facilitate task-specific functional gains beyond the practice of primed physiotherapy. In this context, we argue that BMI technology needs to aim at physiological features relevant for the targeted behavioral gain. Moreover, this therapeutic intervention has to be informed by concepts of reinforcement learning to develop its full potential. Such a refined neurofeedback approach would need to address the following issues (1 Defining a physiological feedback target specific to the intended behavioral gain, e.g. β-band oscillations for cortico-muscular communication. This targeted brain state could well be different from the brain state optimal for the neurofeedback task (2 Selecting a BMI classification and thresholding approach on the basis of learning principles, i.e. balancing challenge and reward of the neurofeedback task instead of maximizing the classification accuracy of the feedback device (3 Adjusting the feedback in the course of the training period to account for the cognitive load and the learning experience of the participant. The proposed neurofeedback strategy provides evidence for the feasibility of the suggested approach by demonstrating that dynamic threshold adaptation based on reinforcement learning may lead to frequency-specific operant conditioning of β-band oscillations paralleled by task-specific motor improvement; a proposal that requires investigation in a larger cohort of stroke

  16. Beyond adaptive-critic creative learning for intelligent mobile robots

    Science.gov (United States)

    Liao, Xiaoqun; Cao, Ming; Hall, Ernest L.

    2001-10-01

    Intelligent industrial and mobile robots may be considered proven technology in structured environments. Teach programming and supervised learning methods permit solutions to a variety of applications. However, we believe that to extend the operation of these machines to more unstructured environments requires a new learning method. Both unsupervised learning and reinforcement learning are potential candidates for these new tasks. The adaptive critic method has been shown to provide useful approximations or even optimal control policies to non-linear systems. The purpose of this paper is to explore the use of new learning methods that goes beyond the adaptive critic method for unstructured environments. The adaptive critic is a form of reinforcement learning. A critic element provides only high level grading corrections to a cognition module that controls the action module. In the proposed system the critic's grades are modeled and forecasted, so that an anticipated set of sub-grades are available to the cognition model. The forecasting grades are interpolated and are available on the time scale needed by the action model. The success of the system is highly dependent on the accuracy of the forecasted grades and adaptability of the action module. Examples from the guidance of a mobile robot are provided to illustrate the method for simple line following and for the more complex navigation and control in an unstructured environment. The theory presented that is beyond the adaptive critic may be called creative theory. Creative theory is a form of learning that models the highest level of human learning - imagination. The application of the creative theory appears to not only be to mobile robots but also to many other forms of human endeavor such as educational learning and business forecasting. Reinforcement learning such as the adaptive critic may be applied to known problems to aid in the discovery of their solutions. The significance of creative theory is that it

  17. University Libraries and Digital Learning Environments

    OpenAIRE

    2011-01-01

    University libraries around the world have embraced the possibilities of the digital learning environment, facilitating its use and proactively seeking to develop the provision of electronic resources and services. The digital environment offers opportunities and challenges for librarians in all aspects of their work – in information literacy, virtual reference, institutional repositories, e-learning, managing digital resources and social media. The authors in this timely book are leading exp...

  18. Learning How to Design a Technology Supported Inquiry-Based Learning Environment

    Science.gov (United States)

    Hakverdi-Can, Meral; Sonmez, Duygu

    2012-01-01

    This paper describes a study focusing on pre-service teachers' experience of learning how to design a technology supported inquiry-based learning environment using the Internet. As part of their elective course, pre-service science teachers were asked to develop a WebQuest environment targeting middle school students. A WebQuest is an…

  19. Reading a Story: Different Degrees of Learning in Different Learning Environments

    Directory of Open Access Journals (Sweden)

    Anna Maria Giannini

    2017-10-01

    Full Text Available The learning environment in which material is acquired may produce differences in delayed recall and in the elements that individuals focus on. These differences may appear even during development. In the present study, we compared three different learning environments in 450 normally developing 7-year-old children subdivided into three groups according to the type of learning environment. Specifically, children were asked to learn the same material shown in three different learning environments: reading illustrated books (TB; interacting with the same text displayed on a PC monitor and enriched with interactive activities (PC-IA; reading the same text on a PC monitor but not enriched with interactive narratives (PC-NoIA. Our results demonstrated that TB and PC-NoIA elicited better verbal memory recall. In contrast, PC-IA and PC-NoIA produced higher scores for visuo-spatial memory, enhancing memory for spatial relations, positions and colors with respect to TB. Interestingly, only TB seemed to produce a deeper comprehension of the story’s moral. Our results indicated that PC-IA offered a different type of learning that favored visual details. In this sense, interactive activities demonstrate certain limitations, probably due to information overabundance, emotional mobilization, emphasis on images and effort exerted in interactive activities. Thus, interactive activities, although entertaining, act as disruptive elements which interfere with verbal memory and deep moral comprehension.

  20. Reading a Story: Different Degrees of Learning in Different Learning Environments.

    Science.gov (United States)

    Giannini, Anna Maria; Cordellieri, Pierluigi; Piccardi, Laura

    2017-01-01

    The learning environment in which material is acquired may produce differences in delayed recall and in the elements that individuals focus on. These differences may appear even during development. In the present study, we compared three different learning environments in 450 normally developing 7-year-old children subdivided into three groups according to the type of learning environment. Specifically, children were asked to learn the same material shown in three different learning environments: reading illustrated books (TB); interacting with the same text displayed on a PC monitor and enriched with interactive activities (PC-IA); reading the same text on a PC monitor but not enriched with interactive narratives (PC-NoIA). Our results demonstrated that TB and PC-NoIA elicited better verbal memory recall. In contrast, PC-IA and PC-NoIA produced higher scores for visuo-spatial memory, enhancing memory for spatial relations, positions and colors with respect to TB. Interestingly, only TB seemed to produce a deeper comprehension of the story's moral. Our results indicated that PC-IA offered a different type of learning that favored visual details. In this sense, interactive activities demonstrate certain limitations, probably due to information overabundance, emotional mobilization, emphasis on images and effort exerted in interactive activities. Thus, interactive activities, although entertaining, act as disruptive elements which interfere with verbal memory and deep moral comprehension.

  1. The fluidities of digital learning environments and resources

    DEFF Research Database (Denmark)

    Hansbøl, Mikala

    2012-01-01

    The research project “Educational cultures and serious games on a global market place” (2009-2011) dealt with the challenge of the digital learning environment and hence it’s educational development space always existing outside the present space and hence scope of activities. With a reference...... and establishments of the virtual universe called Mingoville.com, the research shows a need to include in researchers’ conceptualizations of digital learning environments and resources, their shifting materialities and platformations and hence emerging (often unpredictable) agencies and educational development...... spaces. Keywords: Fluidity, digital learning environment, digital learning resource, educational development space...

  2. E-Learning Systems, Environments and Approaches

    OpenAIRE

    Isaias, P.; Spector, J.M.; Ifenthaler, D.; Sampson, D.G.

    2015-01-01

    The volume consists of twenty-five chapters selected from among peer-reviewed papers presented at the CELDA (Cognition and Exploratory Learning in the Digital Age) 2013 Conference held in Fort Worth, Texas, USA, in October 2013 and also from world class scholars in e-learning systems, environments and approaches. The following sub-topics are included: Exploratory Learning Technologies (Part I), e-Learning social web design (Part II), Learner communities through e-Learning implementations (Par...

  3. Students’ Preferred Characteristics of Learning Environments in Vocational Secondary Education

    OpenAIRE

    Ingeborg Placklé; Karen D. Könings; Wolfgang Jacquet; Katrien Struyven; Arno Libotton; Jeroen J. G. van Merriënboer; Nadine Engels

    2014-01-01

    If teachers and teacher educators are willing to support the learning of students, it is important for them to learn what motivates students to engage in learning. Students have their own preferences on design characteristics of powerful learning environments in vocational education. We developed an instrument – the Inventory Powerful Learning Environments in Vocational Education - to measure students’ preferences on characteristics of powerful learning environments in vocational education. W...

  4. Students Preferred Characteristics of Learning Environments in Vocational Secondary Education

    OpenAIRE

    Placklé, Ingeborg

    2014-01-01

    If teachers and teacher educators are willing to support the learning of students, it is important for them to learn what motivates students to engage in learning. Students have their own preferences on design characteristics of powerful learning environments in vocational education. We developed an instrument - the Inventory Powerful Learning Environments in Vocational Education - to measure studentsâ preferences on characteristics of powerful learning environments in voca-tional education. ...

  5. Computer-Assisted Language Learning: Diversity in Research and Practice

    Science.gov (United States)

    Stockwell, Glenn, Ed.

    2012-01-01

    Computer-assisted language learning (CALL) is an approach to teaching and learning languages that uses computers and other technologies to present, reinforce, and assess material to be learned, or to create environments where teachers and learners can interact with one another and the outside world. This book provides a much-needed overview of the…

  6. The clinical learning environment in nursing education: a concept analysis.

    Science.gov (United States)

    Flott, Elizabeth A; Linden, Lois

    2016-03-01

    The aim of this study was to report an analysis of the clinical learning environment concept. Nursing students are evaluated in clinical learning environments where skills and knowledge are applied to patient care. These environments affect achievement of learning outcomes, and have an impact on preparation for practice and student satisfaction with the nursing profession. Providing clarity of this concept for nursing education will assist in identifying antecedents, attributes and consequences affecting student transition to practice. The clinical learning environment was investigated using Walker and Avant's concept analysis method. A literature search was conducted using WorldCat, MEDLINE and CINAHL databases using the keywords clinical learning environment, clinical environment and clinical education. Articles reviewed were written in English and published in peer-reviewed journals between 1995-2014. All data were analysed for recurring themes and terms to determine possible antecedents, attributes and consequences of this concept. The clinical learning environment contains four attribute characteristics affecting student learning experiences. These include: (1) the physical space; (2) psychosocial and interaction factors; (3) the organizational culture and (4) teaching and learning components. These attributes often determine achievement of learning outcomes and student self-confidence. With better understanding of attributes comprising the clinical learning environment, nursing education programmes and healthcare agencies can collaborate to create meaningful clinical experiences and enhance student preparation for the professional nurse role. © 2015 John Wiley & Sons Ltd.

  7. Advanced Training Technologies and Learning Environments

    Science.gov (United States)

    Noor, Ahmed K. (Compiler); Malone, John B. (Compiler)

    1999-01-01

    This document contains the proceedings of the Workshop on Advanced Training Technologies and Learning Environments held at NASA Langley Research Center, Hampton, Virginia, March 9-10, 1999. The workshop was jointly sponsored by the University of Virginia's Center for Advanced Computational Technology and NASA. Workshop attendees were from NASA, other government agencies, industry, and universities. The objective of the workshop was to assess the status and effectiveness of different advanced training technologies and learning environments.

  8. Perceived Satisfaction, Perceived Usefulness and Interactive Learning Environments as Predictors to Self-Regulation in e-Learning Environments

    Science.gov (United States)

    Liaw, Shu-Sheng; Huang, Hsiu-Mei

    2013-01-01

    The research purpose is to investigate learner self-regulation in e-learning environments. In order to better understand learner attitudes toward e-learning, 196 university students answer a questionnaire survey after use an e-learning system few months. The statistical results showed that perceived satisfaction, perceived usefulness, and…

  9. Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games.

    Science.gov (United States)

    Li, Jinna; Modares, Hamidreza; Chai, Tianyou; Lewis, Frank L; Xie, Lihua

    2017-10-01

    This paper develops an off-policy reinforcement learning (RL) algorithm to solve optimal synchronization of multiagent systems. This is accomplished by using the framework of graphical games. In contrast to traditional control protocols, which require complete knowledge of agent dynamics, the proposed off-policy RL algorithm is a model-free approach, in that it solves the optimal synchronization problem without knowing any knowledge of the agent dynamics. A prescribed control policy, called behavior policy, is applied to each agent to generate and collect data for learning. An off-policy Bellman equation is derived for each agent to learn the value function for the policy under evaluation, called target policy, and find an improved policy, simultaneously. Actor and critic neural networks along with least-square approach are employed to approximate target control policies and value functions using the data generated by applying prescribed behavior policies. Finally, an off-policy RL algorithm is presented that is implemented in real time and gives the approximate optimal control policy for each agent using only measured data. It is shown that the optimal distributed policies found by the proposed algorithm satisfy the global Nash equilibrium and synchronize all agents to the leader. Simulation results illustrate the effectiveness of the proposed method.

  10. The Effects of Integrating Social Learning Environment with Online Learning

    Science.gov (United States)

    Raspopovic, Miroslava; Cvetanovic, Svetlana; Medan, Ivana; Ljubojevic, Danijela

    2017-01-01

    The aim of this paper is to present the learning and teaching styles using the Social Learning Environment (SLE), which was developed based on the computer supported collaborative learning approach. To avoid burdening learners with multiple platforms and tools, SLE was designed and developed in order to integrate existing systems, institutional…

  11. Incremental learning of concept drift in nonstationary environments.

    Science.gov (United States)

    Elwell, Ryan; Polikar, Robi

    2011-10-01

    We introduce an ensemble of classifiers-based approach for incremental learning of concept drift, characterized by nonstationary environments (NSEs), where the underlying data distributions change over time. The proposed algorithm, named Learn(++). NSE, learns from consecutive batches of data without making any assumptions on the nature or rate of drift; it can learn from such environments that experience constant or variable rate of drift, addition or deletion of concept classes, as well as cyclical drift. The algorithm learns incrementally, as other members of the Learn(++) family of algorithms, that is, without requiring access to previously seen data. Learn(++). NSE trains one new classifier for each batch of data it receives, and combines these classifiers using a dynamically weighted majority voting. The novelty of the approach is in determining the voting weights, based on each classifier's time-adjusted accuracy on current and past environments. This approach allows the algorithm to recognize, and act accordingly, to the changes in underlying data distributions, as well as to a possible reoccurrence of an earlier distribution. We evaluate the algorithm on several synthetic datasets designed to simulate a variety of nonstationary environments, as well as a real-world weather prediction dataset. Comparisons with several other approaches are also included. Results indicate that Learn(++). NSE can track the changing environments very closely, regardless of the type of concept drift. To allow future use, comparison and benchmarking by interested researchers, we also release our data used in this paper. © 2011 IEEE

  12. A SIMULTANEOUS MOBILE E-LEARNING ENVIRONMENT AND APPLICATION

    Directory of Open Access Journals (Sweden)

    Hasan KARAL

    2010-04-01

    Full Text Available The purpose of the present study was to design a mobile learning environment that enables the use of a teleconference application used in simultaneous e-learning with mobile devices and to evaluate this mobile learning environment based on students’ views. With the mobile learning environment developed in the study, the students are able to follow a teleconference application realized by using appropriate mobile devices. The study was carried out with 8 post-graduate students enrolled in Karadeniz Technical University (KTU, Department of Computer Education and Instructional Technologies (CEIT, Graduate School of Natural and Applied Science. The students utilized this teleconference application using mobile devices supporting internet access and Adobe Flash technology. Of the 8 students, 4 accessed the system using EDGE technology and 4 used wireless internet technology. At the end of the application, the audio and display were delayed by 4-5 seconds with EDGE technology, and were delayed by 7-8 seconds with wireless internet technology. Based on the students’ views, it was concluded that the environment had some deficiencies in terms of quality, especially in terms of the screen resolution. Despite this, the students reported that this environment could provide more flexibility in terms of space and time when compared to other simultaneous distance education applications. Although the environment enables interaction, in particular, the problem of resolution caused by screen size is a disadvantage for the system. When this mobile learning application is compared to conventional education environments, it was found that mobile learning does have a role in helping the students overcome the problems of participating in learning activities caused by time and space constraints.

  13. Sociocultural Perspective of Science in Online Learning Environments. Communities of Practice in Online Learning Environments

    Science.gov (United States)

    Erdogan, Niyazi

    2016-01-01

    Present study reviews empirical research studies related to learning science in online learning environments as a community. Studies published between 1995 and 2015 were searched by using ERIC and EBSCOhost databases. As a result, fifteen studies were selected for review. Identified studies were analyzed with a qualitative content analysis method…

  14. Investigation of the Relationship between Learning Process and Learning Outcomes in E-Learning Environments

    Science.gov (United States)

    Yurdugül, Halil; Menzi Çetin, Nihal

    2015-01-01

    Problem Statement: Learners can access and participate in online learning environments regardless of time and geographical barriers. This brings up the umbrella concept of learner autonomy that contains self-directed learning, self-regulated learning and the studying process. Motivation and learning strategies are also part of this umbrella…

  15. Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

    Science.gov (United States)

    Uchibe, Eiji; Doya, Kenji

    2008-12-01

    Understanding the design principle of reward functions is a substantial challenge both in artificial intelligence and neuroscience. Successful acquisition of a task usually requires not only rewards for goals, but also for intermediate states to promote effective exploration. This paper proposes a method for designing 'intrinsic' rewards of autonomous agents by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use Cyber Rodent robots, in which collision avoidance, recharging from battery packs, and 'mating' by software reproduction are three major 'extrinsic' rewards. We show in hardware experiments that the robots can find appropriate 'intrinsic' rewards for the vision of battery packs and other robots to promote approach behaviors.

  16. Spike-based decision learning of Nash equilibria in two-player games.

    Directory of Open Access Journals (Sweden)

    Johannes Friedrich

    Full Text Available Humans and animals face decision tasks in an uncertain multi-agent environment where an agent's strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic and mixed (stochastic Nash equilibrium, respectively. In contrast, temporal-difference(TD-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.

  17. Towards an intelligent environment for distance learning

    Directory of Open Access Journals (Sweden)

    Rafael Morales

    2009-12-01

    Full Text Available Mainstream distance learning nowadays is heavily influenced by traditional educational approaches that produceshomogenised learning scenarios for all learners through learning management systems. Any differentiation betweenlearners and personalisation of their learning scenarios is left to the teacher, who gets minimum support from the system inthis respect. This way, the truly digital native, the computer, is left out of the move, unable to better support the teachinglearning processes because it is not provided with the means to transform into knowledge all the information that it storesand manages. I believe learning management systems should care for supporting adaptation and personalisation of bothindividual learning and the formation of communities of learning. Open learner modelling and intelligent collaborativelearning environments are proposed as a means to care. The proposal is complemented with a general architecture for anintelligent environment for distance learning and an educational model based on the principles of self-management,creativity, significance and participation.

  18. Technically Speaking: Transforming Language Learning through Virtual Learning Environments (MOOs).

    Science.gov (United States)

    von der Emde, Silke; Schneider, Jeffrey; Kotter, Markus

    2001-01-01

    Draws on experiences from a 7-week exchange between students learning German at an American college and advanced students of English at a German university. Maps out the benefits to using a MOO (multiple user domains object-oriented) for language learning: a student-centered learning environment structured by such objectives as peer teaching,…

  19. INTUITEL and the Hypercube Model - Developing Adaptive Learning Environments

    Directory of Open Access Journals (Sweden)

    Kevin Fuchs

    2016-06-01

    Full Text Available In this paper we introduce an approach for the creation of adaptive learning environments that give human-like recommendations to a learner in the form of a virtual tutor. We use ontologies defining pedagogical, didactic and learner-specific data describing a learner's progress, learning history, capabilities and the learner's current state within the learning environment. Learning recommendations are based on a reasoning process on these ontologies and can be provided in real-time. The ontologies may describe learning content from any domain of knowledge. Furthermore, we describe an approach to store learning histories as spatio-temporal trajectories and to correlate them with influencing didactic factors. We show how such analysis of spatiotemporal data can be used for learning analytics to improve future adaptive learning environments.

  20. Engaging students in a community of learning: Renegotiating the learning environment.

    Science.gov (United States)

    Theobald, Karen A; Windsor, Carol A; Forster, Elizabeth M

    2018-03-01

    Promoting student engagement in a student led environment can be challenging. This article reports on the process of design, implementation and evaluation of a student led learning approach in a small group tutorial environment in a three year Bachelor of Nursing program at an Australian university. The research employed three phases of data collection. The first phase explored student perceptions of learning and engagement in tutorials. The results informed the development of a web based learning resource. Phase two centred on implementation of a community of learning approach where students were supported to lead tutorial learning with peers. The final phase constituted an evaluation of the new approach. Findings suggest that students have the capacity to lead and engage in a community of learning and to assume greater ownership and responsibility where scaffolding is provided. Nonetheless, an ongoing whole of course approach to pedagogical change would better support this form of teaching and learning innovation. Copyright © 2018 Elsevier Ltd. All rights reserved.

  1. Creating Dynamic Learning Environment to Enhance Students’ Engagement in Learning Geometry

    Science.gov (United States)

    Sariyasa

    2017-04-01

    Learning geometry gives many benefits to students. It strengthens the development of deductive thinking and reasoning; it also provides an opportunity to improve visualisation and spatial ability. Some studies, however, have pointed out the difficulties that students encountered when learning geometry. A preliminary study by the author in Bali revealed that one of the main problems was teachers’ difficulties in delivering geometry instruction. It was partly due to the lack of appropriate instructional media. Coupling with dynamic geometry software, dynamic learning environments is a promising solution to this problem. Employing GeoGebra software supported by the well-designed instructional process may result in more meaningful learning, and consequently, students are motivated to engage in the learning process more deeply and actively. In this paper, we provide some examples of GeoGebra-aided learning activities that allow students to interactively explore and investigate geometry concepts and the properties of geometry objects. Thus, it is expected that such learning environment will enhance students’ internalisation process of geometry concepts.

  2. Soft Systems Methodology for Personalized Learning Environment

    Science.gov (United States)

    Nair, Uday

    2015-01-01

    There are two sides to a coin when it comes to implementing technology at universities; on one side, there is the university using technologies via the virtual learning environment that seems to be outdated with the digital needs of the students, and on the other side, while implementing technology at the university learning environment the focus…

  3. Dopaminergic control of motivation and reinforcement learning: a closed-circuit account for reward-oriented behavior.

    Science.gov (United States)

    Morita, Kenji; Morishima, Mieko; Sakai, Katsuyuki; Kawaguchi, Yasuo

    2013-05-15

    Humans and animals take actions quickly when they expect that the actions lead to reward, reflecting their motivation. Injection of dopamine receptor antagonists into the striatum has been shown to slow such reward-seeking behavior, suggesting that dopamine is involved in the control of motivational processes. Meanwhile, neurophysiological studies have revealed that phasic response of dopamine neurons appears to represent reward prediction error, indicating that dopamine plays central roles in reinforcement learning. However, previous attempts to elucidate the mechanisms of these dopaminergic controls have not fully explained how the motivational and learning aspects are related and whether they can be understood by the way the activity of dopamine neurons itself is controlled by their upstream circuitries. To address this issue, we constructed a closed-circuit model of the corticobasal ganglia system based on recent findings regarding intracortical and corticostriatal circuit architectures. Simulations show that the model could reproduce the observed distinct motivational effects of D1- and D2-type dopamine receptor antagonists. Simultaneously, our model successfully explains the dopaminergic representation of reward prediction error as observed in behaving animals during learning tasks and could also explain distinct choice biases induced by optogenetic stimulation of the D1 and D2 receptor-expressing striatal neurons. These results indicate that the suggested roles of dopamine in motivational control and reinforcement learning can be understood in a unified manner through a notion that the indirect pathway of the basal ganglia represents the value of states/actions at a previous time point, an empirically driven key assumption of our model.

  4. Evaluation of a Learning Object Based Learning Environment in Different Dimensions

    Directory of Open Access Journals (Sweden)

    Ünal Çakıroğlu

    2009-11-01

    Full Text Available Learning Objects (LOs are web based learning resources presented by Learning Object Repositories (LOR. For recent years LOs have begun to take place on web and it is suggested that appropriate design of LOs can make positive impact on learning. In order to support learning, research studies recommends LOs should have been evaluated pedagogically and technologically, and the content design created by using LOs should have been designed through appropriate instructional models. Since the use of LOs have recently begun, an exact pedagogical model about efficient use of LOs has not been developed. In this study a LOR is designed in order to be used in mathematics education. The LOs in this LOR have been evaluated pedagogically and technologically by mathematics teachers and field experts. In order to evaluate the designed LO based environment, two different questionnaires have been used. These questionnaires are developed by using the related literature about web based learning environments evaluation criteria and also the items are discussed with the field experts for providing the validity. The reliability of the questionnaires is calculated cronbach alpha = 0.715 for the design properties evaluation survey and cronbach alpha =0.726 for pedagogic evaluation. Both of two questionnaires are five point Likert type. The first questionnaire has the items about “Learning Support of LOs, Competency of LOR, The importance of LOs in mathematics education, the usability of LOs by students”. “The activities on LOs are related to outcomes of subjects, there are activities for students have different learning styles. There are activities for wondering students.” are examples for items about learning support of LOs. “System helps for exploration of mathematical relations”, “I think teaching mathematics with this system will be enjoyable.” are example items for importance of LOs in mathematics education. In the competency of LOR title,

  5. Trading Rules on Stock Markets Using Genetic Network Programming with Reinforcement Learning and Importance Index

    Science.gov (United States)

    Mabu, Shingo; Hirasawa, Kotaro; Furuzuki, Takayuki

    Genetic Network Programming (GNP) is an evolutionary computation which represents its solutions using graph structures. Since GNP can create quite compact programs and has an implicit memory function, it has been clarified that GNP works well especially in dynamic environments. In addition, a study on creating trading rules on stock markets using GNP with Importance Index (GNP-IMX) has been done. IMX is a new element which is a criterion for decision making. In this paper, we combined GNP-IMX with Actor-Critic (GNP-IMX&AC) and create trading rules on stock markets. Evolution-based methods evolve their programs after enough period of time because they must calculate fitness values, however reinforcement learning can change programs during the period, therefore the trading rules can be created efficiently. In the simulation, the proposed method is trained using the stock prices of 10 brands in 2002 and 2003. Then the generalization ability is tested using the stock prices in 2004. The simulation results show that the proposed method can obtain larger profits than GNP-IMX without AC and Buy&Hold.

  6. An analysis of intergroup rivalry using Ising model and reinforcement learning

    Science.gov (United States)

    Zhao, Feng-Fei; Qin, Zheng; Shao, Zhuo

    2014-01-01

    Modeling of intergroup rivalry can help us better understand economic competitions, political elections and other similar activities. The result of intergroup rivalry depends on the co-evolution of individual behavior within one group and the impact from the rival group. In this paper, we model the rivalry behavior using Ising model. Different from other simulation studies using Ising model, the evolution rules of each individual in our model are not static, but have the ability to learn from historical experience using reinforcement learning technique, which makes the simulation more close to real human behavior. We studied the phase transition in intergroup rivalry and focused on the impact of the degree of social freedom, the personality of group members and the social experience of individuals. The results of computer simulation show that a society with a low degree of social freedom and highly educated, experienced individuals is more likely to be one-sided in intergroup rivalry.

  7. Construction of a Digital Learning Environment Based on Cloud Computing

    Science.gov (United States)

    Ding, Jihong; Xiong, Caiping; Liu, Huazhong

    2015-01-01

    Constructing the digital learning environment for ubiquitous learning and asynchronous distributed learning has opened up immense amounts of concrete research. However, current digital learning environments do not fully fulfill the expectations on supporting interactive group learning, shared understanding and social construction of knowledge.…

  8. A new computational account of cognitive control over reinforcement-based decision-making: Modeling of a probabilistic learning task.

    Science.gov (United States)

    Zendehrouh, Sareh

    2015-11-01

    Recent work on decision-making field offers an account of dual-system theory for decision-making process. This theory holds that this process is conducted by two main controllers: a goal-directed system and a habitual system. In the reinforcement learning (RL) domain, the habitual behaviors are connected with model-free methods, in which appropriate actions are learned through trial-and-error experiences. However, goal-directed behaviors are associated with model-based methods of RL, in which actions are selected using a model of the environment. Studies on cognitive control also suggest that during processes like decision-making, some cortical and subcortical structures work in concert to monitor the consequences of decisions and to adjust control according to current task demands. Here a computational model is presented based on dual system theory and cognitive control perspective of decision-making. The proposed model is used to simulate human performance on a variant of probabilistic learning task. The basic proposal is that the brain implements a dual controller, while an accompanying monitoring system detects some kinds of conflict including a hypothetical cost-conflict one. The simulation results address existing theories about two event-related potentials, namely error related negativity (ERN) and feedback related negativity (FRN), and explore the best account of them. Based on the results, some testable predictions are also presented. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Clinical learning environments: place, artefacts and rhythm.

    Science.gov (United States)

    Sheehan, Dale; Jowsey, Tanisha; Parwaiz, Mariam; Birch, Mark; Seaton, Philippa; Shaw, Susan; Duggan, Alison; Wilkinson, Tim

    2017-10-01

    Health care practitioners learn through experience in clinical environments in which supervision is a key component, but how that learning occurs outside the supervision relationship remains largely unknown. This study explores the environmental factors that inform and support workplace learning within a clinical environment. An observational study drawing on ethnographic methods was undertaken in a general medicine ward. Observers paid attention to interactions among staff members that involved potential teaching and learning moments that occurred and were visible in the course of routine work. General purpose thematic analysis of field notes was undertaken. A total of 376 observations were undertaken and documented. The findings suggest that place (location of interaction), rhythm (regularity of activities occurring in the ward) and artefacts (objects and equipment) were strong influences on the interactions and exchanges that occurred. Each of these themes had inherent tensions that could promote or inhibit engagement and therefore learning opportunities. Although many learning opportunities were available, not all were taken up or recognised by the participants. We describe and make explicit how the natural environment of a medical ward and flow of work through patient care contribute to the learning architecture, and how this creates or inhibits opportunities for learning. Awareness of learning opportunities was often tacit and not explicit for either supervisor or learner. We identify strategies through which tensions inherent within space, artefacts and the rhythms of work can be resolved and learning opportunities maximised. © 2017 John Wiley & Sons Ltd and The Association for the Study of Medical Education.

  10. Neural systems underlying aversive conditioning in humans with primary and secondary reinforcers

    Directory of Open Access Journals (Sweden)

    Mauricio R Delgado

    2011-05-01

    Full Text Available Money is a secondary reinforcer commonly used across a range of disciplines in experimental paradigms investigating reward learning and decision-making. The effectiveness of monetary reinforcers during aversive learning and its neural basis, however, remains a topic of debate. Specifically, it is unclear if the initial acquisition of aversive representations of monetary losses depends on similar neural systems as more traditional aversive conditioning that involves primary reinforcers. This study contrasts the efficacy of a biologically defined primary reinforcer (shock and a socially defined secondary reinforcer (money during aversive learning and its associated neural circuitry. During a two-part experiment, participants first played a gambling game where wins and losses were based on performance to gain an experimental bank. Participants were then exposed to two separate aversive conditioning sessions. In one session, a primary reinforcer (mild shock served as an unconditioned stimulus (US and was paired with one of two colored squares, the conditioned stimuli (CS+ and CS-, respectively. In another session, a secondary reinforcer (loss of money served as the US and was paired with one of two different CS. Skin conductance responses were greater for CS+ compared to CS- trials irrespective of type of reinforcer. Neuroimaging results revealed that the striatum, a region typically linked with reward-related processing, was found to be involved in the acquisition of aversive conditioned response irrespective of reinforcer type. In contrast, the amygdala was involved during aversive conditioning with primary reinforcers, as suggested by both an exploratory fMRI analysis and a follow-up case study with a patient with bilateral amygdala damage. Taken together, these results suggest that learning about potential monetary losses may depend on reinforcement learning related systems, rather than on typical structures involved in more biologically based

  11. USING PCU-CAMEL, A WEB-BASED LEARNING ENVIRONMENT, IN EVALUATING TEACHING-LEARNING PROCESS

    Directory of Open Access Journals (Sweden)

    Arlinah Imam Rahardjo

    2008-01-01

    Full Text Available PCU-CAMEL (Petra Christian University-Computer Aided Mechanical Engineering Department Learning Environment has been developed to integrate the use of this web-based learning environment into the traditional, face-to-face setting of class activities. This integrated learning method is designed as an effort to enrich and improve the teaching-learning process at Petra Christian University. A study was conducted to introduce the use of PCU-CAMEL as a tool in evaluating teaching learning process. The study on this method of evaluation was conducted by using a case analysis on the integration of PCU-CAMEL to the traditional face-to-face meetings of LIS (Library Information System class at the Informatics Engineering Department of Petra Christian University. Students’ responses documented in some features of PCU-CAMEL were measured and analyzed to evaluate the effectiveness of this integrated system in developing intrinsic motivation of the LIS students of the first and second semester of 2004/2005 to learn. It is believed that intrinsic motivation can drive students to learn more. From the study conducted, it is concluded that besides its capability in developing intrinsic motivation, PCU-CAMEL as a web-based learning environment, can also serve as an effective tool for both students and instructors to evaluate the teaching-learning process. However, some weaknesses did exist in using this method of evaluating teaching-learning process. The free style and unstructured form of the documentation features of this web-based learning environment can lead to ineffective evaluation results

  12. Evaluation of students' perception of their learning environment and approaches to learning

    Science.gov (United States)

    Valyrakis, Manousos; Cheng, Ming

    2015-04-01

    This work presents the results of two case studies designed to assess the various approaches undergraduate and postgraduate students undertake for their education. The first study describes the results and evaluation of an undergraduate course in Water Engineering which aims to develop the fundamental background knowledge of students on introductory practical applications relevant to the practice of water and hydraulic engineering. The study assesses the effectiveness of the course design and learning environment from the perception of students using a questionnaire addressing several aspects that may affect student learning, performance and satisfaction, such as students' motivation, factors to effective learning, and methods of communication and assessment. The second study investigates the effectiveness of supervisory arrangements based on the perceptions of engineering undergraduate and postgraduate students. Effective supervision requires leadership skills that are not taught in the University, yet there is rarely a chance to get feedback, evaluate this process and reflect. Even though the results are very encouraging there are significant lessons to learn in improving ones practice and develop an effective learning environment to student support and guidance. The findings from these studies suggest that students with high level of intrinsic motivation are deep learners and are also top performers in a student-centered learning environment. A supportive teaching environment with a plethora of resources and feedback made available over different platforms that address students need for direct communication and feedback has the potential to improve student satisfaction and their learning experience. Finally, incorporating a multitude of assessment methods is also important in promoting deep learning. These results have deep implications about student learning and can be used to further improve course design and delivery in the future.

  13. Learning Design Patterns for Hybrid Synchronous Video-Mediated Learning Environments

    DEFF Research Database (Denmark)

    Weitze, Charlotte Lærke

    2016-01-01

    This article describes an innovative learning environment where remote and face-to-face full-time general upper secondary adult students jointly participate in the same live classes at VUC Storstrøm, an adult learning centre in Denmark. The teachers developed new learning designs as a part of the...... activating and equal learning designs for the students. This article is written on the basis of a chapter in the PhD–thesis by the author....

  14. Ubiquitous Learning Environments in Higher Education: A Scoping Literature Review

    Science.gov (United States)

    Virtanen, Mari Aulikki; Haavisto, Elina; Liikanen, Eeva; Kääriäinen, Maria

    2018-01-01

    Ubiquitous learning and the use of ubiquitous learning environments heralds a new era in higher education. Ubiquitous learning environments enhance context-aware and seamless learning experiences available from any location at any time. They support smooth interaction between authentic and digital learning resources and provide personalized…

  15. Students’ Preferred Characteristics of Learning Environments in Vocational Secondary Education

    Directory of Open Access Journals (Sweden)

    Ingeborg Placklé

    2014-12-01

    Full Text Available If teachers and teacher educators are willing to support the learning of students, it is important for them to learn what motivates students to engage in learning. Students have their own preferences on design characteristics of powerful learning environments in vocational education. We developed an instrument - the Inventory Powerful Learning Environments in Vocational Education - to measure students’ preferences on characteristics of powerful learning environments in vocational education. We investigated whether student preferences on the design of their learning environments are in line with what is described in the literature as beneficial for learning. Data of 544 students show that the preferences of students support most characteristics of PLEs in vocational education. Looking through the eyes of students, teachers have to challenge their students and encourage them to take their learning in their own hands. Adaptive learning support is needed. Remarkable, students do not prefer having reflective dialogues with teachers or peers.

  16. U-CrAc Flexible Interior Doctrine, Agile Learning Environments

    DEFF Research Database (Denmark)

    Poulsen, Søren Bolvig; Rosenstand, Claus Andreas Foss

    2012-01-01

    The research domain of this article is flexible learning environment for immediate use. The research question is: How can the learning environment support an agile learning process? The research contribution of this article is a flexible interior doctrine. The research method is action research...

  17. [Learning about social determinants of health through chronicles, using a virtual learning environment].

    Science.gov (United States)

    Restrepo-Palacio, Sonia; Amaya-Guio, Jairo

    2016-01-01

    To describe the contributions of a pedagogical strategy based on the construction of chronicles, using a Virtual Learning Environment for training medical students from Universidad de La Sabana on social determinants of health. Descriptive study with a qualitative approach. Design and implementation of a Virtual Learning Environment based on the ADDIE instructional model. A Virtual Learning Environment was implemented with an instructional design based on the five phases of the ADDIE model, on the grounds of meaningful learning and social constructivism, and through the narration of chronicles or life stories as a pedagogical strategy. During the course, the structural determinants and intermediaries were addressed, and nine chronicles were produced by working groups made up of four or five students, who demonstrated meaningful learning from real life stories, presented a coherent sequence, and kept a thread; 82% of these students incorporated in their contents most of the social determinants of health, emphasizing on the concepts of equity or inequity, equality or inequality, justice or injustice and social cohesion. A Virtual Learning Environment, based on an appropriate instructional design, allows to facilitate learning of social determinants of health through a constructivist pedagogical approach by analyzing chronicles or life stories created by ninth-semester students of medicine from Universidad de La Sabana.

  18. Nursing students' perceptions of learning in practice environments: a review.

    Science.gov (United States)

    Henderson, Amanda; Cooke, Marie; Creedy, Debra K; Walker, Rachel

    2012-04-01

    Effective clinical learning requires integration of nursing students into ward activities, staff engagement to address individual student learning needs, and innovative teaching approaches. Assessing characteristics of practice environments can provide useful insights for development. This study identified predominant features of clinical learning environments from nursing students' perspectives across studies using the same measure in different countries over the last decade. Six studies, from three different countries, using the Clinical Leaning Environment Inventory (CLEI) were reviewed. Studies explored consistent trends about learning environment. Students rated sense of task accomplishment high. Affiliation also rated highly though was influenced by models of care. Feedback measuring whether students' individual needs and views were accommodated consistently rated lower. Across different countries students report similar perceptions about learning environments. Clinical learning environments are most effective in promoting safe practice and are inclusive of student learners, but not readily open to innovation and challenges to routine practices. Crown Copyright © 2011. Published by Elsevier Ltd. All rights reserved.

  19. The corrosion pattern of reinforcement and its influence on serviceability of reinforced concrete members in chloride environment

    International Nuclear Information System (INIS)

    Zhang Ruijin; Castel, Arnaud; Francois, Raoul

    2009-01-01

    This paper deals with two corroded reinforcement concrete beams, which have been stored under sustained load in a chloride environment for 14 and 23 years respectively. The evolution of corrosion pattern of reinforcement and its influence on serviceability are studied. In chloride-induced corrosion process, corrosion cracking affects significantly the corrosion pattern. During the corrosion cracking initiation period, only local pitting corrosion occurs. At early stage of cracking propagation, localized pitting corrosion is still predominant as cracks widths are very small and cracks are not interconnected, but a general corrosion slowly develops as the cracks widen. At late cracking stage, interconnected cracking with wide width develops along large parts of the beam leading to a general corrosion pattern. Macrocells and microcells concepts are used for the interpretation of the results. Mechanical experiments and corrosion simulation tests are performed to clarify the influence of this corrosion pattern evolution on the serviceability of the beams (deflection increase). Experimental results show that, when the corrosion is localized (early cracking stage), the steel-concrete bond loss is the main factor affecting the beams serviceability. The local cross-section loss resulting from pitting attack does not significantly influence the deflection of the beam. When corrosion is generalized (late cracking stage), as the steel-concrete bond is already lost, the generalized steel cross-section reduction becomes the main factor affecting the beams serviceability. But, at this stage, the deflection increase is slower due to the low general corrosion rate.

  20. Personal Learning Environment – a Conceptual Study

    Directory of Open Access Journals (Sweden)

    Herbert Mühlburger

    2010-01-01

    Full Text Available The influence of digital technologies as well as the World Wide Web on education rises dramatically. In former years Learning Management Systems (LMS were introduced on educational institutes to address the needs both their institutions and their lecturers. Nowadays a shift from an institution-centered approach to a learner-centered one becomes necessary to allow individuality through the learning process and to think about learning strategies in general. In this paper a first approach of a Personal Learning Environment (PLE is described. The technological concept is pointed out as well as a study about the graphical user-interface done at Graz University of Technology (TU Graz. It can be concluded that PLEs are the next generation environments, which help to improve the learning and teaching behavior

  1. Creating sustainable empowering learning environments through ...

    African Journals Online (AJOL)

    ... as these impede optimal learning especially among rural and immigrant communities in South Africa, Canada and the world over. The primary focus of all papers herein therefore is on the creation of sustainable empowering learning environments through engaged scholarship spearheaded by the university.

  2. Create a good learning environment and motivate active learning enthusiasm

    Science.gov (United States)

    Bi, Weihong; Fu, Guangwei; Fu, Xinghu; Zhang, Baojun; Liu, Qiang; Jin, Wa

    2017-08-01

    In view of the current poor learning initiative of undergraduates, the idea of creating a good learning environment and motivating active learning enthusiasm is proposed. In practice, the professional tutor is allocated and professional introduction course is opened for college freshman. It can promote communication between the professional teachers and students as early as possible, and guide students to know and devote the professional knowledge by the preconceived form. Practice results show that these solutions can improve the students interest in learning initiative, so that the active learning and self-learning has become a habit in the classroom.

  3. Students’ digital learning environments

    DEFF Research Database (Denmark)

    Caviglia, Francesco; Dalsgaard, Christian; Davidsen, Jacob

    2018-01-01

    used tools in the students’ digital learning environments are Facebook, Google Drive, tools for taking notes, and institutional systems. Additionally, the study shows that the tools meet some very basic demands of the students in relation to collaboration, communication, and feedback. Finally...

  4. Invited Reaction: Influences of Formal Learning, Personal Learning Orientation, and Supportive Learning Environment on Informal Learning

    Science.gov (United States)

    Cseh, Maria; Manikoth, Nisha N.

    2011-01-01

    As the authors of the preceding article (Choi and Jacobs, 2011) have noted, the workplace learning literature shows evidence of the complementary and integrated nature of formal and informal learning in the development of employee competencies. The importance of supportive learning environments in the workplace and of employees' personal learning…

  5. Mobile Learning for Higher Education in Problem-Based Learning Environments

    DEFF Research Database (Denmark)

    Rongbutsri, Nikorn

    2011-01-01

    This paper describes the PhD project on Mobile Learning for Higher Education in Problem-Based Learning Environment which aims to understand how students gain benefit from using mobile devices in the aspect of project work collaboration. It demonstrates research questions, theoretical perspective...

  6. What students really learn: contrasting medical and nursing students' experiences of the clinical learning environment.

    Science.gov (United States)

    Liljedahl, Matilda; Boman, Lena Engqvist; Fält, Charlotte Porthén; Bolander Laksov, Klara

    2015-08-01

    This paper explores and contrasts undergraduate medical and nursing students' experiences of the clinical learning environment. Using a sociocultural perspective of learning and an interpretative approach, 15 in-depth interviews with medical and nursing students were analysed with content analysis. Students' experiences are described using a framework of 'before', 'during' and 'after' clinical placements. Three major themes emerged from the analysis, contrasting the medical and nursing students' experiences of the clinical learning environment: (1) expectations of the placement; (2) relationship with the supervisor; and (3) focus of learning. The findings offer an increased understanding of how medical and nursing students learn in the clinical setting; they also show that the clinical learning environment contributes to the socialisation process of students not only into their future profession, but also into their role as learners. Differences between the two professions should be taken into consideration when designing interprofessional learning activities. Also, the findings can be used as a tool for clinical supervisors in the reflection on how student learning in the clinical learning environment can be improved.

  7. Enhancing the Learning Environment by Learning all the Students' Names

    DEFF Research Database (Denmark)

    Jørgensen, Anker Helms

    the method to learn all the students' names enhances the learning environment substantially.  ReferencesCranton, Patricia (2001) Becoming an authentic teacher in higher education. Malabar, Florida: Krieger Pub. Co.Wiberg, Merete (2011): Personal email communication June 22, 2011.Woodhead, M. M. and Baddeley......Short abstract This paper describes how the teaching environment can be enhanced significantly by a simple method: learning the names of all the students. The method is time-efficient: In a course with 33 students I used 65 minutes in total. My own view of the effect was confirmed in a small study......: The students felt more valued, secure and respected. They also made an effort to learn each other's names. Long abstract In high school teachers know the students' names very soon - anything else is unthinkable (Wiberg, 2011). Not so in universities where knowing the names of all the students is the exception...

  8. Blended learning in paediatric emergency medicine: preliminary analysis of a virtual learning environment.

    Science.gov (United States)

    Spedding, Ruth; Jenner, Rachel; Potier, Katherine; Mackway-Jones, Kevin; Carley, Simon

    2013-04-01

    Paediatric emergency medicine (PEM) currently faces many competing educational challenges. Recent changes to the working patterns have made the delivery of effective teaching to trainees extremely difficult. We developed a virtual learning environment, on the basis of socioconstructivist principles, which allows learning to take place regardless of time or location. The aim was to evaluate the effectiveness of a blended e-learning approach for PEM training. We evaluated the experiences of ST3 trainees in PEM using a multimodal approach. We classified and analysed message board discussions over a 6-month period to look for evidence of practice change and learning. We conducted semistructured qualitative interviews with trainees approximately 5 months after they completed the course. Trainees embraced the virtual learning environment and had positive experiences of the blended approach to learning. Socioconstructivist learning did take place through the use of message boards on the virtual learning environment. Despite their initial unfamiliarity with the online learning system, the participants found it easy to access and use. The participants found the learning relevant and there was an overlap between shop floor learning and the online content. Clinical discussion was often led by trainees on the forums and these were described as enjoyable and informative. A blended approach to e-learning in basic PEM is effective and enjoyable to trainees.

  9. A Preliminary Investigation of Self-Directed Learning Activities in a Non-Formal Blended Learning Environment

    Science.gov (United States)

    Schwier, Richard A.; Morrison, Dirk; Daniel, Ben K.

    2009-01-01

    This research considers how professional participants in a non-formal self-directed learning environment (NFSDL) made use of self-directed learning activities in a blended face-to-face and on line learning professional development course. The learning environment for the study was a professional development seminar on teaching in higher education…

  10. The new learning environment is personal

    NARCIS (Netherlands)

    De Vries, P.

    2013-01-01

    In a traditional sense the learning environment is qualified as the institutional setting for the teaching and learning to take place. This comprises the students, the teachers, management, the services and all the buildings, the classrooms, the equipment, the tools and laboratories that constitute

  11. Technology-supported environments for learning through cognitive conflict

    Directory of Open Access Journals (Sweden)

    Anne McDougall

    2002-12-01

    Full Text Available This paper examines ways in which the idea of cognitive conflict is used to facilitate learning, looking at the design and use of learning environments for this purpose. Drawing on previous work in science education and educational computing, three approaches to the design of learning environments utilizing cognitive conflict are introduced. These approaches are described as confrontational, guiding and explanatory, based on the level of the designer's concern with learners' pre-existing understanding, the extent of modification to the learner's conceptual structures intended by the designer, and the directness of steering the learner to the desired understanding. The examples used to illustrate the three approaches are taken from science education, specifically software for learning about Newtonian physics; it is contended however that the argument of the paper applies more broadly, to learning environments for many curriculum areas for school levels and in higher education.

  12. Mobile e-Learning for Next Generation Communication Environment

    Science.gov (United States)

    Wu, Tin-Yu; Chao, Han-Chieh

    2008-01-01

    This article develops an environment for mobile e-learning that includes an interactive course, virtual online labs, an interactive online test, and lab-exercise training platform on the fourth generation mobile communication system. The Next Generation Learning Environment (NeGL) promotes the term "knowledge economy." Inter-networking…

  13. Social Networks as Learning Environments for Higher Education

    Directory of Open Access Journals (Sweden)

    J.A.Cortés

    2014-09-01

    Full Text Available Learning is considered as a social activity, a student does not learn only of the teacher and the textbook or only in the classroom, learn also from many other agents related to the media, peers and society in general. And since the explosion of the Internet, the information is within the reach of everyone, is there where the main area of opportunity in new technologies applied to education, as well as taking advantage of recent socialization trends that can be leveraged to improve not only informing of their daily practices, but rather as a tool that explore different branches of education research. One can foresee the future of higher education as a social learning environment, open and collaborative, where people construct knowledge in interaction with others, in a comprehensive manner. The mobility and ubiquity that provide mobile devices enable the connection from anywhere and at any time. In modern educational environments can be expected to facilitate mobile devices in the classroom expansion in digital environments, so that students and teachers can build the teaching-learning process collectively, this partial derivative results in the development of draft research approved by the CONADI in “Universidad Cooperativa de Colombia”, "Social Networks: A teaching strategy in learning environments in higher education."

  14. Designing Virtual Learning Environments

    DEFF Research Database (Denmark)

    Veirum, Niels Einar

    2003-01-01

    The main objective of this working paper is to present a conceptual model for media integrated communication in virtual learning environments. The model for media integrated communication is very simple and identifies the necessary building blocks for virtual place making in a synthesis of methods...

  15. Environmental Impact: Reinforce a Culture of Continuous Learning with These Key Elements

    Science.gov (United States)

    Edwards, Brian; Gammell, Jessica

    2017-01-01

    Fostering a robust professional learning culture in schools is vital for attracting and retaining high-caliber talent. Education leaders are looking for guidance on how to establish and sustain an environment that fosters continuous learning. Based on their experience in helping educators design and implement professional learning systems, the…

  16. Students’ perception of the learning environment in a distributed medical programme

    Directory of Open Access Journals (Sweden)

    Kiran Veerapen

    2010-09-01

    Full Text Available Background : The learning environment of a medical school has a significant impact on students’ achievements and learning outcomes. The importance of equitable learning environments across programme sites is implicit in distributed undergraduate medical programmes being developed and implemented. Purpose : To study the learning environment and its equity across two classes and three geographically separate sites of a distributed medical programme at the University of British Columbia Medical School that commenced in 2004. Method : The validated Dundee Ready Educational Environment Survey was sent to all students in their 2nd and 3rd year (classes graduating in 2009 and 2008 of the programme. The domains of the learning environment surveyed were: students’ perceptions of learning, students’ perceptions of teachers, students’ academic self-perceptions, students’ perceptions of the atmosphere, and students’ social self-perceptions. Mean scores, frequency distribution of responses, and inter- and intrasite differences were calculated. Results : The perception of the global learning environment at all sites was more positive than negative. It was characterised by a strongly positive perception of teachers. The work load and emphasis on factual learning were perceived negatively. Intersite differences within domains of the learning environment were more evident in the pioneer class (2008 of the programme. Intersite differences consistent across classes were largely related to on-site support for students. Conclusions : Shared strengths and weaknesses in the learning environment at UBC sites were evident in areas that were managed by the parent institution, such as the attributes of shared faculty and curriculum. A greater divergence in the perception of the learning environment was found in domains dependent on local arrangements and social factors that are less amenable to central regulation. This study underlines the need for ongoing

  17. Students' perception of the learning environment in a distributed medical programme.

    Science.gov (United States)

    Veerapen, Kiran; McAleer, Sean

    2010-09-24

    The learning environment of a medical school has a significant impact on students' achievements and learning outcomes. The importance of equitable learning environments across programme sites is implicit in distributed undergraduate medical programmes being developed and implemented. To study the learning environment and its equity across two classes and three geographically separate sites of a distributed medical programme at the University of British Columbia Medical School that commenced in 2004. The validated Dundee Ready Educational Environment Survey was sent to all students in their 2nd and 3rd year (classes graduating in 2009 and 2008) of the programme. The domains of the learning environment surveyed were: students' perceptions of learning, students' perceptions of teachers, students' academic self-perceptions, students' perceptions of the atmosphere, and students' social self-perceptions. Mean scores, frequency distribution of responses, and inter- and intrasite differences were calculated. The perception of the global learning environment at all sites was more positive than negative. It was characterised by a strongly positive perception of teachers. The work load and emphasis on factual learning were perceived negatively. Intersite differences within domains of the learning environment were more evident in the pioneer class (2008) of the programme. Intersite differences consistent across classes were largely related to on-site support for students. Shared strengths and weaknesses in the learning environment at UBC sites were evident in areas that were managed by the parent institution, such as the attributes of shared faculty and curriculum. A greater divergence in the perception of the learning environment was found in domains dependent on local arrangements and social factors that are less amenable to central regulation. This study underlines the need for ongoing comparative evaluation of the learning environment at the distributed sites and

  18. Use of frontal lobe hemodynamics as reinforcement signals to an adaptive controller.

    Directory of Open Access Journals (Sweden)

    Marcello M DiStasio

    Full Text Available Decision-making ability in the frontal lobe (among other brain structures relies on the assignment of value to states of the animal and its environment. Then higher valued states can be pursued and lower (or negative valued states avoided. The same principle forms the basis for computational reinforcement learning controllers, which have been fruitfully applied both as models of value estimation in the brain, and as artificial controllers in their own right. This work shows how state desirability signals decoded from frontal lobe hemodynamics, as measured with near-infrared spectroscopy (NIRS, can be applied as reinforcers to an adaptable artificial learning agent in order to guide its acquisition of skills. A set of experiments carried out on an alert macaque demonstrate that both oxy- and deoxyhemoglobin concentrations in the frontal lobe show differences in response to both primarily and secondarily desirable (versus undesirable stimuli. This difference allows a NIRS signal classifier to serve successfully as a reinforcer for an adaptive controller performing a virtual tool-retrieval task. The agent's adaptability allows its performance to exceed the limits of the NIRS classifier decoding accuracy. We also show that decoding state desirabilities is more accurate when using relative concentrations of both oxyhemoglobin and deoxyhemoglobin, rather than either species alone.

  19. Reinforcement learning for a biped robot based on a CPG-actor-critic method.

    Science.gov (United States)

    Nakamura, Yutaka; Mori, Takeshi; Sato, Masa-aki; Ishii, Shin

    2007-08-01

    Animals' rhythmic movements, such as locomotion, are considered to be controlled by neural circuits called central pattern generators (CPGs), which generate oscillatory signals. Motivated by this biological mechanism, studies have been conducted on the rhythmic movements controlled by CPG. As an autonomous learning framework for a CPG controller, we propose in this article a reinforcement learning method we call the "CPG-actor-critic" method. This method introduces a new architecture to the actor, and its training is roughly based on a stochastic policy gradient algorithm presented recently. We apply this method to an automatic acquisition problem of control for a biped robot. Computer simulations show that training of the CPG can be successfully performed by our method, thus allowing the biped robot to not only walk stably but also adapt to environmental changes.

  20. Effects of prior knowledge on learning from different compositions of representations in a mobile learning environment

    NARCIS (Netherlands)

    T.-C. Liu (Tzu-Chien); Y.-C. Lin (Yi-Chun); G.W.C. Paas (Fred)

    2014-01-01

    textabstractTwo experiments examined the effects of prior knowledge on learning from different compositions of multiple representations in a mobile learning environment on plant leaf morphology for primary school students. Experiment 1 compared the learning effects of a mobile learning environment

  1. Virtual Learning Environment for Interactive Engagement with Advanced Quantum Mechanics

    Science.gov (United States)

    Pedersen, Mads Kock; Skyum, Birk; Heck, Robert; Müller, Romain; Bason, Mark; Lieberoth, Andreas; Sherson, Jacob F.

    2016-01-01

    A virtual learning environment can engage university students in the learning process in ways that the traditional lectures and lab formats cannot. We present our virtual learning environment "StudentResearcher," which incorporates simulations, multiple-choice quizzes, video lectures, and gamification into a learning path for quantum…

  2. Reinforcement learning solution for HJB equation arising in constrained optimal control problem.

    Science.gov (United States)

    Luo, Biao; Wu, Huai-Ning; Huang, Tingwen; Liu, Derong

    2015-11-01

    The constrained optimal control problem depends on the solution of the complicated Hamilton-Jacobi-Bellman equation (HJBE). In this paper, a data-based off-policy reinforcement learning (RL) method is proposed, which learns the solution of the HJBE and the optimal control policy from real system data. One important feature of the off-policy RL is that its policy evaluation can be realized with data generated by other behavior policies, not necessarily the target policy, which solves the insufficient exploration problem. The convergence of the off-policy RL is proved by demonstrating its equivalence to the successive approximation approach. Its implementation procedure is based on the actor-critic neural networks structure, where the function approximation is conducted with linearly independent basis functions. Subsequently, the convergence of the implementation procedure with function approximation is also proved. Finally, its effectiveness is verified through computer simulations. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Investigating the Relationship between Instructors' Use of Active-Learning Strategies and Students' Conceptual Understanding and Affective Changes in Introductory Biology: A Comparison of Two Active-Learning Environments.

    Science.gov (United States)

    Cleveland, Lacy M; Olimpo, Jeffrey T; DeChenne-Peters, Sue Ellen

    2017-01-01

    In response to calls for reform in undergraduate biology education, we conducted research examining how varying active-learning strategies impacted students' conceptual understanding, attitudes, and motivation in two sections of a large-lecture introductory cell and molecular biology course. Using a quasi-experimental design, we collected quantitative data to compare participants' conceptual understanding, attitudes, and motivation in the biological sciences across two contexts that employed different active-learning strategies and that were facilitated by unique instructors. Students participated in either graphic organizer/worksheet activities or clicker-based case studies. After controlling for demographic and presemester affective differences, we found that students in both active-learning environments displayed similar and significant learning gains. In terms of attitudinal and motivational data, significant differences were observed for two attitudinal measures. Specifically, those students who had participated in graphic organizer/worksheet activities demonstrated more expert-like attitudes related to their enjoyment of biology and ability to make real-world connections. However, all motivational and most attitudinal data were not significantly different between the students in the two learning environments. These data reinforce the notion that active learning is associated with conceptual change and suggests that more research is needed to examine the differential effects of varying active-learning strategies on students' attitudes and motivation in the domain. © 2017 L. M. Cleveland et al. CBE—Life Sciences Education © 2017 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).

  4. Using Interactive Animations to Enhance Teaching, Learning, and Retention of Respiration Pathway Concepts in Face-to-Face and Online High School, Undergraduate, and Continuing Education Learning Environments

    Directory of Open Access Journals (Sweden)

    Sederick C. Rice

    2013-02-01

    Full Text Available One major tool set teachers/instructors can use is online interactive animations, which presents content in a way that helps pique students' interest and differentiates instructional content.  The Virtual Cell Animation Collections (VCAC, developed from the Molecular and Cellular Biology Learning Center, has developed a series of online interactive animations that provide teacher/instructors and students with immersive learning tools for studying and understanding respiration processes.  These virtual tools work as powerful instructional devices to help explain and reinforce concepts of metabolic pathways that would normally be taught traditionally using static textbook pages or by neumonic flashcards. High school, undergraduate, and continuing education students of today learn and retain knowledge differently than their predecessors.  Now teachers face new challenges and must engage and assess students, within a small window during classroom instruction, but also have the skills to provide useful content in distance learning environments.  Educators have to keep up with changing trends in education as a result of technological advances, higher student/teacher ratios, and the influence of social media on education. It is critical for teachers/instructors to be able to present content that not only keeps students interested but also helps bridge learning gaps. VCAC provides high school, undergraduate, and continuing education biology or life science teachers/instructors with classroom strategies and tools for introducing respiration content through free open source online resources. VCAC content supports the development of more inquiry-based classroom and distance-learning environments that can be facilitated by teachers/instructors, which helps improve retention of important respiration subject content and problem-based learning skills for students.

  5. Early results of experiments with responsive open learning environments

    OpenAIRE

    Friedrich, M.; Wolpers, M.; Shen, R.; Ullrich, C.; Klamma, R.; Renzel, D.; Richert, A.; Heiden, B. von der

    2011-01-01

    Responsive open learning environments (ROLEs) are the next generation of personal learning environments (PLEs). While PLEs rely on the simple aggregation of existing content and services mainly using Web 2.0 technologies, ROLEs are transforming lifelong learning by introducing a new infrastructure on a global scale while dealing with existing learning management systems, institutions, and technologies. The requirements engineering process in highly populated test-beds is as important as the t...

  6. Interactive learning environments to support independent learning: the impact of discernability of embedded support devices

    NARCIS (Netherlands)

    Martens, Rob; Valcke, Martin; Portier, Stanley

    2017-01-01

    In this article the effectivity of prototypes of interactive learning environments (ILE) is investigated. These computer-based environments are used for independent learning. In the learning materials, represented in the prototypes, a clear distinction is made between the basic content and embedded

  7. A Collaborative Model for Ubiquitous Learning Environments

    Science.gov (United States)

    Barbosa, Jorge; Barbosa, Debora; Rabello, Solon

    2016-01-01

    Use of mobile devices and widespread adoption of wireless networks have enabled the emergence of Ubiquitous Computing. Application of this technology to improving education strategies gave rise to Ubiquitous e-Learning, also known as Ubiquitous Learning. There are several approaches to organizing ubiquitous learning environments, but most of them…

  8. Learning Design for a Successful Blended E-learning Environment: Cultural Dimensions

    OpenAIRE

    Al-Huwail, N.; Gulf Univ. for Science & Technology; Al-Sharhan, S.; Gulf Univ. for Science & Technology; Al-Hunaiyyan, A.; Gulf Univ. for Science & Technology

    2007-01-01

    Blended e-learning is becoming an educational issue especially with the new development of e-learning technology and globalization. This paper presents a new framework for delivery environment in blended e-learning. In addition, new concepts related to the learning strategies and multimedia design in blended e-learning are introduced. The work focuses on the critical cultural factors that affect a blended elearning system. Since it is common that good systems may fail due to cultural issues, ...

  9. Students’ digital learning environments

    DEFF Research Database (Denmark)

    Caviglia, Francesco; Dalsgaard, Christian; Davidsen, Jacob

    2018-01-01

    of the study are 1) to provide an overview of tools for students’ study activities, 2) to identify the most used and most important tools for students and 3) to discover which activities the tools are used for. The empirical study reveals that the students have a varied use of digital media. Some of the most......, the study shows that most of the important tools are not related to the systems provided by the educational institutions. Based on the study, the paper concludes with a discussion of how institutional systems connect to the other tools in the students’ practices, and how we can qualify students’ digital......The objective of the paper is to examine the nature of students’ digital learning environments to understand the interplay of institutional systems and tools that are managed by the students themselves. The paper is based on a study of 128 students’ digital learning environments. The objectives...

  10. Vicarious Reinforcement In Rhesus Macaques (Macaca mulatta

    Directory of Open Access Journals (Sweden)

    Steve W. C. Chang

    2011-03-01

    Full Text Available What happens to others profoundly influences our own behavior. Such other-regarding outcomes can drive observational learning, as well as motivate cooperation, charity, empathy, and even spite. Vicarious reinforcement may serve as one of the critical mechanisms mediating the influence of other-regarding outcomes on behavior and decision-making in groups. Here we show that rhesus macaques spontaneously derive vicarious reinforcement from observing rewards given to another monkey, and that this reinforcement can motivate them to subsequently deliver or withhold rewards from the other animal. We exploited Pavlovian and instrumental conditioning to associate rewards to self (M1 and/or rewards to another monkey (M2 with visual cues. M1s made more errors in the instrumental trials when cues predicted reward to M2 compared to when cues predicted reward to M1, but made even more errors when cues predicted reward to no one. In subsequent preference tests between pairs of conditioned cues, M1s preferred cues paired with reward to M2 over cues paired with reward to no one. By contrast, M1s preferred cues paired with reward to self over cues paired with reward to both monkeys simultaneously. Rates of attention to M2 strongly predicted the strength and valence of vicarious reinforcement. These patterns of behavior, which were absent in nonsocial control trials, are consistent with vicarious reinforcement based upon sensitivity to observed, or counterfactual, outcomes with respect to another individual. Vicarious reward may play a critical role in shaping cooperation and competition, as well as motivating observational learning and group coordination in rhesus macaques, much as it does in humans. We propose that vicarious reinforcement signals mediate these behaviors via homologous neural circuits involved in reinforcement learning and decision-making.

  11. Vicarious reinforcement in rhesus macaques (macaca mulatta).

    Science.gov (United States)

    Chang, Steve W C; Winecoff, Amy A; Platt, Michael L

    2011-01-01

    What happens to others profoundly influences our own behavior. Such other-regarding outcomes can drive observational learning, as well as motivate cooperation, charity, empathy, and even spite. Vicarious reinforcement may serve as one of the critical mechanisms mediating the influence of other-regarding outcomes on behavior and decision-making in groups. Here we show that rhesus macaques spontaneously derive vicarious reinforcement from observing rewards given to another monkey, and that this reinforcement can motivate them to subsequently deliver or withhold rewards from the other animal. We exploited Pavlovian and instrumental conditioning to associate rewards to self (M1) and/or rewards to another monkey (M2) with visual cues. M1s made more errors in the instrumental trials when cues predicted reward to M2 compared to when cues predicted reward to M1, but made even more errors when cues predicted reward to no one. In subsequent preference tests between pairs of conditioned cues, M1s preferred cues paired with reward to M2 over cues paired with reward to no one. By contrast, M1s preferred cues paired with reward to self over cues paired with reward to both monkeys simultaneously. Rates of attention to M2 strongly predicted the strength and valence of vicarious reinforcement. These patterns of behavior, which were absent in non-social control trials, are consistent with vicarious reinforcement based upon sensitivity to observed, or counterfactual, outcomes with respect to another individual. Vicarious reward may play a critical role in shaping cooperation and competition, as well as motivating observational learning and group coordination in rhesus macaques, much as it does in humans. We propose that vicarious reinforcement signals mediate these behaviors via homologous neural circuits involved in reinforcement learning and decision-making.

  12. Practical Applications and Experiences in K-20 Blended Learning Environments

    Science.gov (United States)

    Kyei-Blankson, Lydia, Ed.; Ntuli, Esther, Ed.

    2014-01-01

    Learning environments continue to change considerably and is no longer confined to the face-to-face classroom setting. As learning options have evolved, educators must adopt a variety of pedagogical strategies and innovative technologies to enable learning. "Practical Applications and Experiences in K-20 Blended Learning Environments"…

  13. Distributed Scaffolding: Synergy in Technology-Enhanced Learning Environments

    Science.gov (United States)

    Ustunel, Hale H.; Tokel, Saniye Tugba

    2018-01-01

    When technology is employed challenges increase in learning environments. Kim et al. ("Sci Educ" 91(6):1010-1030, 2007) presented a pedagogical framework that provides a valid technology-enhanced learning environment. The purpose of the present design-based study was to investigate the micro context dimension of this framework and to…

  14. Digital Communication Applications in the Online Learning Environment

    Science.gov (United States)

    Lambeth, Krista Jill

    2011-01-01

    Scope and method of study. The purpose of this study was for the researcher to obtain a better understanding of the online learning environment, to explore the various ways online class instructors have incorporated digital communication applications to try and provide learner-centered online learning environments, and to examine students'…

  15. A Reinforcement Learning Framework for Spiking Networks with Dynamic Synapses

    Directory of Open Access Journals (Sweden)

    Karim El-Laithy

    2011-01-01

    Full Text Available An integration of both the Hebbian-based and reinforcement learning (RL rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.

  16. Student-Centred Learning Environments: An Investigation into Student Teachers' Instructional Preferences and Approaches to Learning

    Science.gov (United States)

    Baeten, Marlies; Dochy, Filip; Struyven, Katrien; Parmentier, Emmeline; Vanderbruggen, Anne

    2016-01-01

    The use of student-centred learning environments in education has increased. This study investigated student teachers' instructional preferences for these learning environments and how these preferences are related to their approaches to learning. Participants were professional Bachelor students in teacher education. Instructional preferences and…

  17. Preparing Teachers for Emerging Blended Learning Environments

    Science.gov (United States)

    Oliver, Kevin M.; Stallings, Dallas T.

    2014-01-01

    Blended learning environments that merge learning strategies, resources, and modes have been implemented in higher education settings for nearly two decades, and research has identified many positive effects. More recently, K-12 traditional and charter schools have begun to experiment with blended learning, but to date, research on the effects of…

  18. Digital Learning Environments: New possibilities and opportunities

    Directory of Open Access Journals (Sweden)

    Otto Peters

    2000-06-01

    Full Text Available This paper deals with the general problem whether and, if so, how far the impact of the digitised learning environment on our traditional distance education will change the way in which teachers teach and learners learn. Are the dramatic innovations a menace to established ways of learning and teaching or are they the panacea to overcome some of the difficulties of our system of higher learning and to solve some of our educational problems caused by the big and far-reaching educational paradigm shift? This paper will not deal with technical or technological achievements in the field of information and communication which are, of course, revolutionary and to be acknowledged and admired. Rather, the digital learning environment will be analysed from a pedagogical point of view in order to find out what exactly are the didactic possibilities and opportunities and what are its foreseeable disadvantages.

  19. Learning in the e-environment: new media and learning for the future

    Directory of Open Access Journals (Sweden)

    Milan Matijević

    2015-03-01

    Full Text Available We live in times of rapid change in all areas of science, technology, communication and social life. Every day we are asked to what extent school prepares us for these changes and for life in a new, multimedia environment. Children and adolescents spend less time at school or in other settings of learning than they do outdoors or within other social communities (family, clubs, societies, religious institutions and the like. Experts must constantly inquire about what exactly influences learning and development in our rich media environment. The list of the most important life competences has significantly changed and expanded since the last century. Educational experts are attempting to predict changes in the content and methodology of learning at the beginning of the 21st century. Answers are sought to key questions such as: what should one learn; how should one learn; where should one learn; why should one learn; and how do these answers relate to the new learning environment? In his examination of the way children and young people learn and grow up, the author places special attention on the relationship between personal and non-personal communication (e.g. the internet, mobile phones and different types of e-learning. He deals with today's questions by looking back to some of the more prominent authors and studies of the past fifty years that tackled identical or similar questions (Alvin Toffler, Ivan Illich, George Orwell, and the members of the Club of Rome. The conclusion reached is that in today's world of rapid and continuous change, it is much more crucial than in the last century, both, to be able to learn, and to adapt to learning with the help of new media.

  20. Investigation of a reinforcement-based toilet training procedure for children with autism.

    Science.gov (United States)

    Cicero, Frank R; Pfadt, Al

    2002-01-01

    Independent toileting is an important developmental skill which individuals with developmental disabilities often find a challenge to master. Effective toilet training interventions have been designed which rely on a combination of basic operant principles of positive reinforcement and punishment. In the present study, the effectiveness of a reinforcement-based toilet training intervention was investigated with three children with a diagnosis of autism. Procedures included a combination of positive reinforcement, graduated guidance, scheduled practice trials and forward prompting. Results indicated that all procedures were implemented in response to urination accidents. A three participants reduced urination accidents to zero and learned to spontaneously request use of the bathroom within 7-11 days of training. Gains were maintained over 6-month and 1-year follow-ups. Findings suggest that the proposed procedure is an effective and rapid method of toilet training, which can be implemented within a structured school setting with generalization to the home environment.

  1. Gendered learning environments in managerial work

    OpenAIRE

    Gustavsson, Maria; Fogelberg Eriksson, Anna

    2010-01-01

    The aim is to investigate female and male managers’ learning environments with particular focus on their opportunities for and barriers to learning and career development in the managerial work of a male-dominated industrial company. In the case study 42 managers, 15 women and 27 men in the company were interviewed. The findings demonstrate that the male managers were provided with significantly richer opportunities to participate in activities conducive to learning and career development tha...

  2. Implementation of real-time energy management strategy based on reinforcement learning for hybrid electric vehicles and simulation validation.

    Science.gov (United States)

    Kong, Zehui; Zou, Yuan; Liu, Teng

    2017-01-01

    To further improve the fuel economy of series hybrid electric tracked vehicles, a reinforcement learning (RL)-based real-time energy management strategy is developed in this paper. In order to utilize the statistical characteristics of online driving schedule effectively, a recursive algorithm for the transition probability matrix (TPM) of power-request is derived. The reinforcement learning (RL) is applied to calculate and update the control policy at regular time, adapting to the varying driving conditions. A facing-forward powertrain model is built in detail, including the engine-generator model, battery model and vehicle dynamical model. The robustness and adaptability of real-time energy management strategy are validated through the comparison with the stationary control strategy based on initial transition probability matrix (TPM) generated from a long naturalistic driving cycle in the simulation. Results indicate that proposed method has better fuel economy than stationary one and is more effective in real-time control.

  3. Implementation of real-time energy management strategy based on reinforcement learning for hybrid electric vehicles and simulation validation.

    Directory of Open Access Journals (Sweden)

    Zehui Kong

    Full Text Available To further improve the fuel economy of series hybrid electric tracked vehicles, a reinforcement learning (RL-based real-time energy management strategy is developed in this paper. In order to utilize the statistical characteristics of online driving schedule effectively, a recursive algorithm for the transition probability matrix (TPM of power-request is derived. The reinforcement learning (RL is applied to calculate and update the control policy at regular time, adapting to the varying driving conditions. A facing-forward powertrain model is built in detail, including the engine-generator model, battery model and vehicle dynamical model. The robustness and adaptability of real-time energy management strategy are validated through the comparison with the stationary control strategy based on initial transition probability matrix (TPM generated from a long naturalistic driving cycle in the simulation. Results indicate that proposed method has better fuel economy than stationary one and is more effective in real-time control.

  4. Education for Knowledge Society: Learning and Scientific Innovation Environment

    Directory of Open Access Journals (Sweden)

    Alexander O. Karpov

    2017-11-01

    Full Text Available Cognitive-active learning research-type environment is the fundamental component of the education system for the knowledge society. The purpose of the research is the development of conceptual bases and a constructional model of a cognitively active learning environment that stimulates the creation of new knowledge and its socio-economic application. Research methods include epistemic-didactic analysis of empirical material collected as a result of the study of research environments at schools and universities; conceptualization and theoretical modeling of the cognitively active surrounding, which provides an infrastructure of the research-type cognitive process. The empirical material summarized in this work was collected in the research-cognitive space of the “Step into the Future” program, which is one of the most powerful systems of research education in present-day Russia. The article presents key points of the author's concept of generative learning environments and a model of learning and scientific innovation environment implemented at Russian schools and universities.

  5. From Personal to Social: Learning Environments that Work

    Science.gov (United States)

    Camacho, Mar; Guilana, Sonia

    2011-01-01

    VLE (Virtual Learning Environments) are rapidly falling short to meet the demands of a networked society. Web 2.0 and social networks are proving to offer a more personalized, open environment for students to learn formally as they are already doing informally. With the irruption of social media into society, and therefore, education, many voices…

  6. Design issues of a reinforcement-based self-learning fuzzy controller for petrochemical process control

    Science.gov (United States)

    Yen, John; Wang, Haojin; Daugherity, Walter C.

    1992-01-01

    Fuzzy logic controllers have some often-cited advantages over conventional techniques such as PID control, including easier implementation, accommodation to natural language, and the ability to cover a wider range of operating conditions. One major obstacle that hinders the broader application of fuzzy logic controllers is the lack of a systematic way to develop and modify their rules; as a result the creation and modification of fuzzy rules often depends on trial and error or pure experimentation. One of the proposed approaches to address this issue is a self-learning fuzzy logic controller (SFLC) that uses reinforcement learning techniques to learn the desirability of states and to adjust the consequent part of its fuzzy control rules accordingly. Due to the different dynamics of the controlled processes, the performance of a self-learning fuzzy controller is highly contingent on its design. The design issue has not received sufficient attention. The issues related to the design of a SFLC for application to a petrochemical process are discussed, and its performance is compared with that of a PID and a self-tuning fuzzy logic controller.

  7. Application of Reinforcement Learning in Cognitive Radio Networks: Models and Algorithms

    Directory of Open Access Journals (Sweden)

    Kok-Lim Alvin Yau

    2014-01-01

    Full Text Available Cognitive radio (CR enables unlicensed users to exploit the underutilized spectrum in licensed spectrum whilst minimizing interference to licensed users. Reinforcement learning (RL, which is an artificial intelligence approach, has been applied to enable each unlicensed user to observe and carry out optimal actions for performance enhancement in a wide range of schemes in CR, such as dynamic channel selection and channel sensing. This paper presents new discussions of RL in the context of CR networks. It provides an extensive review on how most schemes have been approached using the traditional and enhanced RL algorithms through state, action, and reward representations. Examples of the enhancements on RL, which do not appear in the traditional RL approach, are rules and cooperative learning. This paper also reviews performance enhancements brought about by the RL algorithms and open issues. This paper aims to establish a foundation in order to spark new research interests in this area. Our discussion has been presented in a tutorial manner so that it is comprehensive to readers outside the specialty of RL and CR.

  8. Axial Compression Tests on Corroded Reinforced Concrete Columns Consolidated with Fibre Reinforced Polymers

    Directory of Open Access Journals (Sweden)

    Bin Ding

    2017-06-01

    Full Text Available Reinforced concrete structure featured by strong bearing capacity, high rigidity, good integrity, good fire resistance, and extensive applicability occupies a mainstream position in contemporary architecture. However, with the development of social economy, people need higher requirements on architectural structure; durability, especially, has been extensively researched. Because of the higher requirement on building material, ordinary reinforced concrete structure has not been able to satisfy the demand. As a result, some new materials and structures have emerged, for example, fibre reinforced polymers. Compared to steel reinforcement, fibre reinforced polymers have many advantages, such as high tensile strength, good durability, good shock absorption, low weight, and simple construction. The application of fibre reinforced polymers in architectural structure can effectively improve the durability of the concrete structure and lower the maintenance, reinforcement, and construction costs in severe environments. Based on the concepts of steel tube concrete, fibre reinforced composite material confined concrete, and fibre reinforced composite material tubed concrete, this study proposes a novel composite structure, i.e., fibre reinforced composite material and steel tube concrete composite structure. The structure was developed by pasting fibre around steel tube concrete and restraining core concrete using fibre reinforced composite material and steel tubes. The bearing capacity and ultimate deformation capacity of the structure was tested using column axial compression test.

  9. Virtual Learning Environments and Learning Forms -experiments in ICT-based learning

    DEFF Research Database (Denmark)

    Helbo, Jan; Knudsen, Morten

    2004-01-01

    This paper report the main results of a three year experiment in ICT-based distance learning. The results are based on a full scale experiment in the education, Master of Industrial Information Technology (MII) and is one of many projects deeply rooted in the project Virtual Learning Environments...... and Learning forms (ViLL). The experiment was to transfer a well functioning on-campus engineering program based on project organized collaborative learning to a technology supported distance education program. After three years the experiments indicate that adjustments are required in this transformation....... The main problem is that we do not find the same self regulatoring learning effect in the group work among the off-campus students as is the case for on-campus students. Based on feedback from evaluation questionnaires and discussions with the students didactic adjustments have been made. The revised...

  10. Ethnography in the Danish Veterinary Learning Environment

    Directory of Open Access Journals (Sweden)

    Camilla Kirketerp Nielsen

    2015-11-01

    Full Text Available The overall objective of this project is research-based development, implementation and evaluation of a game-based learning concept to be used in the veterinary education. Herd visits and animal contact are essential for the development of veterinary competences and skills during education. Yet veterinary students have little occasion to reach/attain a proper level of confidence in their own skills/abilities, as they have limited “training-facilities” (Kneebone & Baillie, 2008. One possible solution mightbe to provide a safe, virtual environment (game-based where students could practise interdisciplinary clinical skills in an easily-accessible, interactive setting. A playable demo using Classical Swine Fever in a pig herd as an example has been produced for this purpose. In order totailor the game concept to the specific veterinary learning environment and to ensure compliance with both learning objectives and the actual learning processes/procedures of the veterinary students, the project contains both a developmental aspect (game development and an exploration of the academic (scholastic and profession (practice oriented learning context. The initial phase of the project was a preliminary exploration of the actual learning context, providing an important starting point for the upcoming phase in which I will concentrate on research-based development, implementation and evaluation of a game-based virtual environment in this course context. In the academic (scholastic and profession (practice oriented learning context of a veterinary course in Herd Health Management (Pig module,ethnographic studies have been conducted by using multiple data collection methods; participant observation, spontaneous dialogues and interviews (Borgnakke, 1996; Hammersley & Atkinson, 2007. All courserelated activities in the different learning spaces (commercial pig herds, auditoriums, post-mortem examinations, independent group work were followed.This paper will

  11. Students experiences with collaborative learning in asynchronous computer-supported collaborative learning environments.

    NARCIS (Netherlands)

    Dewiyanti, Silvia; Brand-Gruwel, Saskia; Jochems, Wim; Broers, Nick

    2008-01-01

    Dewiyanti, S., Brand-Gruwel, S., Jochems, W., & Broers, N. (2007). Students experiences with collaborative learning in asynchronous computer-supported collaborative learning environments. Computers in Human Behavior, 23, 496-514.

  12. Shifting workplace behavior to inspire learning: a journey to building a learning culture.

    Science.gov (United States)

    Schoonbeek, Sue; Henderson, Amanda

    2011-01-01

    This article discusses the process of building a learning culture. It began with establishing acceptance and connection with the nurse unit manager and the ward team. In the early phases of developing rapport, bullying became apparent. Because bullying undermines sharing and trust, the hallmarks of learning environments, the early intervention work assisted staff to recognize and counteract bullying behaviors. When predominantly positive relationships were restored, interactions that facilitated open communication, including asking questions and providing feedback-behaviors commensurate with learning in the workplace-were developed during regular in-service sessions. Staff participated in role-play and role modeling desired behaviors. Once staff became knowledgeable about positive learning interactions, reward and recognition strategies began to reinforce attitudes and behaviors that align with learning. Through rewards, all nurses had the opportunity to be recognized for their contribution. Nurses who excelled were invited to become champions to continue engaging the key stakeholders to further build the learning environment. Copyright 2011, SLACK Incorporated.

  13. Hipatia: a hypermedia learning environment in mathematics

    Directory of Open Access Journals (Sweden)

    Marisol Cueli

    2016-01-01

    Full Text Available Literature revealed the benefits of different instruments for the development of mathematical competence, problem solving, self-regulated learning, affective-motivational aspects and intervention in students with specific difficulties in mathematics. However, no one tool combined all these variables. The aim of this study is to present and describe the design and development of a hypermedia tool, Hipatia. Hypermedia environments are, by definición, adaptive learning systems, which are usually a web-based application program that provide a personalized learning environment. This paper describes the principles on which Hipatia is based as well as a review of available technologies developed in different academic subjects. Hipatia was created to boost self-regulated learning, develop specific math skills, and promote effective problem solving. It was targeted toward fifth and sixth grade students with and without learning difficulties in mathematics. After the development of the tool, we concluded that it aligned well with the logic underlying the principles of self-regulated learning. Future research is needed to test the efficacy of Hipatia with an empirical methodology.

  14. Virtual language learning environments: the standardization of evaluation

    Directory of Open Access Journals (Sweden)

    Francesca Romero Forteza

    2014-03-01

    Full Text Available Nowadays there are many approaches aimed at helping learners acquire knowledge through the Internet. Virtual Learning Environments (VLE facilitate the acquisition and practice of skills, but some of these learning platforms are not evaluated or do not follow a standard that guarantees the quality of the tasks involved. In this paper, we set out a proposal for the standardization of the evaluation of VLEs available on the World Wide Web. Thus, the main objective of this study is to establish an evaluation template with which to test whether a VLE is appropriate for computer-assisted language learning (CALL. In the methodology section, a learning platform is analysed and tested to establish the characteristics learning platforms must have. Having established the design of the template for language learning environments, we concluded that a VLE must be versatile enough for application with different language learning and teaching approaches.

  15. Blended learning for reinforcing dental pharmacology in the clinical years: A qualitative analysis.

    Science.gov (United States)

    Eachempati, Prashanti; Kiran Kumar, K S; Sumanth, K N

    2016-10-01

    Blended learning has become the method of choice in educational institutions because of its systematic integration of traditional classroom teaching and online components. This study aims to analyze student's reflection regarding blended learning in dental pharmacology. A cross-sectional study was conducted in Faculty of Dentistry, Melaka-Manipal Medical College among 3 rd and 4 th year BDS students. A total of 145 dental students, who consented, participate in the study. Students were divided into 14 groups. Nine online sessions followed by nine face-to-face discussions were held. Each session addressed topics related to oral lesions and orofacial pain with pharmacological applications. After each week, students were asked to reflect on blended learning. On completion of 9 weeks, reflections were collected and analyzed. Qualitative analysis was done using thematic analysis model suggested by Braun and Clarke. The four main themes were identified, namely, merits of blended learning, skill in writing prescription for oral diseases, dosages of drugs, and identification of strengths and weakness. In general, the participants had a positive feedback regarding blended learning. Students felt more confident in drug selection and prescription writing. They could recollect the doses better after the online and face-to-face sessions. Most interestingly, the students reflected that they are able to identify their strength and weakness after the blended learning sessions. Blended learning module was successfully implemented for reinforcing dental pharmacology. The results obtained in this study enable us to plan future comparative studies to know the effectiveness of blended learning in dental pharmacology.

  16. CAPES: Unsupervised Storage Performance Tuning Using Neural Network-Based Deep Reinforcement Learning

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Parameter tuning is an important task of storage performance optimization. Current practice usually involves numerous tweak-benchmark cycles that are slow and costly. To address this issue, we developed CAPES, a model-less deep reinforcement learning-based unsupervised parameter tuning system driven by a deep neural network (DNN). It is designed to nd the optimal values of tunable parameters in computer systems, from a simple client-server system to a large data center, where human tuning can be costly and often cannot achieve optimal performance. CAPES takes periodic measurements of a target computer system’s state, and trains a DNN which uses Q-learning to suggest changes to the system’s current parameter values. CAPES is minimally intrusive, and can be deployed into a production system to collect training data and suggest tuning actions during the system’s daily operation. Evaluation of a prototype on a Lustre system demonstrates an increase in I/O throughput up to 45% at saturation point. About the...

  17. A Semi-Open Learning Environment for Mobile Robotics

    Directory of Open Access Journals (Sweden)

    Enrique Sucar

    2007-05-01

    Full Text Available We have developed a semi-open learning environment for mobile robotics, to learn through free exploration, but with specific performance criteria that guides the learning process. The environment includes virtual and remote robotics laboratories, and an intelligent virtual assistant the guides the students using the labs. A series of experiments in the virtual and remote labs are designed to gradually learn the basics of mobile robotics. Each experiment considers exploration and performance aspects, which are evaluated by the virtual assistant, giving feedback to the user. The virtual laboratory has been incorporated to a course in mobile robotics and used by a group of students. A preliminary evaluation shows that the intelligent tutor combined with the virtual laboratory can improve the learning process.

  18. Lung Nodule Detection via Deep Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Issa Ali

    2018-04-01

    Full Text Available Lung cancer is the most common cause of cancer-related death globally. As a preventive measure, the United States Preventive Services Task Force (USPSTF recommends annual screening of high risk individuals with low-dose computed tomography (CT. The resulting volume of CT scans from millions of people will pose a significant challenge for radiologists to interpret. To fill this gap, computer-aided detection (CAD algorithms may prove to be the most promising solution. A crucial first step in the analysis of lung cancer screening results using CAD is the detection of pulmonary nodules, which may represent early-stage lung cancer. The objective of this work is to develop and validate a reinforcement learning model based on deep artificial neural networks for early detection of lung nodules in thoracic CT images. Inspired by the AlphaGo system, our deep learning algorithm takes a raw CT image as input and views it as a collection of states, and output a classification of whether a nodule is present or not. The dataset used to train our model is the LIDC/IDRI database hosted by the lung nodule analysis (LUNA challenge. In total, there are 888 CT scans with annotations based on agreement from at least three out of four radiologists. As a result, there are 590 individuals having one or more nodules, and 298 having none. Our training results yielded an overall accuracy of 99.1% [sensitivity 99.2%, specificity 99.1%, positive predictive value (PPV 99.1%, negative predictive value (NPV 99.2%]. In our test, the results yielded an overall accuracy of 64.4% (sensitivity 58.9%, specificity 55.3%, PPV 54.2%, and NPV 60.0%. These early results show promise in solving the major issue of false positives in CT screening of lung nodules, and may help to save unnecessary follow-up tests and expenditures.

  19. Gendered Learning Environments in Managerial Work

    Science.gov (United States)

    Gustavsson, Maria; Eriksson, Anna Fogelberg

    2010-01-01

    The aim is to investigate female and male managers' learning environments with particular focus on their opportunities for and barriers to learning and career development in the managerial work of a male-dominated industrial company. In the case study 42 managers, 15 women and 27 men in the company were interviewed. The findings demonstrate that…

  20. The influence of the damaged reinforcing bars on the stress-strain state of the rein-forced concrete beams

    Directory of Open Access Journals (Sweden)

    Zenoviy Blikharskyy

    2017-04-01

    Full Text Available The article is devoted to the overall view of experimental research of reinforced concrete beams with the simultaneous influence of the corrosion environment and loading. The tests have been carried out upon the reinforced concrete specimens considering the corrosion in the acid environment, namely 10 % H2SO4 that have been taken as a model of the aggressive environment. The beams are with span equalling to 1,9m with different series of tensile armature, concrete compressive strength and different length of impact of corrosion (continuous and local. The influence of simultaneous action of the aggressive environment and loading on strength of reinforced-concrete beams has been described. For a detailed study of the effect of individual components there was suggested additional experimental modelling of the only tensile armature damage without concrete damage. It will investigate the influence of this factor irrespective of the concrete.

  1. Appreciation of learning environment and development of higher-order learning skills in a problem-based learning medical curriculum.

    Science.gov (United States)

    Mala-Maung; Abdullah, Azman; Abas, Zoraini W

    2011-12-01

    This cross-sectional study determined the appreciation of the learning environment and development of higher-order learning skills among students attending the Medical Curriculum at the International Medical University, Malaysia which provides traditional and e-learning resources with an emphasis on problem based learning (PBL) and self-directed learning. Of the 708 participants, the majority preferred traditional to e-resources. Students who highly appreciated PBL demonstrated a higher appreciation of e-resources. Appreciation of PBL is positively and significantly correlated with higher-order learning skills, reflecting the inculcation of self-directed learning traits. Implementers must be sensitive to the progress of learners adapting to the higher education environment and innovations, and to address limitations as relevant.

  2. Nigerian Physiotherapy Clinical Students' Perception of Their Learning Environment Measured by the Dundee Ready Education Environment Measure Inventory

    Science.gov (United States)

    Odole, Adesola C.; Oyewole, Olufemi O.; Ogunmola, Oluwasolape T.

    2014-01-01

    The identification of the learning environment and the understanding of how students learn will help teacher to facilitate learning and plan a curriculum to achieve the learning outcomes. The purpose of this study was to investigate undergraduate physiotherapy clinical students' perception of University of Ibadan's learning environment. Using the…

  3. A Novel Dynamic Spectrum Access Framework Based on Reinforcement Learning for Cognitive Radio Sensor Networks

    Directory of Open Access Journals (Sweden)

    Yun Lin

    2016-10-01

    Full Text Available Cognitive radio sensor networks are one of the kinds of application where cognitive techniques can be adopted and have many potential applications, challenges and future research trends. According to the research surveys, dynamic spectrum access is an important and necessary technology for future cognitive sensor networks. Traditional methods of dynamic spectrum access are based on spectrum holes and they have some drawbacks, such as low accessibility and high interruptibility, which negatively affect the transmission performance of the sensor networks. To address this problem, in this paper a new initialization mechanism is proposed to establish a communication link and set up a sensor network without adopting spectrum holes to convey control information. Specifically, firstly a transmission channel model for analyzing the maximum accessible capacity for three different polices in a fading environment is discussed. Secondly, a hybrid spectrum access algorithm based on a reinforcement learning model is proposed for the power allocation problem of both the transmission channel and the control channel. Finally, extensive simulations have been conducted and simulation results show that this new algorithm provides a significant improvement in terms of the tradeoff between the control channel reliability and the efficiency of the transmission channel.

  4. Students' Conception of Learning Environment and Their Approach to Learning and Its Implication on Quality Education

    Science.gov (United States)

    Belaineh, Matheas Shemelis

    2017-01-01

    Quality of education in higher institutions can be affected by different factors. It partly rests on the learning environment created by teachers and the learning approach students are employing during their learning. The main purpose of this study is to examine the learning environment at Mizan Tepi University from students' perspective and their…

  5. Students' perceptions of learning environment in Guilan University of Medical Sciences

    Directory of Open Access Journals (Sweden)

    Mahdokht Taheri

    2013-05-01

    Full Text Available  Background and purpose: There is an increasing interest and concern regarding the role of learning environment in undergraduate medical education in recent years. Educational environment is one of the most important factors determining the success of an effective curriculum. The quality of educational environment has been identified to be crucial for effective learning.we compared the perceptions of Basic sciences students and clinical phase regarding the learning environment and also to identify the gender related differences in their perceptions.Method: In this study, the Dundee Ready Education Environment Measure (DREEM inventory was used. The total score for all subscales is 200. In this study, DREEM was administered to undergraduate medical students of basic sciences students (n=120, and clinical phase (n= 100 and the scores were compared using a nonparametric test.Results Between the two batches, basic sciences students were found to be more than satisfied with the learning environment at GUMS compared to the clinical phase. Gender wise, there was not much difference in the students' perceptions.Conclusion: This study revealed that both groups of students perceived learning environment relatively more Negative than Positive in GUMS. It is essential for faculty members to place more efforts on observing principals of instructional design and create an appropriate educational environment in order to provide a better learning for students.Keywords:LEARNING ENVIRONMENT,,MEDICAL SCHOOL

  6. Online EEG-Based Workload Adaptation of an Arithmetic Learning Environment.

    Science.gov (United States)

    Walter, Carina; Rosenstiel, Wolfgang; Bogdan, Martin; Gerjets, Peter; Spüler, Martin

    2017-01-01

    In this paper, we demonstrate a closed-loop EEG-based learning environment, that adapts instructional learning material online, to improve learning success in students during arithmetic learning. The amount of cognitive workload during learning is crucial for successful learning and should be held in the optimal range for each learner. Based on EEG data from 10 subjects, we created a prediction model that estimates the learner's workload to obtain an unobtrusive workload measure. Furthermore, we developed an interactive learning environment that uses the prediction model to estimate the learner's workload online based on the EEG data and adapt the difficulty of the learning material to keep the learner's workload in an optimal range. The EEG-based learning environment was used by 13 subjects to learn arithmetic addition in the octal number system, leading to a significant learning effect. The results suggest that it is feasible to use EEG as an unobtrusive measure of cognitive workload to adapt the learning content. Further it demonstrates that a promptly workload prediction is possible using a generalized prediction model without the need for a user-specific calibration.

  7. Effect of Chloride on Tensile and Bending Capacities of Basalt FRP Mesh Reinforced Cementitious Thin Plates under Indoor and Marine Environments

    Directory of Open Access Journals (Sweden)

    Yan Xie

    2016-01-01

    Full Text Available This paper presented a durability experimental study for thin basalt fiber reinforced polymer (BFRP mesh reinforced cementitious plates under indoor and marine environment. The marine environment was simulated by wetting/drying cycles (wetting in salt water and drying in hot air. After 12 months of exposure, the effects of the chloride on the tensile and bending behaviors of the thin plate were investigated. In addition to the penetration of salt water, the chloride in the thin plate could be also from the sea sand since it is a component of the plate. Experimental results showed that the effect of the indoor exposure on the tensile capacity of the plate is not pronounced, while the marine exposure reduced the tensile capacity significantly. The bending capacity of the thin plates was remarkably reduced by both indoor and marine environmental exposure, in which the effect of the marine environment is more severe. The tensile capacity of the meshes extracted from the thin plates was tested, as well as the meshes immersed in salt solution for 30, 60, and 90 days. The test results confirmed that the chloride is the reason of the BFRP mesh deterioration. Moreover, as a comparison, the steel mesh reinforced thin plate was also tested and it has a similar durability performance.

  8. Improvement of Inquiry in a Complex Technology-Enhanced Learning Environment

    NARCIS (Netherlands)

    Pedaste, Margus; Kori, Külli; Maeots, Mario; de Jong, Anthonius J.M.; Riopel, Martin; Smyrnaiou, Zacharoula

    2016-01-01

    Inquiry learning is an effective approach in science education. Complex technology-enhanced learning environments are needed to apply inquiry worldwide to support knowledge gain and improvement of inquiry skills. In our study, we applied an ecology mission in the SCY-Lab learning environment and

  9. Personal Learning Environments for Supporting Out-of-Class Language Learning

    Science.gov (United States)

    Reinders, Hayo

    2014-01-01

    A Personal Learning Environment (PLE) it is a combination of tools (usually digital) and resources chosen by the learner to support different aspects of the learning process, from goal setting to materials selection to assessment. The importance of PLEs for teachers lies in their ability to help students develop autonomy and prepare them for…

  10. Learning in Non-Stationary Environments Methods and Applications

    CERN Document Server

    Lughofer, Edwin

    2012-01-01

    Recent decades have seen rapid advances in automatization processes, supported by modern machines and computers. The result is significant increases in system complexity and state changes, information sources, the need for faster data handling and the integration of environmental influences. Intelligent systems, equipped with a taxonomy of data-driven system identification and machine learning algorithms, can handle these problems partially. Conventional learning algorithms in a batch off-line setting fail whenever dynamic changes of the process appear due to non-stationary environments and external influences.   Learning in Non-Stationary Environments: Methods and Applications offers a wide-ranging, comprehensive review of recent developments and important methodologies in the field. The coverage focuses on dynamic learning in unsupervised problems, dynamic learning in supervised classification and dynamic learning in supervised regression problems. A later section is dedicated to applications in which dyna...

  11. Personal Learning Environments in Black and White

    NARCIS (Netherlands)

    Kalz, Marco

    2010-01-01

    Kalz, M. (2010, 22 January). Personal Learning Environments in Black and White. Presentation provided during the workshop "Informal Learning and the use of social software in veterinary medicine" of the Noviceproject (http://www.noviceproject.eu), Utrecht, The Netherlands.

  12. Supporting Student Learning in Computer Science Education via the Adaptive Learning Environment ALMA

    Directory of Open Access Journals (Sweden)

    Alexandra Gasparinatou

    2015-10-01

    Full Text Available This study presents the ALMA environment (Adaptive Learning Models from texts and Activities. ALMA supports the processes of learning and assessment via: (1 texts differing in local and global cohesion for students with low, medium, and high background knowledge; (2 activities corresponding to different levels of comprehension which prompt the student to practically implement different text-reading strategies, with the recommended activity sequence adapted to the student’s learning style; (3 an overall framework for informing, guiding, and supporting students in performing the activities; and; (4 individualized support and guidance according to student specific characteristics. ALMA also, supports students in distance learning or in blended learning in which students are submitted to face-to-face learning supported by computer technology. The adaptive techniques provided via ALMA are: (a adaptive presentation and (b adaptive navigation. Digital learning material, in accordance with the text comprehension model described by Kintsch, was introduced into the ALMA environment. This material can be exploited in either distance or blended learning.

  13. Nursing students' satisfaction of the clinical learning environment: a research study.

    Science.gov (United States)

    Papastavrou, Evridiki; Dimitriadou, Maria; Tsangari, Haritini; Andreou, Christos

    2016-01-01

    The acquisition of quality clinical experience within a supportive and pedagogically adjusted clinical learning environment is a significant concern for educational institutions. The quality of clinical learning usually reflects the quality of the curriculum structure. The assessment of the clinical settings as learning environment is a significant concern within the contemporary nursing education. The nursing students' satisfaction is considered as an important factor of such assessment, contributing to any potential reforms in order to optimize the learning activities and achievements within clinical settings. The aim of the study was to investigate nursing students' satisfaction of the clinical settings as learning environments. A quantitative descriptive, correlational design was used. A sample of 463 undergraduate nursing students from the three universities in Cyprus were participated. Data were collected using the Clinical Learning Environment, Supervision and Nurse Teacher (CLES + T). Nursing students were highly satisfied with the clinical learning environment and their satisfaction has been positively related to all clinical learning environment constructs namely the pedagogical atmosphere, the Ward Manager's leadership style, the premises of Nursing in the ward, the supervisory relationship (mentor) and the role of the Nurse Teacher (p relationship. The frequency of meetings among the students and the mentors increased the students' satisfaction with the clinical learning environment. It was also revealed that 1st year students were found to be more satisfied than the students in other years. The supervisory relationship was evaluated by the students as the most influential factor in their satisfaction with the clinical learning environment. Student's acceptance within the nursing team and a well-documented individual nursing care is also related with students' satisfaction. The pedagogical atmosphere is considered pivotal, with reference to

  14. Mobile Learning Environment System (MLES): The Case of Android-based Learning Application on Undergraduates' Learning

    OpenAIRE

    Hanafi, Hafizul Fahri; Samsudin, Khairulanuar

    2012-01-01

    Of late, mobile technology has introduced new, novel environment that can be capitalized to further enrich the teaching and learning process in classrooms. Taking cognizance of this promising setting, a study was undertaken to investigate the impact of such an environment enabled by android platform on the learning process among undergraduates of Sultan Idris Education University, Malaysia; in particular, this paper discusses critical aspects of the design and implementation of the android le...

  15. COOPERATIVE LEARNING ENVIRONMENT WITH THE WEB 2.0 TOOL E-PORTFOLIOS

    Directory of Open Access Journals (Sweden)

    Soh OR KAN

    2011-07-01

    Full Text Available In recent years, the development of information and communication technology (ICT in the world and Malaysia namely has created a significant impact on the methods of communicating information and knowledge to the learners and consequently, innovative teaching techniques have evolved to change the ways teachers teach and the ways students learn. This study main focuses are directed on developing a cooperative learning environment to promote an active learning environment of smart schools in Malaysia. Within this learning process, multimedia technology and Web 2.0 tools, namely, MyPortfolio were integrated to provide the students to learn on their own as well as to document their progress and experience within this cooperative learning environment. The core purpose of this study is to establish the impact on student learning, their perceptions and learning experiences of the cooperative learning environment using web 2.0 tools among the smart secondary schools students in Malaysia. Surveys were conducted to students to ascertain their reaction towards these learning environment activities. The results of this project were encouraging as the students managed to cope with each other to reach their common goal. The usage of blogs acts as an important tool to enhance team cooperation and to foster a learning community within the class.

  16. Personalised Peer-Supported Learning: The Peer-to-Peer Learning Environment (P2PLE)

    Science.gov (United States)

    Corneli, Joseph; Mikroyannidis, Alexander

    2011-01-01

    The Peer-to-Peer Learning Environment (P2PLE) is a proposed approach to helping learners co-construct their learning environment using recommendations about people, content, and tools. The work draws on current research on PLEs, and participant observation at the Peer-to-Peer University (P2PU). We are particularly interested in ways of eliciting…

  17. Personalized e-Learning Environments: Considering Students' Contexts

    Science.gov (United States)

    Eyharabide, Victoria; Gasparini, Isabela; Schiaffino, Silvia; Pimenta, Marcelo; Amandi, Analía

    Personalization in e-learning systems is vital since they are used by a wide variety of students with different characteristics. There are several approaches that aim at personalizing e-learning environments. However, they focus mainly on technological and/or networking aspects without caring of contextual aspects. They consider only a limited version of context while providing personalization. In our work, the objective is to improve e-learning environment personalization making use of a better understanding and modeling of the user’s educational and technological context using ontologies. We show an example of the use of our proposal in the AdaptWeb system, in which content and navigation recommendations are provided depending on the student’s context.

  18. The partial-reinforcement extinction effect and the contingent-sampling hypothesis.

    Science.gov (United States)

    Hochman, Guy; Erev, Ido

    2013-12-01

    The partial-reinforcement extinction effect (PREE) implies that learning under partial reinforcements is more robust than learning under full reinforcements. While the advantages of partial reinforcements have been well-documented in laboratory studies, field research has failed to support this prediction. In the present study, we aimed to clarify this pattern. Experiment 1 showed that partial reinforcements increase the tendency to select the promoted option during extinction; however, this effect is much smaller than the negative effect of partial reinforcements on the tendency to select the promoted option during the training phase. Experiment 2 demonstrated that the overall effect of partial reinforcements varies inversely with the attractiveness of the alternative to the promoted behavior: The overall effect is negative when the alternative is relatively attractive, and positive when the alternative is relatively unattractive. These results can be captured with a contingent-sampling model assuming that people select options that provided the best payoff in similar past experiences. The best fit was obtained under the assumption that similarity is defined by the sequence of the last four outcomes.

  19. Miscellany of Students' Satisfaction in an Asynchronous Learning Environment

    Science.gov (United States)

    Larbi-Siaw, Otu; Owusu-Agyeman, Yaw

    2017-01-01

    This study investigates the determinants of students' satisfaction in an asynchronous learning environment using seven key considerations: the e-learning environment, student-content interaction, student and student interaction, student-teacher interaction, group cohesion and timely participation, knowledge of Internet usage, and satisfaction. The…

  20. Virtual Learning Environments and Learning Forms -experiments in ICT-based learning

    DEFF Research Database (Denmark)

    Helbo, Jan; Knudsen, Morten

    2004-01-01

    This paper report the main results of a three year experiment in ICT-based distance learning. The results are based on a full scale experiment in the education, Master of Industrial Information Technology (MII) and is one of many projects deeply rooted in the project Virtual Learning Environments...... didactic model has until now been a positive experience........ The main problem is that we do not find the same self regulatoring learning effect in the group work among the off-campus students as is the case for on-campus students. Based on feedback from evaluation questionnaires and discussions with the students didactic adjustments have been made. The revised...

  1. The role of multiple neuromodulators in reinforcement learning that is based on competition between eligibility traces

    Directory of Open Access Journals (Sweden)

    Marco A Huertas

    2016-12-01

    Full Text Available The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment, and how to stop learning once the target behaviors are attained (stopping rule. To address the first problem, synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although these were mere theoretical constructs, recent experiements have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP and one for long-term depression (LTD, each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different

  2. The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces.

    Science.gov (United States)

    Huertas, Marco A; Schwettmann, Sarah E; Shouval, Harel Z

    2016-01-01

    The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL) refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment), and how to stop learning once the target behaviors are attained (stopping rule). To address the first problem synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although, these were mere theoretical constructs, recent experiments have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward nucleus cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP) and one for long-term depression (LTD), each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although, the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different neuromodulators for

  3. Deep Reinforcement Fuzzing

    OpenAIRE

    Böttinger, Konstantin; Godefroid, Patrice; Singh, Rishabh

    2018-01-01

    Fuzzing is the process of finding security vulnerabilities in input-processing code by repeatedly testing the code with modified inputs. In this paper, we formalize fuzzing as a reinforcement learning problem using the concept of Markov decision processes. This in turn allows us to apply state-of-the-art deep Q-learning algorithms that optimize rewards, which we define from runtime properties of the program under test. By observing the rewards caused by mutating with a specific set of actions...

  4. Creative and Playful Learning: Learning through Game Co-Creation and Games in a Playful Learning Environment

    Science.gov (United States)

    Kangas, Marjaana

    2010-01-01

    This paper reports on a pilot study in which children aged 7-12 (N = 68) had an opportunity to study in a novel formal and informal learning setting. The learning activities were extended from the classroom to the playful learning environment (PLE), an innovative playground enriched by technological tools. Curriculum-based learning was intertwined…

  5. Glass FRP reinforcement in rehabilitation of concrete marine infrastructure

    International Nuclear Information System (INIS)

    Newhook, John P.

    2006-01-01

    Fiber reinforced polymer (FRP) reinforcements for concrete structures are gaining wide acceptance as a suitable alternative to steel reinforcements. The primary advantage is that they do not suffer corrosion and hence they promise to be more durable in environments where steel reinforced concrete has a limited life span. Concrete wharves and jetties are examples of structures subjected to such harsh environments and represent the general class of marine infrastructure in which glass FRP (GFRP) reinforcement should be used for improved durability and service life. General design considerations which make glass FRP suitable for use in marine concrete rehabilitation projects are discussed. A case study of recent wharf rehabilitation project in Canada is used to reinforce these considerations. The structure consisted of a GFRP reinforced concrete deck panel and steel - GFRP hybrid reinforced concrete pile cap. A design methodology is developed for the hybrid reinforcement design and verified through testing. The results of a field monitoring program are used to establish the satisfactory field performance of the GFRP reinforcement. The design concepts presented in the paper are applicable to many concrete marine components and other structures where steel reinforcement corrosion is a problem. (author)

  6. Applying a Framework for Student Modeling in Exploratory Learning Environments: Comparing Data Representation Granularity to Handle Environment Complexity

    Science.gov (United States)

    Fratamico, Lauren; Conati, Cristina; Kardan, Samad; Roll, Ido

    2017-01-01

    Interactive simulations can facilitate inquiry learning. However, similarly to other Exploratory Learning Environments, students may not always learn effectively in these unstructured environments. Thus, providing adaptive support has great potential to help improve student learning with these rich activities. Providing adaptive support requires a…

  7. Postgraduate trainees' perceptions of the learning environment in a ...

    African Journals Online (AJOL)

    Increased performance in both areas requires routine assessment of the learning environment to identify components that need attention. Objective. To evaluate the perception of junior doctors undergoing specialist training regarding the learning environment in a teaching hospital. Methods. This was a single-centre, ...

  8. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis.

    Science.gov (United States)

    Glimcher, Paul W

    2011-09-13

    A number of recent advances have been achieved in the study of midbrain dopaminergic neurons. Understanding these advances and how they relate to one another requires a deep understanding of the computational models that serve as an explanatory framework and guide ongoing experimental inquiry. This intertwining of theory and experiment now suggests very clearly that the phasic activity of the midbrain dopamine neurons provides a global mechanism for synaptic modification. These synaptic modifications, in turn, provide the mechanistic underpinning for a specific class of reinforcement learning mechanisms that now seem to underlie much of human and animal behavior. This review describes both the critical empirical findings that are at the root of this conclusion and the fantastic theoretical advances from which this conclusion is drawn.

  9. Comparative learning theory and its application in the training of horses.

    Science.gov (United States)

    Cooper, J J

    1998-11-01

    Training can best be explained as a process that occurs through stimulus-response-reinforcement chains, whereby animals are conditioned to associate cues in their environment, with specific behavioural responses and their rewarding consequences. Research into learning in horses has concentrated on their powers of discrimination and on primary positive reinforcement schedules, where the correct response is paired with a desirable consequence such as food. In contrast, a number of other learning processes that are used in training have been widely studied in other species, but have received little scientific investigation in the horse. These include: negative reinforcement, where performance of the correct response is followed by removal of, or decrease in, intensity of a unpleasant stimulus; punishment, where an incorrect response is paired with an undesirable consequence, but without consistent prior warning; secondary conditioning, where a natural primary reinforcer such as food is closely associated with an arbitrary secondary reinforcer such as vocal praise; and variable or partial conditioning, where once the correct response has been learnt, reinforcement is presented according to an intermittent schedule to increase resistance to extinction outside of training.

  10. ADILE: Architecture of a database-supported learning environment

    NARCIS (Netherlands)

    Hiddink, G.W.

    2001-01-01

    This article proposes an architecture for distributed learning environments that use databases to store learning material. As the layout of learning material can inhibit reuse, the ar-chitecture implements the notion of "separation of layout and structure" using XML technology. Also, the

  11. Learning from data for aquatic and geothenical environments

    NARCIS (Netherlands)

    Bhattacharya, B.

    2005-01-01

    The book presents machine learning as an approach to build models that learn from data, and that can be used to complement the existing modelling practice in aquatic and geotechnical environments. It provides concepts of learning from data, and identifies segmentation (clustering), classification,

  12. Burnout and the learning environment of anaesthetic trainees.

    Science.gov (United States)

    Castanelli, D J; Wickramaarachchi, S A; Wallis, S

    2017-11-01

    Burnout has a high prevalence among healthcare workers and is increasingly recognised as an environmental problem rather than reflecting a personal inability to cope with work stress. We distributed an electronic survey, which included the Maslach Burnout Inventory Health Services Survey and a previously validated learning environment instrument, to 281 Victorian anaesthetic trainees. The response rate was 50%. We found significantly raised rates of burnout in two of three subscales. Ninety-one respondents (67%) displayed evidence of burnout in at least one domain, with 67 (49%) reporting high emotional exhaustion and 57 (42%) reporting high depersonalisation. The clinical learning environment tool demonstrated a significant negative correlation with burnout (r=-0.56, P Burnout was significantly more common than when previously measured in Victoria in 2008 (62% versus 38%). Trainees rated examination preparation the most stressful aspect of the training program. There is a high prevalence of burnout among Victorian anaesthetic trainees. We have shown a significant correlation exists between the clinical learning environment measure and the presence of burnout. This correlation supports the development of interventions to improve the clinical learning environment, as a means to improve trainee wellbeing and address the high prevalence of burnout.

  13. Pedunculopontine tegmental nucleus lesions impair stimulus--reward learning in autoshaping and conditioned reinforcement paradigms.

    Science.gov (United States)

    Inglis, W L; Olmstead, M C; Robbins, T W

    2000-04-01

    The role of the pedunculopontine tegmental nucleus (PPTg) in stimulus-reward learning was assessed by testing the effects of PPTg lesions on performance in visual autoshaping and conditioned reinforcement (CRf) paradigms. Rats with PPTg lesions were unable to learn an association between a conditioned stimulus (CS) and a primary reward in either paradigm. In the autoshaping experiment, PPTg-lesioned rats approached the CS+ and CS- with equal frequency, and the latencies to respond to the two stimuli did not differ. PPTg lesions also disrupted discriminated approaches to an appetitive CS in the CRf paradigm and completely abolished the acquisition of responding with CRf. These data are discussed in the context of a possible cognitive function of the PPTg, particularly in terms of lesion-induced disruptions of attentional processes that are mediated by the thalamus.

  14. Corrosion of reinforcement induced by environment containing ...

    Indian Academy of Sciences (India)

    ... the action of chloride solutions may intensify the process of corrosion of steel reinforcement in comparison to the converse sequence of the action of mentioned media. At the same time the natrium chloride solution has been shown as a more aggressive medium opposite to the calcium and magnesium chloride solutions.

  15. Virtual Learning Environment for Interactive Engagement with Advanced Quantum Mechanics

    Science.gov (United States)

    Pedersen, Mads Kock; Skyum, Birk; Heck, Robert; Müller, Romain; Bason, Mark; Lieberoth, Andreas; Sherson, Jacob F.

    2016-06-01

    A virtual learning environment can engage university students in the learning process in ways that the traditional lectures and lab formats cannot. We present our virtual learning environment StudentResearcher, which incorporates simulations, multiple-choice quizzes, video lectures, and gamification into a learning path for quantum mechanics at the advanced university level. StudentResearcher is built upon the experiences gathered from workshops with the citizen science game Quantum Moves at the high-school and university level, where the games were used extensively to illustrate the basic concepts of quantum mechanics. The first test of this new virtual learning environment was a 2014 course in advanced quantum mechanics at Aarhus University with 47 enrolled students. We found increased learning for the students who were more active on the platform independent of their previous performances.

  16. Reinforcement Learning for Predictive Analytics in Smart Cities

    Directory of Open Access Journals (Sweden)

    Kostas Kolomvatsos

    2017-06-01

    Full Text Available The digitization of our lives cause a shift in the data production as well as in the required data management. Numerous nodes are capable of producing huge volumes of data in our everyday activities. Sensors, personal smart devices as well as the Internet of Things (IoT paradigm lead to a vast infrastructure that covers all the aspects of activities in modern societies. In the most of the cases, the critical issue for public authorities (usually, local, like municipalities is the efficient management of data towards the support of novel services. The reason is that analytics provided on top of the collected data could help in the delivery of new applications that will facilitate citizens’ lives. However, the provision of analytics demands intelligent techniques for the underlying data management. The most known technique is the separation of huge volumes of data into a number of parts and their parallel management to limit the required time for the delivery of analytics. Afterwards, analytics requests in the form of queries could be realized and derive the necessary knowledge for supporting intelligent applications. In this paper, we define the concept of a Query Controller ( Q C that receives queries for analytics and assigns each of them to a processor placed in front of each data partition. We discuss an intelligent process for query assignments that adopts Machine Learning (ML. We adopt two learning schemes, i.e., Reinforcement Learning (RL and clustering. We report on the comparison of the two schemes and elaborate on their combination. Our aim is to provide an efficient framework to support the decision making of the QC that should swiftly select the appropriate processor for each query. We provide mathematical formulations for the discussed problem and present simulation results. Through a comprehensive experimental evaluation, we reveal the advantages of the proposed models and describe the outcomes results while comparing them with a

  17. The Effects of Different Learning Environments on Students' Motivation for Learning and Their Achievement

    Science.gov (United States)

    Baeten, Marlies; Dochy, Filip; Struyven, Katrien

    2013-01-01

    Background: Research in higher education on the effects of student-centred versus lecture-based learning environments generally does not take into account the psychological need support provided in these learning environments. From a self-determination theory perspective, need support is important to study because it has been associated with…

  18. NNETS - NEURAL NETWORK ENVIRONMENT ON A TRANSPUTER SYSTEM

    Science.gov (United States)

    Villarreal, J.

    1994-01-01

    The primary purpose of NNETS (Neural Network Environment on a Transputer System) is to provide users a high degree of flexibility in creating and manipulating a wide variety of neural network topologies at processing speeds not found in conventional computing environments. To accomplish this purpose, NNETS supports back propagation and back propagation related algorithms. The back propagation algorithm used is an implementation of Rumelhart's Generalized Delta Rule. NNETS was developed on the INMOS Transputer. NNETS predefines a Back Propagation Network, a Jordan Network, and a Reinforcement Network to assist users in learning and defining their own networks. The program also allows users to configure other neural network paradigms from the NNETS basic architecture. The Jordan network is basically a feed forward network that has the outputs connected to a pseudo input layer. The state of the network is dependent on the inputs from the environment plus the state of the network. The Reinforcement network learns via a scalar feedback signal called reinforcement. The network propagates forward randomly. The environment looks at the outputs of the network to produce a reinforcement signal that is fed back to the network. NNETS was written for the INMOS C compiler D711B version 1.3 or later (MS-DOS version). A small portion of the software was written in the OCCAM language to perform the communications routing between processors. NNETS is configured to operate on a 4 X 10 array of Transputers in sequence with a Transputer based graphics processor controlled by a master IBM PC 286 (or better) Transputer. A RGB monitor is required which must be capable of 512 X 512 resolution. It must be able to receive red, green, and blue signals via BNC connectors. NNETS is meant for experienced Transputer users only. The program is distributed on 5.25 inch 1.2Mb MS-DOS format diskettes. NNETS was developed in 1991. Transputer and OCCAM are registered trademarks of Inmos Corporation. MS

  19. Theoretical framework on selected core issues on conditions for productive learning in networked learning environments

    DEFF Research Database (Denmark)

    Dirckinck-Holmfeld, Lone; Svendsen, Brian Møller; Ponti, Marisa

    The report documents and summarises the elements and dimensions that have been identified to describe and analyse the case studies collected in the Kaleidoscope Jointly Executed Integrating Research Project (JEIRP) on Conditions for productive learning in network learning environments.......The report documents and summarises the elements and dimensions that have been identified to describe and analyse the case studies collected in the Kaleidoscope Jointly Executed Integrating Research Project (JEIRP) on Conditions for productive learning in network learning environments....

  20. Long term effects of aversive reinforcement on colour discrimination learning in free-flying bumblebees.

    Directory of Open Access Journals (Sweden)

    Miguel A Rodríguez-Gironés

    Full Text Available The results of behavioural experiments provide important information about the structure and information-processing abilities of the visual system. Nevertheless, if we want to infer from behavioural data how the visual system operates, it is important to know how different learning protocols affect performance and to devise protocols that minimise noise in the response of experimental subjects. The purpose of this work was to investigate how reinforcement schedule and individual variability affect the learning process in a colour discrimination task. Free-flying bumblebees were trained to discriminate between two perceptually similar colours. The target colour was associated with sucrose solution, and the distractor could be associated with water or quinine solution throughout the experiment, or with one substance during the first half of the experiment and the other during the second half. Both acquisition and final performance of the discrimination task (measured as proportion of correct choices were determined by the choice of reinforcer during the first half of the experiment: regardless of whether bees were trained with water or quinine during the second half of the experiment, bees trained with quinine during the first half learned the task faster and performed better during the whole experiment. Our results confirm that the choice of stimuli used during training affects the rate at which colour discrimination tasks are acquired and show that early contact with a strongly aversive stimulus can be sufficient to maintain high levels of attention during several hours. On the other hand, bees which took more time to decide on which flower to alight were more likely to make correct choices than bees which made fast decisions. This result supports the existence of a trade-off between foraging speed and accuracy, and highlights the importance of measuring choice latencies during behavioural experiments focusing on cognitive abilities.