WorldWideScience

Sample records for real-time reinforcement learning

  1. Experiments with Online Reinforcement Learning in Real-Time Strategy Games

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-time strategy (RTS) games provide a challenging platform to implement online reinforcement learning (RL) techniques in a real application. Computer, as one game player, monitors opponents' (human or other computers) strategies and then updates its own policy using RL methods. In this article......, we first examine the suitability of applying the online RL in various computer games. Reinforcement learning application depends on both RL complexity and the game features. We then propose a multi-layer framework for implementing online RL in an RTS game. The framework significantly reduces RL...... the effectiveness of our proposed framework and shed light on relevant issues in using online RL in RTS games....

  2. Implementation of real-time energy management strategy based on reinforcement learning for hybrid electric vehicles and simulation validation.

    Science.gov (United States)

    Kong, Zehui; Zou, Yuan; Liu, Teng

    2017-01-01

    To further improve the fuel economy of series hybrid electric tracked vehicles, a reinforcement learning (RL)-based real-time energy management strategy is developed in this paper. In order to utilize the statistical characteristics of online driving schedule effectively, a recursive algorithm for the transition probability matrix (TPM) of power-request is derived. The reinforcement learning (RL) is applied to calculate and update the control policy at regular time, adapting to the varying driving conditions. A facing-forward powertrain model is built in detail, including the engine-generator model, battery model and vehicle dynamical model. The robustness and adaptability of real-time energy management strategy are validated through the comparison with the stationary control strategy based on initial transition probability matrix (TPM) generated from a long naturalistic driving cycle in the simulation. Results indicate that proposed method has better fuel economy than stationary one and is more effective in real-time control.

  3. Implementation of real-time energy management strategy based on reinforcement learning for hybrid electric vehicles and simulation validation.

    Directory of Open Access Journals (Sweden)

    Zehui Kong

    Full Text Available To further improve the fuel economy of series hybrid electric tracked vehicles, a reinforcement learning (RL-based real-time energy management strategy is developed in this paper. In order to utilize the statistical characteristics of online driving schedule effectively, a recursive algorithm for the transition probability matrix (TPM of power-request is derived. The reinforcement learning (RL is applied to calculate and update the control policy at regular time, adapting to the varying driving conditions. A facing-forward powertrain model is built in detail, including the engine-generator model, battery model and vehicle dynamical model. The robustness and adaptability of real-time energy management strategy are validated through the comparison with the stationary control strategy based on initial transition probability matrix (TPM generated from a long naturalistic driving cycle in the simulation. Results indicate that proposed method has better fuel economy than stationary one and is more effective in real-time control.

  4. Reinforcement learning in supply chains.

    Science.gov (United States)

    Valluri, Annapurna; North, Michael J; Macal, Charles M

    2009-10-01

    Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.

  5. Autonomous reinforcement learning with experience replay.

    Science.gov (United States)

    Wawrzyński, Paweł; Tanwani, Ajay Kumar

    2013-05-01

    This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. Copyright © 2012 Elsevier Ltd. All rights reserved.

  6. TEXPLORE temporal difference reinforcement learning for robots and time-constrained domains

    CERN Document Server

    Hester, Todd

    2013-01-01

    This book presents and develops new reinforcement learning methods that enable fast and robust learning on robots in real-time. Robots have the potential to solve many problems in society, because of their ability to work in dangerous places doing necessary jobs that no one wants or is able to do. One barrier to their widespread deployment is that they are mainly limited to tasks where it is possible to hand-program behaviors for every situation that may be encountered. For robots to meet their potential, they need methods that enable them to learn and adapt to novel situations that they were not programmed for. Reinforcement learning (RL) is a paradigm for learning sequential decision making processes and could solve the problems of learning and adaptation on robots. This book identifies four key challenges that must be addressed for an RL algorithm to be practical for robotic control tasks. These RL for Robotics Challenges are: 1) it must learn in very few samples; 2) it must learn in domains with continuou...

  7. Time representation in reinforcement learning models of the basal ganglia

    Directory of Open Access Journals (Sweden)

    Samuel Joseph Gershman

    2014-01-01

    Full Text Available Reinforcement learning models have been influential in understanding many aspects of basal ganglia function, from reward prediction to action selection. Time plays an important role in these models, but there is still no theoretical consensus about what kind of time representation is used by the basal ganglia. We review several theoretical accounts and their supporting evidence. We then discuss the relationship between reinforcement learning models and the timing mechanisms that have been attributed to the basal ganglia. We hypothesize that a single computational system may underlie both reinforcement learning and interval timing—the perception of duration in the range of seconds to hours. This hypothesis, which extends earlier models by incorporating a time-sensitive action selection mechanism, may have important implications for understanding disorders like Parkinson's disease in which both decision making and timing are impaired.

  8. Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

    Directory of Open Access Journals (Sweden)

    Nicolas Frémaux

    2013-04-01

    Full Text Available Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD learning of Doya (2000 to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

  9. Learning to trade via direct reinforcement.

    Science.gov (United States)

    Moody, J; Saffell, M

    2001-01-01

    We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision-making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills.

  10. An algorithm for learning real-time automata

    NARCIS (Netherlands)

    Verwer, S.E.; De Weerdt, M.M.; Witteveen, C.

    2007-01-01

    We describe an algorithm for learning simple timed automata, known as real-time automata. The transitions of real-time automata can have a temporal constraint on the time of occurrence of the current symbol relative to the previous symbol. The learning algorithm is similar to the redblue fringe

  11. Intrinsically motivated reinforcement learning for human-robot interaction in the real-world.

    Science.gov (United States)

    Qureshi, Ahmed Hussain; Nakamura, Yutaka; Yoshikawa, Yuichiro; Ishiguro, Hiroshi

    2018-03-26

    For a natural social human-robot interaction, it is essential for a robot to learn the human-like social skills. However, learning such skills is notoriously hard due to the limited availability of direct instructions from people to teach a robot. In this paper, we propose an intrinsically motivated reinforcement learning framework in which an agent gets the intrinsic motivation-based rewards through the action-conditional predictive model. By using the proposed method, the robot learned the social skills from the human-robot interaction experiences gathered in the real uncontrolled environments. The results indicate that the robot not only acquired human-like social skills but also took more human-like decisions, on a test dataset, than a robot which received direct rewards for the task achievement. Copyright © 2018 Elsevier Ltd. All rights reserved.

  12. Algorithms for Reinforcement Learning

    CERN Document Server

    Szepesvari, Csaba

    2010-01-01

    Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms'

  13. Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.

    Science.gov (United States)

    Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna

    2016-09-01

    Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing.

  14. Reinforcement learning in continuous state and action spaces

    NARCIS (Netherlands)

    H. P. van Hasselt (Hado); M.A. Wiering; M. van Otterlo

    2012-01-01

    textabstractMany traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action

  15. Off-Policy Reinforcement Learning: Optimal Operational Control for Two-Time-Scale Industrial Processes.

    Science.gov (United States)

    Li, Jinna; Kiumarsi, Bahare; Chai, Tianyou; Lewis, Frank L; Fan, Jialu

    2017-12-01

    Industrial flow lines are composed of unit processes operating on a fast time scale and performance measurements known as operational indices measured at a slower time scale. This paper presents a model-free optimal solution to a class of two time-scale industrial processes using off-policy reinforcement learning (RL). First, the lower-layer unit process control loop with a fast sampling period and the upper-layer operational index dynamics at a slow time scale are modeled. Second, a general optimal operational control problem is formulated to optimally prescribe the set-points for the unit industrial process. Then, a zero-sum game off-policy RL algorithm is developed to find the optimal set-points by using data measured in real-time. Finally, a simulation experiment is employed for an industrial flotation process to show the effectiveness of the proposed method.

  16. Reinforcement learning improves behaviour from evaluative feedback

    Science.gov (United States)

    Littman, Michael L.

    2015-05-01

    Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.

  17. Reinforcement Learning Based Artificial Immune Classifier

    Directory of Open Access Journals (Sweden)

    Mehmet Karakose

    2013-01-01

    Full Text Available One of the widely used methods for classification that is a decision-making process is artificial immune systems. Artificial immune systems based on natural immunity system can be successfully applied for classification, optimization, recognition, and learning in real-world problems. In this study, a reinforcement learning based artificial immune classifier is proposed as a new approach. This approach uses reinforcement learning to find better antibody with immune operators. The proposed new approach has many contributions according to other methods in the literature such as effectiveness, less memory cell, high accuracy, speed, and data adaptability. The performance of the proposed approach is demonstrated by simulation and experimental results using real data in Matlab and FPGA. Some benchmark data and remote image data are used for experimental results. The comparative results with supervised/unsupervised based artificial immune system, negative selection classifier, and resource limited artificial immune classifier are given to demonstrate the effectiveness of the proposed new method.

  18. Intrinsic interactive reinforcement learning - Using error-related potentials for real world human-robot interaction.

    Science.gov (United States)

    Kim, Su Kyoung; Kirchner, Elsa Andrea; Stefes, Arne; Kirchner, Frank

    2017-12-14

    Reinforcement learning (RL) enables robots to learn its optimal behavioral strategy in dynamic environments based on feedback. Explicit human feedback during robot RL is advantageous, since an explicit reward function can be easily adapted. However, it is very demanding and tiresome for a human to continuously and explicitly generate feedback. Therefore, the development of implicit approaches is of high relevance. In this paper, we used an error-related potential (ErrP), an event-related activity in the human electroencephalogram (EEG), as an intrinsically generated implicit feedback (rewards) for RL. Initially we validated our approach with seven subjects in a simulated robot learning scenario. ErrPs were detected online in single trial with a balanced accuracy (bACC) of 91%, which was sufficient to learn to recognize gestures and the correct mapping between human gestures and robot actions in parallel. Finally, we validated our approach in a real robot scenario, in which seven subjects freely chose gestures and the real robot correctly learned the mapping between gestures and actions (ErrP detection (90% bACC)). In this paper, we demonstrated that intrinsically generated EEG-based human feedback in RL can successfully be used to implicitly improve gesture-based robot control during human-robot interaction. We call our approach intrinsic interactive RL.

  19. Tank War Using Online Reinforcement Learning

    DEFF Research Database (Denmark)

    Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

    2009-01-01

    Real-Time Strategy(RTS) games provide a challenging platform to implement online reinforcement learning(RL) techniques in a real application. Computer as one player monitors opponents'(human or other computers) strategies and then updates its own policy using RL methods. In this paper, we propose...... a multi-layer framework for implementing the online RL in a RTS game. The framework significantly reduces the RL computational complexity by decomposing the state space in a hierarchical manner. We implement the RTS game - Tank General, and perform a thorough test on the proposed framework. The results...... show the effectiveness of our proposed framework and shed light on relevant issues on using the RL in RTS games....

  20. GA-based fuzzy reinforcement learning for control of a magnetic bearing system.

    Science.gov (United States)

    Lin, C T; Jou, C P

    2000-01-01

    This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.

  1. A Lecture Supporting System Based on Real-Time Learning Analytics

    Science.gov (United States)

    Shimada, Atsushi; Konomi, Shin'ichi

    2017-01-01

    A new lecture supporting system based on real-time learning analytics is proposed. Our target is on-site classrooms where teachers give their lectures, and a lot of students listen to teachers' explanation, conduct exercises etc. We utilize not only an e-Learning system, but also an e-Book system to collect real-time learning activities during the…

  2. Nuclear power plant monitoring using real-time learning neural network

    International Nuclear Information System (INIS)

    Nabeshima, Kunihiko; Tuerkcan, E.; Ciftcioglu, O.

    1994-01-01

    In the present research, artificial neural network (ANN) with real-time adaptive learning is developed for the plant wide monitoring of Borssele Nuclear Power Plant (NPP). Adaptive ANN learning capability is integrated to the monitoring system so that robust and sensitive on-line monitoring is achieved in real-time environment. The major advantages provided by ANN are that system modelling is formed by means of measurement information obtained from a multi-output process system, explicit modelling is not required and the modelling is not restricted to linear systems. Also ANN can respond very fast to anomalous operational conditions. The real-time ANN learning methodology with adaptive real-time monitoring capability is described below for the wide-range and plant-wide data from an operating nuclear power plant. The layered neural network with error backpropagation algorithm for learning has three layers. The network type is auto-associative, inputs and outputs are exactly the same, using 12 plant signals. (author)

  3. Human-level control through deep reinforcement learning

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-01

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  4. Human-level control through deep reinforcement learning.

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-26

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  5. Functional Contour-following via Haptic Perception and Reinforcement Learning.

    Science.gov (United States)

    Hellman, Randall B; Tekin, Cem; van der Schaar, Mihaela; Santos, Veronica J

    2018-01-01

    Many tasks involve the fine manipulation of objects despite limited visual feedback. In such scenarios, tactile and proprioceptive feedback can be leveraged for task completion. We present an approach for real-time haptic perception and decision-making for a haptics-driven, functional contour-following task: the closure of a ziplock bag. This task is challenging for robots because the bag is deformable, transparent, and visually occluded by artificial fingertip sensors that are also compliant. A deep neural net classifier was trained to estimate the state of a zipper within a robot's pinch grasp. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards by balancing exploration versus exploitation of the state-action space. The C-MAB learner outperformed a benchmark Q-learner by more efficiently exploring the state-action space while learning a hard-to-code task. The learned C-MAB policy was tested with novel ziplock bag scenarios and contours (wire, rope). Importantly, this work contributes to the development of reinforcement learning approaches that account for limited resources such as hardware life and researcher time. As robots are used to perform complex, physically interactive tasks in unstructured or unmodeled environments, it becomes important to develop methods that enable efficient and effective learning with physical testbeds.

  6. Real-time individualized training vectors for experiential learning.

    Energy Technology Data Exchange (ETDEWEB)

    Willis, Matt; Tucker, Eilish Marie; Raybourn, Elaine Marie; Glickman, Matthew R.; Fabian, Nathan

    2011-01-01

    Military training utilizing serious games or virtual worlds potentially generate data that can be mined to better understand how trainees learn in experiential exercises. Few data mining approaches for deployed military training games exist. Opportunities exist to collect and analyze these data, as well as to construct a full-history learner model. Outcomes discussed in the present document include results from a quasi-experimental research study on military game-based experiential learning, the deployment of an online game for training evidence collection, and results from a proof-of-concept pilot study on the development of individualized training vectors. This Lab Directed Research & Development (LDRD) project leveraged products within projects, such as Titan (Network Grand Challenge), Real-Time Feedback and Evaluation System, (America's Army Adaptive Thinking and Leadership, DARWARS Ambush! NK), and Dynamic Bayesian Networks to investigate whether machine learning capabilities could perform real-time, in-game similarity vectors of learner performance, toward adaptation of content delivery, and quantitative measurement of experiential learning.

  7. Reinforcement learning for optimal control of low exergy buildings

    International Nuclear Information System (INIS)

    Yang, Lei; Nagy, Zoltan; Goffin, Philippe; Schlueter, Arno

    2015-01-01

    Highlights: • Implementation of reinforcement learning control for LowEx Building systems. • Learning allows adaptation to local environment without prior knowledge. • Presentation of reinforcement learning control for real-life applications. • Discussion of the applicability for real-life situations. - Abstract: Over a third of the anthropogenic greenhouse gas (GHG) emissions stem from cooling and heating buildings, due to their fossil fuel based operation. Low exergy building systems are a promising approach to reduce energy consumption as well as GHG emissions. They consists of renewable energy technologies, such as PV, PV/T and heat pumps. Since careful tuning of parameters is required, a manual setup may result in sub-optimal operation. A model predictive control approach is unnecessarily complex due to the required model identification. Therefore, in this work we present a reinforcement learning control (RLC) approach. The studied building consists of a PV/T array for solar heat and electricity generation, as well as geothermal heat pumps. We present RLC for the PV/T array, and the full building model. Two methods, Tabular Q-learning and Batch Q-learning with Memory Replay, are implemented with real building settings and actual weather conditions in a Matlab/Simulink framework. The performance is evaluated against standard rule-based control (RBC). We investigated different neural network structures and find that some outperformed RBC already during the learning phase. Overall, every RLC strategy for PV/T outperformed RBC by over 10% after the third year. Likewise, for the full building, RLC outperforms RBC in terms of meeting the heating demand, maintaining the optimal operation temperature and compensating more effectively for ground heat. This allows to reduce engineering costs associated with the setup of these systems, as well as decrease the return-of-invest period, both of which are necessary to create a sustainable, zero-emission building

  8. Real-time Color Codes for Assessing Learning Process

    OpenAIRE

    Dzelzkalēja, L; Kapenieks, J

    2016-01-01

    Effective assessment is an important way for improving the learning process. There are existing guidelines for assessing the learning process, but they lack holistic digital knowledge society considerations. In this paper the authors propose a method for real-time evaluation of students’ learning process and, consequently, for quality evaluation of teaching materials both in the classroom and in the distance learning environment. The main idea of the proposed Color code method (CCM) is to use...

  9. "Notice of Violation of IEEE Publication Principles" Multiobjective Reinforcement Learning: A Comprehensive Overview.

    Science.gov (United States)

    Liu, Chunming; Xu, Xin; Hu, Dewen

    2013-04-29

    Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.

  10. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

    OpenAIRE

    He, Frank S.; Liu, Yang; Schwing, Alexander G.; Peng, Jian

    2016-01-01

    We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and...

  11. Curiosity driven reinforcement learning for motion planning on humanoids

    Science.gov (United States)

    Frank, Mikhail; Leitner, Jürgen; Stollenga, Marijn; Förster, Alexander; Schmidhuber, Jürgen

    2014-01-01

    Most previous work on artificial curiosity (AC) and intrinsic motivation focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study AC in a more realistic setting, we embody a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning (RL) framework consists of a state-of-the-art, low-level, reactive control layer, which controls the iCub while respecting constraints, and a high-level curious agent, which explores the iCub's state-action space through information gain maximization, learning a world model from experience, controlling the actual iCub hardware in real-time. To the best of our knowledge, this is the first ever embodied, curious agent for real-time motion planning on a humanoid. We demonstrate that it can learn compact Markov models to represent large regions of the iCub's configuration space, and that the iCub explores intelligently, showing interest in its physical constraints as well as in objects it finds in its environment. PMID:24432001

  12. The Reinforcement Learning Competition 2014

    OpenAIRE

    Dimitrakakis, Christos; Li, Guangliang; Tziortziotis, Nikoalos

    2014-01-01

    Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the development of very general learning agents for arbitrary reinforcement learning problems and to provide a test-bed for the unbiased evaluation of algorithms.

  13. Towards Real-Time Speech Emotion Recognition for Affective E-Learning

    Science.gov (United States)

    Bahreini, Kiavash; Nadolski, Rob; Westera, Wim

    2016-01-01

    This paper presents the voice emotion recognition part of the FILTWAM framework for real-time emotion recognition in affective e-learning settings. FILTWAM (Framework for Improving Learning Through Webcams And Microphones) intends to offer timely and appropriate online feedback based upon learner's vocal intonations and facial expressions in order…

  14. Multiobjective Reinforcement Learning for Traffic Signal Control Using Vehicular Ad Hoc Network

    Directory of Open Access Journals (Sweden)

    Houli Duan

    2010-01-01

    Full Text Available We propose a new multiobjective control algorithm based on reinforcement learning for urban traffic signal control, named multi-RL. A multiagent structure is used to describe the traffic system. A vehicular ad hoc network is used for the data exchange among agents. A reinforcement learning algorithm is applied to predict the overall value of the optimization objective given vehicles' states. The policy which minimizes the cumulative value of the optimization objective is regarded as the optimal one. In order to make the method adaptive to various traffic conditions, we also introduce a multiobjective control scheme in which the optimization objective is selected adaptively to real-time traffic states. The optimization objectives include the vehicle stops, the average waiting time, and the maximum queue length of the next intersection. In addition, we also accommodate a priority control to the buses and the emergency vehicles through our model. The simulation results indicated that our algorithm could perform more efficiently than traditional traffic light control methods.

  15. Online gaming for learning optimal team strategies in real time

    Science.gov (United States)

    Hudas, Gregory; Lewis, F. L.; Vamvoudakis, K. G.

    2010-04-01

    This paper first presents an overall view for dynamical decision-making in teams, both cooperative and competitive. Strategies for team decision problems, including optimal control, zero-sum 2-player games (H-infinity control) and so on are normally solved for off-line by solving associated matrix equations such as the Riccati equation. However, using that approach, players cannot change their objectives online in real time without calling for a completely new off-line solution for the new strategies. Therefore, in this paper we give a method for learning optimal team strategies online in real time as team dynamical play unfolds. In the linear quadratic regulator case, for instance, the method learns the Riccati equation solution online without ever solving the Riccati equation. This allows for truly dynamical team decisions where objective functions can change in real time and the system dynamics can be time-varying.

  16. Reinforcement function design and bias for efficient learning in mobile robots

    International Nuclear Information System (INIS)

    Touzet, C.; Santos, J.M.

    1998-01-01

    The main paradigm in sub-symbolic learning robot domain is the reinforcement learning method. Various techniques have been developed to deal with the memorization/generalization problem, demonstrating the superior ability of artificial neural network implementations. In this paper, the authors address the issue of designing the reinforcement so as to optimize the exploration part of the learning. They also present and summarize works relative to the use of bias intended to achieve the effective synthesis of the desired behavior. Demonstrative experiments involving a self-organizing map implementation of the Q-learning and real mobile robots (Nomad 200 and Khepera) in a task of obstacle avoidance behavior synthesis are described. 3 figs., 5 tabs

  17. Value learning through reinforcement : The basics of dopamine and reinforcement learning

    NARCIS (Netherlands)

    Daw, N.D.; Tobler, P.N.; Glimcher, P.W.; Fehr, E.

    2013-01-01

    This chapter provides an overview of reinforcement learning and temporal difference learning and relates these topics to the firing properties of midbrain dopamine neurons. First, we review the RescorlaWagner learning rule and basic learning phenomena, such as blocking, which the rule explains. Then

  18. Reinforcement Learning State-of-the-Art

    CERN Document Server

    Wiering, Marco

    2012-01-01

    Reinforcement learning encompasses both a science of adaptive behavior of rational beings in uncertain environments and a computational methodology for finding optimal behaviors for challenging problems in control, optimization and adaptive behavior of intelligent agents. As a field, reinforcement learning has progressed tremendously in the past decade. The main goal of this book is to present an up-to-date series of survey articles on the main contemporary sub-fields of reinforcement learning. This includes surveys on partially observable environments, hierarchical task decompositions, relational knowledge representation and predictive state representations. Furthermore, topics such as transfer, evolutionary methods and continuous spaces in reinforcement learning are surveyed. In addition, several chapters review reinforcement learning methods in robotics, in games, and in computational neuroscience. In total seventeen different subfields are presented by mostly young experts in those areas, and together the...

  19. Deep Reinforcement Learning: An Overview

    OpenAIRE

    Li, Yuxi

    2017-01-01

    We give an overview of recent exciting achievements of deep reinforcement learning (RL). We discuss six core elements, six important mechanisms, and twelve applications. We start with background of machine learning, deep learning and reinforcement learning. Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. After that, we discuss important mechanisms for RL, including attention and memory, unsuperv...

  20. Effect of reinforcement learning on coordination of multiangent systems

    Science.gov (United States)

    Bukkapatnam, Satish T. S.; Gao, Greg

    2000-12-01

    For effective coordination of distributed environments involving multiagent systems, learning ability of each agent in the environment plays a crucial role. In this paper, we develop a simple group learning method based on reinforcement, and study its effect on coordination through application to a supply chain procurement scenario involving a computer manufacturer. Here, all parties are represented by self-interested, autonomous agents, each capable of performing specific simple tasks. They negotiate with each other to perform complex tasks and thus coordinate supply chain procurement. Reinforcement learning is intended to enable each agent to reach a best negotiable price within a shortest possible time. Our simulations of the application scenario under different learning strategies reveals the positive effects of reinforcement learning on an agent's as well as the system's performance.

  1. Rational and Mechanistic Perspectives on Reinforcement Learning

    Science.gov (United States)

    Chater, Nick

    2009-01-01

    This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…

  2. Project Management in Real Time: A Service-Learning Project

    Science.gov (United States)

    Larson, Erik; Drexler, John A., Jr.

    2010-01-01

    This article describes a service-learning assignment for a project management course. It is designed to facilitate hands-on student learning of both the technical and the interpersonal aspects of project management, and it involves student engagement with real customers and real stakeholders in the creation of real events with real outcomes. As…

  3. Real time eye tracking using Kalman extended spatio-temporal context learning

    Science.gov (United States)

    Munir, Farzeen; Minhas, Fayyaz ul Amir Asfar; Jalil, Abdul; Jeon, Moongu

    2017-06-01

    Real time eye tracking has numerous applications in human computer interaction such as a mouse cursor control in a computer system. It is useful for persons with muscular or motion impairments. However, tracking the movement of the eye is complicated by occlusion due to blinking, head movement, screen glare, rapid eye movements, etc. In this work, we present the algorithmic and construction details of a real time eye tracking system. Our proposed system is an extension of Spatio-Temporal context learning through Kalman Filtering. Spatio-Temporal Context Learning offers state of the art accuracy in general object tracking but its performance suffers due to object occlusion. Addition of the Kalman filter allows the proposed method to model the dynamics of the motion of the eye and provide robust eye tracking in cases of occlusion. We demonstrate the effectiveness of this tracking technique by controlling the computer cursor in real time by eye movements.

  4. Reinforcement learning in computer vision

    Science.gov (United States)

    Bernstein, A. V.; Burnaev, E. V.

    2018-04-01

    Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.

  5. Reinforcement learning for microgrid energy management

    International Nuclear Information System (INIS)

    Kuznetsova, Elizaveta; Li, Yan-Fu; Ruiz, Carlos; Zio, Enrico; Ault, Graham; Bell, Keith

    2013-01-01

    We consider a microgrid for energy distribution, with a local consumer, a renewable generator (wind turbine) and a storage facility (battery), connected to the external grid via a transformer. We propose a 2 steps-ahead reinforcement learning algorithm to plan the battery scheduling, which plays a key role in the achievement of the consumer goals. The underlying framework is one of multi-criteria decision-making by an individual consumer who has the goals of increasing the utilization rate of the battery during high electricity demand (so as to decrease the electricity purchase from the external grid) and increasing the utilization rate of the wind turbine for local use (so as to increase the consumer independence from the external grid). Predictions of available wind power feed the reinforcement learning algorithm for selecting the optimal battery scheduling actions. The embedded learning mechanism allows to enhance the consumer knowledge about the optimal actions for battery scheduling under different time-dependent environmental conditions. The developed framework gives the capability to intelligent consumers to learn the stochastic environment and make use of the experience to select optimal energy management actions. - Highlights: • A consumer exploits a 2 steps-ahead reinforcement learning for battery scheduling. • The Q-learning based mechanism is fed by the predictions of available wind power. • Wind speed state evolutions are modeled with a Markov chain model. • Optimal scheduling actions are learned through the occurrence of similar scenarios. • The consumer manifests a continuous enhance of his knowledge about optimal actions

  6. Simulation-based optimization parametric optimization techniques and reinforcement learning

    CERN Document Server

    Gosavi, Abhijit

    2003-01-01

    Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning introduces the evolving area of simulation-based optimization. The book's objective is two-fold: (1) It examines the mathematical governing principles of simulation-based optimization, thereby providing the reader with the ability to model relevant real-life problems using these techniques. (2) It outlines the computational technology underlying these methods. Taken together these two aspects demonstrate that the mathematical and computational methods discussed in this book do work. Broadly speaking, the book has two parts: (1) parametric (static) optimization and (2) control (dynamic) optimization. Some of the book's special features are: *An accessible introduction to reinforcement learning and parametric-optimization techniques. *A step-by-step description of several algorithms of simulation-based optimization. *A clear and simple introduction to the methodology of neural networks. *A gentle introduction to converg...

  7. Flexible Heuristic Dynamic Programming for Reinforcement Learning in Quadrotors

    NARCIS (Netherlands)

    Helmer, Alexander; de Visser, C.C.; van Kampen, E.

    2018-01-01

    Reinforcement learning is a paradigm for learning decision-making tasks from interaction with the environment. Function approximators solve a part of the curse of dimensionality when learning in high-dimensional state and/or action spaces. It can be a time-consuming process to learn a good policy in

  8. How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

    Science.gov (United States)

    Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C

    2014-03-01

    Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward

  9. Problem based learning: the effect of real time data on the website to student independence

    Science.gov (United States)

    Setyowidodo, I.; Pramesti, Y. S.; Handayani, A. D.

    2018-05-01

    Learning science developed as an integrative science rather than disciplinary education, the reality of the nation character development has not been able to form a more creative and independent Indonesian man. Problem Based Learning based on real time data in the website is a learning method focuses on developing high-level thinking skills in problem-oriented situations by integrating technology in learning. The essence of this study is the presentation of authentic problems in the real time data situation in the website. The purpose of this research is to develop student independence through Problem Based Learning based on real time data in website. The type of this research is development research with implementation using purposive sampling technique. Based on the study there is an increase in student self-reliance, where the students in very high category is 47% and in the high category is 53%. This learning method can be said to be effective in improving students learning independence in problem-oriented situations.

  10. The drift diffusion model as the choice rule in reinforcement learning.

    Science.gov (United States)

    Pedersen, Mads Lund; Frank, Michael J; Biele, Guido

    2017-08-01

    Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.

  11. Incorporating Real-time Earthquake Information into Large Enrollment Natural Disaster Course Learning

    Science.gov (United States)

    Furlong, K. P.; Benz, H.; Hayes, G. P.; Villasenor, A.

    2010-12-01

    Although most would agree that the occurrence of natural disaster events such as earthquakes, volcanic eruptions, and floods can provide effective learning opportunities for natural hazards-based courses, implementing compelling materials into the large-enrollment classroom environment can be difficult. These natural hazard events derive much of their learning potential from their real-time nature, and in the modern 24/7 news-cycle where all but the most devastating events are quickly out of the public eye, the shelf life for an event is quite limited. To maximize the learning potential of these events requires that both authoritative information be available and course materials be generated as the event unfolds. Although many events such as hurricanes, flooding, and volcanic eruptions provide some precursory warnings, and thus one can prepare background materials to place the main event into context, earthquakes present a particularly confounding situation of providing no warning, but where context is critical to student learning. Attempting to implement real-time materials into large enrollment classes faces the additional hindrance of limited internet access (for students) in most lecture classrooms. In Earth 101 Natural Disasters: Hollywood vs Reality, taught as a large enrollment (150+ students) general education course at Penn State, we are collaborating with the USGS’s National Earthquake Information Center (NEIC) to develop efficient means to incorporate their real-time products into learning activities in the lecture hall environment. Over time (and numerous events) we have developed a template for presenting USGS-produced real-time information in lecture mode. The event-specific materials can be quickly incorporated and updated, along with key contextual materials, to provide students with up-to-the-minute current information. In addition, we have also developed in-class activities, such as student determination of population exposure to severe ground

  12. Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.

    Science.gov (United States)

    Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

    2015-01-01

    Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.

  13. Separation of time-based and trial-based accounts of the partial reinforcement extinction effect.

    Science.gov (United States)

    Bouton, Mark E; Woods, Amanda M; Todd, Travis P

    2014-01-01

    Two appetitive conditioning experiments with rats examined time-based and trial-based accounts of the partial reinforcement extinction effect (PREE). In the PREE, the loss of responding that occurs in extinction is slower when the conditioned stimulus (CS) has been paired with a reinforcer on some of its presentations (partially reinforced) instead of every presentation (continuously reinforced). According to a time-based or "time-accumulation" view (e.g., Gallistel and Gibbon, 2000), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger amount of time has accumulated in the CS over trials. In contrast, according to a trial-based view (e.g., Capaldi, 1967), the PREE occurs because the organism has learned in partial reinforcement to expect the reinforcer after a larger number of CS presentations. Experiment 1 used a procedure that equated partially and continuously reinforced groups on their expected times to reinforcement during conditioning. A PREE was still observed. Experiment 2 then used an extinction procedure that allowed time in the CS and the number of trials to accumulate differentially through extinction. The PREE was still evident when responding was examined as a function of expected time units to the reinforcer, but was eliminated when responding was examined as a function of expected trial units to the reinforcer. There was no evidence that the animal responded according to the ratio of time accumulated during the CS in extinction over the time in the CS expected before the reinforcer. The results thus favor a trial-based account over a time-based account of extinction and the PREE. This article is part of a Special Issue entitled: Associative and Temporal Learning. Copyright © 2013 Elsevier B.V. All rights reserved.

  14. Distributed Economic Dispatch in Microgrids Based on Cooperative Reinforcement Learning.

    Science.gov (United States)

    Liu, Weirong; Zhuang, Peng; Liang, Hao; Peng, Jun; Huang, Zhiwu; Weirong Liu; Peng Zhuang; Hao Liang; Jun Peng; Zhiwu Huang; Liu, Weirong; Liang, Hao; Peng, Jun; Zhuang, Peng; Huang, Zhiwu

    2018-06-01

    Microgrids incorporated with distributed generation (DG) units and energy storage (ES) devices are expected to play more and more important roles in the future power systems. Yet, achieving efficient distributed economic dispatch in microgrids is a challenging issue due to the randomness and nonlinear characteristics of DG units and loads. This paper proposes a cooperative reinforcement learning algorithm for distributed economic dispatch in microgrids. Utilizing the learning algorithm can avoid the difficulty of stochastic modeling and high computational complexity. In the cooperative reinforcement learning algorithm, the function approximation is leveraged to deal with the large and continuous state spaces. And a diffusion strategy is incorporated to coordinate the actions of DG units and ES devices. Based on the proposed algorithm, each node in microgrids only needs to communicate with its local neighbors, without relying on any centralized controllers. Algorithm convergence is analyzed, and simulations based on real-world meteorological and load data are conducted to validate the performance of the proposed algorithm.

  15. Lessons Learned from Real-Time, Event-Based Internet Science Communications

    Science.gov (United States)

    Phillips, T.; Myszka, E.; Gallagher, D. L.; Adams, M. L.; Koczor, R. J.; Whitaker, Ann F. (Technical Monitor)

    2001-01-01

    For the last several years the Science Directorate at Marshall Space Flight Center has carried out a diverse program of Internet-based science communication. The Directorate's Science Roundtable includes active researchers, NASA public relations, educators, and administrators. The Science@NASA award-winning family of Web sites features science, mathematics, and space news. The program includes extended stories about NASA science, a curriculum resource for teachers tied to national education standards, on-line activities for students, and webcasts of real-time events. The focus of sharing science activities in real-time has been to involve and excite students and the public about science. Events have involved meteor showers, solar eclipses, natural very low frequency radio emissions, and amateur balloon flights. In some cases, broadcasts accommodate active feedback and questions from Internet participants. Through these projects a pattern has emerged in the level of interest or popularity with the public. The pattern differentiates projects that include science from those that do not, All real-time, event-based Internet activities have captured public interest at a level not achieved through science stories or educator resource material exclusively. The worst event-based activity attracted more interest than the best written science story. One truly rewarding lesson learned through these projects is that the public recognizes the importance and excitement of being part of scientific discovery. Flying a camera to 100,000 feet altitude isn't as interesting to the public as searching for viable life-forms at these oxygen-poor altitudes. The details of these real-time, event-based projects and lessons learned will be discussed.

  16. Real-time modeling of primitive environments through wavelet sensors and Hebbian learning

    Science.gov (United States)

    Vaccaro, James M.; Yaworsky, Paul S.

    1999-06-01

    Modeling the world through sensory input necessarily provides a unique perspective for the observer. Given a limited perspective, objects and events cannot always be encoded precisely but must involve crude, quick approximations to deal with sensory information in a real- time manner. As an example, when avoiding an oncoming car, a pedestrian needs to identify the fact that a car is approaching before ascertaining the model or color of the vehicle. In our methodology, we use wavelet-based sensors with self-organized learning to encode basic sensory information in real-time. The wavelet-based sensors provide necessary transformations while a rank-based Hebbian learning scheme encodes a self-organized environment through translation, scale and orientation invariant sensors. Such a self-organized environment is made possible by combining wavelet sets which are orthonormal, log-scale with linear orientation and have automatically generated membership functions. In earlier work we used Gabor wavelet filters, rank-based Hebbian learning and an exponential modulation function to encode textural information from images. Many different types of modulation are possible, but based on biological findings the exponential modulation function provided a good approximation of first spike coding of `integrate and fire' neurons. These types of Hebbian encoding schemes (e.g., exponential modulation, etc.) are useful for quick response and learning, provide several advantages over contemporary neural network learning approaches, and have been found to quantize data nonlinearly. By combining wavelets with Hebbian learning we can provide a real-time front-end for modeling an intelligent process, such as the autonomous control of agents in a simulated environment.

  17. In real time: exploring nursing students' learning during an international experience.

    Science.gov (United States)

    Afriyie Asenso, Barbara; Reimer-Kirkham, Sheryl; Astle, Barbara

    2013-10-11

    Abstract Nursing education has increasingly turned to international learning experiences to educate students who are globally minded and aware of social injustices in local and global communities. To date, research with international learning experiences has focused on the benefits for the students participating, after they have completed the international experience. The purpose of this qualitative study was to explore how nursing students learn during the international experience. The sample consisted of eight nursing students who enrolled in an international learning experience, and data were collected in "real time" in Zambia. The students were observed during learning activities and were interviewed three times. Three major themes emerged from the thematic analysis: expectations shaped students' learning, engagement facilitated learning, and critical reflection enhanced learning. Implications are discussed, related to disrupting media representations of Africa that shape students' expectations, and educational strategies for transformative learning and global citizenship.

  18. Overlay improvements using a real time machine learning algorithm

    Science.gov (United States)

    Schmitt-Weaver, Emil; Kubis, Michael; Henke, Wolfgang; Slotboom, Daan; Hoogenboom, Tom; Mulkens, Jan; Coogans, Martyn; ten Berge, Peter; Verkleij, Dick; van de Mast, Frank

    2014-04-01

    While semiconductor manufacturing is moving towards the 14nm node using immersion lithography, the overlay requirements are tightened to below 5nm. Next to improvements in the immersion scanner platform, enhancements in the overlay optimization and process control are needed to enable these low overlay numbers. Whereas conventional overlay control methods address wafer and lot variation autonomously with wafer pre exposure alignment metrology and post exposure overlay metrology, we see a need to reduce these variations by correlating more of the TWINSCAN system's sensor data directly to the post exposure YieldStar metrology in time. In this paper we will present the results of a study on applying a real time control algorithm based on machine learning technology. Machine learning methods use context and TWINSCAN system sensor data paired with post exposure YieldStar metrology to recognize generic behavior and train the control system to anticipate on this generic behavior. Specific for this study, the data concerns immersion scanner context, sensor data and on-wafer measured overlay data. By making the link between the scanner data and the wafer data we are able to establish a real time relationship. The result is an inline controller that accounts for small changes in scanner hardware performance in time while picking up subtle lot to lot and wafer to wafer deviations introduced by wafer processing.

  19. Flow Navigation by Smart Microswimmers via Reinforcement Learning

    Science.gov (United States)

    Colabrese, Simona; Biferale, Luca; Celani, Antonio; Gustavsson, Kristian

    2017-11-01

    We have numerically modeled active particles which are able to acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. We show that those swimmers can learn effective strategies just by experience, using a reinforcement learning algorithm. As an example, we focus on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, exploiting the underlying flow whenever possible. The reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This work paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. ERC AdG NewTURB 339032.

  20. Feasibility of a real-time hand hygiene notification machine learning system in outpatient clinics.

    Science.gov (United States)

    Geilleit, R; Hen, Z Q; Chong, C Y; Loh, A P; Pang, N L; Peterson, G M; Ng, K C; Huis, A; de Korne, D F

    2018-04-09

    Various technologies have been developed to improve hand hygiene (HH) compliance in inpatient settings; however, little is known about the feasibility of machine learning technology for this purpose in outpatient clinics. To assess the effectiveness, user experiences, and costs of implementing a real-time HH notification machine learning system in outpatient clinics. In our mixed methods study, a multi-disciplinary team co-created an infrared guided sensor system to automatically notify clinicians to perform HH just before first patient contact. Notification technology effects were measured by comparing HH compliance at baseline (without notifications) with real-time auditory notifications that continued till HH was performed (intervention I) or notifications lasting 15 s (intervention II). User experiences were collected during daily briefings and semi-structured interviews. Costs of implementation of the system were calculated and compared to the current observational auditing programme. Average baseline HH performance before first patient contact was 53.8%. With real-time auditory notifications that continued till HH was performed, overall HH performance increased to 100% (P machine learning system were estimated to be 46% lower than the observational auditing programme. Machine learning technology that enables real-time HH notification provides a promising cost-effective approach to both improving and monitoring HH, and deserves further development in outpatient settings. Copyright © 2018 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.

  1. Reinforcement and Systemic Machine Learning for Decision Making

    CERN Document Server

    Kulkarni, Parag

    2012-01-01

    Reinforcement and Systemic Machine Learning for Decision Making There are always difficulties in making machines that learn from experience. Complete information is not always available-or it becomes available in bits and pieces over a period of time. With respect to systemic learning, there is a need to understand the impact of decisions and actions on a system over that period of time. This book takes a holistic approach to addressing that need and presents a new paradigm-creating new learning applications and, ultimately, more intelligent machines. The first book of its kind in this new an

  2. Framework for robot skill learning using reinforcement learning

    Science.gov (United States)

    Wei, Yingzi; Zhao, Mingyang

    2003-09-01

    Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.

  3. Optimizing Chemical Reactions with Deep Reinforcement Learning.

    Science.gov (United States)

    Zhou, Zhenpeng; Li, Xiaocheng; Zare, Richard N

    2017-12-27

    Deep reinforcement learning was employed to optimize chemical reactions. Our model iteratively records the results of a chemical reaction and chooses new experimental conditions to improve the reaction outcome. This model outperformed a state-of-the-art blackbox optimization algorithm by using 71% fewer steps on both simulations and real reactions. Furthermore, we introduced an efficient exploration strategy by drawing the reaction conditions from certain probability distributions, which resulted in an improvement on regret from 0.062 to 0.039 compared with a deterministic policy. Combining the efficient exploration policy with accelerated microdroplet reactions, optimal reaction conditions were determined in 30 min for the four reactions considered, and a better understanding of the factors that control microdroplet reactions was reached. Moreover, our model showed a better performance after training on reactions with similar or even dissimilar underlying mechanisms, which demonstrates its learning ability.

  4. SCAFFOLDINGAND REINFORCEMENT: USING DIGITAL LOGBOOKS IN LEARNING VOCABULARY

    OpenAIRE

    Khalifa, Salma Hasan Almabrouk; Shabdin, Ahmad Affendi

    2016-01-01

    Reinforcement and scaffolding are tested approaches to enhance learning achievements. Keeping a record of the learning process as well as the new learned words functions as scaffolding to help learners build a comprehensive vocabulary. Similarly, repetitive learning of new words reinforces permanent learning for long-term memory. Paper-based logbooks may prove to be good records of the learning process, but if learners use digital logbooks, the results may be even better. Digital logbooks wit...

  5. Episodic reinforcement learning control approach for biped walking

    Directory of Open Access Journals (Sweden)

    Katić Duško

    2012-01-01

    Full Text Available This paper presents a hybrid dynamic control approach to the realization of humanoid biped robotic walk, focusing on the policy gradient episodic reinforcement learning with fuzzy evaluative feedback. The proposed structure of controller involves two feedback loops: a conventional computed torque controller and an episodic reinforcement learning controller. The reinforcement learning part includes fuzzy information about Zero-Moment- Point errors. Simulation tests using a medium-size 36-DOF humanoid robot MEXONE were performed to demonstrate the effectiveness of our method.

  6. Reinforcement learning in complementarity game and population dynamics.

    Science.gov (United States)

    Jost, Jürgen; Li, Wei

    2014-02-01

    We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005)] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.

  7. Large-scale machine learning and evaluation platform for real-time traffic surveillance

    Science.gov (United States)

    Eichel, Justin A.; Mishra, Akshaya; Miller, Nicholas; Jankovic, Nicholas; Thomas, Mohan A.; Abbott, Tyler; Swanson, Douglas; Keller, Joel

    2016-09-01

    In traffic engineering, vehicle detectors are trained on limited datasets, resulting in poor accuracy when deployed in real-world surveillance applications. Annotating large-scale high-quality datasets is challenging. Typically, these datasets have limited diversity; they do not reflect the real-world operating environment. There is a need for a large-scale, cloud-based positive and negative mining process and a large-scale learning and evaluation system for the application of automatic traffic measurements and classification. The proposed positive and negative mining process addresses the quality of crowd sourced ground truth data through machine learning review and human feedback mechanisms. The proposed learning and evaluation system uses a distributed cloud computing framework to handle data-scaling issues associated with large numbers of samples and a high-dimensional feature space. The system is trained using AdaBoost on 1,000,000 Haar-like features extracted from 70,000 annotated video frames. The trained real-time vehicle detector achieves an accuracy of at least 95% for 1/2 and about 78% for 19/20 of the time when tested on ˜7,500,000 video frames. At the end of 2016, the dataset is expected to have over 1 billion annotated video frames.

  8. A Sarsa(λ)-based control model for real-time traffic light coordination.

    Science.gov (United States)

    Zhou, Xiaoke; Zhu, Fei; Liu, Quan; Fu, Yuchen; Huang, Wei

    2014-01-01

    Traffic problems often occur due to the traffic demands by the outnumbered vehicles on road. Maximizing traffic flow and minimizing the average waiting time are the goals of intelligent traffic control. Each junction wants to get larger traffic flow. During the course, junctions form a policy of coordination as well as constraints for adjacent junctions to maximize their own interests. A good traffic signal timing policy is helpful to solve the problem. However, as there are so many factors that can affect the traffic control model, it is difficult to find the optimal solution. The disability of traffic light controllers to learn from past experiences caused them to be unable to adaptively fit dynamic changes of traffic flow. Considering dynamic characteristics of the actual traffic environment, reinforcement learning algorithm based traffic control approach can be applied to get optimal scheduling policy. The proposed Sarsa(λ)-based real-time traffic control optimization model can maintain the traffic signal timing policy more effectively. The Sarsa(λ)-based model gains traffic cost of the vehicle, which considers delay time, the number of waiting vehicles, and the integrated saturation from its experiences to learn and determine the optimal actions. The experiment results show an inspiring improvement in traffic control, indicating the proposed model is capable of facilitating real-time dynamic traffic control.

  9. A Sarsa(λ-Based Control Model for Real-Time Traffic Light Coordination

    Directory of Open Access Journals (Sweden)

    Xiaoke Zhou

    2014-01-01

    Full Text Available Traffic problems often occur due to the traffic demands by the outnumbered vehicles on road. Maximizing traffic flow and minimizing the average waiting time are the goals of intelligent traffic control. Each junction wants to get larger traffic flow. During the course, junctions form a policy of coordination as well as constraints for adjacent junctions to maximize their own interests. A good traffic signal timing policy is helpful to solve the problem. However, as there are so many factors that can affect the traffic control model, it is difficult to find the optimal solution. The disability of traffic light controllers to learn from past experiences caused them to be unable to adaptively fit dynamic changes of traffic flow. Considering dynamic characteristics of the actual traffic environment, reinforcement learning algorithm based traffic control approach can be applied to get optimal scheduling policy. The proposed Sarsa(λ-based real-time traffic control optimization model can maintain the traffic signal timing policy more effectively. The Sarsa(λ-based model gains traffic cost of the vehicle, which considers delay time, the number of waiting vehicles, and the integrated saturation from its experiences to learn and determine the optimal actions. The experiment results show an inspiring improvement in traffic control, indicating the proposed model is capable of facilitating real-time dynamic traffic control.

  10. Enhancement of Online Robotics Learning Using Real-Time 3D Visualization Technology

    OpenAIRE

    Richard Chiou; Yongjin (james) Kwon; Tzu-Liang (bill) Tseng; Robin Kizirian; Yueh-Ting Yang

    2010-01-01

    This paper discusses a real-time e-Lab Learning system based on the integration of 3D visualization technology with a remote robotic laboratory. With the emergence and development of the Internet field, online learning is proving to play a significant role in the upcoming era. In an effort to enhance Internet-based learning of robotics and keep up with the rapid progression of technology, a 3- Dimensional scheme of viewing the robotic laboratory has been introduced in addition to the remote c...

  11. Belief reward shaping in reinforcement learning

    CSIR Research Space (South Africa)

    Marom, O

    2018-02-01

    Full Text Available A key challenge in many reinforcement learning problems is delayed rewards, which can significantly slow down learning. Although reward shaping has previously been introduced to accelerate learning by bootstrapping an agent with additional...

  12. Reinforcement learning agents providing advice in complex video games

    Science.gov (United States)

    Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa

    2014-01-01

    This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

  13. Adaptive representations for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.

    2010-01-01

    This book presents new algorithms for reinforcement learning, a form of machine learning in which an autonomous agent seeks a control policy for a sequential decision task. Since current methods typically rely on manually designed solution representations, agents that automatically adapt their own

  14. Empirical Evidence of Priming, Transfer, Reinforcement, and Learning in the Real and Virtual Trillium Trails

    Science.gov (United States)

    Harrington, M. C. R.

    2011-01-01

    Over the past 20 years, there has been a debate on the effectiveness of virtual reality used for learning with young children, producing many ideas but little empirical proof. This empirical study compared learning activity in situ of a real environment (Real) and a desktop virtual reality (Virtual) environment, built with video game technology,…

  15. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning.

    Science.gov (United States)

    Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B J

    2014-06-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.

  16. Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers

    Science.gov (United States)

    Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.

    2014-01-01

    Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…

  17. Neural Basis of Reinforcement Learning and Decision Making

    Science.gov (United States)

    Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan

    2012-01-01

    Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543

  18. Reinforcement Learning in Repeated Portfolio Decisions

    OpenAIRE

    Diao, Linan; Rieskamp, Jörg

    2011-01-01

    How do people make investment decisions when they receive outcome feedback? We examined how well the standard mean-variance model and two reinforcement models predict people's portfolio decisions. The basic reinforcement model predicts a learning process that relies solely on the portfolio's overall return, whereas the proposed extended reinforcement model also takes the risk and covariance of the investments into account. The experimental results illustrate that people reacted sensitively to...

  19. Staged Inference using Conditional Deep Learning for energy efficient real-time smart diagnosis.

    Science.gov (United States)

    Parsa, Maryam; Panda, Priyadarshini; Sen, Shreyas; Roy, Kaushik

    2017-07-01

    Recent progress in biosensor technology and wearable devices has created a formidable opportunity for remote healthcare monitoring systems as well as real-time diagnosis and disease prevention. The use of data mining techniques is indispensable for analysis of the large pool of data generated by the wearable devices. Deep learning is among the promising methods for analyzing such data for healthcare applications and disease diagnosis. However, the conventional deep neural networks are computationally intensive and it is impractical to use them in real-time diagnosis with low-powered on-body devices. We propose Staged Inference using Conditional Deep Learning (SICDL), as an energy efficient approach for creating healthcare monitoring systems. For smart diagnostics, we observe that all diagnoses are not equally challenging. The proposed approach thus decomposes the diagnoses into preliminary analysis (such as healthy vs unhealthy) and detailed analysis (such as identifying the specific type of cardio disease). The preliminary diagnosis is conducted real-time with a low complexity neural network realized on the resource-constrained on-body device. The detailed diagnosis requires a larger network that is implemented remotely in cloud and is conditionally activated only for detailed diagnosis (unhealthy individuals). We evaluated the proposed approach using available physiological sensor data from Physionet databases, and achieved 38% energy reduction in comparison to the conventional deep learning approach.

  20. Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults.

    Science.gov (United States)

    Vernetti, Angélina; Smith, Tim J; Senju, Atsushi

    2017-03-15

    While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.

  1. The Computational Development of Reinforcement Learning during Adolescence.

    Directory of Open Access Journals (Sweden)

    Stefano Palminteri

    2016-06-01

    Full Text Available Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed. Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback and a value contextualisation module (enabling symmetrical reward and punishment learning. Unlike adults, adolescent performance did not benefit from counterfactual (complete feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.

  2. Solution to reinforcement learning problems with artificial potential field

    Institute of Scientific and Technical Information of China (English)

    XIE Li-juan; XIE Guang-rong; CHEN Huan-wen; LI Xiao-li

    2008-01-01

    A novel method was designed to solve reinforcement learning problems with artificial potential field. Firstly a reinforcement learning problem was transferred to a path planning problem by using artificial potential field(APF), which was a very appropriate method to model a reinforcement learning problem. Secondly, a new APF algorithm was proposed to overcome the local minimum problem in the potential field methods with a virtual water-flow concept. The performance of this new method was tested by a gridworld problem named as key and door maze. The experimental results show that within 45 trials, good and deterministic policies are found in almost all simulations. In comparison with WIERING's HQ-learning system which needs 20 000 trials for stable solution, the proposed new method can obtain optimal and stable policy far more quickly than HQ-learning. Therefore, the new method is simple and effective to give an optimal solution to the reinforcement learning problem.

  3. Tunnel Ventilation Control Using Reinforcement Learning Methodology

    Science.gov (United States)

    Chu, Baeksuk; Kim, Dongnam; Hong, Daehie; Park, Jooyoung; Chung, Jin Taek; Kim, Tae-Hyung

    The main purpose of tunnel ventilation system is to maintain CO pollutant concentration and VI (visibility index) under an adequate level to provide drivers with comfortable and safe driving environment. Moreover, it is necessary to minimize power consumption used to operate ventilation system. To achieve the objectives, the control algorithm used in this research is reinforcement learning (RL) method. RL is a goal-directed learning of a mapping from situations to actions without relying on exemplary supervision or complete models of the environment. The goal of RL is to maximize a reward which is an evaluative feedback from the environment. In the process of constructing the reward of the tunnel ventilation system, two objectives listed above are included, that is, maintaining an adequate level of pollutants and minimizing power consumption. RL algorithm based on actor-critic architecture and gradient-following algorithm is adopted to the tunnel ventilation system. The simulations results performed with real data collected from existing tunnel ventilation system and real experimental verification are provided in this paper. It is confirmed that with the suggested controller, the pollutant level inside the tunnel was well maintained under allowable limit and the performance of energy consumption was improved compared to conventional control scheme.

  4. Real-time cerebellar neuroprosthetic system based on a spiking neural network model of motor learning.

    Science.gov (United States)

    Xu, Tao; Xiao, Na; Zhai, Xiaolong; Kwan Chan, Pak; Tin, Chung

    2018-02-01

    Damage to the brain, as a result of various medical conditions, impacts the everyday life of patients and there is still no complete cure to neurological disorders. Neuroprostheses that can functionally replace the damaged neural circuit have recently emerged as a possible solution to these problems. Here we describe the development of a real-time cerebellar neuroprosthetic system to substitute neural function in cerebellar circuitry for learning delay eyeblink conditioning (DEC). The system was empowered by a biologically realistic spiking neural network (SNN) model of the cerebellar neural circuit, which considers the neuronal population and anatomical connectivity of the network. The model simulated synaptic plasticity critical for learning DEC. This SNN model was carefully implemented on a field programmable gate array (FPGA) platform for real-time simulation. This hardware system was interfaced in in vivo experiments with anesthetized rats and it used neural spikes recorded online from the animal to learn and trigger conditioned eyeblink in the animal during training. This rat-FPGA hybrid system was able to process neuronal spikes in real-time with an embedded cerebellum model of ~10 000 neurons and reproduce learning of DEC with different inter-stimulus intervals. Our results validated that the system performance is physiologically relevant at both the neural (firing pattern) and behavioral (eyeblink pattern) levels. This integrated system provides the sufficient computation power for mimicking the cerebellar circuit in real-time. The system interacts with the biological system naturally at the spike level and can be generalized for including other neural components (neuron types and plasticity) and neural functions for potential neuroprosthetic applications.

  5. Reinforcement Learning in Autism Spectrum Disorder

    Directory of Open Access Journals (Sweden)

    Manuela Schuetze

    2017-11-01

    Full Text Available Early behavioral interventions are recognized as integral to standard care in autism spectrum disorder (ASD, and often focus on reinforcing desired behaviors (e.g., eye contact and reducing the presence of atypical behaviors (e.g., echoing others' phrases. However, efficacy of these programs is mixed. Reinforcement learning relies on neurocircuitry that has been reported to be atypical in ASD: prefrontal-sub-cortical circuits, amygdala, brainstem, and cerebellum. Thus, early behavioral interventions rely on neurocircuitry that may function atypically in at least a subset of individuals with ASD. Recent work has investigated physiological, behavioral, and neural responses to reinforcers to uncover differences in motivation and learning in ASD. We will synthesize this work to identify promising avenues for future research that ultimately can be used to enhance the efficacy of early intervention.

  6. Storm real-time processing cookbook

    CERN Document Server

    Anderson, Quinton

    2013-01-01

    A Cookbook with plenty of practical recipes for different uses of Storm.If you are a Java developer with basic knowledge of real-time processing and would like to learn Storm to process unbounded streams of data in real time, then this book is for you.

  7. Investigations of mode I crack propagation in fibre-reinforced plastics with real time X-ray tests and simultaneous sound emission analysis

    International Nuclear Information System (INIS)

    Brunner, A.; Nordstrom, R.; Flueeler, P.

    1992-01-01

    The described investigation of crack formation and crack propagation in mode I (tensile stress) in fibre-reinforced plastic samples, especially uni-directional carbon fibre reinforced polyether-ether ketone (PEEK) has several aims. On the one hand, the phenomena of crack formation and crack propagation in these materials are to be studied, and on the other hand, the draft standards for these tests are to be checked. It was found that the combination of real time X-ray tests and simultaneous sound emission analysis is excellently suited for the basic examination of crack formation and crack propagation in DCB samples. With the aid of picture processing and analysis of the video representation, consistent crack lengths and resulting G IC values can be determined. (orig./RHM) [de

  8. Temporal Memory Reinforcement Learning for the Autonomous Micro-mobile Robot Based-behavior

    Institute of Scientific and Technical Information of China (English)

    Yang Yujun(杨玉君); Cheng Junshi; Chen Jiapin; Li Xiaohai

    2004-01-01

    This paper presents temporal memory reinforcement learning for the autonomous micro-mobile robot based-behavior. Human being has a memory oblivion process, i.e. the earlier to memorize, the earlier to forget, only the repeated thing can be remembered firmly. Enlightening forms this, and the robot need not memorize all the past states, at the same time economizes the EMS memory space, which is not enough in the MPU of our AMRobot. The proposed algorithm is an extension of the Q-learning, which is an incremental reinforcement learning method. The results of simulation have shown that the algorithm is valid.

  9. Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

    Directory of Open Access Journals (Sweden)

    Takahiro Ezaki

    2016-07-01

    Full Text Available Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.

  10. Reusable Reinforcement Learning via Shallow Trails.

    Science.gov (United States)

    Yu, Yang; Chen, Shi-Yong; Da, Qing; Zhou, Zhi-Hua

    2018-06-01

    Reinforcement learning has shown great success in helping learning agents accomplish tasks autonomously from environment interactions. Meanwhile in many real-world applications, an agent needs to accomplish not only a fixed task but also a range of tasks. For this goal, an agent can learn a metapolicy over a set of training tasks that are drawn from an underlying distribution. By maximizing the total reward summed over all the training tasks, the metapolicy can then be reused in accomplishing test tasks from the same distribution. However, in practice, we face two major obstacles to train and reuse metapolicies well. First, how to identify tasks that are unrelated or even opposite with each other, in order to avoid their mutual interference in the training. Second, how to characterize task features, according to which a metapolicy can be reused. In this paper, we propose the MetA-Policy LEarning (MAPLE) approach that overcomes the two difficulties by introducing the shallow trail. It probes a task by running a roughly trained policy. Using the rewards of the shallow trail, MAPLE automatically groups similar tasks. Moreover, when the task parameters are unknown, the rewards of the shallow trail also serve as task features. Empirical studies on several controlling tasks verify that MAPLE can train metapolicies well and receives high reward on test tasks.

  11. Reinforcement and inference in cross-situational word learning.

    Science.gov (United States)

    Tilles, Paulo F C; Fontanari, José F

    2013-01-01

    Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation) inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only) reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.

  12. Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.

    Science.gov (United States)

    Ren, Zhipeng; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Zhipeng Ren; Daoyi Dong; Huaxiong Li; Chunlin Chen; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Ren, Zhipeng

    2018-06-01

    In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.

  13. Manifold Regularized Reinforcement Learning.

    Science.gov (United States)

    Li, Hongliang; Liu, Derong; Wang, Ding

    2018-04-01

    This paper introduces a novel manifold regularized reinforcement learning scheme for continuous Markov decision processes. Smooth feature representations for value function approximation can be automatically learned using the unsupervised manifold regularization method. The learned features are data-driven, and can be adapted to the geometry of the state space. Furthermore, the scheme provides a direct basis representation extension for novel samples during policy learning and control. The performance of the proposed scheme is evaluated on two benchmark control tasks, i.e., the inverted pendulum and the energy storage problem. Simulation results illustrate the concepts of the proposed scheme and show that it can obtain excellent performance.

  14. Real-time cerebellar neuroprosthetic system based on a spiking neural network model of motor learning

    Science.gov (United States)

    Xu, Tao; Xiao, Na; Zhai, Xiaolong; Chan, Pak Kwan; Tin, Chung

    2018-02-01

    Objective. Damage to the brain, as a result of various medical conditions, impacts the everyday life of patients and there is still no complete cure to neurological disorders. Neuroprostheses that can functionally replace the damaged neural circuit have recently emerged as a possible solution to these problems. Here we describe the development of a real-time cerebellar neuroprosthetic system to substitute neural function in cerebellar circuitry for learning delay eyeblink conditioning (DEC). Approach. The system was empowered by a biologically realistic spiking neural network (SNN) model of the cerebellar neural circuit, which considers the neuronal population and anatomical connectivity of the network. The model simulated synaptic plasticity critical for learning DEC. This SNN model was carefully implemented on a field programmable gate array (FPGA) platform for real-time simulation. This hardware system was interfaced in in vivo experiments with anesthetized rats and it used neural spikes recorded online from the animal to learn and trigger conditioned eyeblink in the animal during training. Main results. This rat-FPGA hybrid system was able to process neuronal spikes in real-time with an embedded cerebellum model of ~10 000 neurons and reproduce learning of DEC with different inter-stimulus intervals. Our results validated that the system performance is physiologically relevant at both the neural (firing pattern) and behavioral (eyeblink pattern) levels. Significance. This integrated system provides the sufficient computation power for mimicking the cerebellar circuit in real-time. The system interacts with the biological system naturally at the spike level and can be generalized for including other neural components (neuron types and plasticity) and neural functions for potential neuroprosthetic applications.

  15. Energy Management Strategy for a Hybrid Electric Vehicle Based on Deep Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Yue Hu

    2018-01-01

    Full Text Available An energy management strategy (EMS is important for hybrid electric vehicles (HEVs since it plays a decisive role on the performance of the vehicle. However, the variation of future driving conditions deeply influences the effectiveness of the EMS. Most existing EMS methods simply follow predefined rules that are not adaptive to different driving conditions online. Therefore, it is useful that the EMS can learn from the environment or driving cycle. In this paper, a deep reinforcement learning (DRL-based EMS is designed such that it can learn to select actions directly from the states without any prediction or predefined rules. Furthermore, a DRL-based online learning architecture is presented. It is significant for applying the DRL algorithm in HEV energy management under different driving conditions. Simulation experiments have been conducted using MATLAB and Advanced Vehicle Simulator (ADVISOR co-simulation. Experimental results validate the effectiveness of the DRL-based EMS compared with the rule-based EMS in terms of fuel economy. The online learning architecture is also proved to be effective. The proposed method ensures the optimality, as well as real-time applicability, in HEVs.

  16. Optimal and Autonomous Control Using Reinforcement Learning: A Survey.

    Science.gov (United States)

    Kiumarsi, Bahare; Vamvoudakis, Kyriakos G; Modares, Hamidreza; Lewis, Frank L

    2018-06-01

    This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.

  17. Reinforcement learning or active inference?

    Science.gov (United States)

    Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J

    2009-07-29

    This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  18. Reinforcement learning or active inference?

    Directory of Open Access Journals (Sweden)

    Karl J Friston

    2009-07-01

    Full Text Available This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  19. Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games.

    Science.gov (United States)

    Li, Jinna; Modares, Hamidreza; Chai, Tianyou; Lewis, Frank L; Xie, Lihua

    2017-10-01

    This paper develops an off-policy reinforcement learning (RL) algorithm to solve optimal synchronization of multiagent systems. This is accomplished by using the framework of graphical games. In contrast to traditional control protocols, which require complete knowledge of agent dynamics, the proposed off-policy RL algorithm is a model-free approach, in that it solves the optimal synchronization problem without knowing any knowledge of the agent dynamics. A prescribed control policy, called behavior policy, is applied to each agent to generate and collect data for learning. An off-policy Bellman equation is derived for each agent to learn the value function for the policy under evaluation, called target policy, and find an improved policy, simultaneously. Actor and critic neural networks along with least-square approach are employed to approximate target control policies and value functions using the data generated by applying prescribed behavior policies. Finally, an off-policy RL algorithm is presented that is implemented in real time and gives the approximate optimal control policy for each agent using only measured data. It is shown that the optimal distributed policies found by the proposed algorithm satisfy the global Nash equilibrium and synchronize all agents to the leader. Simulation results illustrate the effectiveness of the proposed method.

  20. Decentralized Reinforcement Learning of robot behaviors

    NARCIS (Netherlands)

    Leottau, David L.; Ruiz-del-Solar, Javier; Babuska, R.

    2018-01-01

    A multi-agent methodology is proposed for Decentralized Reinforcement Learning (DRL) of individual behaviors in problems where multi-dimensional action spaces are involved. When using this methodology, sub-tasks are learned in parallel by individual agents working toward a common goal. In

  1. Continuous residual reinforcement learning for traffic signal control optimization

    NARCIS (Netherlands)

    Aslani, Mohammad; Seipel, Stefan; Wiering, Marco

    2018-01-01

    Traffic signal control can be naturally regarded as a reinforcement learning problem. Unfortunately, it is one of the most difficult classes of reinforcement learning problems owing to its large state space. A straightforward approach to address this challenge is to control traffic signals based on

  2. Online constrained model-based reinforcement learning

    CSIR Research Space (South Africa)

    Van Niekerk, B

    2017-08-01

    Full Text Available Constrained Model-based Reinforcement Learning Benjamin van Niekerk School of Computer Science University of the Witwatersrand South Africa Andreas Damianou∗ Amazon.com Cambridge, UK Benjamin Rosman Council for Scientific and Industrial Research, and School... MULTIPLE SHOOTING Using direct multiple shooting (Bock and Plitt, 1984), problem (1) can be transformed into a structured non- linear program (NLP). First, the time horizon [t0, t0 + T ] is partitioned into N equal subintervals [tk, tk+1] for k = 0...

  3. Reinforcement learning account of network reciprocity.

    Science.gov (United States)

    Ezaki, Takahiro; Masuda, Naoki

    2017-01-01

    Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model) approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results.

  4. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention-A Neuroeducation Study.

    Science.gov (United States)

    Anderson, Sarah J; Hecker, Kent G; Krigolson, Olave E; Jamniczky, Heather A

    2018-01-01

    In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise.

  5. Learning motion concepts using real-time microcomputer-based laboratory tools

    Science.gov (United States)

    Thornton, Ronald K.; Sokoloff, David R.

    1990-09-01

    Microcomputer-based laboratory (MBL) tools have been developed which interface to Apple II and Macintosh computers. Students use these tools to collect physical data that are graphed in real time and then can be manipulated and analyzed. The MBL tools have made possible discovery-based laboratory curricula that embody results from educational research. These curricula allow students to take an active role in their learning and encourage them to construct physical knowledge from observation of the physical world. The curricula encourage collaborative learning by taking advantage of the fact that MBL tools present data in an immediately understandable graphical form. This article describes one of the tools—the motion detector (hardware and software)—and the kinematics curriculum. The effectiveness of this curriculum compared to traditional college and university methods for helping students learn basic kinematics concepts has been evaluated by pre- and post-testing and by observation. There is strong evidence for significantly improved learning and retention by students who used the MBL materials, compared to those taught in lecture.

  6. Optimal and Scalable Caching for 5G Using Reinforcement Learning of Space-Time Popularities

    Science.gov (United States)

    Sadeghi, Alireza; Sheikholeslami, Fatemeh; Giannakis, Georgios B.

    2018-02-01

    Small basestations (SBs) equipped with caching units have potential to handle the unprecedented demand growth in heterogeneous networks. Through low-rate, backhaul connections with the backbone, SBs can prefetch popular files during off-peak traffic hours, and service them to the edge at peak periods. To intelligently prefetch, each SB must learn what and when to cache, while taking into account SB memory limitations, the massive number of available contents, the unknown popularity profiles, as well as the space-time popularity dynamics of user file requests. In this work, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown. Joint consideration of global and local popularity demands along with cache-refreshing costs allow for a simple, yet practical asynchronous caching approach. The novel RL-based caching relies on a Q-learning algorithm to implement the optimal policy in an online fashion, thus enabling the cache control unit at the SB to learn, track, and possibly adapt to the underlying dynamics. To endow the algorithm with scalability, a linear function approximation of the proposed Q-learning scheme is introduced, offering faster convergence as well as reduced complexity and memory requirements. Numerical tests corroborate the merits of the proposed approach in various realistic settings.

  7. Can model-free reinforcement learning explain deontological moral judgments?

    Science.gov (United States)

    Ayars, Alisabeth

    2016-05-01

    Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Real time reinforcement learning control of dynamic systems applied to an inverted pendulum

    NARCIS (Netherlands)

    van Luenen, W.T.C.; van Luenen, W.T.C.; Stender, J.; Addis, T.

    1990-01-01

    Describes work started in order to investigate the use of neural networks for application in adaptive or learning control systems. Neural networks have learning capabilities and they can be used to realize non-linear mappings. These are attractive features which could make them useful building

  9. Human demonstrations for fast and safe exploration in reinforcement learning

    NARCIS (Netherlands)

    Schonebaum, G.K.; Junell, J.L.; van Kampen, E.

    2017-01-01

    Reinforcement learning is a promising framework for controlling complex vehicles with a high level of autonomy, since it does not need a dynamic model of the vehicle, and it is able to adapt to changing conditions. When learning from scratch, the performance of a reinforcement learning controller

  10. Reinforcement learning on slow features of high-dimensional input streams.

    Directory of Open Access Journals (Sweden)

    Robert Legenstein

    Full Text Available Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning.

  11. Real-time lexical comprehension in young children learning American Sign Language.

    Science.gov (United States)

    MacDonald, Kyle; LaMarr, Todd; Corina, David; Marchman, Virginia A; Fernald, Anne

    2018-04-16

    When children interpret spoken language in real time, linguistic information drives rapid shifts in visual attention to objects in the visual world. This language-vision interaction can provide insights into children's developing efficiency in language comprehension. But how does language influence visual attention when the linguistic signal and the visual world are both processed via the visual channel? Here, we measured eye movements during real-time comprehension of a visual-manual language, American Sign Language (ASL), by 29 native ASL-learning children (16-53 mos, 16 deaf, 13 hearing) and 16 fluent deaf adult signers. All signers showed evidence of rapid, incremental language comprehension, tending to initiate an eye movement before sign offset. Deaf and hearing ASL-learners showed similar gaze patterns, suggesting that the in-the-moment dynamics of eye movements during ASL processing are shaped by the constraints of processing a visual language in real time and not by differential access to auditory information in day-to-day life. Finally, variation in children's ASL processing was positively correlated with age and vocabulary size. Thus, despite competition for attention within a single modality, the timing and accuracy of visual fixations during ASL comprehension reflect information processing skills that are important for language acquisition regardless of language modality. © 2018 John Wiley & Sons Ltd.

  12. Reinforcement Learning in Continuous Action Spaces

    NARCIS (Netherlands)

    Hasselt, H. van; Wiering, M.A.

    2007-01-01

    Quite some research has been done on Reinforcement Learning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new class of algorithms named Continuous Actor Critic Learning Automaton (CACLA)

  13. Memory controllers for real-time embedded systems predictable and composable real-time systems

    CERN Document Server

    Akesson, Benny

    2012-01-01

      Verification of real-time requirements in systems-on-chip becomes more complex as more applications are integrated. Predictable and composable systems can manage the increasing complexity using formal verification and simulation.  This book explains the concepts of predictability and composability and shows how to apply them to the design and analysis of a memory controller, which is a key component in any real-time system. This book is generally intended for readers interested in Systems-on-Chips with real-time applications.   It is especially well-suited for readers looking to use SDRAM memories in systems with hard or firm real-time requirements. There is a strong focus on real-time concepts, such as predictability and composability, as well as a brief discussion about memory controller architectures for high-performance computing. Readers will learn step-by-step how to go from an unpredictable SDRAM memory, offering highly variable bandwidth and latency, to a predictable and composable shared memory...

  14. Reinforcement learning account of network reciprocity.

    Directory of Open Access Journals (Sweden)

    Takahiro Ezaki

    Full Text Available Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results.

  15. Learning Constructive Primitives for Real-time Dynamic Difficulty Adjustment in Super Mario Bros

    OpenAIRE

    Shi, Peizhi; Chen, Ke

    2017-01-01

    Among the main challenges in procedural content generation (PCG), content quality assurance and dynamic difficulty adjustment (DDA) of game content in real time are two major issues concerned in adaptive content generation. Motivated by the recent learning-based PCG framework, we propose a novel approach to seamlessly address two issues in Super Mario Bros (SMB). To address the quality assurance issue, we exploit the synergy between rule-based and learning-based methods to produce quality gam...

  16. Human reinforcement learning subdivides structured action spaces by learning effector-specific values.

    Science.gov (United States)

    Gershman, Samuel J; Pesaran, Bijan; Daw, Nathaniel D

    2009-10-28

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.

  17. Evolutionary computation for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Wiering, M.; van Otterlo, M.

    2012-01-01

    Algorithms for evolutionary computation, which simulate the process of natural selection to solve optimization problems, are an effective tool for discovering high-performing reinforcement-learning policies. Because they can automatically find good representations, handle continuous action spaces,

  18. Reinforcement learning for dpm of embedded visual sensor nodes

    International Nuclear Information System (INIS)

    Khani, U.; Sadhayo, I. H.

    2014-01-01

    This paper proposes a RL (Reinforcement Learning) based DPM (Dynamic Power Management) technique to learn time out policies during a visual sensor node's operation which has multiple power/performance states. As opposed to the widely used static time out policies, our proposed DPM policy which is also referred to as OLTP (Online Learning of Time out Policies), learns to dynamically change the time out decisions in the different node states including the non-operational states. The selection of time out values in different power/performance states of a visual sensing platform is based on the workload estimates derived from a ML-ANN (Multi-Layer Artificial Neural Network) and an objective function given by weighted performance and power parameters. The DPM approach is also able to dynamically adjust the power-performance weights online to satisfy a given constraint of either power consumption or performance. Results show that the proposed learning algorithm explores the power-performance tradeoff with non-stationary workload and outperforms other DPM policies. It also performs the online adjustment of the tradeoff parameters in order to meet a user-specified constraint. (author)

  19. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study

    Science.gov (United States)

    Anderson, Sarah J.; Hecker, Kent G.; Krigolson, Olave E.; Jamniczky, Heather A.

    2018-01-01

    In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise. PMID:29467638

  20. A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study

    Directory of Open Access Journals (Sweden)

    Sarah J. Anderson

    2018-02-01

    Full Text Available In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT methods incorporate pre-class exercises (typically online meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise.

  1. Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations.

    Science.gov (United States)

    Lee, Jae Young; Park, Jin Bae; Choi, Yoon Ho

    2015-05-01

    This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral Q -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.

  2. Fast Conflict Resolution Based on Reinforcement Learning in Multi-agent System

    Institute of Scientific and Technical Information of China (English)

    PIAOSonghao; HONGBingrong; CHUHaitao

    2004-01-01

    In multi-agent system where each agen thas a different goal (even the team of agents has the same goal), agents must be able to resolve conflicts arising in the process of achieving their goal. Many researchers presented methods for conflict resolution, e.g., Reinforcement learning (RL), but the conventional RL requires a large computation cost because every agent must learn, at the same time the overlap of actions selected by each agent results in local conflict. Therefore in this paper, we propose a novel method to solve these problems. In order to deal with the conflict within the multi-agent system, the concept of potential field function based Action selection priority level (ASPL) is brought forward. In this method, all kinds of environment factor that may have influence on the priority are effectively computed with the potential field function. So the priority to access the local resource can be decided rapidly. By avoiding the complex coordination mechanism used in general multi-agent system, the conflict in multi-agent system is settled more efficiently. Our system consists of RL with ASPL module and generalized rules module. Using ASPL, RL module chooses a proper cooperative behavior, and generalized rule module can accelerate the learning process. By applying the proposed method to Robot Soccer, the learning process can be accelerated. The results of simulation and real experiments indicate the effectiveness of the method.

  3. Investigations of timing during the schedule and reinforcement intervals with wheel-running reinforcement.

    Science.gov (United States)

    Belke, Terry W; Christie-Fougere, Melissa M

    2006-11-01

    Across two experiments, a peak procedure was used to assess the timing of the onset and offset of an opportunity to run as a reinforcer. The first experiment investigated the effect of reinforcer duration on temporal discrimination of the onset of the reinforcement interval. Three male Wistar rats were exposed to fixed-interval (FI) 30-s schedules of wheel-running reinforcement and the duration of the opportunity to run was varied across values of 15, 30, and 60s. Each session consisted of 50 reinforcers and 10 probe trials. Results showed that as reinforcer duration increased, the percentage of postreinforcement pauses longer than the 30-s schedule interval increased. On probe trials, peak response rates occurred near the time of reinforcer delivery and peak times varied with reinforcer duration. In a second experiment, seven female Long-Evans rats were exposed to FI 30-s schedules leading to 30-s opportunities to run. Timing of the onset and offset of the reinforcement period was assessed by probe trials during the schedule interval and during the reinforcement interval in separate conditions. The results provided evidence of timing of the onset, but not the offset of the wheel-running reinforcement period. Further research is required to assess if timing occurs during a wheel-running reinforcement period.

  4. Human reinforcement learning subdivides structured action spaces by learning effector-specific values

    OpenAIRE

    Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.

    2009-01-01

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable, due to the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning – such as prediction error signals for action valuation associated with dopamine and the striatum – can cope with this “curse of dimensionality...

  5. Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems

    International Nuclear Information System (INIS)

    Wei Qing-Lai; Song Rui-Zhuo; Xiao Wen-Dong; Sun Qiu-Ye

    2015-01-01

    This paper estimates an off-policy integral reinforcement learning (IRL) algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the system data generated by an arbitrary control. Moreover, off-policy IRL can be regarded as a direct learning method, which avoids the identification of system dynamics. In this paper, the performance index function is first given based on the system tracking error and control error. For solving the Hamilton–Jacobi–Bellman (HJB) equation, an off-policy IRL algorithm is proposed. It is proven that the iterative control makes the tracking error system asymptotically stable, and the iterative performance index function is convergent. Simulation study demonstrates the effectiveness of the developed tracking control method. (paper)

  6. Reinforcement Learning Based Novel Adaptive Learning Framework for Smart Grid Prediction

    Directory of Open Access Journals (Sweden)

    Tian Li

    2017-01-01

    Full Text Available Smart grid is a potential infrastructure to supply electricity demand for end users in a safe and reliable manner. With the rapid increase of the share of renewable energy and controllable loads in smart grid, the operation uncertainty of smart grid has increased briskly during recent years. The forecast is responsible for the safety and economic operation of the smart grid. However, most existing forecast methods cannot account for the smart grid due to the disabilities to adapt to the varying operational conditions. In this paper, reinforcement learning is firstly exploited to develop an online learning framework for the smart grid. With the capability of multitime scale resolution, wavelet neural network has been adopted in the online learning framework to yield reinforcement learning and wavelet neural network (RLWNN based adaptive learning scheme. The simulations on two typical prediction problems in smart grid, including wind power prediction and load forecast, validate the effectiveness and the scalability of the proposed RLWNN based learning framework and algorithm.

  7. Knowledge-Based Reinforcement Learning for Data Mining

    Science.gov (United States)

    Kudenko, Daniel; Grzes, Marek

    Data Mining is the process of extracting patterns from data. Two general avenues of research in the intersecting areas of agents and data mining can be distinguished. The first approach is concerned with mining an agent’s observation data in order to extract patterns, categorize environment states, and/or make predictions of future states. In this setting, data is normally available as a batch, and the agent’s actions and goals are often independent of the data mining task. The data collection is mainly considered as a side effect of the agent’s activities. Machine learning techniques applied in such situations fall into the class of supervised learning. In contrast, the second scenario occurs where an agent is actively performing the data mining, and is responsible for the data collection itself. For example, a mobile network agent is acquiring and processing data (where the acquisition may incur a certain cost), or a mobile sensor agent is moving in a (perhaps hostile) environment, collecting and processing sensor readings. In these settings, the tasks of the agent and the data mining are highly intertwined and interdependent (or even identical). Supervised learning is not a suitable technique for these cases. Reinforcement Learning (RL) enables an agent to learn from experience (in form of reward and punishment for explorative actions) and adapt to new situations, without a teacher. RL is an ideal learning technique for these data mining scenarios, because it fits the agent paradigm of continuous sensing and acting, and the RL agent is able to learn to make decisions on the sampling of the environment which provides the data. Nevertheless, RL still suffers from scalability problems, which have prevented its successful use in many complex real-world domains. The more complex the tasks, the longer it takes a reinforcement learning algorithm to converge to a good solution. For many real-world tasks, human expert knowledge is available. For example, human

  8. Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models.

    Science.gov (United States)

    Najnin, Shamima; Banerjee, Bonny

    2018-01-01

    Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The "novel words to novel objects" language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task.

  9. Instructional control of reinforcement learning: a behavioral and neurocomputational investigation.

    Science.gov (United States)

    Doll, Bradley B; Jacobs, W Jake; Sanfey, Alan G; Frank, Michael J

    2009-11-24

    Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.

  10. Explaining How to Play Real-Time Strategy Games

    Science.gov (United States)

    Metoyer, Ronald; Stumpf, Simone; Neumann, Christoph; Dodge, Jonathan; Cao, Jill; Schnabel, Aaron

    Real-time strategy games share many aspects with real situations in domains such as battle planning, air traffic control, and emergency response team management which makes them appealing test-beds for Artificial Intelligence (AI) and machine learning. End user annotations could help to provide supplemental information for learning algorithms, especially when training data is sparse. This paper presents a formative study to uncover how experienced users explain game play in real-time strategy games. We report the results of our analysis of explanations and discuss their characteristics that could support the design of systems for use by experienced real-time strategy game users in specifying or annotating strategy-oriented behavior.

  11. Study on state grouping and opportunity evaluation for reinforcement learning methods; Kyoka gakushuho no tame no jotai grouping to opportunity hyoka ni kansuru kenkyu

    Energy Technology Data Exchange (ETDEWEB)

    Yu, W.; Yokoi, H.; Kakazu, Y. [Hokkaido University, Sapporo (Japan)

    1997-08-20

    In this paper, we propose the State Grouping scheme for coping with the problem of scaling up the Reinforcement Learning Algorithm to real, large size application. The grouping scheme is based on geographical and trial-error information, and is made up with state generating, state combining, state splitting, state forgetting procedures, with corresponding action selecting module and learning module. Also, we discuss the Labeling Based Evaluation scheme which can evaluate the opportunity of the state-action pair, therefore, use better experience to guide the exploration of the state-space effectively. Incorporating the Labeling Based Evaluation and State Grouping scheme into the Reinforcement Learning Algorithm, we get the approach that can generate organized state space for Reinforcement Learning, and do problem solving as well. We argue that the approach with this kind of ability is necessary for autonomous agent, namely, autonomous agent can not act depending on any pre-defined map, instead, it should search the environment as well as find the optimal problem solution autonomously and simultaneously. By solving the large state-size 3-DOF and 4-link manipulator problem, we show the efficiency of the proposed approach, i.e., the agent can achieve the optimal or sub-optimal path with less memory and less time. 14 refs., 11 figs., 3 tabs.

  12. Online reinforcement learning control for aerospace systems

    NARCIS (Netherlands)

    Zhou, Y.

    2018-01-01

    Reinforcement Learning (RL) methods are relatively new in the field of aerospace guidance, navigation, and control. This dissertation aims to exploit RL methods to improve the autonomy and online learning of aerospace systems with respect to the a priori unknown system and environment, dynamical

  13. Multi-agent machine learning a reinforcement approach

    CERN Document Server

    Schwartz, H M

    2014-01-01

    The book begins with a chapter on traditional methods of supervised learning, covering recursive least squares learning, mean square error methods, and stochastic approximation. Chapter 2 covers single agent reinforcement learning. Topics include learning value functions, Markov games, and TD learning with eligibility traces. Chapter 3 discusses two player games including two player matrix games with both pure and mixed strategies. Numerous algorithms and examples are presented. Chapter 4 covers learning in multi-player games, stochastic games, and Markov games, focusing on learning multi-pla

  14. Deep Learning for real-time gravitational wave detection and parameter estimation: Results with Advanced LIGO data

    Science.gov (United States)

    George, Daniel; Huerta, E. A.

    2018-03-01

    The recent Nobel-prize-winning detections of gravitational waves from merging black holes and the subsequent detection of the collision of two neutron stars in coincidence with electromagnetic observations have inaugurated a new era of multimessenger astrophysics. To enhance the scope of this emergent field of science, we pioneered the use of deep learning with convolutional neural networks, that take time-series inputs, for rapid detection and characterization of gravitational wave signals. This approach, Deep Filtering, was initially demonstrated using simulated LIGO noise. In this article, we present the extension of Deep Filtering using real data from LIGO, for both detection and parameter estimation of gravitational waves from binary black hole mergers using continuous data streams from multiple LIGO detectors. We demonstrate for the first time that machine learning can detect and estimate the true parameters of real events observed by LIGO. Our results show that Deep Filtering achieves similar sensitivities and lower errors compared to matched-filtering while being far more computationally efficient and more resilient to glitches, allowing real-time processing of weak time-series signals in non-stationary non-Gaussian noise with minimal resources, and also enables the detection of new classes of gravitational wave sources that may go unnoticed with existing detection algorithms. This unified framework for data analysis is ideally suited to enable coincident detection campaigns of gravitational waves and their multimessenger counterparts in real-time.

  15. Structure identification in fuzzy inference using reinforcement learning

    Science.gov (United States)

    Berenji, Hamid R.; Khedkar, Pratap

    1993-01-01

    In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.

  16. Reinforcement Learning Based on the Bayesian Theorem for Electricity Markets Decision Support

    DEFF Research Database (Denmark)

    Sousa, Tiago; Pinto, Tiago; Praca, Isabel

    2014-01-01

    This paper presents the applicability of a reinforcement learning algorithm based on the application of the Bayesian theorem of probability. The proposed reinforcement learning algorithm is an advantageous and indispensable tool for ALBidS (Adaptive Learning strategic Bidding System), a multi...

  17. Using a board game to reinforce learning.

    Science.gov (United States)

    Yoon, Bona; Rodriguez, Leslie; Faselis, Charles J; Liappis, Angelike P

    2014-03-01

    Experiential gaming strategies offer a variation on traditional learning. A board game was used to present synthesized content of fundamental catheter care concepts and reinforce evidence-based practices relevant to nursing. Board games are innovative educational tools that can enhance active learning. Copyright 2014, SLACK Incorporated.

  18. Exploiting Best-Match Equations for Efficient Reinforcement Learning

    NARCIS (Netherlands)

    van Seijen, Harm; Whiteson, Shimon; van Hasselt, Hado; Wiering, Marco

    This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of model-free methods. Best-match learning works by approximating the solution to a set of best-match equations,

  19. Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing

    OpenAIRE

    Le, Minh; Fokkens, Antske

    2017-01-01

    Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this paper, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error propagation. Reinforcement learning improves accuracy of both labeled and unlabeled dependencies of the Stanford Neural Dependency Parser, a high performance greedy parser, while maintaining its eff...

  20. Sparse Bayesian learning machine for real-time management of reservoir releases

    Science.gov (United States)

    Khalil, Abedalrazq; McKee, Mac; Kemblowski, Mariush; Asefa, Tirusew

    2005-11-01

    Water scarcity and uncertainties in forecasting future water availabilities present serious problems for basin-scale water management. These problems create a need for intelligent prediction models that learn and adapt to their environment in order to provide water managers with decision-relevant information related to the operation of river systems. This manuscript presents examples of state-of-the-art techniques for forecasting that combine excellent generalization properties and sparse representation within a Bayesian paradigm. The techniques are demonstrated as decision tools to enhance real-time water management. A relevance vector machine, which is a probabilistic model, has been used in an online fashion to provide confident forecasts given knowledge of some state and exogenous conditions. In practical applications, online algorithms should recognize changes in the input space and account for drift in system behavior. Support vectors machines lend themselves particularly well to the detection of drift and hence to the initiation of adaptation in response to a recognized shift in system structure. The resulting model will normally have a structure and parameterization that suits the information content of the available data. The utility and practicality of this proposed approach have been demonstrated with an application in a real case study involving real-time operation of a reservoir in a river basin in southern Utah.

  1. Longitudinal investigation on learned helplessness tested under negative and positive reinforcement involving stimulus control.

    Science.gov (United States)

    Oliveira, Emileane C; Hunziker, Maria Helena

    2014-07-01

    In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.

  2. Adaptive Trajectory Tracking Control using Reinforcement Learning for Quadrotor

    Directory of Open Access Journals (Sweden)

    Wenjie Lou

    2016-02-01

    Full Text Available Inaccurate system parameters and unpredicted external disturbances affect the performance of non-linear controllers. In this paper, a new adaptive control algorithm under the reinforcement framework is proposed to stabilize a quadrotor helicopter. Based on a command-filtered non-linear control algorithm, adaptive elements are added and learned by policy-search methods. To predict the inaccurate system parameters, a new kernel-based regression learning method is provided. In addition, Policy learning by Weighting Exploration with the Returns (PoWER and Return Weighted Regression (RWR are utilized to learn the appropriate parameters for adaptive elements in order to cancel the effect of external disturbance. Furthermore, numerical simulations under several conditions are performed, and the ability of adaptive trajectory-tracking control with reinforcement learning are demonstrated.

  3. Enriching behavioral ecology with reinforcement learning methods.

    Science.gov (United States)

    Frankenhuis, Willem E; Panchanathan, Karthik; Barto, Andrew G

    2018-02-13

    This article focuses on the division of labor between evolution and development in solving sequential, state-dependent decision problems. Currently, behavioral ecologists tend to use dynamic programming methods to study such problems. These methods are successful at predicting animal behavior in a variety of contexts. However, they depend on a distinct set of assumptions. Here, we argue that behavioral ecology will benefit from drawing more than it currently does on a complementary collection of tools, called reinforcement learning methods. These methods allow for the study of behavior in highly complex environments, which conventional dynamic programming methods do not feasibly address. In addition, reinforcement learning methods are well-suited to studying how biological mechanisms solve developmental and learning problems. For instance, we can use them to study simple rules that perform well in complex environments. Or to investigate under what conditions natural selection favors fixed, non-plastic traits (which do not vary across individuals), cue-driven-switch plasticity (innate instructions for adaptive behavioral development based on experience), or developmental selection (the incremental acquisition of adaptive behavior based on experience). If natural selection favors developmental selection, which includes learning from environmental feedback, we can also make predictions about the design of reward systems. Our paper is written in an accessible manner and for a broad audience, though we believe some novel insights can be drawn from our discussion. We hope our paper will help advance the emerging bridge connecting the fields of behavioral ecology and reinforcement learning. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  4. Optimizing Earth Data Search Ranking using Deep Learning and Real-time User Behaviour

    Science.gov (United States)

    Jiang, Y.; Yang, C. P.; Armstrong, E. M.; Huang, T.; Moroni, D. F.; McGibbney, L. J.; Greguska, F. R., III

    2017-12-01

    Finding Earth science data has been a challenging problem given both the quantity of data available and the heterogeneity of the data across a wide variety of domains. Current search engines in most geospatial data portals tend to induce end users to focus on one single data characteristic dimension (e.g., term frequency-inverse document frequency (TF-IDF) score, popularity, release date, etc.). This approach largely fails to take account of users' multidimensional preferences for geospatial data, and hence may likely result in a less than optimal user experience in discovering the most applicable dataset out of a vast range of available datasets. With users interacting with search engines, sufficient information is already hidden in the log files. Compared with explicit feedback data, information that can be derived/extracted from log files is virtually free and substantially more timely. In this dissertation, I propose an online deep learning framework that can quickly update the learning function based on real-time user clickstream data. The contributions of this framework include 1) a log processor that can ingest, process and create training data from web logs in a real-time manner; 2) a query understanding module to better interpret users' search intent using web log processing results and metadata; 3) a feature extractor that identifies ranking features representing users' multidimensional interests of geospatial data; and 4) a deep learning based ranking algorithm that can be trained incrementally using user behavior data. The search ranking results will be evaluated using precision at K and normalized discounted cumulative gain (NDCG).

  5. Universal effect of dynamical reinforcement learning mechanism in spatial evolutionary games

    International Nuclear Information System (INIS)

    Zhang, Hai-Feng; Wu, Zhi-Xi; Wang, Bing-Hong

    2012-01-01

    One of the prototypical mechanisms in understanding the ubiquitous cooperation in social dilemma situations is the win–stay, lose–shift rule. In this work, a generalized win–stay, lose–shift learning model—a reinforcement learning model with dynamic aspiration level—is proposed to describe how humans adapt their social behaviors based on their social experiences. In the model, the players incorporate the information of the outcomes in previous rounds with time-dependent aspiration payoffs to regulate the probability of choosing cooperation. By investigating such a reinforcement learning rule in the spatial prisoner's dilemma game and public goods game, a most noteworthy viewpoint is that moderate greediness (i.e. moderate aspiration level) favors best the development and organization of collective cooperation. The generality of this observation is tested against different regulation strengths and different types of network of interaction as well. We also make comparisons with two recently proposed models to highlight the importance of the mechanism of adaptive aspiration level in supporting cooperation in structured populations

  6. Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions.

    Science.gov (United States)

    Tamosiunaite, Minija; Asfour, Tamim; Wörgötter, Florentin

    2009-03-01

    Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult.

  7. Real-time Stereoscopic 3D for E-Robotics Learning

    Directory of Open Access Journals (Sweden)

    Richard Y. Chiou

    2011-02-01

    Full Text Available Following the design and testing of a successful 3-Dimensional surveillance system, this 3D scheme has been implemented into online robotics learning at Drexel University. A real-time application, utilizing robot controllers, programmable logic controllers and sensors, has been developed in the “MET 205 Robotics and Mechatronics” class to provide the students with a better robotic education. The integration of the 3D system allows the students to precisely program the robot and execute functions remotely. Upon the students’ recommendation, polarization has been chosen to be the main platform behind the 3D robotic system. Stereoscopic calculations are carried out for calibration purposes to display the images with the highest possible comfort-level and 3D effect. The calculations are further validated by comparing the results with students’ evaluations. Due to the Internet-based feature, multiple clients have the opportunity to perform the online automation development. In the future, students, in different universities, will be able to cross-control robotic components of different types around the world. With the development of this 3D ERobotics interface, automation resources and robotic learning can be shared and enriched regardless of location.

  8. Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning.

    Science.gov (United States)

    Yang, Xiong; Liu, Derong; Wang, Ding; Wei, Qinglai

    2014-07-01

    In this paper, a reinforcement-learning-based direct adaptive control is developed to deliver a desired tracking performance for a class of discrete-time (DT) nonlinear systems with unknown bounded disturbances. We investigate multi-input-multi-output unknown nonaffine nonlinear DT systems and employ two neural networks (NNs). By using Implicit Function Theorem, an action NN is used to generate the control signal and it is also designed to cancel the nonlinearity of unknown DT systems, for purpose of utilizing feedback linearization methods. On the other hand, a critic NN is applied to estimate the cost function, which satisfies the recursive equations derived from heuristic dynamic programming. The weights of both the action NN and the critic NN are directly updated online instead of offline training. By utilizing Lyapunov's direct method, the closed-loop tracking errors and the NN estimated weights are demonstrated to be uniformly ultimately bounded. Two numerical examples are provided to show the effectiveness of the present approach. Copyright © 2014 Elsevier Ltd. All rights reserved.

  9. Embedded Incremental Feature Selection for Reinforcement Learning

    Science.gov (United States)

    2012-05-01

    Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature

  10. Working Memory and Reinforcement Schedule Jointly Determine Reinforcement Learning in Children: Potential Implications for Behavioral Parent Training

    Directory of Open Access Journals (Sweden)

    Elien Segers

    2018-03-01

    Full Text Available Introduction: Behavioral Parent Training (BPT is often provided for childhood psychiatric disorders. These disorders have been shown to be associated with working memory impairments. BPT is based on operant learning principles, yet how operant principles shape behavior (through the partial reinforcement (PRF extinction effect, i.e., greater resistance to extinction that is created when behavior is reinforced partially rather than continuously and the potential role of working memory therein is scarcely studied in children. This study explored the PRF extinction effect and the role of working memory therein using experimental tasks in typically developing children.Methods: Ninety-seven children (age 6–10 completed a working memory task and an operant learning task, in which children acquired a response-sequence rule under either continuous or PRF (120 trials, followed by an extinction phase (80 trials. Data of 88 children were used for analysis.Results: The PRF extinction effect was confirmed: We observed slower acquisition and extinction in the PRF condition as compared to the continuous reinforcement (CRF condition. Working memory was negatively related to acquisition but not extinction performance.Conclusion: Both reinforcement contingencies and working memory relate to acquisition performance. Potential implications for BPT are that decreasing working memory load may enhance the chance of optimally learning through reinforcement.

  11. Reinforcement learning: Solving two case studies

    Science.gov (United States)

    Duarte, Ana Filipa; Silva, Pedro; dos Santos, Cristina Peixoto

    2012-09-01

    Reinforcement Learning algorithms offer interesting features for the control of autonomous systems, such as the ability to learn from direct interaction with the environment, and the use of a simple reward signalas opposed to the input-outputs pairsused in classic supervised learning. The reward signal indicates the success of failure of the actions executed by the agent in the environment. In this work, are described RL algorithmsapplied to two case studies: the Crawler robot and the widely known inverted pendulum. We explore RL capabilities to autonomously learn a basic locomotion pattern in the Crawler, andapproach the balancing problem of biped locomotion using the inverted pendulum.

  12. A real-time standard parts inspection based on deep learning

    Science.gov (United States)

    Xu, Kuan; Li, XuDong; Jiang, Hongzhi; Zhao, Huijie

    2017-10-01

    Since standard parts are necessary components in mechanical structure like bogie and connector. These mechanical structures will be shattered or loosen if standard parts are lost. So real-time standard parts inspection systems are essential to guarantee their safety. Researchers would like to take inspection systems based on deep learning because it works well in image with complex backgrounds which is common in standard parts inspection situation. A typical inspection detection system contains two basic components: feature extractors and object classifiers. For the object classifier, Region Proposal Network (RPN) is one of the most essential architectures in most state-of-art object detection systems. However, in the basic RPN architecture, the proposals of Region of Interest (ROI) have fixed sizes (9 anchors for each pixel), they are effective but they waste much computing resources and time. In standard parts detection situations, standard parts have given size, thus we can manually choose sizes of anchors based on the ground-truths through machine learning. The experiments prove that we could use 2 anchors to achieve almost the same accuracy and recall rate. Basically, our standard parts detection system could reach 15fps on NVIDIA GTX1080 (GPU), while achieving detection accuracy 90.01% mAP.

  13. Systems control with generalized probabilistic fuzzy-reinforcement learning

    NARCIS (Netherlands)

    Hinojosa, J.; Nefti, S.; Kaymak, U.

    2011-01-01

    Reinforcement learning (RL) is a valuable learning method when the systems require a selection of control actions whose consequences emerge over long periods for which input-output data are not available. In most combinations of fuzzy systems and RL, the environment is considered to be

  14. Manufacturing Scheduling Using Colored Petri Nets and Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Maria Drakaki

    2017-02-01

    Full Text Available Agent-based intelligent manufacturing control systems are capable to efficiently respond and adapt to environmental changes. Manufacturing system adaptation and evolution can be addressed with learning mechanisms that increase the intelligence of agents. In this paper a manufacturing scheduling method is presented based on Timed Colored Petri Nets (CTPNs and reinforcement learning (RL. CTPNs model the manufacturing system and implement the scheduling. In the search for an optimal solution a scheduling agent uses RL and in particular the Q-learning algorithm. A warehouse order-picking scheduling is presented as a case study to illustrate the method. The proposed scheduling method is compared to existing methods. Simulation and state space results are used to evaluate performance and identify system properties.

  15. Efficient abstraction selection in reinforcement learning

    NARCIS (Netherlands)

    Seijen, H. van; Whiteson, S.; Kester, L.

    2013-01-01

    This paper introduces a novel approach for abstraction selection in reinforcement learning problems modelled as factored Markov decision processes (MDPs), for which a state is described via a set of state components. In abstraction selection, an agent must choose an abstraction from a set of

  16. Reduction Methods for Real-time Simulations in Hybrid Testing

    DEFF Research Database (Denmark)

    Andersen, Sebastian

    2016-01-01

    Hybrid testing constitutes a cost-effective experimental full scale testing method. The method was introduced in the 1960's by Japanese researchers, as an alternative to conventional full scale testing and small scale material testing, such as shake table tests. The principle of the method...... is performed on a glass fibre reinforced polymer composite box girder. The test serves as a pilot test for prospective real-time tests on a wind turbine blade. The Taylor basis is implemented in the test, used to perform the numerical simulations. Despite of a number of introduced errors in the real...... is to divide a structure into a physical substructure and a numerical substructure, and couple these in a test. If the test is conducted in real-time it is referred to as real time hybrid testing. The hybrid testing concept has developed significantly since its introduction in the 1960', both with respect...

  17. Neural Control of a Tracking Task via Attention-Gated Reinforcement Learning for Brain-Machine Interfaces.

    Science.gov (United States)

    Wang, Yiwen; Wang, Fang; Xu, Kai; Zhang, Qiaosheng; Zhang, Shaomin; Zheng, Xiaoxiang

    2015-05-01

    Reinforcement learning (RL)-based brain machine interfaces (BMIs) enable the user to learn from the environment through interactions to complete the task without desired signals, which is promising for clinical applications. Previous studies exploited Q-learning techniques to discriminate neural states into simple directional actions providing the trial initial timing. However, the movements in BMI applications can be quite complicated, and the action timing explicitly shows the intention when to move. The rich actions and the corresponding neural states form a large state-action space, imposing generalization difficulty on Q-learning. In this paper, we propose to adopt attention-gated reinforcement learning (AGREL) as a new learning scheme for BMIs to adaptively decode high-dimensional neural activities into seven distinct movements (directional moves, holdings and resting) due to the efficient weight-updating. We apply AGREL on neural data recorded from M1 of a monkey to directly predict a seven-action set in a time sequence to reconstruct the trajectory of a center-out task. Compared to Q-learning techniques, AGREL could improve the target acquisition rate to 90.16% in average with faster convergence and more stability to follow neural activity over multiple days, indicating the potential to achieve better online decoding performance for more complicated BMI tasks.

  18. Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play

    NARCIS (Netherlands)

    van der Ree, Michiel; Wiering, Marco

    2013-01-01

    This paper compares three strategies in using reinforcement learning algorithms to let an artificial agent learnto play the game of Othello. The three strategies that are compared are: Learning by self-play, learning from playing against a fixed opponent, and learning from playing against a fixed

  19. Feedback as Real-Time Constructions

    Science.gov (United States)

    Keiding, Tina Bering; Qvortrup, Ane

    2014-01-01

    This article offers a re-description of feedback and the significance of time in feedback constructions based on systems theory. It describes feedback as internal, real-time constructions in a learning system. From this perspective, feedback is neither immediate nor delayed, but occurs in the very moment it takes place. This article argues for a…

  20. Application Of Reinforcement Learning In Heading Control Of A Fixed Wing UAV Using X-Plane Platform

    Directory of Open Access Journals (Sweden)

    Kimathi

    2017-02-01

    Full Text Available Heading control of an Unmanned Aerial Vehicle UAV is a vital operation of an autopilot system. It is executed by employing a design of control algorithms that control its direction and navigation. Most commonly available autopilots exploit Proportional-Integral-Derivative PID based heading controllers. In this paper we propose an online adaptive reinforcement learning heading controller. The autopilot heading controller will be designed in MatlabSimulink for controlling a UAV in X-Plane test platform. Through this platform the performance of the controller is shown using real time simulations. The performance of this controller is compared to that of a PID controller. The results show that the proposed method performs better than a well tuned PID controller.

  1. Safe Exploration of State and Action Spaces in Reinforcement Learning

    OpenAIRE

    Garcia, Javier; Fernandez, Fernando

    2014-01-01

    In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some sta...

  2. Adversarial Reinforcement Learning in a Cyber Security Simulation}

    OpenAIRE

    Elderman, Richard; Pater, Leon; Thie, Albert; Drugan, Madalina; Wiering, Marco

    2017-01-01

    This paper focuses on cyber-security simulations in networks modeled as a Markov game with incomplete information and stochastic elements. The resulting game is an adversarial sequential decision making problem played with two agents, the attacker and defender. The two agents pit one reinforcement learning technique, like neural networks, Monte Carlo learning and Q-learning, against each other and examine their effectiveness against learning opponents. The results showed that Monte Carlo lear...

  3. Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing

    NARCIS (Netherlands)

    Le, M.N.; Fokkens, A.S.

    Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this paper, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error

  4. A multiplicative reinforcement learning model capturing learning dynamics and interindividual variability in mice

    OpenAIRE

    Bathellier, Brice; Tee, Sui Poh; Hrovat, Christina; Rumpel, Simon

    2013-01-01

    Learning speed can strongly differ across individuals. This is seen in humans and animals. Here, we measured learning speed in mice performing a discrimination task and developed a theoretical model based on the reinforcement learning framework to account for differences between individual mice. We found that, when using a multiplicative learning rule, the starting connectivity values of the model strongly determine the shape of learning curves. This is in contrast to current learning models ...

  5. Novel real-time tumor-contouring method using deep learning to prevent mistracking in X-ray fluoroscopy.

    Science.gov (United States)

    Terunuma, Toshiyuki; Tokui, Aoi; Sakae, Takeji

    2018-03-01

    Robustness to obstacles is the most important factor necessary to achieve accurate tumor tracking without fiducial markers. Some high-density structures, such as bone, are enhanced on X-ray fluoroscopic images, which cause tumor mistracking. Tumor tracking should be performed by controlling "importance recognition": the understanding that soft-tissue is an important tracking feature and bone structure is unimportant. We propose a new real-time tumor-contouring method that uses deep learning with importance recognition control. The novelty of the proposed method is the combination of the devised random overlay method and supervised deep learning to induce the recognition of structures in tumor contouring as important or unimportant. This method can be used for tumor contouring because it uses deep learning to perform image segmentation. Our results from a simulated fluoroscopy model showed accurate tracking of a low-visibility tumor with an error of approximately 1 mm, even if enhanced bone structure acted as an obstacle. A high similarity of approximately 0.95 on the Jaccard index was observed between the segmented and ground truth tumor regions. A short processing time of 25 ms was achieved. The results of this simulated fluoroscopy model support the feasibility of robust real-time tumor contouring with fluoroscopy. Further studies using clinical fluoroscopy are highly anticipated.

  6. A real-time articulatory visual feedback approach with target presentation for second language pronunciation learning.

    Science.gov (United States)

    Suemitsu, Atsuo; Dang, Jianwu; Ito, Takayuki; Tiede, Mark

    2015-10-01

    Articulatory information can support learning or remediating pronunciation of a second language (L2). This paper describes an electromagnetic articulometer-based visual-feedback approach using an articulatory target presented in real-time to facilitate L2 pronunciation learning. This approach trains learners to adjust articulatory positions to match targets for a L2 vowel estimated from productions of vowels that overlap in both L1 and L2. Training of Japanese learners for the American English vowel /æ/ that included visual training improved its pronunciation regardless of whether audio training was also included. Articulatory visual feedback is shown to be an effective method for facilitating L2 pronunciation learning.

  7. Enhancement of Online Robotics Learning Using Real-Time 3D Visualization Technology

    Directory of Open Access Journals (Sweden)

    Richard Chiou

    2010-06-01

    Full Text Available This paper discusses a real-time e-Lab Learning system based on the integration of 3D visualization technology with a remote robotic laboratory. With the emergence and development of the Internet field, online learning is proving to play a significant role in the upcoming era. In an effort to enhance Internet-based learning of robotics and keep up with the rapid progression of technology, a 3- Dimensional scheme of viewing the robotic laboratory has been introduced in addition to the remote controlling of the robots. The uniqueness of the project lies in making this process Internet-based, and remote robot operated and visualized in 3D. This 3D system approach provides the students with a more realistic feel of the 3D robotic laboratory even though they are working remotely. As a result, the 3D visualization technology has been tested as part of a laboratory in the MET 205 Robotics and Mechatronics class and has received positive feedback by most of the students. This type of research has introduced a new level of realism and visual communications to online laboratory learning in a remote classroom.

  8. Reinforcement Learning for a New Piano Mover

    Directory of Open Access Journals (Sweden)

    Yuko Ishiwaka

    2005-08-01

    Full Text Available We attempt to achieve corporative behavior of autonomous decentralized agents constructed via Q-Learning, which is a type of reinforcement learning. As such, in the present paper, we examine the piano mover's problem. We propose a multi-agent architecture that has a training agent, learning agents and intermediate agent. Learning agents are heterogeneous and can communicate with each other. The movement of an object with three kinds of agent depends on the composition of the actions of the learning agents. By learning its own shape through the learning agents, avoidance of obstacles by the object is expected. We simulate the proposed method in a two-dimensional continuous world. Results obtained in the present investigation reveal the effectiveness of the proposed method.

  9. Place preference and vocal learning rely on distinct reinforcers in songbirds.

    Science.gov (United States)

    Murdoch, Don; Chen, Ruidong; Goldberg, Jesse H

    2018-04-30

    In reinforcement learning (RL) agents are typically tasked with maximizing a single objective function such as reward. But it remains poorly understood how agents might pursue distinct objectives at once. In machines, multiobjective RL can be achieved by dividing a single agent into multiple sub-agents, each of which is shaped by agent-specific reinforcement, but it remains unknown if animals adopt this strategy. Here we use songbirds to test if navigation and singing, two behaviors with distinct objectives, can be differentially reinforced. We demonstrate that strobe flashes aversively condition place preference but not song syllables. Brief noise bursts aversively condition song syllables but positively reinforce place preference. Thus distinct behavior-generating systems, or agencies, within a single animal can be shaped by correspondingly distinct reinforcement signals. Our findings suggest that spatially segregated vocal circuits can solve a credit assignment problem associated with multiobjective learning.

  10. Sex Differences in Reinforcement and Punishment on Prime-Time Television.

    Science.gov (United States)

    Downs, A. Chris; Gowan, Darryl C.

    1980-01-01

    Television programs were analyzed for frequencies of positive reinforcement and punishment exchanged among performers varying in age and sex. Females were found to more often exhibit and receive reinforcement, whereas males more often exhibited and received punishment. These findings have implications for children's learning of positive and…

  11. A reward optimization method based on action subrewards in hierarchical reinforcement learning.

    Science.gov (United States)

    Fu, Yuchen; Liu, Quan; Ling, Xionghong; Cui, Zhiming

    2014-01-01

    Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

  12. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning

    Directory of Open Access Journals (Sweden)

    Yuntian Feng

    2017-01-01

    Full Text Available We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  13. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning.

    Science.gov (United States)

    Feng, Yuntian; Zhang, Hongjun; Hao, Wenning; Chen, Gang

    2017-01-01

    We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q -Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  14. An analysis of intergroup rivalry using Ising model and reinforcement learning

    Science.gov (United States)

    Zhao, Feng-Fei; Qin, Zheng; Shao, Zhuo

    2014-01-01

    Modeling of intergroup rivalry can help us better understand economic competitions, political elections and other similar activities. The result of intergroup rivalry depends on the co-evolution of individual behavior within one group and the impact from the rival group. In this paper, we model the rivalry behavior using Ising model. Different from other simulation studies using Ising model, the evolution rules of each individual in our model are not static, but have the ability to learn from historical experience using reinforcement learning technique, which makes the simulation more close to real human behavior. We studied the phase transition in intergroup rivalry and focused on the impact of the degree of social freedom, the personality of group members and the social experience of individuals. The results of computer simulation show that a society with a low degree of social freedom and highly educated, experienced individuals is more likely to be one-sided in intergroup rivalry.

  15. Pleasurable music affects reinforcement learning according to the listener

    Science.gov (United States)

    Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira

    2013-01-01

    Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875

  16. Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints

    Science.gov (United States)

    Yang, Xiong; Liu, Derong; Wang, Ding

    2014-03-01

    In this paper, an adaptive reinforcement learning-based solution is developed for the infinite-horizon optimal control problem of constrained-input continuous-time nonlinear systems in the presence of nonlinearities with unknown structures. Two different types of neural networks (NNs) are employed to approximate the Hamilton-Jacobi-Bellman equation. That is, an recurrent NN is constructed to identify the unknown dynamical system, and two feedforward NNs are used as the actor and the critic to approximate the optimal control and the optimal cost, respectively. Based on this framework, the action NN and the critic NN are tuned simultaneously, without the requirement for the knowledge of system drift dynamics. Moreover, by using Lyapunov's direct method, the weights of the action NN and the critic NN are guaranteed to be uniformly ultimately bounded, while keeping the closed-loop system stable. To demonstrate the effectiveness of the present approach, simulation results are illustrated.

  17. Amygdala and ventral striatum make distinct contributions to reinforcement learning

    Science.gov (United States)

    Costa, Vincent D.; Monte, Olga Dal; Lucas, Daniel R.; Murray, Elisabeth A.; Averbeck, Bruno B.

    2016-01-01

    Summary Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with a RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys’ choice reaction times, which emphasized a speed-accuracy tradeoff that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL. PMID:27720488

  18. Deep reinforcement learning for automated radiation adaptation in lung cancer.

    Science.gov (United States)

    Tseng, Huan-Hsin; Luo, Yi; Cui, Sunan; Chien, Jen-Tzung; Ten Haken, Randall K; Naqa, Issam El

    2017-12-01

    To investigate deep reinforcement learning (DRL) based on historical treatment plans for developing automated radiation adaptation protocols for nonsmall cell lung cancer (NSCLC) patients that aim to maximize tumor local control at reduced rates of radiation pneumonitis grade 2 (RP2). In a retrospective population of 114 NSCLC patients who received radiotherapy, a three-component neural networks framework was developed for deep reinforcement learning (DRL) of dose fractionation adaptation. Large-scale patient characteristics included clinical, genetic, and imaging radiomics features in addition to tumor and lung dosimetric variables. First, a generative adversarial network (GAN) was employed to learn patient population characteristics necessary for DRL training from a relatively limited sample size. Second, a radiotherapy artificial environment (RAE) was reconstructed by a deep neural network (DNN) utilizing both original and synthetic data (by GAN) to estimate the transition probabilities for adaptation of personalized radiotherapy patients' treatment courses. Third, a deep Q-network (DQN) was applied to the RAE for choosing the optimal dose in a response-adapted treatment setting. This multicomponent reinforcement learning approach was benchmarked against real clinical decisions that were applied in an adaptive dose escalation clinical protocol. In which, 34 patients were treated based on avid PET signal in the tumor and constrained by a 17.2% normal tissue complication probability (NTCP) limit for RP2. The uncomplicated cure probability (P+) was used as a baseline reward function in the DRL. Taking our adaptive dose escalation protocol as a blueprint for the proposed DRL (GAN + RAE + DQN) architecture, we obtained an automated dose adaptation estimate for use at ∼2/3 of the way into the radiotherapy treatment course. By letting the DQN component freely control the estimated adaptive dose per fraction (ranging from 1-5 Gy), the DRL automatically favored dose

  19. Reinforcement learning solution for HJB equation arising in constrained optimal control problem.

    Science.gov (United States)

    Luo, Biao; Wu, Huai-Ning; Huang, Tingwen; Liu, Derong

    2015-11-01

    The constrained optimal control problem depends on the solution of the complicated Hamilton-Jacobi-Bellman equation (HJBE). In this paper, a data-based off-policy reinforcement learning (RL) method is proposed, which learns the solution of the HJBE and the optimal control policy from real system data. One important feature of the off-policy RL is that its policy evaluation can be realized with data generated by other behavior policies, not necessarily the target policy, which solves the insufficient exploration problem. The convergence of the off-policy RL is proved by demonstrating its equivalence to the successive approximation approach. Its implementation procedure is based on the actor-critic neural networks structure, where the function approximation is conducted with linearly independent basis functions. Subsequently, the convergence of the implementation procedure with function approximation is also proved. Finally, its effectiveness is verified through computer simulations. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning

    National Research Council Canada - National Science Library

    Bowling, Michael

    2000-01-01

    .... In this paper we contribute a comprehensive presentation of the relevant techniques for solving stochastic games from both the game theory community and reinforcement learning communities. We examine the assumptions and limitations of these algorithms, and identify similarities between these algorithms, single agent reinforcement learners, and basic game theory techniques.

  1. Towards autonomous neuroprosthetic control using Hebbian reinforcement learning.

    Science.gov (United States)

    Mahmoudi, Babak; Pohlmeyer, Eric A; Prins, Noeline W; Geng, Shijia; Sanchez, Justin C

    2013-12-01

    Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.

  2. Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement.

    Science.gov (United States)

    Fernández, Thalía; Bosch-Bayard, Jorge; Harmony, Thalía; Caballero, María I; Díaz-Comas, Lourdes; Galán, Lídice; Ricardo-Garcell, Josefina; Aubert, Eduardo; Otero-Ojeda, Gloria

    2016-03-01

    Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer.

  3. Intranasal oxytocin enhances socially-reinforced learning in rhesus monkeys

    Directory of Open Access Journals (Sweden)

    Lisa A Parr

    2014-09-01

    Full Text Available There are currently no drugs approved for the treatment of social deficits associated with autism spectrum disorders (ASD. One hypothesis for these deficits is that individuals with ASD lack the motivation to attend to social cues because those cues are not implicitly rewarding. Therefore, any drug that could enhance the rewarding quality of social stimuli could have a profound impact on the treatment of ASD, and other social disorders. Oxytocin (OT is a neuropeptide that has been effective in enhancing social cognition and social reward in humans. The present study examined the ability of OT to selectively enhance learning after social compared to nonsocial reward in rhesus monkeys, an important species for modeling the neurobiology of social behavior in humans. Monkeys were required to learn an implicit visual matching task after receiving either intranasal (IN OT or Placebo (saline. Correct trials were rewarded with the presentation of positive and negative social (play faces/threat faces or nonsocial (banana/cage locks stimuli, plus food. Incorrect trials were not rewarded. Results demonstrated a strong effect of socially-reinforced learning, monkeys’ performed significantly better when reinforced with social versus nonsocial stimuli. Additionally, socially-reinforced learning was significantly better and occurred faster after IN-OT compared to placebo treatment. Performance in the IN-OT, but not Placebo, condition was also significantly better when the reinforcement stimuli were emotionally positive compared to negative facial expressions. These data support the hypothesis that OT may function to enhance prosocial behavior in primates by increasing the rewarding quality of emotionally positive, social compared to emotionally negative or nonsocial images. These data also support the use of the rhesus monkey as a model for exploring the neurobiological basis of social behavior and its impairment.

  4. applying reinforcement learning to the weapon assignment problem

    African Journals Online (AJOL)

    ismith

    Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy ..... closest to the threat should fire (that weapon also had the highest probability to ... Monte Carlo ..... “Reinforcement learning: Theory, methods and application to.

  5. Reinforcement Learning for Ramp Control: An Analysis of Learning Parameters

    Directory of Open Access Journals (Sweden)

    Chao Lu

    2016-08-01

    Full Text Available Reinforcement Learning (RL has been proposed to deal with ramp control problems under dynamic traffic conditions; however, there is a lack of sufficient research on the behaviour and impacts of different learning parameters. This paper describes a ramp control agent based on the RL mechanism and thoroughly analyzed the influence of three learning parameters; namely, learning rate, discount rate and action selection parameter on the algorithm performance. Two indices for the learning speed and convergence stability were used to measure the algorithm performance, based on which a series of simulation-based experiments were designed and conducted by using a macroscopic traffic flow model. Simulation results showed that, compared with the discount rate, the learning rate and action selection parameter made more remarkable impacts on the algorithm performance. Based on the analysis, some suggestionsabout how to select suitable parameter values that can achieve a superior performance were provided.

  6. Applying reinforcement learning to the weapon assignment problem in air defence

    CSIR Research Space (South Africa)

    Mouton, H

    2011-12-01

    Full Text Available . The techniques investigated in this article were two methods from the machine-learning subfield of reinforcement learning (RL), namely a Monte Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy temporal-difference (TD) learning...

  7. The combination of appetitive and aversive reinforcers and the nature of their interaction during auditory learning.

    Science.gov (United States)

    Ilango, A; Wetzel, W; Scheich, H; Ohl, F W

    2010-03-31

    Learned changes in behavior can be elicited by either appetitive or aversive reinforcers. It is, however, not clear whether the two types of motivation, (approaching appetitive stimuli and avoiding aversive stimuli) drive learning in the same or different ways, nor is their interaction understood in situations where the two types are combined in a single experiment. To investigate this question we have developed a novel learning paradigm for Mongolian gerbils, which not only allows rewards and punishments to be presented in isolation or in combination with each other, but also can use these opposite reinforcers to drive the same learned behavior. Specifically, we studied learning of tone-conditioned hurdle crossing in a shuttle box driven by either an appetitive reinforcer (brain stimulation reward) or an aversive reinforcer (electrical footshock), or by a combination of both. Combination of the two reinforcers potentiated speed of acquisition, led to maximum possible performance, and delayed extinction as compared to either reinforcer alone. Additional experiments, using partial reinforcement protocols and experiments in which one of the reinforcers was omitted after the animals had been previously trained with the combination of both reinforcers, indicated that appetitive and aversive reinforcers operated together but acted in different ways: in this particular experimental context, punishment appeared to be more effective for initial acquisition and reward more effective to maintain a high level of conditioned responses (CRs). The results imply that learning mechanisms in problem solving were maximally effective when the initial punishment of mistakes was combined with the subsequent rewarding of correct performance. Copyright 2010 IBRO. Published by Elsevier Ltd. All rights reserved.

  8. Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer.

    Science.gov (United States)

    Hu, Yujing; Gao, Yang; An, Bo

    2015-07-01

    An important approach in multiagent reinforcement learning (MARL) is equilibrium-based MARL, which adopts equilibrium solution concepts in game theory and requires agents to play equilibrium strategies at each state. However, most existing equilibrium-based MARL algorithms cannot scale due to a large number of computationally expensive equilibrium computations (e.g., computing Nash equilibria is PPAD-hard) during learning. For the first time, this paper finds that during the learning process of equilibrium-based MARL, the one-shot games corresponding to each state's successive visits often have the same or similar equilibria (for some states more than 90% of games corresponding to successive visits have similar equilibria). Inspired by this observation, this paper proposes to use equilibrium transfer to accelerate equilibrium-based MARL. The key idea of equilibrium transfer is to reuse previously computed equilibria when each agent has a small incentive to deviate. By introducing transfer loss and transfer condition, a novel framework called equilibrium transfer-based MARL is proposed. We prove that although equilibrium transfer brings transfer loss, equilibrium-based MARL algorithms can still converge to an equilibrium policy under certain assumptions. Experimental results in widely used benchmarks (e.g., grid world game, soccer game, and wall game) show that the proposed framework: 1) not only significantly accelerates equilibrium-based MARL (up to 96.7% reduction in learning time), but also achieves higher average rewards than algorithms without equilibrium transfer and 2) scales significantly better than algorithms without equilibrium transfer when the state/action space grows and the number of agents increases.

  9. Real-time systems

    OpenAIRE

    Badr, Salah M.; Bruztman, Donald P.; Nelson, Michael L.; Byrnes, Ronald Benton

    1992-01-01

    This paper presents an introduction to the basic issues involved in real-time systems. Both real-time operating sys and real-time programming languages are explored. Concurrent programming and process synchronization and communication are also discussed. The real-time requirements of the Naval Postgraduate School Autonomous Under Vehicle (AUV) are then examined. Autonomous underwater vehicle (AUV), hard real-time system, real-time operating system, real-time programming language, real-time sy...

  10. Reinforcement active learning in the vibrissae system: optimal object localization.

    Science.gov (United States)

    Gordon, Goren; Dorfman, Nimrod; Ahissar, Ehud

    2013-01-01

    Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment. Copyright © 2012 Elsevier Ltd. All rights reserved.

  11. Neural correlates of reinforcement learning and social preferences in competitive bidding.

    Science.gov (United States)

    van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M

    2013-01-30

    In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.

  12. Road Artery Traffic Light Optimization with Use of the Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Rok Marsetič

    2014-04-01

    Full Text Available The basic principle of optimal traffic control is the appropriate real-time response to dynamic traffic flow changes. Signal plan efficiency depends on a large number of input parameters. An actuated signal system can adjust very well to traffic conditions, but cannot fully adjust to stochastic traffic volume oscillation. Due to the complexity of the problem analytical methods are not applicable for use in real time, therefore the purpose of this paper is to introduce heuristic method suitable for traffic light optimization in real time. With the evolution of artificial intelligence new possibilities for solving complex problems have been introduced. The goal of this paper is to demonstrate that the use of the Q learning algorithm for traffic lights optimization is suitable. The Q learning algorithm was verified on a road artery with three intersections. For estimation of the effectiveness and efficiency of the proposed algorithm comparison with an actuated signal plan was carried out. The results (average delay per vehicle and the number of vehicles that left road network show that Q learning algorithm outperforms the actuated signal controllers. The proposed algorithm converges to the minimal delay per vehicle regardless of the stochastic nature of traffic. In this research the impact of the model parameters (learning rate, exploration rate, influence of communication between agents and reward type on algorithm effectiveness were analysed as well.

  13. Vicarious reinforcement learning signals when instructing others.

    Science.gov (United States)

    Apps, Matthew A J; Lesage, Elise; Ramnani, Narender

    2015-02-18

    Reinforcement learning (RL) theory posits that learning is driven by discrepancies between the predicted and actual outcomes of actions (prediction errors [PEs]). In social environments, learning is often guided by similar RL mechanisms. For example, teachers monitor the actions of students and provide feedback to them. This feedback evokes PEs in students that guide their learning. We report the first study that investigates the neural mechanisms that underpin RL signals in the brain of a teacher. Neurons in the anterior cingulate cortex (ACC) signal PEs when learning from the outcomes of one's own actions but also signal information when outcomes are received by others. Does a teacher's ACC signal PEs when monitoring a student's learning? Using fMRI, we studied brain activity in human subjects (teachers) as they taught a confederate (student) action-outcome associations by providing positive or negative feedback. We examined activity time-locked to the students' responses, when teachers infer student predictions and know actual outcomes. We fitted a RL-based computational model to the behavior of the student to characterize their learning, and examined whether a teacher's ACC signals when a student's predictions are wrong. In line with our hypothesis, activity in the teacher's ACC covaried with the PE values in the model. Additionally, activity in the teacher's insula and ventromedial prefrontal cortex covaried with the predicted value according to the student. Our findings highlight that the ACC signals PEs vicariously for others' erroneous predictions, when monitoring and instructing their learning. These results suggest that RL mechanisms, processed vicariously, may underpin and facilitate teaching behaviors. Copyright © 2015 Apps et al.

  14. Bi-directional effect of increasing doses of baclofen on reinforcement learning

    Directory of Open Access Journals (Sweden)

    Jean eTerrier

    2011-07-01

    Full Text Available In rodents as well as in humans, efficient reinforcement learning depends on dopamine (DA released from ventral tegmental area (VTA neurons. It has been shown that in brain slices of mice, GABAB-receptor agonists at low concentrations increase the firing frequency of VTA-DA neurons, while high concentrations reduce the firing frequency. It remains however elusive whether baclofen can modulate reinforcement learning. Here, in a double blind study in 34 healthy human volunteers, we tested the effects of a low and a high concentration of oral baclofen in a gambling task associated with monetary reward. A low (20 mg dose of baclofen increased the efficiency of reward-associated learning but had no effect on the avoidance of monetary loss. A high (50 mg dose of baclofen on the other hand did not affect the learning curve. At the end of the task, subjects who received 20 mg baclofen p.o. were more accurate in choosing the symbol linked to the highest probability of earning money compared to the control group (89.55±1.39% vs 81.07±1.55%, p=0.002. Our results support a model where baclofen, at low concentrations, causes a disinhibition of DA neurons, increases DA levels and thus facilitates reinforcement learning.

  15. Traffic light control by multiagent reinforcement learning systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.; Groen, F.C.A.; Babuška, R.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  16. Traffic Light Control by Multiagent Reinforcement Learning Systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.J.H.M.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  17. Effects of partial reinforcement and time between reinforced trials on terminal response rate in pigeon autoshaping.

    Science.gov (United States)

    Gottlieb, Daniel A

    2006-03-01

    Partial reinforcement often leads to asymptotically higher rates of responding and number of trials with a response than does continuous reinforcement in pigeon autoshaping. However, comparisons typically involve a partial reinforcement schedule that differs from the continuous reinforcement schedule in both time between reinforced trials and probability of reinforcement. Two experiments examined the relative contributions of these two manipulations to asymptotic response rate. Results suggest that the greater responding previously seen with partial reinforcement is primarily due to differential probability of reinforcement and not differential time between reinforced trials. Further, once established, differences in responding are resistant to a change in stimulus and contingency. Secondary response theories of autoshaped responding (theories that posit additional response-augmenting or response-attenuating mechanisms specific to partial or continuous reinforcement) cannot fully accommodate the current body of data. It is suggested that researchers who study pigeon autoshaping train animals on a common task prior to training them under different conditions.

  18. A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control

    Science.gov (United States)

    Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu

    This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.

  19. The multiple reals of workplace learning

    Directory of Open Access Journals (Sweden)

    Kerry Harman

    2014-04-01

    Full Text Available The multiple reals of workplace learning are explored in this paper. Drawing on a Foucauldian conceptualisation of power as distributed, relational and productive, networks that work to produce particular objects and subjects as seemingly natural and real are examined. This approach enables different reals of workplace learning to be traced. Data from a collaborative industry-university research project is used to illustrate the approach, with a focus on the intersecting practices of a group of professional developers and a group of workplace learning researchers. The notion of multiple reals holds promise for research on workplace learning as it moves beyond a view of reality as fixed and singular to a notion of reality as performed in and through a diversity of practices, including the practices of workplace learning researchers.

  20. Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex.

    Directory of Open Access Journals (Sweden)

    George L Chadderdon

    Full Text Available Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint "forearm" to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1, no learning (0, or punishment (-1, corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.

  1. Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex.

    Science.gov (United States)

    Chadderdon, George L; Neymotin, Samuel A; Kerr, Cliff C; Lytton, William W

    2012-01-01

    Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint "forearm" to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1), no learning (0), or punishment (-1), corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.

  2. Perceptual learning rules based on reinforcers and attention

    NARCIS (Netherlands)

    Roelfsema, Pieter R.; van Ooyen, Arjen; Watanabe, Takeo

    2010-01-01

    How does the brain learn those visual features that are relevant for behavior? In this article, we focus on two factors that guide plasticity of visual representations. First, reinforcers cause the global release of diffusive neuromodulatory signals that gate plasticity. Second, attentional feedback

  3. Optimizing microstimulation using a reinforcement learning framework.

    Science.gov (United States)

    Brockmeier, Austin J; Choi, John S; Distasio, Marcello M; Francis, Joseph T; Príncipe, José C

    2011-01-01

    The ability to provide sensory feedback is desired to enhance the functionality of neuroprosthetics. Somatosensory feedback provides closed-loop control to the motor system, which is lacking in feedforward neuroprosthetics. In the case of existing somatosensory function, a template of the natural response can be used as a template of desired response elicited by electrical microstimulation. In the case of no initial training data, microstimulation parameters that produce responses close to the template must be selected in an online manner. We propose using reinforcement learning as a framework to balance the exploration of the parameter space and the continued selection of promising parameters for further stimulation. This approach avoids an explicit model of the neural response from stimulation. We explore a preliminary architecture--treating the task as a k-armed bandit--using offline data recorded for natural touch and thalamic microstimulation, and we examine the methods efficiency in exploring the parameter space while concentrating on promising parameter forms. The best matching stimulation parameters, from k = 68 different forms, are selected by the reinforcement learning algorithm consistently after 334 realizations.

  4. Real Time Strategy Language

    OpenAIRE

    Hayes, Roy; Beling, Peter; Scherer, William

    2014-01-01

    Real Time Strategy (RTS) games provide complex domain to test the latest artificial intelligence (AI) research. In much of the literature, AI systems have been limited to playing one game. Although, this specialization has resulted in stronger AI gaming systems it does not address the key concerns of AI researcher. AI researchers seek the development of AI agents that can autonomously interpret learn, and apply new knowledge. To achieve human level performance, current AI systems rely on game...

  5. Multi-Objective Reinforcement Learning-Based Deep Neural Networks for Cognitive Space Communications

    Science.gov (United States)

    Ferreria, Paulo Victor R.; Paffenroth, Randy; Wyglinski, Alexander M.; Hackett, Timothy M.; Bilen, Sven G.; Reinhart, Richard C.; Mortensen, Dale J.

    2017-01-01

    Future communication subsystems of space exploration missions can potentially benefit from software-defined radios (SDRs) controlled by machine learning algorithms. In this paper, we propose a novel hybrid radio resource allocation management control algorithm that integrates multi-objective reinforcement learning and deep artificial neural networks. The objective is to efficiently manage communications system resources by monitoring performance functions with common dependent variables that result in conflicting goals. The uncertainty in the performance of thousands of different possible combinations of radio parameters makes the trade-off between exploration and exploitation in reinforcement learning (RL) much more challenging for future critical space-based missions. Thus, the system should spend as little time as possible on exploring actions, and whenever it explores an action, it should perform at acceptable levels most of the time. The proposed approach enables on-line learning by interactions with the environment and restricts poor resource allocation performance through virtual environment exploration. Improvements in the multiobjective performance can be achieved via transmitter parameter adaptation on a packet-basis, with poorly predicted performance promptly resulting in rejected decisions. Simulations presented in this work considered the DVB-S2 standard adaptive transmitter parameters and additional ones expected to be present in future adaptive radio systems. Performance results are provided by analysis of the proposed hybrid algorithm when operating across a satellite communication channel from Earth to GEO orbit during clear sky conditions. The proposed approach constitutes part of the core cognitive engine proof-of-concept to be delivered to the NASA Glenn Research Center SCaN Testbed located onboard the International Space Station.

  6. Narrow Artificial Intelligence with Machine Learning for Real-Time Estimation of a Mobile Agent’s Location Using Hidden Markov Models

    Directory of Open Access Journals (Sweden)

    Cédric Beaulac

    2017-01-01

    Full Text Available We propose to use a supervised machine learning technique to track the location of a mobile agent in real time. Hidden Markov Models are used to build artificial intelligence that estimates the unknown position of a mobile target moving in a defined environment. This narrow artificial intelligence performs two distinct tasks. First, it provides real-time estimation of the mobile agent’s position using the forward algorithm. Second, it uses the Baum–Welch algorithm as a statistical learning tool to gain knowledge of the mobile target. Finally, an experimental environment is proposed, namely, a video game that we use to test our artificial intelligence. We present statistical and graphical results to illustrate the efficiency of our method.

  7. Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments

    OpenAIRE

    Kidziński, Łukasz; Mohanty, Sharada Prasanna; Ong, Carmichael; Huang, Zhewei; Zhou, Shuchang; Pechenko, Anton; Stelmaszczyk, Adam; Jarosik, Piotr; Pavlov, Mikhail; Kolesnikov, Sergey; Plis, Sergey; Chen, Zhibo; Zhang, Zhizheng; Chen, Jiale; Shi, Jun

    2018-01-01

    In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar ...

  8. Real-time embedded systems design principles and engineering practices

    CERN Document Server

    Fan, Xiaocong

    2015-01-01

    This book integrates new ideas and topics from real time systems, embedded systems, and software engineering to give a complete picture of the whole process of developing software for real-time embedded applications. You will not only gain a thorough understanding of concepts related to microprocessors, interrupts, and system boot process, appreciating the importance of real-time modeling and scheduling, but you will also learn software engineering practices such as model documentation, model analysis, design patterns, and standard conformance. This book is split into four parts to help you

  9. A machine learning approach for real-time modelling of tissue deformation in image-guided neurosurgery.

    Science.gov (United States)

    Tonutti, Michele; Gras, Gauthier; Yang, Guang-Zhong

    2017-07-01

    Accurate reconstruction and visualisation of soft tissue deformation in real time is crucial in image-guided surgery, particularly in augmented reality (AR) applications. Current deformation models are characterised by a trade-off between accuracy and computational speed. We propose an approach to derive a patient-specific deformation model for brain pathologies by combining the results of pre-computed finite element method (FEM) simulations with machine learning algorithms. The models can be computed instantaneously and offer an accuracy comparable to FEM models. A brain tumour is used as the subject of the deformation model. Load-driven FEM simulations are performed on a tetrahedral brain mesh afflicted by a tumour. Forces of varying magnitudes, positions, and inclination angles are applied onto the brain's surface. Two machine learning algorithms-artificial neural networks (ANNs) and support vector regression (SVR)-are employed to derive a model that can predict the resulting deformation for each node in the tumour's mesh. The tumour deformation can be predicted in real time given relevant information about the geometry of the anatomy and the load, all of which can be measured instantly during a surgical operation. The models can predict the position of the nodes with errors below 0.3mm, beyond the general threshold of surgical accuracy and suitable for high fidelity AR systems. The SVR models perform better than the ANN's, with positional errors for SVR models reaching under 0.2mm. The results represent an improvement over existing deformation models for real time applications, providing smaller errors and high patient-specificity. The proposed approach addresses the current needs of image-guided surgical systems and has the potential to be employed to model the deformation of any type of soft tissue. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Memory Transformation Enhances Reinforcement Learning in Dynamic Environments.

    Science.gov (United States)

    Santoro, Adam; Frankland, Paul W; Richards, Blake A

    2016-11-30

    Over the course of systems consolidation, there is a switch from a reliance on detailed episodic memories to generalized schematic memories. This switch is sometimes referred to as "memory transformation." Here we demonstrate a previously unappreciated benefit of memory transformation, namely, its ability to enhance reinforcement learning in a dynamic environment. We developed a neural network that is trained to find rewards in a foraging task where reward locations are continuously changing. The network can use memories for specific locations (episodic memories) and statistical patterns of locations (schematic memories) to guide its search. We find that switching from an episodic to a schematic strategy over time leads to enhanced performance due to the tendency for the reward location to be highly correlated with itself in the short-term, but regress to a stable distribution in the long-term. We also show that the statistics of the environment determine the optimal utilization of both types of memory. Our work recasts the theoretical question of why memory transformation occurs, shifting the focus from the avoidance of memory interference toward the enhancement of reinforcement learning across multiple timescales. As time passes, memories transform from a highly detailed state to a more gist-like state, in a process called "memory transformation." Theories of memory transformation speak to its advantages in terms of reducing memory interference, increasing memory robustness, and building models of the environment. However, the role of memory transformation from the perspective of an agent that continuously acts and receives reward in its environment is not well explored. In this work, we demonstrate a view of memory transformation that defines it as a way of optimizing behavior across multiple timescales. Copyright © 2016 the authors 0270-6474/16/3612228-15$15.00/0.

  11. Deep neural networks to enable real-time multimessenger astrophysics

    Science.gov (United States)

    George, Daniel; Huerta, E. A.

    2018-02-01

    Gravitational wave astronomy has set in motion a scientific revolution. To further enhance the science reach of this emergent field of research, there is a pressing need to increase the depth and speed of the algorithms used to enable these ground-breaking discoveries. We introduce Deep Filtering—a new scalable machine learning method for end-to-end time-series signal processing. Deep Filtering is based on deep learning with two deep convolutional neural networks, which are designed for classification and regression, to detect gravitational wave signals in highly noisy time-series data streams and also estimate the parameters of their sources in real time. Acknowledging that some of the most sensitive algorithms for the detection of gravitational waves are based on implementations of matched filtering, and that a matched filter is the optimal linear filter in Gaussian noise, the application of Deep Filtering using whitened signals in Gaussian noise is investigated in this foundational article. The results indicate that Deep Filtering outperforms conventional machine learning techniques, achieves similar performance compared to matched filtering, while being several orders of magnitude faster, allowing real-time signal processing with minimal resources. Furthermore, we demonstrate that Deep Filtering can detect and characterize waveform signals emitted from new classes of eccentric or spin-precessing binary black holes, even when trained with data sets of only quasicircular binary black hole waveforms. The results presented in this article, and the recent use of deep neural networks for the identification of optical transients in telescope data, suggests that deep learning can facilitate real-time searches of gravitational wave sources and their electromagnetic and astroparticle counterparts. In the subsequent article, the framework introduced herein is directly applied to identify and characterize gravitational wave events in real LIGO data.

  12. Challenges in the Verification of Reinforcement Learning Algorithms

    Science.gov (United States)

    Van Wesel, Perry; Goodloe, Alwyn E.

    2017-01-01

    Machine learning (ML) is increasingly being applied to a wide array of domains from search engines to autonomous vehicles. These algorithms, however, are notoriously complex and hard to verify. This work looks at the assumptions underlying machine learning algorithms as well as some of the challenges in trying to verify ML algorithms. Furthermore, we focus on the specific challenges of verifying reinforcement learning algorithms. These are highlighted using a specific example. Ultimately, we do not offer a solution to the complex problem of ML verification, but point out possible approaches for verification and interesting research opportunities.

  13. Real-Time Strategy Video Game Experience and Visual Perceptual Learning.

    Science.gov (United States)

    Kim, Yong-Hwan; Kang, Dong-Wha; Kim, Dongho; Kim, Hye-Jin; Sasaki, Yuka; Watanabe, Takeo

    2015-07-22

    Visual perceptual learning (VPL) is defined as long-term improvement in performance on a visual-perception task after visual experiences or training. Early studies have found that VPL is highly specific for the trained feature and location, suggesting that VPL is associated with changes in the early visual cortex. However, the generality of visual skills enhancement attributable to action video-game experience suggests that VPL can result from improvement in higher cognitive skills. If so, experience in real-time strategy (RTS) video-game play, which may heavily involve cognitive skills, may also facilitate VPL. To test this hypothesis, we compared VPL between RTS video-game players (VGPs) and non-VGPs (NVGPs) and elucidated underlying structural and functional neural mechanisms. Healthy young human subjects underwent six training sessions on a texture discrimination task. Diffusion-tensor and functional magnetic resonance imaging were performed before and after training. VGPs performed better than NVGPs in the early phase of training. White-matter connectivity between the right external capsule and visual cortex and neuronal activity in the right inferior frontal gyrus (IFG) and anterior cingulate cortex (ACC) were greater in VGPs than NVGPs and were significantly correlated with RTS video-game experience. In both VGPs and NVGPs, there was task-related neuronal activity in the right IFG, ACC, and striatum, which was strengthened after training. These results indicate that RTS video-game experience, associated with changes in higher-order cognitive functions and connectivity between visual and cognitive areas, facilitates VPL in early phases of training. The results support the hypothesis that VPL can occur without involvement of only visual areas. Significance statement: Although early studies found that visual perceptual learning (VPL) is associated with involvement of the visual cortex, generality of visual skills enhancement by action video-game experience

  14. What Can Reinforcement Learning Teach Us About Non-Equilibrium Quantum Dynamics

    Science.gov (United States)

    Bukov, Marin; Day, Alexandre; Sels, Dries; Weinberg, Phillip; Polkovnikov, Anatoli; Mehta, Pankaj

    Equilibrium thermodynamics and statistical physics are the building blocks of modern science and technology. Yet, our understanding of thermodynamic processes away from equilibrium is largely missing. In this talk, I will reveal the potential of what artificial intelligence can teach us about the complex behaviour of non-equilibrium systems. Specifically, I will discuss the problem of finding optimal drive protocols to prepare a desired target state in quantum mechanical systems by applying ideas from Reinforcement Learning [one can think of Reinforcement Learning as the study of how an agent (e.g. a robot) can learn and perfect a given policy through interactions with an environment.]. The driving protocols learnt by our agent suggest that the non-equilibrium world features possibilities easily defying intuition based on equilibrium physics.

  15. Social Learning, Reinforcement and Crime: Evidence from Three European Cities

    Science.gov (United States)

    Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina

    2012-01-01

    This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…

  16. Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning.

    Science.gov (United States)

    Mkrtchian, Anahit; Aylward, Jessica; Dayan, Peter; Roiser, Jonathan P; Robinson, Oliver J

    2017-10-01

    Serious and debilitating symptoms of anxiety are the most common mental health problem worldwide, accounting for around 5% of all adult years lived with disability in the developed world. Avoidance behavior-avoiding social situations for fear of embarrassment, for instance-is a core feature of such anxiety. However, as for many other psychiatric symptoms the biological mechanisms underlying avoidance remain unclear. Reinforcement learning models provide formal and testable characterizations of the mechanisms of decision making; here, we examine avoidance in these terms. A total of 101 healthy participants and individuals with mood and anxiety disorders completed an approach-avoidance go/no-go task under stress induced by threat of unpredictable shock. We show an increased reliance in the mood and anxiety group on a parameter of our reinforcement learning model that characterizes a prepotent (pavlovian) bias to withhold responding in the face of negative outcomes. This was particularly the case when the mood and anxiety group was under stress. This formal description of avoidance within the reinforcement learning framework provides a new means of linking clinical symptoms with biophysically plausible models of neural circuitry and, as such, takes us closer to a mechanistic understanding of mood and anxiety disorders. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  17. Reinforcement Learning for Online Control of Evolutionary Algorithms

    NARCIS (Netherlands)

    Eiben, A.; Horvath, Mark; Kowalczyk, Wojtek; Schut, Martijn

    2007-01-01

    The research reported in this paper is concerned with assessing the usefulness of reinforcment learning (RL) for on-line calibration of parameters in evolutionary algorithms (EA). We are running an RL procedure and the EA simultaneously and the RL is changing the EA parameters on-the-fly. We

  18. Video Demo: Deep Reinforcement Learning for Coordination in Traffic Light Control

    NARCIS (Netherlands)

    van der Pol, E.; Oliehoek, F.A.; Bosse, T.; Bredeweg, B.

    2016-01-01

    This video demonstration contrasts two approaches to coordination in traffic light control using reinforcement learning: earlier work, based on a deconstruction of the state space into a linear combination of vehicle states, and our own approach based on the Deep Q-learning algorithm.

  19. Fiber-reinforced concretes with a high fiber volume fraction — a look in future. Can a design determine the fiber amount in concrete in real time in every part of a structure in production?

    Science.gov (United States)

    Tepfers, R.

    2010-09-01

    In near future, when the control of the load-bearing capacity of fiber-only-reinforced concrete members will be safely guaranteed, the deletion of the ordinary continuous steel reinforcing bars might be possible. For the time being, it is difficult to change the fiber amount during the casting with today's techniques. Therefore, the fiber concentration has to be determined by the maximum tensile stress in concrete structural members, resulting in an unnecessary fiber addition in compressed zones. However, if the right amount of fibers could be regulated and added to concrete in real time at the pump outlet, a future vision could be to design and produce a structure by using FEM-controlled equipment. The signals from calculation results could be transmitted to a concrete casting system for addition of a necessary amount of fibers to take care of the actual tensile stresses in the right position in the structure. The casting location could be determined by using a GPS for positioning the pump outlet for targeting the casting location horizontally and a laser vertically. The addition of fibers to concrete at the outlet of a concrete pump and proportioning them there according to the actual needs of the stress situation in a structure, given by a FEM analysis in real time, is a future challenge. The FEM analysis has to be based on material properties of fiber-only-reinforced concrete. This means that the resistance and stiffness of different-strength concrete members with a varying fiber content has to be determined in tests and conveyed to the FEM analysis. The FEM analysis has to be completed before the casting and controlled. Then it can be used as the base for adding a correct amount of fibers to concrete in every part of the structure. Thus, a system for introducing a correct amount of fibers into concrete has to be developed. The fibers have to be added at the outlet of concrete pump. Maybe a system to shotcrete concrete with electronically controlled fiber

  20. Machine-learning-based Brokers for Real-time Classification of the LSST Alert Stream

    Science.gov (United States)

    Narayan, Gautham; Zaidi, Tayeb; Soraisam, Monika D.; Wang, Zhe; Lochner, Michelle; Matheson, Thomas; Saha, Abhijit; Yang, Shuo; Zhao, Zhenge; Kececioglu, John; Scheidegger, Carlos; Snodgrass, Richard T.; Axelrod, Tim; Jenness, Tim; Maier, Robert S.; Ridgway, Stephen T.; Seaman, Robert L.; Evans, Eric Michael; Singh, Navdeep; Taylor, Clark; Toeniskoetter, Jackson; Welch, Eric; Zhu, Songzhe; The ANTARES Collaboration

    2018-05-01

    The unprecedented volume and rate of transient events that will be discovered by the Large Synoptic Survey Telescope (LSST) demand that the astronomical community update its follow-up paradigm. Alert-brokers—automated software system to sift through, characterize, annotate, and prioritize events for follow-up—will be critical tools for managing alert streams in the LSST era. The Arizona-NOAO Temporal Analysis and Response to Events System (ANTARES) is one such broker. In this work, we develop a machine learning pipeline to characterize and classify variable and transient sources only using the available multiband optical photometry. We describe three illustrative stages of the pipeline, serving the three goals of early, intermediate, and retrospective classification of alerts. The first takes the form of variable versus transient categorization, the second a multiclass typing of the combined variable and transient data set, and the third a purity-driven subtyping of a transient class. Although several similar algorithms have proven themselves in simulations, we validate their performance on real observations for the first time. We quantitatively evaluate our pipeline on sparse, unevenly sampled, heteroskedastic data from various existing observational campaigns, and demonstrate very competitive classification performance. We describe our progress toward adapting the pipeline developed in this work into a real-time broker working on live alert streams from time-domain surveys.

  1. Perception-based Co-evolutionary Reinforcement Learning for UAV Sensor Allocation

    National Research Council Canada - National Science Library

    Berenji, Hamid

    2003-01-01

    .... A Perception-based reasoning approach based on co-evolutionary reinforcement learning was developed for jointly addressing sensor allocation on each individual UAV and allocation of a team of UAVs...

  2. Stochastic abstract policies: generalizing knowledge to improve reinforcement learning.

    Science.gov (United States)

    Koga, Marcelo L; Freire, Valdinei; Costa, Anna H R

    2015-01-01

    Reinforcement learning (RL) enables an agent to learn behavior by acquiring experience through trial-and-error interactions with a dynamic environment. However, knowledge is usually built from scratch and learning to behave may take a long time. Here, we improve the learning performance by leveraging prior knowledge; that is, the learner shows proper behavior from the beginning of a target task, using the knowledge from a set of known, previously solved, source tasks. In this paper, we argue that building stochastic abstract policies that generalize over past experiences is an effective way to provide such improvement and this generalization outperforms the current practice of using a library of policies. We achieve that contributing with a new algorithm, AbsProb-PI-multiple and a framework for transferring knowledge represented as a stochastic abstract policy in new RL tasks. Stochastic abstract policies offer an effective way to encode knowledge because the abstraction they provide not only generalizes solutions but also facilitates extracting the similarities among tasks. We perform experiments in a robotic navigation environment and analyze the agent's behavior throughout the learning process and also assess the transfer ratio for different amounts of source tasks. We compare our method with the transfer of a library of policies, and experiments show that the use of a generalized policy produces better results by more effectively guiding the agent when learning a target task.

  3. Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems.

    Science.gov (United States)

    Liu, Yan-Jun; Tang, Li; Tong, Shaocheng; Chen, C L Philip; Li, Dong-Juan

    2015-01-01

    Based on the neural network (NN) approximator, an online reinforcement learning algorithm is proposed for a class of affine multiple input and multiple output (MIMO) nonlinear discrete-time systems with unknown functions and disturbances. In the design procedure, two networks are provided where one is an action network to generate an optimal control signal and the other is a critic network to approximate the cost function. An optimal control signal and adaptation laws can be generated based on two NNs. In the previous approaches, the weights of critic and action networks are updated based on the gradient descent rule and the estimations of optimal weight vectors are directly adjusted in the design. Consequently, compared with the existing results, the main contributions of this paper are: 1) only two parameters are needed to be adjusted, and thus the number of the adaptation laws is smaller than the previous results and 2) the updating parameters do not depend on the number of the subsystems for MIMO systems and the tuning rules are replaced by adjusting the norms on optimal weight vectors in both action and critic networks. It is proven that the tracking errors, the adaptation laws, and the control inputs are uniformly bounded using Lyapunov analysis method. The simulation examples are employed to illustrate the effectiveness of the proposed algorithm.

  4. Off-policy reinforcement learning for H∞ control design.

    Science.gov (United States)

    Luo, Biao; Wu, Huai-Ning; Huang, Tingwen

    2015-01-01

    The H∞ control design problem is considered for nonlinear systems with unknown internal system model. It is known that the nonlinear H∞ control problem can be transformed into solving the so-called Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, model-based approaches cannot be used for approximately solving HJI equation, when the accurate system model is unavailable or costly to obtain in practice. To overcome these difficulties, an off-policy reinforcement leaning (RL) method is introduced to learn the solution of HJI equation from real system data instead of mathematical system model, and its convergence is proved. In the off-policy RL method, the system data can be generated with arbitrary policies rather than the evaluating policy, which is extremely important and promising for practical systems. For implementation purpose, a neural network (NN)-based actor-critic structure is employed and a least-square NN weight update algorithm is derived based on the method of weighted residuals. Finally, the developed NN-based off-policy RL method is tested on a linear F16 aircraft plant, and further applied to a rotational/translational actuator system.

  5. A Neuro-Control Design Based on Fuzzy Reinforcement Learning

    DEFF Research Database (Denmark)

    Katebi, S.D.; Blanke, M.

    This paper describes a neuro-control fuzzy critic design procedure based on reinforcement learning. An important component of the proposed intelligent control configuration is the fuzzy credit assignment unit which acts as a critic, and through fuzzy implications provides adjustment mechanisms....... The fuzzy credit assignment unit comprises a fuzzy system with the appropriate fuzzification, knowledge base and defuzzification components. When an external reinforcement signal (a failure signal) is received, sequences of control actions are evaluated and modified by the action applier unit. The desirable...... ones instruct the neuro-control unit to adjust its weights and are simultaneously stored in the memory unit during the training phase. In response to the internal reinforcement signal (set point threshold deviation), the stored information is retrieved by the action applier unit and utilized for re...

  6. Emotion in reinforcement learning agents and robots : A survey

    NARCIS (Netherlands)

    Moerland, T.M.; Broekens, D.J.; Jonker, C.M.

    2018-01-01

    This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action

  7. Real-time regression analysis with deep convolutional neural networks

    OpenAIRE

    Huerta, E. A.; George, Daniel; Zhao, Zhizhen; Allen, Gabrielle

    2018-01-01

    We discuss the development of novel deep learning algorithms to enable real-time regression analysis for time series data. We showcase the application of this new method with a timely case study, and then discuss the applicability of this approach to tackle similar challenges across science domains.

  8. A Day-to-Day Route Choice Model Based on Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Fangfang Wei

    2014-01-01

    Full Text Available Day-to-day traffic dynamics are generated by individual traveler’s route choice and route adjustment behaviors, which are appropriate to be researched by using agent-based model and learning theory. In this paper, we propose a day-to-day route choice model based on reinforcement learning and multiagent simulation. Travelers’ memory, learning rate, and experience cognition are taken into account. Then the model is verified and analyzed. Results show that the network flow can converge to user equilibrium (UE if travelers can remember all the travel time they have experienced, but which is not necessarily the case under limited memory; learning rate can strengthen the flow fluctuation, but memory leads to the contrary side; moreover, high learning rate results in the cyclical oscillation during the process of flow evolution. Finally, both the scenarios of link capacity degradation and random link capacity are used to illustrate the model’s applications. Analyses and applications of our model demonstrate the model is reasonable and useful for studying the day-to-day traffic dynamics.

  9. CAD2RL: Real Single-Image Flight without a Single Real Image

    OpenAIRE

    Sadeghi, Fereshteh; Levine, Sergey

    2016-01-01

    Deep reinforcement learning has emerged as a promising and powerful technique for automatically acquiring control policies that can process raw sensory inputs, such as images, and perform complex behaviors. However, extending deep RL to real-world robotic tasks has proven challenging, particularly in safety-critical domains such as autonomous flight, where a trial-and-error learning process is often impractical. In this paper, we explore the following question: can we train vision-based navig...

  10. Real Time Revisited

    Science.gov (United States)

    Allen, Phillip G.

    1985-12-01

    The call for abolishing photo reconnaissance in favor of real time is once more being heard. Ten years ago the same cries were being heard with the introduction of the Charge Coupled Device (CCD). The real time system problems that existed then and stopped real time proliferation have not been solved. The lack of an organized program by either DoD or industry has hampered any efforts to solve the problems, and as such, very little has happened in real time in the last ten years. Real time is not a replacement for photo, just as photo is not a replacement for infra-red or radar. Operational real time sensors can be designed only after their role has been defined and improvements made to the weak links in the system. Plodding ahead on a real time reconnaissance suite without benefit of evaluation of utility will allow this same paper to be used ten years from now.

  11. MECAR (Main Ring Excitation Controller and Regulator): A real time learning regulator for the Fermilab Main Ring or the Main Injector synchrotron

    International Nuclear Information System (INIS)

    Flora, R.; Martin, K.; Moibenko, A.; Pfeffer, H.; Wolff, D.; Prieto, P.; Hays, S.

    1995-04-01

    The real time computer for controlling and regulating the FNAL Main Ring power supplies has been upgraded with a new learning control system. The learning time of the system has been reduced by an order of magnitude, mostly through the implementation of a 95 tap FIR filter in the learning algorithm. The magnet system consists of three buses, which must track each other during a ramp from 100 to 1700 amps at a 2.4 second repetition rate. This paper will present the system configuration and the tools used during development and testing

  12. Optimal Control via Reinforcement Learning with Symbolic Policy Approximation

    NARCIS (Netherlands)

    Kubalìk, Jiřì; Alibekov, Eduard; Babuska, R.; Dochain, Denis; Henrion, Didier; Peaucelle, Dimitri

    2017-01-01

    Model-based reinforcement learning (RL) algorithms can be used to derive optimal control laws for nonlinear dynamic systems. With continuous-valued state and input variables, RL algorithms have to rely on function approximators to represent the value function and policy mappings. This paper

  13. Learning Similar Actions by Reinforcement or Sensory-Prediction Errors Rely on Distinct Physiological Mechanisms.

    Science.gov (United States)

    Uehara, Shintaro; Mawase, Firas; Celnik, Pablo

    2017-09-14

    Humans can acquire knowledge of new motor behavior via different forms of learning. The two forms most commonly studied have been the development of internal models based on sensory-prediction errors (error-based learning) and success-based feedback (reinforcement learning). Human behavioral studies suggest these are distinct learning processes, though the neurophysiological mechanisms that are involved have not been characterized. Here, we evaluated physiological markers from the cerebellum and the primary motor cortex (M1) using noninvasive brain stimulations while healthy participants trained finger-reaching tasks. We manipulated the extent to which subjects rely on error-based or reinforcement by providing either vector or binary feedback about task performance. Our results demonstrated a double dissociation where learning the task mainly via error-based mechanisms leads to cerebellar plasticity modifications but not long-term potentiation (LTP)-like plasticity changes in M1; while learning a similar action via reinforcement mechanisms elicited M1 LTP-like plasticity but not cerebellar plasticity changes. Our findings indicate that learning complex motor behavior is mediated by the interplay of different forms of learning, weighing distinct neural mechanisms in M1 and the cerebellum. Our study provides insights for designing effective interventions to enhance human motor learning. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. Optimal control in microgrid using multi-agent reinforcement learning.

    Science.gov (United States)

    Li, Fu-Dong; Wu, Min; He, Yong; Chen, Xin

    2012-11-01

    This paper presents an improved reinforcement learning method to minimize electricity costs on the premise of satisfying the power balance and generation limit of units in a microgrid with grid-connected mode. Firstly, the microgrid control requirements are analyzed and the objective function of optimal control for microgrid is proposed. Then, a state variable "Average Electricity Price Trend" which is used to express the most possible transitions of the system is developed so as to reduce the complexity and randomicity of the microgrid, and a multi-agent architecture including agents, state variables, action variables and reward function is formulated. Furthermore, dynamic hierarchical reinforcement learning, based on change rate of key state variable, is established to carry out optimal policy exploration. The analysis shows that the proposed method is beneficial to handle the problem of "curse of dimensionality" and speed up learning in the unknown large-scale world. Finally, the simulation results under JADE (Java Agent Development Framework) demonstrate the validity of the presented method in optimal control for a microgrid with grid-connected mode. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  15. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    Science.gov (United States)

    Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane

    2017-01-01

    Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning. PMID:29163004

  16. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Lucas Kastner

    2017-10-01

    Full Text Available Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1 Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2 Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3 Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.

  17. Reinforcement Learning Based Data Self-Destruction Scheme for Secured Data Management

    Directory of Open Access Journals (Sweden)

    Young Ki Kim

    2018-04-01

    Full Text Available As technologies and services that leverage cloud computing have evolved, the number of businesses and individuals who use them are increasing rapidly. In the course of using cloud services, as users store and use data that include personal information, research on privacy protection models to protect sensitive information in the cloud environment is becoming more important. As a solution to this problem, a self-destructing scheme has been proposed that prevents the decryption of encrypted user data after a certain period of time using a Distributed Hash Table (DHT network. However, the existing self-destructing scheme does not mention how to set the number of key shares and the threshold value considering the environment of the dynamic DHT network. This paper proposes a method to set the parameters to generate the key shares needed for the self-destructing scheme considering the availability and security of data. The proposed method defines state, action, and reward of the reinforcement learning model based on the similarity of the graph, and applies the self-destructing scheme process by updating the parameter based on the reinforcement learning model. Through the proposed technique, key sharing parameters can be set in consideration of data availability and security in dynamic DHT network environments.

  18. Real-time, adaptive machine learning for non-stationary, near chaotic gasoline engine combustion time series.

    Science.gov (United States)

    Vaughan, Adam; Bohac, Stanislav V

    2015-10-01

    Fuel efficient Homogeneous Charge Compression Ignition (HCCI) engine combustion timing predictions must contend with non-linear chemistry, non-linear physics, period doubling bifurcation(s), turbulent mixing, model parameters that can drift day-to-day, and air-fuel mixture state information that cannot typically be resolved on a cycle-to-cycle basis, especially during transients. In previous work, an abstract cycle-to-cycle mapping function coupled with ϵ-Support Vector Regression was shown to predict experimentally observed cycle-to-cycle combustion timing over a wide range of engine conditions, despite some of the aforementioned difficulties. The main limitation of the previous approach was that a partially acasual randomly sampled training dataset was used to train proof of concept offline predictions. The objective of this paper is to address this limitation by proposing a new online adaptive Extreme Learning Machine (ELM) extension named Weighted Ring-ELM. This extension enables fully causal combustion timing predictions at randomly chosen engine set points, and is shown to achieve results that are as good as or better than the previous offline method. The broader objective of this approach is to enable a new class of real-time model predictive control strategies for high variability HCCI and, ultimately, to bring HCCI's low engine-out NOx and reduced CO2 emissions to production engines. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. Ensemble Network Architecture for Deep Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Xi-liang Chen

    2018-01-01

    Full Text Available The popular deep Q learning algorithm is known to be instability because of the Q-value’s shake and overestimation action values under certain conditions. These issues tend to adversely affect their performance. In this paper, we develop the ensemble network architecture for deep reinforcement learning which is based on value function approximation. The temporal ensemble stabilizes the training process by reducing the variance of target approximation error and the ensemble of target values reduces the overestimate and makes better performance by estimating more accurate Q-value. Our results show that this architecture leads to statistically significant better value evaluation and more stable and better performance on several classical control tasks at OpenAI Gym environment.

  20. Real-Time Video Stylization Using Object Flows.

    Science.gov (United States)

    Lu, Cewu; Xiao, Yao; Tang, Chi-Keung

    2017-05-05

    We present a real-time video stylization system and demonstrate a variety of painterly styles rendered on real video inputs. The key technical contribution lies on the object flow, which is robust to inaccurate optical flow, unknown object transformation and partial occlusion as well. Since object flows relate regions of the same object across frames, shower-door effect can be effectively reduced where painterly strokes and textures are rendered on video objects. The construction of object flows is performed in real time and automatically after applying metric learning. To reduce temporal flickering, we extend the bilateral filtering into motion bilateral filtering. We propose quantitative metrics to measure the temporal coherence on structures and textures of our stylized videos, and perform extensive experiments to compare our stylized results with baseline systems and prior works specializing in watercolor and abstraction.

  1. Real-time learning of predictive recognition categories that chunk sequences of items stored in working memory

    Directory of Open Access Journals (Sweden)

    Stephen eGrossberg

    2014-10-01

    Full Text Available How are sequences of events that are temporarily stored in a cognitive working memory unitized, or chunked, through learning? Such sequential learning is needed by the brain in order to enable language, spatial understanding, and motor skills to develop. In particular, how does the brain learn categories, or list chunks, that become selectively tuned to different temporal sequences of items in lists of variable length as they are stored in working memory, and how does this learning process occur in real time? The present article introduces a neural model that simulates learning of such list chunks. In this model, sequences of items are temporarily stored in an Item-and-Order, or competitive queuing, working memory before learning categorizes them using a categorization network, called a Masking Field, which is a self-similar, multiple-scale, recurrent on-center off-surround network that can weigh the evidence for variable-length sequences of items as they are stored in the working memory through time. A Masking Field hereby activates the learned list chunks that represent the most predictive item groupings at any time, while suppressing less predictive chunks. In a network with a given number of input items, all possible ordered sets of these item sequences, up to a fixed length, can be learned with unsupervised or supervised learning. The self-similar multiple-scale properties of Masking Fields interacting with an Item-and-Order working memory provide a natural explanation of George Miller's Magical Number Seven and Nelson Cowan's Magical Number Four. The article explains why linguistic, spatial, and action event sequences may all be stored by Item-and-Order working memories that obey similar design principles, and thus how the current results may apply across modalities. Item-and-Order properties may readily be extended to Item-Order-Rank working memories in which the same item can be stored in multiple list positions, or ranks, as in the list

  2. Integrating distributed Bayesian inference and reinforcement learning for sensor management

    NARCIS (Netherlands)

    Grappiolo, C.; Whiteson, S.; Pavlin, G.; Bakker, B.

    2009-01-01

    This paper introduces a sensor management approach that integrates distributed Bayesian inference (DBI) and reinforcement learning (RL). DBI is implemented using distributed perception networks (DPNs), a multiagent approach to performing efficient inference, while RL is used to automatically

  3. Learning User Preferences in Ubiquitous Systems: A User Study and a Reinforcement Learning Approach

    OpenAIRE

    Zaidenberg , Sofia; Reignier , Patrick; Mandran , Nadine

    2010-01-01

    International audience; Our study concerns a virtual assistant, proposing services to the user based on its current perceived activity and situation (ambient intelligence). Instead of asking the user to define his preferences, we acquire them automatically using a reinforcement learning approach. Experiments showed that our system succeeded the learning of user preferences. In order to validate the relevance and usability of such a system, we have first conducted a user study. 26 non-expert s...

  4. A Novel Real-Time Speech Summarizer System for the Learning of Sustainability

    Directory of Open Access Journals (Sweden)

    Hsiu-Wen Wang

    2015-04-01

    Full Text Available As the number of speech and video documents increases on the Internet and portable devices proliferate, speech summarization becomes increasingly essential. Relevant research in this domain has typically focused on broadcasts and news; however, the automatic summarization methods used in the past may not apply to other speech domains (e.g., speech in lectures. Therefore, this study explores the lecture speech domain. The features used in previous research were analyzed and suitable features were selected following experimentation; subsequently, a three-phase real-time speech summarizer for the learning of sustainability (RTSSLS was proposed. Phase One involved selecting independent features (e.g., centrality, resemblance to the title, sentence length, term frequency, and thematic words and calculating the independent feature scores; Phase Two involved calculating the dependent features, such as the position compared with the independent feature scores; and Phase Three involved comparing these feature scores to obtain weighted averages of the function-scores, determine the highest-scoring sentence, and provide a summary. In practical results, the accuracies of macro-average and micro-average for the RTSSLS were 70% and 73%, respectively. Therefore, using a RTSSLS can enable users to acquire key speech information for the learning of sustainability.

  5. Field Demonstration of Real-Time Wind Turbine Foundation Strain Monitoring.

    Science.gov (United States)

    Rubert, Tim; Perry, Marcus; Fusiek, Grzegorz; McAlorum, Jack; Niewczas, Pawel; Brotherston, Amanda; McCallum, David

    2017-12-31

    Onshore wind turbine foundations are generally over-engineered as their internal stress states are challenging to directly monitor during operation. While there are industry drivers to shift towards more economical foundation designs, making this transition safely will require new monitoring techniques, so that the uncertainties around structural health can be reduced. This paper presents the initial results of a real-time strain monitoring campaign for an operating wind turbine foundation. Selected reinforcement bars were instrumented with metal packaged optical fibre strain sensors prior to concrete casting. In this paper, we outline the sensors' design, characterisation and installation, and present 67 days of operational data. During this time, measured foundation strains did not exceed 95 μ ϵ , and showed a strong correlation with both measured tower displacements and the results of a foundation finite element model. The work demonstrates that real-time foundation monitoring is not only achievable, but that it has the potential to help operators and policymakers quantify the conservatism of their existing design codes.

  6. Multiagent cooperation and competition with deep reinforcement learning.

    Directory of Open Access Journals (Sweden)

    Ardi Tampuu

    Full Text Available Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

  7. Multiagent cooperation and competition with deep reinforcement learning

    Science.gov (United States)

    Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

    2017-01-01

    Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments. PMID:28380078

  8. Multiagent cooperation and competition with deep reinforcement learning.

    Science.gov (United States)

    Tampuu, Ardi; Matiisen, Tambet; Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

    2017-01-01

    Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

  9. Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain.

    Science.gov (United States)

    Niv, Yael; Edlund, Jeffrey A; Dayan, Peter; O'Doherty, John P

    2012-01-11

    Humans and animals are exquisitely, though idiosyncratically, sensitive to risk or variance in the outcomes of their actions. Economic, psychological, and neural aspects of this are well studied when information about risk is provided explicitly. However, we must normally learn about outcomes from experience, through trial and error. Traditional models of such reinforcement learning focus on learning about the mean reward value of cues and ignore higher order moments such as variance. We used fMRI to test whether the neural correlates of human reinforcement learning are sensitive to experienced risk. Our analysis focused on anatomically delineated regions of a priori interest in the nucleus accumbens, where blood oxygenation level-dependent (BOLD) signals have been suggested as correlating with quantities derived from reinforcement learning. We first provide unbiased evidence that the raw BOLD signal in these regions corresponds closely to a reward prediction error. We then derive from this signal the learned values of cues that predict rewards of equal mean but different variance and show that these values are indeed modulated by experienced risk. Moreover, a close neurometric-psychometric coupling exists between the fluctuations of the experience-based evaluations of risky options that we measured neurally and the fluctuations in behavioral risk aversion. This suggests that risk sensitivity is integral to human learning, illuminating economic models of choice, neuroscientific models of affective learning, and the workings of the underlying neural mechanisms.

  10. Multiagent Reinforcement Learning with Regret Matching for Robot Soccer

    Directory of Open Access Journals (Sweden)

    Qiang Liu

    2013-01-01

    Full Text Available This paper proposes a novel multiagent reinforcement learning (MARL algorithm Nash- learning with regret matching, in which regret matching is used to speed up the well-known MARL algorithm Nash- learning. It is critical that choosing a suitable strategy for action selection to harmonize the relation between exploration and exploitation to enhance the ability of online learning for Nash- learning. In Markov Game the joint action of agents adopting regret matching algorithm can converge to a group of points of no-regret that can be viewed as coarse correlated equilibrium which includes Nash equilibrium in essence. It is can be inferred that regret matching can guide exploration of the state-action space so that the rate of convergence of Nash- learning algorithm can be increased. Simulation results on robot soccer validate that compared to original Nash- learning algorithm, the use of regret matching during the learning phase of Nash- learning has excellent ability of online learning and results in significant performance in terms of scores, average reward and policy convergence.

  11. Enabling Real-Time Video Services over Ad-Hoc Networks Opens the Gates for E-learning in Areas Lacking Infrastructure

    Directory of Open Access Journals (Sweden)

    Johannes Karlsson

    2009-10-01

    Full Text Available In this paper we suggest a promising solution to come over the problems of delivering e-learning to areas with lack or deficiencies in infrastructure for Internet and mobile communication. We present a simple, reasonably priced and efficient communication platform for providing e-learning. This platform is based on wireless ad-hoc networks. We also present a preemptive routing protocol suitable for real-time video communication over wireless ad-hoc networks. Our results show that this routing protocol can significantly improve the quality of the received video. This makes our suggested system not only good to overcome the infrastructure barrier but even capable of delivering a high quality e-learning material.

  12. Multiagent Reinforcement Learning Dynamic Spectrum Access in Cognitive Radios

    Directory of Open Access Journals (Sweden)

    Wu Chun

    2014-02-01

    Full Text Available A multiuser independent Q-learning method which does not need information interaction is proposed for multiuser dynamic spectrum accessing in cognitive radios. The method adopts self-learning paradigm, in which each CR user performs reinforcement learning only through observing individual performance reward without spending communication resource on information interaction with others. The reward is defined suitably to present channel quality and channel conflict status. The learning strategy of sufficient exploration, preference for good channel, and punishment for channel conflict is designed to implement multiuser dynamic spectrum accessing. In two users two channels scenario, a fast learning algorithm is proposed and the convergence to maximal whole reward is proved. The simulation results show that, with the proposed method, the CR system can obtain convergence of Nash equilibrium with large probability and achieve great performance of whole reward.

  13. Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints.

    Science.gov (United States)

    Liu, Derong; Yang, Xiong; Wang, Ding; Wei, Qinglai

    2015-07-01

    The design of stabilizing controller for uncertain nonlinear systems with control constraints is a challenging problem. The constrained-input coupled with the inability to identify accurately the uncertainties motivates the design of stabilizing controller based on reinforcement-learning (RL) methods. In this paper, a novel RL-based robust adaptive control algorithm is developed for a class of continuous-time uncertain nonlinear systems subject to input constraints. The robust control problem is converted to the constrained optimal control problem with appropriately selecting value functions for the nominal system. Distinct from typical action-critic dual networks employed in RL, only one critic neural network (NN) is constructed to derive the approximate optimal control. Meanwhile, unlike initial stabilizing control often indispensable in RL, there is no special requirement imposed on the initial control. By utilizing Lyapunov's direct method, the closed-loop optimal control system and the estimated weights of the critic NN are proved to be uniformly ultimately bounded. In addition, the derived approximate optimal control is verified to guarantee the uncertain nonlinear system to be stable in the sense of uniform ultimate boundedness. Two simulation examples are provided to illustrate the effectiveness and applicability of the present approach.

  14. Global reinforcement training of CrossNets

    Science.gov (United States)

    Ma, Xiaolong

    2007-10-01

    Hybrid "CMOL" integrated circuits, incorporating advanced CMOS devices for neural cell bodies, nanowires as axons and dendrites, and latching switches as synapses, may be used for the hardware implementation of extremely dense (107 cells and 1012 synapses per cm2) neuromorphic networks, operating up to 10 6 times faster than their biological prototypes. We are exploring several "Cross- Net" architectures that accommodate the limitations imposed by CMOL hardware and should allow effective training of the networks without a direct external access to individual synapses. Our studies have show that CrossNets based on simple (two-terminal) crosspoint devices can work well in at least two modes: as Hop-field networks for associative memory and multilayer perceptrons for classification tasks. For more intelligent tasks (such as robot motion control or complex games), which do not have "examples" for supervised learning, more advanced training methods such as the global reinforcement learning are necessary. For application of global reinforcement training algorithms to CrossNets, we have extended Williams's REINFORCE learning principle to a more general framework and derived several learning rules that are more suitable for CrossNet hardware implementation. The results of numerical experiments have shown that these new learning rules can work well for both classification tasks and reinforcement tasks such as the cartpole balancing control problem. Some limitations imposed by the CMOL hardware need to be carefully addressed for the the successful application of in situ reinforcement training to CrossNets.

  15. DYNAMIC AND INCREMENTAL EXPLORATION STRATEGY IN FUSION ADAPTIVE RESONANCE THEORY FOR ONLINE REINFORCEMENT LEARNING

    Directory of Open Access Journals (Sweden)

    Budhitama Subagdja

    2016-06-01

    Full Text Available One of the fundamental challenges in reinforcement learning is to setup a proper balance between exploration and exploitation to obtain the maximum cummulative reward in the long run. Most protocols for exploration bound the overall values to a convergent level of performance. If new knowledge is inserted or the environment is suddenly changed, the issue becomes more intricate as the exploration must compromise the pre-existing knowledge. This paper presents a type of multi-channel adaptive resonance theory (ART neural network model called fusion ART which serves as a fuzzy approximator for reinforcement learning with inherent features that can regulate the exploration strategy. This intrinsic regulation is driven by the condition of the knowledge learnt so far by the agent. The model offers a stable but incremental reinforcement learning that can involve prior rules as bootstrap knowledge for guiding the agent to select the right action. Experiments in obstacle avoidance and navigation tasks demonstrate that in the configuration of learning wherein the agent learns from scratch, the inherent exploration model in fusion ART model is comparable to the basic E-greedy policy. On the other hand, the model is demonstrated to deal with prior knowledge and strike a balance between exploration and exploitation.

  16. A Review of the Relationship between Novelty, Intrinsic Motivation and Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Siddique Nazmul

    2017-11-01

    Full Text Available This paper presents a review on the tri-partite relationship between novelty, intrinsic motivation and reinforcement learning. The paper first presents a literature survey on novelty and the different computational models of novelty detection, with a specific focus on the features of stimuli that trigger a Hedonic value for generating a novelty signal. It then presents an overview of intrinsic motivation and investigations into different models with the aim of exploring deeper co-relationships between specific features of a novelty signal and its effect on intrinsic motivation in producing a reward function. Finally, it presents survey results on reinforcement learning, different models and their functional relationship with intrinsic motivation.

  17. Evolutionary online behaviour learning and adaptation in real robots.

    Science.gov (United States)

    Silva, Fernando; Correia, Luís; Christensen, Anders Lyhne

    2017-07-01

    Online evolution of behavioural control on real robots is an open-ended approach to autonomous learning and adaptation: robots have the potential to automatically learn new tasks and to adapt to changes in environmental conditions, or to failures in sensors and/or actuators. However, studies have so far almost exclusively been carried out in simulation because evolution in real hardware has required several days or weeks to produce capable robots. In this article, we successfully evolve neural network-based controllers in real robotic hardware to solve two single-robot tasks and one collective robotics task. Controllers are evolved either from random solutions or from solutions pre-evolved in simulation. In all cases, capable solutions are found in a timely manner (1 h or less). Results show that more accurate simulations may lead to higher-performing controllers, and that completing the optimization process in real robots is meaningful, even if solutions found in simulation differ from solutions in reality. We furthermore demonstrate for the first time the adaptive capabilities of online evolution in real robotic hardware, including robots able to overcome faults injected in the motors of multiple units simultaneously, and to modify their behaviour in response to changes in the task requirements. We conclude by assessing the contribution of each algorithmic component on the performance of the underlying evolutionary algorithm.

  18. Real-Time Analytics for the Healthcare Industry: Arrhythmia Detection.

    Science.gov (United States)

    Agneeswaran, Vijay Srinivas; Mukherjee, Joydeb; Gupta, Ashutosh; Tonpay, Pranay; Tiwari, Jayati; Agarwal, Nitin

    2013-09-01

    It is time for the healthcare industry to move from the era of "analyzing our health history" to the age of "managing the future of our health." In this article, we illustrate the importance of real-time analytics across the healthcare industry by providing a generic mechanism to reengineer traditional analytics expressed in the R programming language into Storm-based real-time analytics code. This is a powerful abstraction, since most data scientists use R to write the analytics and are not clear on how to make the data work in real-time and on high-velocity data. Our paper focuses on the applications necessary to a healthcare analytics scenario, specifically focusing on the importance of electrocardiogram (ECG) monitoring. A physician can use our framework to compare ECG reports by categorization and consequently detect Arrhythmia. The framework can read the ECG signals and uses a machine learning-based categorizer that runs within a Storm environment to compare different ECG signals. The paper also presents some performance studies of the framework to illustrate the throughput and accuracy trade-off in real-time analytics.

  19. Real-time gaze estimation via pupil center tracking

    Directory of Open Access Journals (Sweden)

    Cazzato Dario

    2018-02-01

    Full Text Available Automatic gaze estimation not based on commercial and expensive eye tracking hardware solutions can enable several applications in the fields of human computer interaction (HCI and human behavior analysis. It is therefore not surprising that several related techniques and methods have been investigated in recent years. However, very few camera-based systems proposed in the literature are both real-time and robust. In this work, we propose a real-time user-calibration-free gaze estimation system that does not need person-dependent calibration, can deal with illumination changes and head pose variations, and can work with a wide range of distances from the camera. Our solution is based on a 3-D appearance-based method that processes the images from a built-in laptop camera. Real-time performance is obtained by combining head pose information with geometrical eye features to train a machine learning algorithm. Our method has been validated on a data set of images of users in natural environments, and shows promising results. The possibility of a real-time implementation, combined with the good quality of gaze tracking, make this system suitable for various HCI applications.

  20. Dopamine-Dependent Reinforcement of Motor Skill Learning: Evidence from Gilles de la Tourette Syndrome

    Science.gov (United States)

    Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias

    2011-01-01

    Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…

  1. Aircraft Control Using Engine Thrust: A History of Learning TOC Real-Time

    Science.gov (United States)

    Cole, Jennifer H.; Batteas, Frank; Fullerton, Gordon

    2006-01-01

    A history of learning the operation of Throttles Only Control (TOC) to control an aircraft in real time using engine thrust is shown. The topics include: 1) Past TOC Accidents/Incidents; 2) 1972: DC-10 American Airlines; 3) May 1974: USAF B-52H; 4) April 1975: USAF C-5A; 5) April 1975: USAF C-5A; 6) 1981: USAF B-52G; 7) August 1985: JAL 123 B-747; 8) JAL 123 Survivor Story; 9) JAL 123 Investigation Findings; 10) July 1989: UAL 232 DC-10; 11) UAL 232 DC-10; 12) Eastwind 517 B-737; 13) November 2003: DHL A-300; 14) Historically, TOC has saved lives; 15) Automated Throttles-Only Control; 16) PCA Project; 17) Propulsion-Controlled Aircraft; 18) MD-11 PCA System and Flight Test Envelope; 19) MD-11 Simulation, PCA ILS-Soupled Landing Dispersion; 20) Throttles-Only Pitch and Roll Control Power; 21) PCA in Commercial Fleet; 22) Fall 2005: PCAR Project; 23) PCAR Background - TOC; and 24) PCAR Background - TOC.

  2. Joy, Distress, Hope, and Fear in Reinforcement Learning (Extended Abstract)

    NARCIS (Netherlands)

    Jacobs, E.J.; Broekens, J.; Jonker, C.M.

    2014-01-01

    In this paper we present a mapping between joy, distress, hope and fear, and Reinforcement Learning primitives. Joy / distress is a signal that is derived from the RL update signal, while hope/fear is derived from the utility of the current state. Agent-based simulation experiments replicate

  3. Applications of Deep Learning and Reinforcement Learning to Biological Data.

    Science.gov (United States)

    Mahmud, Mufti; Kaiser, Mohammed Shamim; Hussain, Amir; Vassanelli, Stefano

    2018-06-01

    Rapid advances in hardware-based technologies during the past decades have opened up new possibilities for life scientists to gather multimodal data in various application domains, such as omics, bioimaging, medical imaging, and (brain/body)-machine interfaces. These have generated novel opportunities for development of dedicated data-intensive machine learning techniques. In particular, recent research in deep learning (DL), reinforcement learning (RL), and their combination (deep RL) promise to revolutionize the future of artificial intelligence. The growth in computational power accompanied by faster and increased data storage, and declining computing costs have already allowed scientists in various fields to apply these techniques on data sets that were previously intractable owing to their size and complexity. This paper provides a comprehensive survey on the application of DL, RL, and deep RL techniques in mining biological data. In addition, we compare the performances of DL techniques when applied to different data sets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.

  4. Adaptive Load Balancing of Parallel Applications with Multi-Agent Reinforcement Learning on Heterogeneous Systems

    Directory of Open Access Journals (Sweden)

    Johan Parent

    2004-01-01

    Full Text Available We report on the improvements that can be achieved by applying machine learning techniques, in particular reinforcement learning, for the dynamic load balancing of parallel applications. The applications being considered in this paper are coarse grain data intensive applications. Such applications put high pressure on the interconnect of the hardware. Synchronization and load balancing in complex, heterogeneous networks need fast, flexible, adaptive load balancing algorithms. Viewing a parallel application as a one-state coordination game in the framework of multi-agent reinforcement learning, and by using a recently introduced multi-agent exploration technique, we are able to improve upon the classic job farming approach. The improvements are achieved with limited computation and communication overhead.

  5. Learning alternative movement coordination patterns using reinforcement feedback.

    Science.gov (United States)

    Lin, Tzu-Hsiang; Denomme, Amber; Ranganathan, Rajiv

    2018-05-01

    One of the characteristic features of the human motor system is redundancy-i.e., the ability to achieve a given task outcome using multiple coordination patterns. However, once participants settle on using a specific coordination pattern, the process of learning to use a new alternative coordination pattern to perform the same task is still poorly understood. Here, using two experiments, we examined this process of how participants shift from one coordination pattern to another using different reinforcement schedules. Participants performed a virtual reaching task, where they moved a cursor to different targets positioned on the screen. Our goal was to make participants use a coordination pattern with greater trunk motion, and to this end, we provided reinforcement by making the cursor disappear if the trunk motion during the reach did not cross a specified threshold value. In Experiment 1, we compared two reinforcement schedules in two groups of participants-an abrupt group, where the threshold was introduced immediately at the beginning of practice; and a gradual group, where the threshold was introduced gradually with practice. Results showed that both abrupt and gradual groups were effective in shifting their coordination patterns to involve greater trunk motion, but the abrupt group showed greater retention when the reinforcement was removed. In Experiment 2, we examined the basis of this advantage in the abrupt group using two additional control groups. Results showed that the advantage of the abrupt group was because of a greater number of practice trials with the desired coordination pattern. Overall, these results show that reinforcement can be successfully used to shift coordination patterns, which has potential in the rehabilitation of movement disorders.

  6. Learning from Dealing with Real World Problems

    Science.gov (United States)

    Akcay, Hakan

    2017-01-01

    The purpose of this article is to provide an example of using real world issues as tools for science teaching and learning. Using real world issues provides students with experiences in learning in problem-based environments and encourages them to apply their content knowledge to solving current and local problems.

  7. Analysis of time-dependent reliability of degenerated reinforced concrete structure

    Directory of Open Access Journals (Sweden)

    Zhang Hongping

    2016-07-01

    Full Text Available Durability deterioration of structure is a highly random process. The maintenance of degenerated structure involves the calculation of the reliability of time-dependent structure. This study introduced reinforced concrete structure resistance decrease model and related statistical parameters of uncertainty, analyzed resistance decrease rules of corroded bending element of reinforced concrete structure, and finally calculated timedependent reliability of the corroded bending element of reinforced concrete structure, aiming to provide a specific theoretical basis for the application of time-dependent reliability theory.

  8. Nonparametric bayesian reward segmentation for skill discovery using inverse reinforcement learning

    CSIR Research Space (South Africa)

    Ranchod, P

    2015-10-01

    Full Text Available We present a method for segmenting a set of unstructured demonstration trajectories to discover reusable skills using inverse reinforcement learning (IRL). Each skill is characterised by a latent reward function which the demonstrator is assumed...

  9. Interactive Learning Modules: Enabling Near Real-Time Oceanographic Data Use In Undergraduate Education

    Science.gov (United States)

    Kilb, D. L.; Fundis, A. T.; Risien, C. M.

    2012-12-01

    The focus of the Education and Public Engagement (EPE) component of the NSF's Ocean Observatories Initiative (OOI) is to provide a new layer of cyber-interactivity for undergraduate educators to bring near real-time data from the global ocean into learning environments. To accomplish this, we are designing six online services including: 1) visualization tools, 2) a lesson builder, 3) a concept map builder, 4) educational web services (middleware), 5) collaboration tools and 6) an educational resource database. Here, we report on our Fall 2012 release that includes the first four of these services: 1) Interactive visualization tools allow users to interactively select data of interest, display the data in various views (e.g., maps, time-series and scatter plots) and obtain statistical measures such as mean, standard deviation and a regression line fit to select data. Specific visualization tools include a tool to compare different months of data, a time series explorer tool to investigate the temporal evolution of select data parameters (e.g., sea water temperature or salinity), a glider profile tool that displays ocean glider tracks and associated transects, and a data comparison tool that allows users to view the data either in scatter plot view comparing one parameter with another, or in time series view. 2) Our interactive lesson builder tool allows users to develop a library of online lesson units, which are collaboratively editable and sharable and provides starter templates designed from learning theory knowledge. 3) Our interactive concept map tool allows the user to build and use concept maps, a graphical interface to map the connection between concepts and ideas. This tool also provides semantic-based recommendations, and allows for embedding of associated resources such as movies, images and blogs. 4) Education web services (middleware) will provide an educational resource database API.

  10. Explicit and implicit reinforcement learning across the psychosis spectrum.

    Science.gov (United States)

    Barch, Deanna M; Carter, Cameron S; Gold, James M; Johnson, Sheri L; Kring, Ann M; MacDonald, Angus W; Pizzagalli, Diego A; Ragland, J Daniel; Silverstein, Steven M; Strauss, Milton E

    2017-07-01

    Motivational and hedonic impairments are core features of a variety of types of psychopathology. An important aspect of motivational function is reinforcement learning (RL), including implicit (i.e., outside of conscious awareness) and explicit (i.e., including explicit representations about potential reward associations) learning, as well as both positive reinforcement (learning about actions that lead to reward) and punishment (learning to avoid actions that lead to loss). Here we present data from paradigms designed to assess both positive and negative components of both implicit and explicit RL, examine performance on each of these tasks among individuals with schizophrenia, schizoaffective disorder, and bipolar disorder with psychosis, and examine their relative relationships to specific symptom domains transdiagnostically. None of the diagnostic groups differed significantly from controls on the implicit RL tasks in either bias toward a rewarded response or bias away from a punished response. However, on the explicit RL task, both the individuals with schizophrenia and schizoaffective disorder performed significantly worse than controls, but the individuals with bipolar did not. Worse performance on the explicit RL task, but not the implicit RL task, was related to worse motivation and pleasure symptoms across all diagnostic categories. Performance on explicit RL, but not implicit RL, was related to working memory, which accounted for some of the diagnostic group differences. However, working memory did not account for the relationship of explicit RL to motivation and pleasure symptoms. These findings suggest transdiagnostic relationships across the spectrum of psychotic disorders between motivation and pleasure impairments and explicit RL. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  11. An algorithm for learning real-time automata (extended abstract)

    NARCIS (Netherlands)

    Verwer, S.E.; De Weerdt, M.M.; Witteveen, C.

    2007-01-01

    A common model for discrete event systems is a deterministic finite automaton (DFA). An advantage of this model is that it can be interpreted by domain experts. When observing a real-world system, however, there often is more information than just the sequence of discrete events: the time at which

  12. An Improved Reinforcement Learning System Using Affective Factors

    Directory of Open Access Journals (Sweden)

    Takashi Kuremoto

    2013-07-01

    Full Text Available As a powerful and intelligent machine learning method, reinforcement learning (RL has been widely used in many fields such as game theory, adaptive control, multi-agent system, nonlinear forecasting, and so on. The main contribution of this technique is its exploration and exploitation approaches to find the optimal solution or semi-optimal solution of goal-directed problems. However, when RL is applied to multi-agent systems (MASs, problems such as “curse of dimension”, “perceptual aliasing problem”, and uncertainty of the environment constitute high hurdles to RL. Meanwhile, although RL is inspired by behavioral psychology and reward/punishment from the environment is used, higher mental factors such as affects, emotions, and motivations are rarely adopted in the learning procedure of RL. In this paper, to challenge agents learning in MASs, we propose a computational motivation function, which adopts two principle affective factors “Arousal” and “Pleasure” of Russell’s circumplex model of affects, to improve the learning performance of a conventional RL algorithm named Q-learning (QL. Compared with the conventional QL, computer simulations of pursuit problems with static and dynamic preys were carried out, and the results showed that the proposed method results in agents having a faster and more stable learning performance.

  13. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning.

    Science.gov (United States)

    Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S

    2011-01-01

    As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. © 2011 IEEE

  14. 'Proactive' use of cue-context congruence for building reinforcement learning's reward function.

    Science.gov (United States)

    Zsuga, Judit; Biro, Klara; Tajti, Gabor; Szilasi, Magdolna Emma; Papp, Csaba; Juhasz, Bela; Gesztelyi, Rudolf

    2016-10-28

    Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent's knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability. The central objective of reinforcement learning is to solve these two functions outside the agent's control either using, or not using a model. In the present paper, using the proactive model of reinforcement learning we offer insight on how the brain creates simplified representations of the environment, and how these representations are organized to support the identification of relevant stimuli and action. Furthermore, we identify neurobiological correlates of our model by suggesting that the reward and policy functions, attributes of the Bellman equitation, are built by the orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC), respectively. Based on this we propose that the OFC assesses cue-context congruence to activate the most context frame. Furthermore given the bidirectional neuroanatomical link between the OFC and model-free structures, we suggest that model-based input is incorporated into the reward prediction error (RPE) signal, and conversely RPE signal may be used to update the reward-related information of context frames and the policy underlying action selection in the OFC and ACC, respectively. Furthermore clinical implications for cognitive behavioral interventions are discussed.

  15. The "proactive" model of learning: Integrative framework for model-free and model-based reinforcement learning utilizing the associative learning-based proactive brain concept.

    Science.gov (United States)

    Zsuga, Judit; Biro, Klara; Papp, Csaba; Tajti, Gabor; Gesztelyi, Rudolf

    2016-02-01

    Reinforcement learning (RL) is a powerful concept underlying forms of associative learning governed by the use of a scalar reward signal, with learning taking place if expectations are violated. RL may be assessed using model-based and model-free approaches. Model-based reinforcement learning involves the amygdala, the hippocampus, and the orbitofrontal cortex (OFC). The model-free system involves the pedunculopontine-tegmental nucleus (PPTgN), the ventral tegmental area (VTA) and the ventral striatum (VS). Based on the functional connectivity of VS, model-free and model based RL systems center on the VS that by integrating model-free signals (received as reward prediction error) and model-based reward related input computes value. Using the concept of reinforcement learning agent we propose that the VS serves as the value function component of the RL agent. Regarding the model utilized for model-based computations we turned to the proactive brain concept, which offers an ubiquitous function for the default network based on its great functional overlap with contextual associative areas. Hence, by means of the default network the brain continuously organizes its environment into context frames enabling the formulation of analogy-based association that are turned into predictions of what to expect. The OFC integrates reward-related information into context frames upon computing reward expectation by compiling stimulus-reward and context-reward information offered by the amygdala and hippocampus, respectively. Furthermore we suggest that the integration of model-based expectations regarding reward into the value signal is further supported by the efferent of the OFC that reach structures canonical for model-free learning (e.g., the PPTgN, VTA, and VS). (c) 2016 APA, all rights reserved).

  16. Arrow-bot: A Teaching Tool for Real-Time Embedded System Course

    Directory of Open Access Journals (Sweden)

    Zakaria Mohamad Fauzi

    2017-01-01

    Full Text Available This paper presents the design of a line following Arduino-based mobile robot for Real-Time Embedded System course at Universiti Tun Hussein Onn Malaysia. The real-time system (RTS concept was implementing is based on rate monotonic scheduling (RMS on an ATmega328P microcontroller. Three infrared line sensors were used as input for controlling two direct current (DC motors. A RTS software was programmed in Arduino IDE which relied on a real-time operating system (RTOS of ChibiOS/RT library. Three independent tasks of software functions were created for testing real-time scheduling capability and the result of temporal scope was collected. The microcontroller succeeded to handle multiple tasks without missed their dateline. This implementation of the RTOS in embedded system for mobile robotics system is hoped to increase students understanding and learning capability.

  17. Combining high-speed SVM learning with CNN feature encoding for real-time target recognition in high-definition video for ISR missions

    Science.gov (United States)

    Kroll, Christine; von der Werth, Monika; Leuck, Holger; Stahl, Christoph; Schertler, Klaus

    2017-05-01

    For Intelligence, Surveillance, Reconnaissance (ISR) missions of manned and unmanned air systems typical electrooptical payloads provide high-definition video data which has to be exploited with respect to relevant ground targets in real-time by automatic/assisted target recognition software. Airbus Defence and Space is developing required technologies for real-time sensor exploitation since years and has combined the latest advances of Deep Convolutional Neural Networks (CNN) with a proprietary high-speed Support Vector Machine (SVM) learning method into a powerful object recognition system with impressive results on relevant high-definition video scenes compared to conventional target recognition approaches. This paper describes the principal requirements for real-time target recognition in high-definition video for ISR missions and the Airbus approach of combining an invariant feature extraction using pre-trained CNNs and the high-speed training and classification ability of a novel frequency-domain SVM training method. The frequency-domain approach allows for a highly optimized implementation for General Purpose Computation on a Graphics Processing Unit (GPGPU) and also an efficient training of large training samples. The selected CNN which is pre-trained only once on domain-extrinsic data reveals a highly invariant feature extraction. This allows for a significantly reduced adaptation and training of the target recognition method for new target classes and mission scenarios. A comprehensive training and test dataset was defined and prepared using relevant high-definition airborne video sequences. The assessment concept is explained and performance results are given using the established precision-recall diagrams, average precision and runtime figures on representative test data. A comparison to legacy target recognition approaches shows the impressive performance increase by the proposed CNN+SVM machine-learning approach and the capability of real-time high

  18. Real-time shadows

    CERN Document Server

    Eisemann, Elmar; Assarsson, Ulf; Wimmer, Michael

    2011-01-01

    Important elements of games, movies, and other computer-generated content, shadows are crucial for enhancing realism and providing important visual cues. In recent years, there have been notable improvements in visual quality and speed, making high-quality realistic real-time shadows a reachable goal. Real-Time Shadows is a comprehensive guide to the theory and practice of real-time shadow techniques. It covers a large variety of different effects, including hard, soft, volumetric, and semi-transparent shadows.The book explains the basics as well as many advanced aspects related to the domain

  19. Dependable Real-Time Systems

    Science.gov (United States)

    1991-09-30

    0196 or 413 545-0720 PI E-mail Address: krithi@nirvan.cs.umass.edu, stankovic(ocs.umass.edu Grant or Contract Title: Dependable Real - Time Systems Grant...Dependable Real - Time Systems " Grant or Contract Number: N00014-85-k-0398 L " Reporting Period: 1 Oct 87 - 30 Sep 91 , 2. Summary of Accomplishments ’ 2.1 Our...in developing a sound approach to scheduling tasks in complex real - time systems , (2) developed a real-time operating system kernel, a preliminary

  20. Memristive device based learning for navigation in robots.

    Science.gov (United States)

    Sarim, Mohammad; Kumar, Manish; Jha, Rashmi; Minai, Ali A

    2017-11-08

    Biomimetic robots have gained attention recently for various applications ranging from resource hunting to search and rescue operations during disasters. Biological species are known to intuitively learn from the environment, gather and process data, and make appropriate decisions. Such sophisticated computing capabilities in robots are difficult to achieve, especially if done in real-time with ultra-low energy consumption. Here, we present a novel memristive device based learning architecture for robots. Two terminal memristive devices with resistive switching of oxide layer are modeled in a crossbar array to develop a neuromorphic platform that can impart active real-time learning capabilities in a robot. This approach is validated by navigating a robot vehicle in an unknown environment with randomly placed obstacles. Further, the proposed scheme is compared with reinforcement learning based algorithms using local and global knowledge of the environment. The simulation as well as experimental results corroborate the validity and potential of the proposed learning scheme for robots. The results also show that our learning scheme approaches an optimal solution for some environment layouts in robot navigation.

  1. LabVIEW Real-Time

    CERN Multimedia

    CERN. Geneva; Flockhart, Ronald Bruce; Seppey, P

    2003-01-01

    With LabVIEW Real-Time, you can choose from a variety of RT Series hardware. Add a real-time data acquisition component into a larger measurement and automation system or create a single stand-alone real-time solution with data acquisition, signal conditioning, motion control, RS-232, GPIB instrumentation, and Ethernet connectivity. With the various hardware options, you can create a system to meet your precise needs today, while the modularity of the system means you can add to the solution as your system requirements grow. If you are interested in Reliable and Deterministic systems for Measurement and Automation, you will profit from this seminar. Agenda: Real-Time Overview LabVIEW RT Hardware Platforms - Linux on PXI Programming with LabVIEW RT Real-Time Operating Systems concepts Timing Applications Data Transfer

  2. A Model to Explain the Emergence of Reward Expectancy neurons using Reinforcement Learning and Neural Network

    OpenAIRE

    Shinya, Ishii; Munetaka, Shidara; Katsunari, Shibata

    2006-01-01

    In an experiment of multi-trial task to obtain a reward, reward expectancy neurons,###which responded only in the non-reward trials that are necessary to advance###toward the reward, have been observed in the anterior cingulate cortex of monkeys.###In this paper, to explain the emergence of the reward expectancy neuron in###terms of reinforcement learning theory, a model that consists of a recurrent neural###network trained based on reinforcement learning is proposed. The analysis of the###hi...

  3. FMRQ-A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks.

    Science.gov (United States)

    Zhang, Zhen; Zhao, Dongbin; Gao, Junwei; Wang, Dongqing; Dai, Yujie

    2017-06-01

    In this paper, we propose a multiagent reinforcement learning algorithm dealing with fully cooperative tasks. The algorithm is called frequency of the maximum reward Q-learning (FMRQ). FMRQ aims to achieve one of the optimal Nash equilibria so as to optimize the performance index in multiagent systems. The frequency of obtaining the highest global immediate reward instead of immediate reward is used as the reinforcement signal. With FMRQ each agent does not need the observation of the other agents' actions and only shares its state and reward at each step. We validate FMRQ through case studies of repeated games: four cases of two-player two-action and one case of three-player two-action. It is demonstrated that FMRQ can converge to one of the optimal Nash equilibria in these cases. Moreover, comparison experiments on tasks with multiple states and finite steps are conducted. One is box-pushing and the other one is distributed sensor network problem. Experimental results show that the proposed algorithm outperforms others with higher performance.

  4. A New Learning Control System for Basketball Free Throws Based on Real Time Video Image Processing and Biofeedback

    Directory of Open Access Journals (Sweden)

    R. Sarang

    2018-02-01

    Full Text Available Shooting free throws plays an important role in basketball. The major problem in performing a correct free throw seems to be inappropriate training. Training is performed offline and it is often not that persistent. The aim of this paper is to consciously modify and control the free throw using biofeedback. Elbow and shoulder dynamics are calculated by an image processing technique equipped with a video image acquisition system. The proposed setup in this paper, named learning control system, is able to quantify and provide feedback of the above parameters in real time as audio signals. Therefore, it yielded to performing a correct learning and conscious control of shooting. Experimental results showed improvements in the free throw shooting style including shot pocket and locked position. The mean values of elbow and shoulder angles were controlled approximately on 89o and 26o, for shot pocket and also these angles were tuned approximately on 180o and 47o respectively for the locked position (closed to the desired pattern of the free throw based on valid FIBA references. Not only the mean values enhanced but also the standard deviations of these angles decreased meaningfully, which shows shooting style convergence and uniformity. Also, in training conditions, the average percentage of making successful free throws increased from about 64% to even 87% after using this setup and in competition conditions the average percentage of successful free throws enhanced about 20%, although using the learning control system may not be the only reason for these outcomes. The proposed system is easy to use, inexpensive, portable and real time applicable.

  5. Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning.

    Science.gov (United States)

    Zhu, Lusha; Mathewson, Kyle E; Hsu, Ming

    2012-01-31

    Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents' beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs.

  6. Concepts of real time and semi-real time material control

    International Nuclear Information System (INIS)

    Lovett, J.E.

    1975-01-01

    After a brief consideration of the traditional material balance accounting on an MBA basis, this paper explores the basic concepts of real time and semi-real time material control, together with some of the major problems to be solved. Three types of short-term material control are discussed: storage, batch processing, and continuous processing. (DLC)

  7. Reinforcement Learning with Autonomous Small Unmanned Aerial Vehicles in Cluttered Environments

    Science.gov (United States)

    Tran, Loc; Cross, Charles; Montague, Gilbert; Motter, Mark; Neilan, James; Qualls, Garry; Rothhaar, Paul; Trujillo, Anna; Allen, B. Danette

    2015-01-01

    We present ongoing work in the Autonomy Incubator at NASA Langley Research Center (LaRC) exploring the efficacy of a data set aggregation approach to reinforcement learning for small unmanned aerial vehicle (sUAV) flight in dense and cluttered environments with reactive obstacle avoidance. The goal is to learn an autonomous flight model using training experiences from a human piloting a sUAV around static obstacles. The training approach uses video data from a forward-facing camera that records the human pilot's flight. Various computer vision based features are extracted from the video relating to edge and gradient information. The recorded human-controlled inputs are used to train an autonomous control model that correlates the extracted feature vector to a yaw command. As part of the reinforcement learning approach, the autonomous control model is iteratively updated with feedback from a human agent who corrects undesired model output. This data driven approach to autonomous obstacle avoidance is explored for simulated forest environments furthering autonomous flight under the tree canopy research. This enables flight in previously inaccessible environments which are of interest to NASA researchers in Earth and Atmospheric sciences.

  8. Real Time Systems

    DEFF Research Database (Denmark)

    Christensen, Knud Smed

    2000-01-01

    Describes fundamentals of parallel programming and a kernel for that. Describes methods for modelling and checking parallel problems. Real time problems.......Describes fundamentals of parallel programming and a kernel for that. Describes methods for modelling and checking parallel problems. Real time problems....

  9. Real time expert systems

    International Nuclear Information System (INIS)

    Asami, Tohru; Hashimoto, Kazuo; Yamamoto, Seiichi

    1992-01-01

    Recently, aiming at the application to the plant control for nuclear reactors and traffic and communication control, the research and the practical use of the expert system suitable to real time processing have become conspicuous. In this report, the condition for the required function to control the object that dynamically changes within a limited time is presented, and the technical difference between the real time expert system developed so as to satisfy it and the expert system of conventional type is explained with the actual examples and from theoretical aspect. The expert system of conventional type has the technical base in the problem-solving equipment originating in STRIPS. The real time expert system is applied to the fields accompanied by surveillance and control, to which conventional expert system is hard to be applied. The requirement for the real time expert system, the example of the real time expert system, and as the techniques of realizing real time processing, the realization of interruption processing, dispersion processing, and the mechanism of maintaining the consistency of knowledge are explained. (K.I.)

  10. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Science.gov (United States)

    Kato, Ayaka; Morita, Kenji

    2016-10-01

    It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2) value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning

  11. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Directory of Open Access Journals (Sweden)

    Ayaka Kato

    2016-10-01

    Full Text Available It has been suggested that dopamine (DA represents reward-prediction-error (RPE defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1 decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2 value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i slowdown of behavior by post-training blockade of DA signaling, (ii observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems

  12. Decision Making in Reinforcement Learning Using a Modified Learning Space Based on the Importance of Sensors

    Directory of Open Access Journals (Sweden)

    Yasutaka Kishima

    2013-01-01

    Full Text Available Many studies have been conducted on the application of reinforcement learning (RL to robots. A robot which is made for general purpose has redundant sensors or actuators because it is difficult to assume an environment that the robot will face and a task that the robot must execute. In this case, -space on RL contains redundancy so that the robot must take much time to learn a given task. In this study, we focus on the importance of sensors with regard to a robot’s performance of a particular task. The sensors that are applicable to a task differ according to the task. By using the importance of the sensors, we try to adjust the state number of the sensors and to reduce the size of -space. In this paper, we define the measure of importance of a sensor for a task with the correlation between the value of each sensor and reward. A robot calculates the importance of the sensors and makes the size of -space smaller. We propose the method which reduces learning space and construct the learning system by putting it in RL. In this paper, we confirm the effectiveness of our proposed system with an experimental robot.

  13. Real-time intelligent pattern recognition algorithm for surface EMG signals

    Directory of Open Access Journals (Sweden)

    Jahed Mehran

    2007-12-01

    Full Text Available Abstract Background Electromyography (EMG is the study of muscle function through the inquiry of electrical signals that the muscles emanate. EMG signals collected from the surface of the skin (Surface Electromyogram: sEMG can be used in different applications such as recognizing musculoskeletal neural based patterns intercepted for hand prosthesis movements. Current systems designed for controlling the prosthetic hands either have limited functions or can only be used to perform simple movements or use excessive amount of electrodes in order to achieve acceptable results. In an attempt to overcome these problems we have proposed an intelligent system to recognize hand movements and have provided a user assessment routine to evaluate the correctness of executed movements. Methods We propose to use an intelligent approach based on adaptive neuro-fuzzy inference system (ANFIS integrated with a real-time learning scheme to identify hand motion commands. For this purpose and to consider the effect of user evaluation on recognizing hand movements, vision feedback is applied to increase the capability of our system. By using this scheme the user may assess the correctness of the performed hand movement. In this work a hybrid method for training fuzzy system, consisting of back-propagation (BP and least mean square (LMS is utilized. Also in order to optimize the number of fuzzy rules, a subtractive clustering algorithm has been developed. To design an effective system, we consider a conventional scheme of EMG pattern recognition system. To design this system we propose to use two different sets of EMG features, namely time domain (TD and time-frequency representation (TFR. Also in order to decrease the undesirable effects of the dimension of these feature sets, principle component analysis (PCA is utilized. Results In this study, the myoelectric signals considered for classification consists of six unique hand movements. Features chosen for EMG signal

  14. Real-time well condition monitoring in extended reach wells

    Energy Technology Data Exchange (ETDEWEB)

    Kucs, R.; Spoerker, H.F. [OMV Austria Exploration and Production GmbH, Gaenserndorf (Austria); Thonhauser, G. [Montanuniversitaet Leoben (Austria)

    2008-10-23

    Ever rising daily operating cost for offshore operations make the risk of running into drilling problems due to torque and drag developments in extended reach applications a growing concern. One option to reduce cost related to torque and drag problems can be to monitor torque and drag trends in real time without additional workload on the platform drilling team. To evaluate observed torque or drag trends it is necessary to automatically recognize operations and to have a 'standard value' to compare the measurements to. The presented systematic approach features both options - fully automated operations recognition and real time analysis. Trends can be discussed between rig- and shore-based teams, and decisions can be based on up to date information. Since the system is focused on visualization of real-time torque and drag trends, instead of highly complex and repeated simulations, calculation time is reduced by comparing the real-time rig data against predictions imported from a commercial drilling engineering application. The system allows reacting to emerging stuck pipe situations or developing cuttings beds long before the situations become severe enough to result in substantial lost time. The ability to compare real-time data with historical data from the same or other wells makes the system a valuable tool in supporting a learning organization. The system has been developed in a joint research initiative for field application on the development of an offshore heavy oil field in New Zealand. (orig.)

  15. Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators.

    Science.gov (United States)

    Yang, Qinmin; Jagannathan, Sarangapani

    2012-04-01

    In this paper, reinforcement learning state- and output-feedback-based adaptive critic controller designs are proposed by using the online approximators (OLAs) for a general multi-input and multioutput affine unknown nonlinear discretetime systems in the presence of bounded disturbances. The proposed controller design has two entities, an action network that is designed to produce optimal signal and a critic network that evaluates the performance of the action network. The critic estimates the cost-to-go function which is tuned online using recursive equations derived from heuristic dynamic programming. Here, neural networks (NNs) are used both for the action and critic whereas any OLAs, such as radial basis functions, splines, fuzzy logic, etc., can be utilized. For the output-feedback counterpart, an additional NN is designated as the observer to estimate the unavailable system states, and thus, separation principle is not required. The NN weight tuning laws for the controller schemes are also derived while ensuring uniform ultimate boundedness of the closed-loop system using Lyapunov theory. Finally, the effectiveness of the two controllers is tested in simulation on a pendulum balancing system and a two-link robotic arm system.

  16. SignalR real-time application cookbook

    CERN Document Server

    Vespa, Roberto

    2014-01-01

    This book contains illustrated code examples to help you create real-time, asynchronous, and bi-directional client-server applications. Each recipe will concentrate on one specific aspect of application development with SignalR showing you how that aspect can be used proficiently. Different levels of developers will find this book useful. Beginners will be able to learn all the fundamental concepts of SignalR, quickly becoming productive in a difficult arena. Experienced programmers will find in this book a handy and useful collection of ready-made solutions to common use cases, which they wil

  17. Real-Time processing of Big Data with ScyllaDB

    CERN Multimedia

    CERN. Geneva; Martinez Pedreira, Miguel

    2018-01-01

    ScyllaDB: achieving 1 million operations/sec with stable and consistent real time latencies This talk will present ScyllaDB, a highly available Real-time Big Data Database that can achieve high throughput without compromising latencies or availability. ScyllaDB is API-compatible with Apache Cassandra but employs a different internal architecture to make sure that operational capacity is increased while the maintenance burden is reduced. It provides everything that a new-world database must provide: horizontal (infinite) scaling, no single point of failure, high availability and excellent performance, while keeping a sensible amount of operational efforts. Some of the key points that make ScyllaDB very efficient are its fully asynchronous operations and the smart integration with the kernel and hardware. You will learn about what makes ScyllaDB special in the crowded space of NoSQL solutions and how it can be used to power a wide variety of workloads: from real time bidding to the experiment data from the ALI...

  18. Active-learning strategies: the use of a game to reinforce learning in nursing education. A case study.

    Science.gov (United States)

    Boctor, Lisa

    2013-03-01

    The majority of nursing students are kinesthetic learners, preferring a hands-on, active approach to education. Research shows that active-learning strategies can increase student learning and satisfaction. This study looks at the use of one active-learning strategy, a Jeopardy-style game, 'Nursopardy', to reinforce Fundamentals of Nursing material, aiding in students' preparation for a standardized final exam. The game was created keeping students varied learning styles and the NCLEX blueprint in mind. The blueprint was used to create 5 categories, with 26 total questions. Student survey results, using a five-point Likert scale showed that they did find this learning method enjoyable and beneficial to learning. More research is recommended regarding learning outcomes, when using active-learning strategies, such as games. Copyright © 2012 Elsevier Ltd. All rights reserved.

  19. Inovasi Pengembangan Metode Pembelajaran Dengan Menggunakan Real Avatar-Based Learning Dalam Pendidikan Keperawatan: A Bridge Connection Theory and Practice di STIKEP PPNI Jawa Barat

    Directory of Open Access Journals (Sweden)

    Linlin Lindayani

    2017-11-01

    Full Text Available Virtual learning is one of the most effective and efficient learning methods, especially in improving skills including soft skills. In Indonesia, the problem-based learning methodology (PBL is the most widely applied but has weaknesses in helping to bridge the students in the application of theory to practice. The purpose of this research was to develop learning method by using avatar-based learning to self-directed learning, which is one of the main competencies of nursing education that is lifelong learning. This study was quasi experiment with one group of intervention. Respondents in this study were nursing students of stratum 1 level four. The Self-Directed Learning Instrument (SDLI was used to measure this research outcome. The paired t-test was used to evaluate the effectiveness of this method against outcomes. A total of 40 students agreed to participate in the study. Before intervention, the mean score for the total self-directed learning score was 72.3 (SD = 8.97. Based on the results of paired t-test about the effectiveness of real-avatar-based learning on self-directed learning, was found that after applied real-avatar based-learning for Medical Surgical Nursing III course there were an increasing of the self-directed learning (different value = 4.56, p value = 0.001. Learning method by applying real avatar-based learning was effective in improve student’s self-directed learning especially on the aspect of improvement of planning, implementation and self-monitoring. For further research, using more rigors with other outcomes is needed to reinforce the effectiveness of this method

  20. Safe robot execution in model-based reinforcement learning

    OpenAIRE

    Martínez Martínez, David; Alenyà Ribas, Guillem; Torras, Carme

    2015-01-01

    Task learning in robotics requires repeatedly executing the same actions in different states to learn the model of the task. However, in real-world domains, there are usually sequences of actions that, if executed, may produce unrecoverable errors (e.g. breaking an object). Robots should avoid repeating such errors when learning, and thus explore the state space in a more intelligent way. This requires identifying dangerous action effects to avoid including such actions in the generated plans...

  1. Novel Advancements in Internet-Based Real-Time Data Technologies

    Science.gov (United States)

    Myers, Gerry; Welch, Clara L. (Technical Monitor)

    2002-01-01

    AZ Technology has been working with NASA MSFC (Marshall Space Flight Center) to find ways to make it easier for remote experimenters (RPI's) to monitor their International Space Station (ISS) payloads in real-time from anywhere using standard/familiar devices. That effort resulted in a product called 'EZStream' which is in use on several ISS-related projects. Although the initial implementation is geared toward ISS, the architecture and lessons learned are applicable to other space-related programs. This paper begins with a brief history on why Internet-based real-time data is important and where EZStream or products like it fit in the flow of data from orbit to experimenter/researcher. A high-level architecture is then presented along with explanations of the components used. A combination of commercial-off-the-shelf (COTS), Open Source, and custom components are discussed. The use of standard protocols is shown along with some details on how data flows between server and client. Some examples are presented to illustrate how a system like EZStream can be used in real world applications and how care was taken to make the end-user experience as painless as possible. A system such as EZStream has potential in the commercial (non-ISS) arena and some possibilities are presented. During the development and fielding of EZStream, a lot was learned. Good and not so good decisions were made. Some of the major lessons learned will be shared. The development of EZStream is continuing and the future of EZStream will be discussed to shed some light over the technological horizon.

  2. Adolescent-specific patterns of behavior and neural activity during social reinforcement learning

    OpenAIRE

    Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ

    2014-01-01

    Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated....

  3. High and low temperatures have unequal reinforcing properties in Drosophila spatial learning.

    Science.gov (United States)

    Zars, Melissa; Zars, Troy

    2006-07-01

    Small insects regulate their body temperature solely through behavior. Thus, sensing environmental temperature and implementing an appropriate behavioral strategy can be critical for survival. The fly Drosophila melanogaster prefers 24 degrees C, avoiding higher and lower temperatures when tested on a temperature gradient. Furthermore, temperatures above 24 degrees C have negative reinforcing properties. In contrast, we found that flies have a preference in operant learning experiments for a low-temperature-associated position rather than the 24 degrees C alternative in the heat-box. Two additional differences between high- and low-temperature reinforcement, i.e., temperatures above and below 24 degrees C, were found. Temperatures equally above and below 24 degrees C did not reinforce equally and only high temperatures supported increased memory performance with reversal conditioning. Finally, low- and high-temperature reinforced memories are similarly sensitive to two genetic mutations. Together these results indicate the qualitative meaning of temperatures below 24 degrees C depends on the dynamics of the temperatures encountered and that the reinforcing effects of these temperatures depend on at least some common genetic components. Conceptualizing these results using the Wolf-Heisenberg model of operant conditioning, we propose the maximum difference in experienced temperatures determines the magnitude of the reinforcement input to a conditioning circuit.

  4. Spared internal but impaired external reward prediction error signals in major depressive disorder during reinforcement learning.

    Science.gov (United States)

    Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris

    2017-01-01

    Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.

  5. Constructivist Learning Environment During Virtual and Real Laboratory Activities

    Directory of Open Access Journals (Sweden)

    Ari Widodo

    2017-04-01

    Full Text Available Laboratory activities and constructivism are two notions that have been playing significant roles in science education. Despite common beliefs about the importance of laboratory activities, reviews reported inconsistent results about the effectiveness of laboratory activities. Since laboratory activities can be expensive and take more time, there is an effort to introduce virtual laboratory activities. This study aims at exploring the learning environment created by a virtual laboratory and a real laboratory. A quasi experimental study was conducted at two grade ten classes at a state high school in Bandung, Indonesia. Data were collected using a questionnaire called Constructivist Learning Environment Survey (CLES before and after the laboratory activities. The results show that both types of laboratories can create constructivist learning environments. Each type of laboratory activity, however, may be stronger in improving certain aspects compared to the other. While a virtual laboratory is stronger in improving critical voice and personal relevance, real laboratory activities promote aspects of personal relevance, uncertainty and student negotiation. This study suggests that instead of setting one type of laboratory against the other, lessons and follow up studies should focus on how to combine both types of laboratories to support better learning.

  6. Emotion in reinforcement learning agents and robots: A survey

    OpenAIRE

    Moerland, T.M.; Broekens, D.J.; Jonker, C.M.

    2018-01-01

    This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action selection. Therefore, computational emotion models are usually grounded in the agent's decision making architecture, of which RL is an important subclass. Studying emotions in RL-based agents is useful for ...

  7. Process algebra with timing : real time and discrete time

    NARCIS (Netherlands)

    Baeten, J.C.M.; Middelburg, C.A.; Bergstra, J.A.; Ponse, A.J.; Smolka, S.A.

    2001-01-01

    We present real time and discrete time versions of ACP with absolute timing and relative timing. The starting-point is a new real time version with absolute timing, called ACPsat, featuring urgent actions and a delay operator. The discrete time versions are conservative extensions of the discrete

  8. Process algebra with timing: Real time and discrete time

    NARCIS (Netherlands)

    Baeten, J.C.M.; Middelburg, C.A.

    1999-01-01

    We present real time and discrete time versions of ACP with absolute timing and relative timing. The startingpoint is a new real time version with absolute timing, called ACPsat , featuring urgent actions and a delay operator. The discrete time versions are conservative extensions of the discrete

  9. Brain Circuits of Methamphetamine Place Reinforcement Learning: The Role of the Hippocampus-VTA Loop.

    Science.gov (United States)

    Keleta, Yonas B; Martinez, Joe L

    2012-03-01

    The reinforcing effects of addictive drugs including methamphetamine (METH) involve the midbrain ventral tegmental area (VTA). VTA is primary source of dopamine (DA) to the nucleus accumbens (NAc) and the ventral hippocampus (VHC). These three brain regions are functionally connected through the hippocampal-VTA loop that includes two main neural pathways: the bottom-up pathway and the top-down pathway. In this paper, we take the view that addiction is a learning process. Therefore, we tested the involvement of the hippocampus in reinforcement learning by studying conditioned place preference (CPP) learning by sequentially conditioning each of the three nuclei in either the bottom-up order of conditioning; VTA, then VHC, finally NAc, or the top-down order; VHC, then VTA, finally NAc. Following habituation, the rats underwent experimental modules consisting of two conditioning trials each followed by immediate testing (test 1 and test 2) and two additional tests 24 h (test 3) and/or 1 week following conditioning (test 4). The module was repeated three times for each nucleus. The results showed that METH, but not Ringer's, produced positive CPP following conditioning each brain area in the bottom-up order. In the top-down order, METH, but not Ringer's, produced either an aversive CPP or no learning effect following conditioning each nucleus of interest. In addition, METH place aversion was antagonized by coadministration of the N-methyl-d-aspartate (NMDA) receptor antagonist MK801, suggesting that the aversion learning was an NMDA receptor activation-dependent process. We conclude that the hippocampus is a critical structure in the reward circuit and hence suggest that the development of target-specific therapeutics for the control of addiction emphasizes on the hippocampus-VTA top-down connection.

  10. Quality of E-Learners’ Time and Learning Performance Beyond Quantitative Time-on-Task

    Directory of Open Access Journals (Sweden)

    Margarida Romero

    2011-06-01

    Full Text Available AbstractAlong with the amount of time spent learning (or time-on-task, the quality of learning time has a real influence on learning performance. Quality of time in online learning depends on students’ time availability and their willingness to devote quality cognitive time to learning activities. However, the quantity and quality of the time spent by adult e-learners on learning activities can be reduced by professional, family, and social commitments. Considering that the main time pattern followed by most adult e-learners is a professional one, it may be beneficial for online education programs to offer a certain degree of flexibility in instructional time that might allow adult learners to adjust their learning times to their professional constraints. However, using the time left over once professional and family requirements have been fulfilled could lead to a reduction in quality time for learning. This paper starts by introducing the concept of quality of learning time from an online student-centred perspective. The impact of students’ time-related variables (working hours, time-on-task engagement, time flexibility, time of day, day of week is then analyzed according to individual and collaborative grades achieved during an online master’s degree program. The data show that both students’ time flexibility (r = .98 and especially their availability to learn in the morning are related to better grades in individual (r = .93 and collaborative activities (r = .46.

  11. Real-time radiography

    International Nuclear Information System (INIS)

    Bossi, R.H.; Oien, C.T.

    1981-01-01

    Real-time radiography is used for imaging both dynamic events and static objects. Fluorescent screens play an important role in converting radiation to light, which is then observed directly or intensified and detected. The radiographic parameters for real-time radiography are similar to conventional film radiography with special emphasis on statistics and magnification. Direct-viewing fluoroscopy uses the human eye as a detector of fluorescent screen light or the light from an intensifier. Remote-viewing systems replace the human observer with a television camera. The remote-viewing systems have many advantages over the direct-viewing conditions such as safety, image enhancement, and the capability to produce permanent records. This report reviews real-time imaging system parameters and components

  12. Real-time vision systems

    Energy Technology Data Exchange (ETDEWEB)

    Johnson, R.; Hernandez, J.E.; Lu, Shin-yee [Lawrence Livermore National Lab., CA (United States)

    1994-11-15

    Many industrial and defence applications require an ability to make instantaneous decisions based on sensor input of a time varying process. Such systems are referred to as `real-time systems` because they process and act on data as it occurs in time. When a vision sensor is used in a real-time system, the processing demands can be quite substantial, with typical data rates of 10-20 million samples per second. A real-time Machine Vision Laboratory (MVL) was established in FY94 to extend our years of experience in developing computer vision algorithms to include the development and implementation of real-time vision systems. The laboratory is equipped with a variety of hardware components, including Datacube image acquisition and processing boards, a Sun workstation, and several different types of CCD cameras, including monochrome and color area cameras and analog and digital line-scan cameras. The equipment is reconfigurable for prototyping different applications. This facility has been used to support several programs at LLNL, including O Division`s Peacemaker and Deadeye Projects as well as the CRADA with the U.S. Textile Industry, CAFE (Computer Aided Fabric Inspection). To date, we have successfully demonstrated several real-time applications: bullet tracking, stereo tracking and ranging, and web inspection. This work has been documented in the ongoing development of a real-time software library.

  13. Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.

    Science.gov (United States)

    Schönberg, Tom; Daw, Nathaniel D; Joel, Daphna; O'Doherty, John P

    2007-11-21

    The computational framework of reinforcement learning has been used to forward our understanding of the neural mechanisms underlying reward learning and decision-making behavior. It is known that humans vary widely in their performance in decision-making tasks. Here, we used a simple four-armed bandit task in which subjects are almost evenly split into two groups on the basis of their performance: those who do learn to favor choice of the optimal action and those who do not. Using models of reinforcement learning we sought to determine the neural basis of these intrinsic differences in performance by scanning both groups with functional magnetic resonance imaging. We scanned 29 subjects while they performed the reward-based decision-making task. Our results suggest that these two groups differ markedly in the degree to which reinforcement learning signals in the striatum are engaged during task performance. While the learners showed robust prediction error signals in both the ventral and dorsal striatum during learning, the nonlearner group showed a marked absence of such signals. Moreover, the magnitude of prediction error signals in a region of dorsal striatum correlated significantly with a measure of behavioral performance across all subjects. These findings support a crucial role of prediction error signals, likely originating from dopaminergic midbrain neurons, in enabling learning of action selection preferences on the basis of obtained rewards. Thus, spontaneously observed individual differences in decision making performance demonstrate the suggested dependence of this type of learning on the functional integrity of the dopaminergic striatal system in humans.

  14. Toward a real-time system for temporal enhanced ultrasound-guided prostate biopsy.

    Science.gov (United States)

    Azizi, Shekoofeh; Van Woudenberg, Nathan; Sojoudi, Samira; Li, Ming; Xu, Sheng; Abu Anas, Emran M; Yan, Pingkun; Tahmasebi, Amir; Kwak, Jin Tae; Turkbey, Baris; Choyke, Peter; Pinto, Peter; Wood, Bradford; Mousavi, Parvin; Abolmaesumi, Purang

    2018-03-27

    We have previously proposed temporal enhanced ultrasound (TeUS) as a new paradigm for tissue characterization. TeUS is based on analyzing a sequence of ultrasound data with deep learning and has been demonstrated to be successful for detection of cancer in ultrasound-guided prostate biopsy. Our aim is to enable the dissemination of this technology to the community for large-scale clinical validation. In this paper, we present a unified software framework demonstrating near-real-time analysis of ultrasound data stream using a deep learning solution. The system integrates ultrasound imaging hardware, visualization and a deep learning back-end to build an accessible, flexible and robust platform. A client-server approach is used in order to run computationally expensive algorithms in parallel. We demonstrate the efficacy of the framework using two applications as case studies. First, we show that prostate cancer detection using near-real-time analysis of RF and B-mode TeUS data and deep learning is feasible. Second, we present real-time segmentation of ultrasound prostate data using an integrated deep learning solution. The system is evaluated for cancer detection accuracy on ultrasound data obtained from a large clinical study with 255 biopsy cores from 157 subjects. It is further assessed with an independent dataset with 21 biopsy targets from six subjects. In the first study, we achieve area under the curve, sensitivity, specificity and accuracy of 0.94, 0.77, 0.94 and 0.92, respectively, for the detection of prostate cancer. In the second study, we achieve an AUC of 0.85. Our results suggest that TeUS-guided biopsy can be potentially effective for the detection of prostate cancer.

  15. Important variables in explaining real-time peak price in the independent power market of Ontario

    International Nuclear Information System (INIS)

    Rueda, I.E.A.; Marathe, A.

    2005-01-01

    This paper uses support vector machines (SVM) based learning algorithm to select important variables that help explain the real-time peak electricity price in the Ontario market. The Ontario market was opened to competition only in May 2002. Due to the limited number of observations available, finding a set of variables that can explain the independent power market of Ontario (IMO) real-time peak price is a significant challenge for the traders and analysts. The kernel regressions of the explanatory variables on the IMO real-time average peak price show that non-linear dependencies exist between the explanatory variables and the IMO price. This non-linear relationship combined with the low variable-observation ratio rule out conventional statistical analysis. Hence, we use an alternative machine learning technique to find the important explanatory variables for the IMO real-time average peak price. SVM sensitivity analysis based results find that the IMO's predispatch average peak price, the actual import peak volume, the peak load of the Ontario market and the net available supply after accounting for load (energy excess) are some of the most important variables in explaining the real-time average peak price in the Ontario electricity market. (author)

  16. Real-time fMRI neurofeedback: Progress and challenges

    Science.gov (United States)

    Sulzer, J.; Haller, S.; Scharnowski, F.; Weiskopf, N.; Birbaumer, N.; Blefari, M.L.; Bruehl, A.B.; Cohen, L.G.; deCharms, R.C.; Gassert, R.; Goebel, R.; Herwig, U.; LaConte, S.; Linden, D.; Luft, A.; Seifritz, E.; Sitaram, R.

    2016-01-01

    In February of 2012, the first international conference on real time functional magnetic resonance imaging (rtfMRI) neurofeedback was held at the Swiss Federal Institute of Technology Zurich (ETHZ), Switzerland. This review summarizes progress in the field, introduces current debates, elucidates open questions, and offers viewpoints derived from the conference. The review offers perspectives on study design, scientific and clinical applications, rtfMRI learning mechanisms and future outlook. PMID:23541800

  17. Reinforcement Learning in Distributed Domains: Beyond Team Games

    Science.gov (United States)

    Wolpert, David H.; Sill, Joseph; Turner, Kagan

    2000-01-01

    Distributed search algorithms are crucial in dealing with large optimization problems, particularly when a centralized approach is not only impractical but infeasible. Many machine learning concepts have been applied to search algorithms in order to improve their effectiveness. In this article we present an algorithm that blends Reinforcement Learning (RL) and hill climbing directly, by using the RL signal to guide the exploration step of a hill climbing algorithm. We apply this algorithm to the domain of a constellations of communication satellites where the goal is to minimize the loss of importance weighted data. We introduce the concept of 'ghost' traffic, where correctly setting this traffic induces the satellites to act to optimize the world utility. Our results indicated that the bi-utility search introduced in this paper outperforms both traditional hill climbing algorithms and distributed RL approaches such as team games.

  18. Dopaminergic control of motivation and reinforcement learning: a closed-circuit account for reward-oriented behavior.

    Science.gov (United States)

    Morita, Kenji; Morishima, Mieko; Sakai, Katsuyuki; Kawaguchi, Yasuo

    2013-05-15

    Humans and animals take actions quickly when they expect that the actions lead to reward, reflecting their motivation. Injection of dopamine receptor antagonists into the striatum has been shown to slow such reward-seeking behavior, suggesting that dopamine is involved in the control of motivational processes. Meanwhile, neurophysiological studies have revealed that phasic response of dopamine neurons appears to represent reward prediction error, indicating that dopamine plays central roles in reinforcement learning. However, previous attempts to elucidate the mechanisms of these dopaminergic controls have not fully explained how the motivational and learning aspects are related and whether they can be understood by the way the activity of dopamine neurons itself is controlled by their upstream circuitries. To address this issue, we constructed a closed-circuit model of the corticobasal ganglia system based on recent findings regarding intracortical and corticostriatal circuit architectures. Simulations show that the model could reproduce the observed distinct motivational effects of D1- and D2-type dopamine receptor antagonists. Simultaneously, our model successfully explains the dopaminergic representation of reward prediction error as observed in behaving animals during learning tasks and could also explain distinct choice biases induced by optogenetic stimulation of the D1 and D2 receptor-expressing striatal neurons. These results indicate that the suggested roles of dopamine in motivational control and reinforcement learning can be understood in a unified manner through a notion that the indirect pathway of the basal ganglia represents the value of states/actions at a previous time point, an empirically driven key assumption of our model.

  19. Real-time dosimetry system in catheterisation laboratory: utility as a learning tool in radiation protection

    International Nuclear Information System (INIS)

    Pinto Monedero, M.; Rodriguez Cobo, C.; Pifarre Martinez, X.; Ruiz Martin, J.; Barros Candelero, J.M.; Goicolea Ruigomez, J.; Diaz Blaires, G.; Garcia Lunar, I.

    2015-01-01

    Document available in abstract form only. Full text of publication follows: Workers at the catheter laboratory are among the most exposed to ionising radiation in hospitals. However, it is difficult to be certain of the radiation doses received by the staff, as personal dosemeters are often misused, and thus, the dose history is not reliable. Moreover, the information provided by personal dosemeters corresponds to the monthly accumulated dose, so corrective actions tends to be delayed. The purpose of this work is, on the one hand, to use a real-time dosimetry system to establish the occupational doses per procedure of workers at the catheter laboratories and, on the other hand, to evaluate its utility as a learning tool for radiation protection purposes with the simultaneous video recording of the interventions. (authors)

  20. Cerebellar and prefrontal cortex contributions to adaptation, strategies, and reinforcement learning.

    Science.gov (United States)

    Taylor, Jordan A; Ivry, Richard B

    2014-01-01

    Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. © 2014 Elsevier B.V. All rights reserved.

  1. A Reinforcement Learning Framework for Spiking Networks with Dynamic Synapses

    Directory of Open Access Journals (Sweden)

    Karim El-Laithy

    2011-01-01

    Full Text Available An integration of both the Hebbian-based and reinforcement learning (RL rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.

  2. The role of multiple neuromodulators in reinforcement learning that is based on competition between eligibility traces

    Directory of Open Access Journals (Sweden)

    Marco A Huertas

    2016-12-01

    Full Text Available The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment, and how to stop learning once the target behaviors are attained (stopping rule. To address the first problem, synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although these were mere theoretical constructs, recent experiements have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP and one for long-term depression (LTD, each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different

  3. The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces.

    Science.gov (United States)

    Huertas, Marco A; Schwettmann, Sarah E; Shouval, Harel Z

    2016-01-01

    The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL) refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment), and how to stop learning once the target behaviors are attained (stopping rule). To address the first problem synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although, these were mere theoretical constructs, recent experiments have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward nucleus cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP) and one for long-term depression (LTD), each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although, the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different neuromodulators for

  4. Real-time beam monitoring in scanned proton therapy

    Science.gov (United States)

    Klimpki, G.; Eichin, M.; Bula, C.; Rechsteiner, U.; Psoroulas, S.; Weber, D. C.; Lomax, A.; Meer, D.

    2018-05-01

    When treating cancerous tissues with protons beams, many centers make use of a step-and-shoot irradiation technique, in which the beam is steered to discrete grid points in the tumor volume. For safety reasons, the irradiation is supervised by an independent monitoring system validating cyclically that the correct amount of protons has been delivered to the correct position in the patient. Whenever unacceptable inaccuracies are detected, the irradiation can be interrupted to reinforce a high degree of radiation protection. At the Paul Scherrer Institute, we plan to irradiate tumors continuously. By giving up the idea of discrete grid points, we aim to be faster and more flexible in the irradiation. But the increase in speed and dynamics necessitates a highly responsive monitoring system to guarantee the same level of patient safety as for conventional step-and-shoot irradiations. Hence, we developed and implemented real-time monitoring of the proton beam current and position. As such, we read out diagnostic devices with 100 kHz and compare their signals against safety tolerances in an FPGA. In this paper, we report on necessary software and firmware enhancements of our control system and test their functionality based on three exemplary error scenarios. We demonstrate successful implementation of real-time beam monitoring and, consequently, compliance with international patient safety regulations.

  5. Direct modulation of aberrant brain network connectivity through real-time NeuroFeedback.

    Science.gov (United States)

    Ramot, Michal; Kimmich, Sara; Gonzalez-Castillo, Javier; Roopchansingh, Vinai; Popal, Haroon; White, Emily; Gotts, Stephen J; Martin, Alex

    2017-09-16

    The existence of abnormal connectivity patterns between resting state networks in neuropsychiatric disorders, including Autism Spectrum Disorder (ASD), has been well established. Traditional treatment methods in ASD are limited, and do not address the aberrant network structure. Using real-time fMRI neurofeedback, we directly trained three brain nodes in participants with ASD, in which the aberrant connectivity has been shown to correlate with symptom severity. Desired network connectivity patterns were reinforced in real-time, without participants' awareness of the training taking place. This training regimen produced large, significant long-term changes in correlations at the network level, and whole brain analysis revealed that the greatest changes were focused on the areas being trained. These changes were not found in the control group. Moreover, changes in ASD resting state connectivity following the training were correlated to changes in behavior, suggesting that neurofeedback can be used to directly alter complex, clinically relevant network connectivity patterns.

  6. Direct modulation of aberrant brain network connectivity through real-time NeuroFeedback

    Science.gov (United States)

    Kimmich, Sara; Gonzalez-Castillo, Javier; Roopchansingh, Vinai; Popal, Haroon; White, Emily; Gotts, Stephen J; Martin, Alex

    2017-01-01

    The existence of abnormal connectivity patterns between resting state networks in neuropsychiatric disorders, including Autism Spectrum Disorder (ASD), has been well established. Traditional treatment methods in ASD are limited, and do not address the aberrant network structure. Using real-time fMRI neurofeedback, we directly trained three brain nodes in participants with ASD, in which the aberrant connectivity has been shown to correlate with symptom severity. Desired network connectivity patterns were reinforced in real-time, without participants’ awareness of the training taking place. This training regimen produced large, significant long-term changes in correlations at the network level, and whole brain analysis revealed that the greatest changes were focused on the areas being trained. These changes were not found in the control group. Moreover, changes in ASD resting state connectivity following the training were correlated to changes in behavior, suggesting that neurofeedback can be used to directly alter complex, clinically relevant network connectivity patterns. PMID:28917059

  7. Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder.

    Science.gov (United States)

    Rothkirch, Marcus; Tonn, Jonas; Köhler, Stephan; Sterzer, Philipp

    2017-04-01

    According to current concepts, major depressive disorder is strongly related to dysfunctional neural processing of motivational information, entailing impairments in reinforcement learning. While computational modelling can reveal the precise nature of neural learning signals, it has not been used to study learning-related neural dysfunctions in unmedicated patients with major depressive disorder so far. We thus aimed at comparing the neural coding of reward and punishment prediction errors, representing indicators of neural learning-related processes, between unmedicated patients with major depressive disorder and healthy participants. To this end, a group of unmedicated patients with major depressive disorder (n = 28) and a group of age- and sex-matched healthy control participants (n = 30) completed an instrumental learning task involving monetary gains and losses during functional magnetic resonance imaging. The two groups did not differ in their learning performance. Patients and control participants showed the same level of prediction error-related activity in the ventral striatum and the anterior insula. In contrast, neural coding of reward prediction errors in the medial orbitofrontal cortex was reduced in patients. Moreover, neural reward prediction error signals in the medial orbitofrontal cortex and ventral striatum showed negative correlations with anhedonia severity. Using a standard instrumental learning paradigm we found no evidence for an overall impairment of reinforcement learning in medication-free patients with major depressive disorder. Importantly, however, the attenuated neural coding of reward in the medial orbitofrontal cortex and the relation between anhedonia and reduced reward prediction error-signalling in the medial orbitofrontal cortex and ventral striatum likely reflect an impairment in experiencing pleasure from rewarding events as a key mechanism of anhedonia in major depressive disorder. © The Author (2017). Published by Oxford

  8. Machine Learning-based Transient Brokers for Real-time Classification of the LSST Alert Stream

    Science.gov (United States)

    Narayan, Gautham; Zaidi, Tayeb; Soraisam, Monika; ANTARES Collaboration

    2018-01-01

    The number of transient events discovered by wide-field time-domain surveys already far outstrips the combined followup resources of the astronomical community. This number will only increase as we progress towards the commissioning of the Large Synoptic Survey Telescope (LSST), breaking the community's current followup paradigm. Transient brokers - software to sift through, characterize, annotate and prioritize events for followup - will be a critical tool for managing alert streams in the LSST era. Developing the algorithms that underlie the brokers, and obtaining simulated LSST-like datasets prior to LSST commissioning, to train and test these algorithms are formidable, though not insurmountable challenges. The Arizona-NOAO Temporal Analysis and Response to Events System (ANTARES) is a joint project of the National Optical Astronomy Observatory and the Department of Computer Science at the University of Arizona. We have been developing completely automated methods to characterize and classify variable and transient events from their multiband optical photometry. We describe the hierarchical ensemble machine learning algorithm we are developing, and test its performance on sparse, unevenly sampled, heteroskedastic data from various existing observational campaigns, as well as our progress towards incorporating these into a real-time event broker working on live alert streams from time-domain surveys.

  9. Learning-based traffic signal control algorithms with neighborhood information sharing: An application for sustainable mobility

    Energy Technology Data Exchange (ETDEWEB)

    Aziz, H. M. Abdul [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Zhu, Feng [Purdue University, West Lafayette, IN (United States). Lyles School of Civil Engineering; Ukkusuri, Satish V. [Purdue University, West Lafayette, IN (United States). Lyles School of Civil Engineering

    2017-10-04

    Here, this research applies R-Markov Average Reward Technique based reinforcement learning (RL) algorithm, namely RMART, for vehicular signal control problem leveraging information sharing among signal controllers in connected vehicle environment. We implemented the algorithm in a network of 18 signalized intersections and compare the performance of RMART with fixed, adaptive, and variants of the RL schemes. Results show significant improvement in system performance for RMART algorithm with information sharing over both traditional fixed signal timing plans and real time adaptive control schemes. Additionally, the comparison with reinforcement learning algorithms including Q learning and SARSA indicate that RMART performs better at higher congestion levels. Further, a multi-reward structure is proposed that dynamically adjusts the reward function with varying congestion states at the intersection. Finally, the results from test networks show significant reduction in emissions (CO, CO2, NOx, VOC, PM10) when RL algorithms are implemented compared to fixed signal timings and adaptive schemes.

  10. Essays in real-time forecasting

    OpenAIRE

    Liebermann, Joelle

    2012-01-01

    This thesis contains three essays in the field of real-time econometrics, and more particularlyforecasting.The issue of using data as available in real-time to forecasters, policymakers or financialmarkets is an important one which has only recently been taken on board in the empiricalliterature. Data available and used in real-time are preliminary and differ from ex-postrevised data, and given that data revisions may be quite substantial, the use of latestavailable instead of real-time can s...

  11. The Study of Reinforcement Learning for Traffic Self-Adaptive Control under Multiagent Markov Game Environment

    Directory of Open Access Journals (Sweden)

    Lun-Hui Xu

    2013-01-01

    Full Text Available Urban traffic self-adaptive control problem is dynamic and uncertain, so the states of traffic environment are hard to be observed. Efficient agent which controls a single intersection can be discovered automatically via multiagent reinforcement learning. However, in the majority of the previous works on this approach, each agent needed perfect observed information when interacting with the environment and learned individually with less efficient coordination. This study casts traffic self-adaptive control as a multiagent Markov game problem. The design employs traffic signal control agent (TSCA for each signalized intersection that coordinates with neighboring TSCAs. A mathematical model for TSCAs’ interaction is built based on nonzero-sum markov game which has been applied to let TSCAs learn how to cooperate. A multiagent Markov game reinforcement learning approach is constructed on the basis of single-agent Q-learning. This method lets each TSCA learn to update its Q-values under the joint actions and imperfect information. The convergence of the proposed algorithm is analyzed theoretically. The simulation results show that the proposed method is convergent and effective in realistic traffic self-adaptive control setting.

  12. TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow

    OpenAIRE

    Hafner, Danijar; Davidson, James; Vanhoucke, Vincent

    2017-01-01

    We introduce TensorFlow Agents, an efficient infrastructure paradigm for building parallel reinforcement learning algorithms in TensorFlow. We simulate multiple environments in parallel, and group them to perform the neural network computation on a batch rather than individual observations. This allows the TensorFlow execution engine to parallelize computation, without the need for manual synchronization. Environments are stepped in separate Python processes to progress them in parallel witho...

  13. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia

    Science.gov (United States)

    Markou, Athina; Salamone, John D.; Bussey, Timothy; Mar, Adam; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-01-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu). A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. PMID:23994273

  14. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia.

    Science.gov (United States)

    Markou, Athina; Salamone, John D; Bussey, Timothy J; Mar, Adam C; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

    2013-11-01

    The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu) meeting. A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. Copyright © 2013 Elsevier

  15. Genetic algorithms for adaptive real-time control in space systems

    Science.gov (United States)

    Vanderzijp, J.; Choudry, A.

    1988-01-01

    Genetic Algorithms that are used for learning as one way to control the combinational explosion associated with the generation of new rules are discussed. The Genetic Algorithm approach tends to work best when it can be applied to a domain independent knowledge representation. Applications to real time control in space systems are discussed.

  16. Robust Real-Time Music Transcription with a Compositional Hierarchical Model.

    Science.gov (United States)

    Pesek, Matevž; Leonardis, Aleš; Marolt, Matija

    2017-01-01

    The paper presents a new compositional hierarchical model for robust music transcription. Its main features are unsupervised learning of a hierarchical representation of input data, transparency, which enables insights into the learned representation, as well as robustness and speed which make it suitable for real-world and real-time use. The model consists of multiple layers, each composed of a number of parts. The hierarchical nature of the model corresponds well to hierarchical structures in music. The parts in lower layers correspond to low-level concepts (e.g. tone partials), while the parts in higher layers combine lower-level representations into more complex concepts (tones, chords). The layers are learned in an unsupervised manner from music signals. Parts in each layer are compositions of parts from previous layers based on statistical co-occurrences as the driving force of the learning process. In the paper, we present the model's structure and compare it to other hierarchical approaches in the field of music information retrieval. We evaluate the model's performance for the multiple fundamental frequency estimation. Finally, we elaborate on extensions of the model towards other music information retrieval tasks.

  17. Teaching with Real-time Earthquake Data in jAmaSeis

    Science.gov (United States)

    Bravo, T. K.; Coleman, B.; Taber, J.

    2011-12-01

    Earthquakes can capture the attention of students and inspire them to explore the Earth. The Incorporated Research Institutions in Seismology (IRIS) and Moravian College are collaborating to develop cross-platform software (jAmaSeis) that enables students to access real-time earthquake waveform data. Users can record their own data from several different types of educational seismometers, and they can obtain data in real-time from other jAmaseis users nationwide. Additionally, the ability to stream data from the IRIS Data Management Center (DMC) is under development. Once real-time data is obtained, users of jAmaseis can study seismological concepts in the classroom. The user interface of the software is carefully designed to lead students through the steps to interrogate seismic data following a large earthquake. Users can process data to determine characteristics of seismograms such as time of occurrence, distance from the epicenter to the station, magnitude, and location (via triangulation). Along the way, the software provides graphical clues to assist student interpretations. In addition to the inherent pedagogical features of the software, IRIS provides pre-packaged data and instructional activities to help students learn the analysis steps. After using these activities, students can apply their skills to interpret seismic waves from their own real-time data.

  18. Real-time probabilistic covariance tracking with efficient model update.

    Science.gov (United States)

    Wu, Yi; Cheng, Jian; Wang, Jinqiao; Lu, Hanqing; Wang, Jun; Ling, Haibin; Blasch, Erik; Bai, Li

    2012-05-01

    The recently proposed covariance region descriptor has been proven robust and versatile for a modest computational cost. The covariance matrix enables efficient fusion of different types of features, where the spatial and statistical properties, as well as their correlation, are characterized. The similarity between two covariance descriptors is measured on Riemannian manifolds. Based on the same metric but with a probabilistic framework, we propose a novel tracking approach on Riemannian manifolds with a novel incremental covariance tensor learning (ICTL). To address the appearance variations, ICTL incrementally learns a low-dimensional covariance tensor representation and efficiently adapts online to appearance changes of the target with only O(1) computational complexity, resulting in a real-time performance. The covariance-based representation and the ICTL are then combined with the particle filter framework to allow better handling of background clutter, as well as the temporary occlusions. We test the proposed probabilistic ICTL tracker on numerous benchmark sequences involving different types of challenges including occlusions and variations in illumination, scale, and pose. The proposed approach demonstrates excellent real-time performance, both qualitatively and quantitatively, in comparison with several previously proposed trackers.

  19. Switching Reinforcement Learning for Continuous Action Space

    Science.gov (United States)

    Nagayoshi, Masato; Murao, Hajime; Tamaki, Hisashi

    Reinforcement Learning (RL) attracts much attention as a technique of realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. This difficulty includes a problem of designing a suitable action space of an agent, i.e., satisfying two requirements in trade-off: (i) to keep the characteristics (or structure) of an original search space as much as possible in order to seek strategies that lie close to the optimal, and (ii) to reduce the search space as much as possible in order to expedite the learning process. In order to design a suitable action space adaptively, we propose switching RL model to mimic a process of an infant's motor development in which gross motor skills develop before fine motor skills. Then, a method for switching controllers is constructed by introducing and referring to the “entropy”. Further, through computational experiments by using robot navigation problems with one and two-dimensional continuous action space, the validity of the proposed method has been confirmed.

  20. Ovation Prime Real-Time

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Ovation Prime Real-Time (OPRT) product is a real-time forecast and nowcast model of auroral power and is an operational implementation of the work by Newell et...

  1. Emergent Auditory Feature Tuning in a Real-Time Neuromorphic VLSI System.

    Science.gov (United States)

    Sheik, Sadique; Coath, Martin; Indiveri, Giacomo; Denham, Susan L; Wennekers, Thomas; Chicca, Elisabetta

    2012-01-01

    Many sounds of ecological importance, such as communication calls, are characterized by time-varying spectra. However, most neuromorphic auditory models to date have focused on distinguishing mainly static patterns, under the assumption that dynamic patterns can be learned as sequences of static ones. In contrast, the emergence of dynamic feature sensitivity through exposure to formative stimuli has been recently modeled in a network of spiking neurons based on the thalamo-cortical architecture. The proposed network models the effect of lateral and recurrent connections between cortical layers, distance-dependent axonal transmission delays, and learning in the form of Spike Timing Dependent Plasticity (STDP), which effects stimulus-driven changes in the pattern of network connectivity. In this paper we demonstrate how these principles can be efficiently implemented in neuromorphic hardware. In doing so we address two principle problems in the design of neuromorphic systems: real-time event-based asynchronous communication in multi-chip systems, and the realization in hybrid analog/digital VLSI technology of neural computational principles that we propose underlie plasticity in neural processing of dynamic stimuli. The result is a hardware neural network that learns in real-time and shows preferential responses, after exposure, to stimuli exhibiting particular spectro-temporal patterns. The availability of hardware on which the model can be implemented, makes this a significant step toward the development of adaptive, neurobiologically plausible, spike-based, artificial sensory systems.

  2. Emergent auditory feature tuning in a real-time neuromorphic VLSI system

    Directory of Open Access Journals (Sweden)

    Sadique eSheik

    2012-02-01

    Full Text Available Many sounds of ecological importance, such as communication calls, are characterised by time-varying spectra. However, most neuromorphic auditory models to date have focused on distinguishing mainly static patterns, under the assumption that dynamic patterns can be learned as sequences of static ones. In contrast, the emergence of dynamic feature sensitivity through exposure to formative stimuli has been recently modeled in a network of spiking neurons based on the thalamocortical architecture. The proposed network models the effect of lateral and recurrent connections between cortical layers, distance-dependent axonal transmission delays, and learning in the form of Spike Timing Dependent Plasticity (STDP, which effects stimulus-driven changes in the pattern of network connectivity. In this paper we demonstrate how these principles can be efficiently implemented in neuromorphic hardware. In doing so we address two principle problems in the design of neuromorphic systems: real-time event-based asynchronous communication in multi-chip systems, and the realization in hybrid analog/digital VLSI technology of neural computational principles that we propose underlie plasticity in neural processing of dynamic stimuli. The result is a hardware neural network that learns in real-time and shows preferential responses, after exposure, to stimuli exhibiting particular spectrotemporal patterns. The availability of hardware on which the model can be implemented, makes this a significant step towards the development of adaptive, neurobiologically plausible, spike-based, artificial sensory systems.

  3. Numerical Analysis of Slopes Stability and Shallow Foundations Behavior at Crest under Real Seismic Loading - Reinforcement Effect

    International Nuclear Information System (INIS)

    Mekdash, H.; Hage Chehade, F.; Sadek, M.; Abdel Massih, D.; El Hachem, E.; Youssef, E.

    2011-01-01

    The aim of this paper is to analyze the slopes stability under seismic loading using a global numerical dynamic approach. This approach allows important parameters that are generally ignored by traditional engineering methods such as the soil deformability, the dynamic amplification, non linear soil behavior, the spatial and temporal variability of the seismic loading and the reinforcement element. The present study is conducted by using measures recorded during real earthquakes (Turkey, 1999) and (Lebanon, 2008). Elastoplastic soil behavior analysis leads to monitor the evolution of the slope state after an earthquake and to clarify the most probable failure circles. A parametric study according to the reinforcement length, position, inclination and the number of elements has been studied in order to define the optimal reinforcement scheme for slopes under seismic loading. This study contains also the stability analysis of an existing foundation near the slope's crest. It will focus on the reinforcement in order to give recommendation for the most appropriate scheme that minimize the settlement of the foundation due to earthquake effect. (author)

  4. A finite element-based machine learning approach for modeling the mechanical behavior of the breast tissues under compression in real-time.

    Science.gov (United States)

    Martínez-Martínez, F; Rupérez-Moreno, M J; Martínez-Sober, M; Solves-Llorens, J A; Lorente, D; Serrano-López, A J; Martínez-Sanchis, S; Monserrat, C; Martín-Guerrero, J D

    2017-11-01

    This work presents a data-driven method to simulate, in real-time, the biomechanical behavior of the breast tissues in some image-guided interventions such as biopsies or radiotherapy dose delivery as well as to speed up multimodal registration algorithms. Ten real breasts were used for this work. Their deformation due to the displacement of two compression plates was simulated off-line using the finite element (FE) method. Three machine learning models were trained with the data from those simulations. Then, they were used to predict in real-time the deformation of the breast tissues during the compression. The models were a decision tree and two tree-based ensemble methods (extremely randomized trees and random forest). Two different experimental setups were designed to validate and study the performance of these models under different conditions. The mean 3D Euclidean distance between nodes predicted by the models and those extracted from the FE simulations was calculated to assess the performance of the models in the validation set. The experiments proved that extremely randomized trees performed better than the other two models. The mean error committed by the three models in the prediction of the nodal displacements was under 2 mm, a threshold usually set for clinical applications. The time needed for breast compression prediction is sufficiently short to allow its use in real-time (<0.2 s). Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. MO-FG-BRD-00: Real-Time Imaging and Tracking Techniques for Intrafractional Motion Management

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2015-06-15

    Intrafraction target motion is a prominent complicating factor in the accurate targeting of radiation within the body. Methods compensating for target motion during treatment, such as gating and dynamic tumor tracking, depend on the delineation of target location as a function of time during delivery. A variety of techniques for target localization have been explored and are under active development; these include beam-level imaging of radio-opaque fiducials, fiducial-less tracking of anatomical landmarks, tracking of electromagnetic transponders, optical imaging of correlated surrogates, and volumetric imaging within treatment delivery. The Joint Imaging and Therapy Symposium will provide an overview of the techniques for real-time imaging and tracking, with special focus on emerging modes of implementation across different modalities. In particular, the symposium will explore developments in 1) Beam-level kilovoltage X-ray imaging techniques, 2) EPID-based megavoltage X-ray tracking, 3) Dynamic tracking using electromagnetic transponders, and 4) MRI-based soft-tissue tracking during radiation delivery. Learning Objectives: Understand the fundamentals of real-time imaging and tracking techniques Learn about emerging techniques in the field of real-time tracking Distinguish between the advantages and disadvantages of different tracking modalities Understand the role of real-time tracking techniques within the clinical delivery work-flow.

  6. MO-FG-BRD-00: Real-Time Imaging and Tracking Techniques for Intrafractional Motion Management

    International Nuclear Information System (INIS)

    2015-01-01

    Intrafraction target motion is a prominent complicating factor in the accurate targeting of radiation within the body. Methods compensating for target motion during treatment, such as gating and dynamic tumor tracking, depend on the delineation of target location as a function of time during delivery. A variety of techniques for target localization have been explored and are under active development; these include beam-level imaging of radio-opaque fiducials, fiducial-less tracking of anatomical landmarks, tracking of electromagnetic transponders, optical imaging of correlated surrogates, and volumetric imaging within treatment delivery. The Joint Imaging and Therapy Symposium will provide an overview of the techniques for real-time imaging and tracking, with special focus on emerging modes of implementation across different modalities. In particular, the symposium will explore developments in 1) Beam-level kilovoltage X-ray imaging techniques, 2) EPID-based megavoltage X-ray tracking, 3) Dynamic tracking using electromagnetic transponders, and 4) MRI-based soft-tissue tracking during radiation delivery. Learning Objectives: Understand the fundamentals of real-time imaging and tracking techniques Learn about emerging techniques in the field of real-time tracking Distinguish between the advantages and disadvantages of different tracking modalities Understand the role of real-time tracking techniques within the clinical delivery work-flow

  7. Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing

    Science.gov (United States)

    Lefebvre, Germain; Blakemore, Sarah-Jayne

    2017-01-01

    Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice. PMID:28800597

  8. Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing.

    Science.gov (United States)

    Palminteri, Stefano; Lefebvre, Germain; Kilford, Emma J; Blakemore, Sarah-Jayne

    2017-08-01

    Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice.

  9. Real-time use of the iPad by third-year medical students for clinical decision support and learning: a mixed methods study

    Science.gov (United States)

    Nuss, Michelle A.; Hill, Janette R.; Cervero, Ronald M.; Gaines, Julie K.; Middendorf, Bruce F.

    2014-01-01

    Purpose Despite widespread use of mobile technology in medical education, medical students’ use of mobile technology for clinical decision support and learning is not well understood. Three key questions were explored in this extensive mixed methods study: 1) how medical students used mobile technology in the care of patients, 2) the mobile applications (apps) used and 3) how expertise and time spent changed overtime. Methods This year-long (July 2012–June 2013) mixed methods study explored the use of the iPad, using four data collection instruments: 1) beginning and end-of-year questionnaires, 2) iPad usage logs, 3) weekly rounding observations, and 4) weekly medical student interviews. Descriptive statistics were generated for the questionnaires and apps reported in the usage logs. The iPad usage logs, observation logs, and weekly interviews were analyzed via inductive thematic analysis. Results Students predominantly used mobile technology to obtain real-time patient data via the electronic health record (EHR), to access medical knowledge resources for learning, and to inform patient care. The top four apps used were Epocrates®, PDF Expert®, VisualDx®, and Micromedex®. The majority of students indicated that their use (71%) and expertise (75%) using mobile technology grew overtime. Conclusions This mixed methods study provides substantial evidence that medical students used mobile technology for clinical decision support and learning. Integrating its use into the medical student's daily workflow was essential for achieving these outcomes. Developing expertise in using mobile technology and various apps was critical for effective and efficient support of real-time clinical decisions. PMID:25317266

  10. Real-time use of the iPad by third-year medical students for clinical decision support and learning: a mixed methods study.

    Science.gov (United States)

    Nuss, Michelle A; Hill, Janette R; Cervero, Ronald M; Gaines, Julie K; Middendorf, Bruce F

    2014-01-01

    Despite widespread use of mobile technology in medical education, medical students' use of mobile technology for clinical decision support and learning is not well understood. Three key questions were explored in this extensive mixed methods study: 1) how medical students used mobile technology in the care of patients, 2) the mobile applications (apps) used and 3) how expertise and time spent changed overtime. This year-long (July 2012-June 2013) mixed methods study explored the use of the iPad, using four data collection instruments: 1) beginning and end-of-year questionnaires, 2) iPad usage logs, 3) weekly rounding observations, and 4) weekly medical student interviews. Descriptive statistics were generated for the questionnaires and apps reported in the usage logs. The iPad usage logs, observation logs, and weekly interviews were analyzed via inductive thematic analysis. Students predominantly used mobile technology to obtain real-time patient data via the electronic health record (EHR), to access medical knowledge resources for learning, and to inform patient care. The top four apps used were Epocrates(®), PDF Expert(®), VisualDx(®), and Micromedex(®). The majority of students indicated that their use (71%) and expertise (75%) using mobile technology grew overtime. This mixed methods study provides substantial evidence that medical students used mobile technology for clinical decision support and learning. Integrating its use into the medical student's daily workflow was essential for achieving these outcomes. Developing expertise in using mobile technology and various apps was critical for effective and efficient support of real-time clinical decisions.

  11. Within- and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory.

    Science.gov (United States)

    Collins, Anne G E; Frank, Michael J

    2018-03-06

    Learning from rewards and punishments is essential to survival and facilitates flexible human behavior. It is widely appreciated that multiple cognitive and reinforcement learning systems contribute to decision-making, but the nature of their interactions is elusive. Here, we leverage methods for extracting trial-by-trial indices of reinforcement learning (RL) and working memory (WM) in human electro-encephalography to reveal single-trial computations beyond that afforded by behavior alone. Neural dynamics confirmed that increases in neural expectation were predictive of reduced neural surprise in the following feedback period, supporting central tenets of RL models. Within- and cross-trial dynamics revealed a cooperative interplay between systems for learning, in which WM contributes expectations to guide RL, despite competition between systems during choice. Together, these results provide a deeper understanding of how multiple neural systems interact for learning and decision-making and facilitate analysis of their disruption in clinical populations.

  12. VERSE - Virtual Equivalent Real-time Simulation

    Science.gov (United States)

    Zheng, Yang; Martin, Bryan J.; Villaume, Nathaniel

    2005-01-01

    Distributed real-time simulations provide important timing validation and hardware in the- loop results for the spacecraft flight software development cycle. Occasionally, the need for higher fidelity modeling and more comprehensive debugging capabilities - combined with a limited amount of computational resources - calls for a non real-time simulation environment that mimics the real-time environment. By creating a non real-time environment that accommodates simulations and flight software designed for a multi-CPU real-time system, we can save development time, cut mission costs, and reduce the likelihood of errors. This paper presents such a solution: Virtual Equivalent Real-time Simulation Environment (VERSE). VERSE turns the real-time operating system RTAI (Real-time Application Interface) into an event driven simulator that runs in virtual real time. Designed to keep the original RTAI architecture as intact as possible, and therefore inheriting RTAI's many capabilities, VERSE was implemented with remarkably little change to the RTAI source code. This small footprint together with use of the same API allows users to easily run the same application in both real-time and virtual time environments. VERSE has been used to build a workstation testbed for NASA's Space Interferometry Mission (SIM PlanetQuest) instrument flight software. With its flexible simulation controls and inexpensive setup and replication costs, VERSE will become an invaluable tool in future mission development.

  13. Performance Comparison of Two Reinforcement Learning Algorithms for Small Mobile Robots

    Czech Academy of Sciences Publication Activity Database

    Neruda, Roman; Slušný, Stanislav

    2009-01-01

    Roč. 2, č. 1 (2009), s. 59-68 ISSN 2005-4297 R&D Projects: GA MŠk(CZ) 1M0567 Grant - others:GA UK(CZ) 7637/2007 Institutional research plan: CEZ:AV0Z10300504 Keywords : reinforcement learning * mobile robots * inteligent agents Subject RIV: IN - Informatics, Computer Science http://www.sersc.org/journals/IJCA/vol2_no1/7.pdf

  14. IMPLEMENTATION OF MULTIAGENT REINFORCEMENT LEARNING MECHANISM FOR OPTIMAL ISLANDING OPERATION OF DISTRIBUTION NETWORK

    DEFF Research Database (Denmark)

    Saleem, Arshad; Lind, Morten

    2008-01-01

    among electric power utilities to utilize modern information and communication technologies (ICT) in order to improve the automation of the distribution system. In this paper we present our work for the implementation of a dynamic multi-agent based distributed reinforcement learning mechanism...

  15. Reinforcement-learning-based output-feedback control of nonstrict nonlinear discrete-time systems with application to engine emission control.

    Science.gov (United States)

    Shih, Peter; Kaul, Brian C; Jagannathan, Sarangapani; Drallmeier, James A

    2009-10-01

    A novel reinforcement-learning-based output adaptive neural network (NN) controller, which is also referred to as the adaptive-critic NN controller, is developed to deliver the desired tracking performance for a class of nonlinear discrete-time systems expressed in nonstrict feedback form in the presence of bounded and unknown disturbances. The adaptive-critic NN controller consists of an observer, a critic, and two action NNs. The observer estimates the states and output, and the two action NNs provide virtual and actual control inputs to the nonlinear discrete-time system. The critic approximates a certain strategic utility function, and the action NNs minimize the strategic utility function and control inputs. All NN weights adapt online toward minimization of a performance index, utilizing the gradient-descent-based rule, in contrast with iteration-based adaptive-critic schemes. Lyapunov functions are used to show the stability of the closed-loop tracking error, weights, and observer estimates. Separation and certainty equivalence principles, persistency of excitation condition, and linearity in the unknown parameter assumption are not needed. Experimental results on a spark ignition (SI) engine operating lean at an equivalence ratio of 0.75 show a significant (25%) reduction in cyclic dispersion in heat release with control, while the average fuel input changes by less than 1% compared with the uncontrolled case. Consequently, oxides of nitrogen (NO(x)) drop by 30%, and unburned hydrocarbons drop by 16% with control. Overall, NO(x)'s are reduced by over 80% compared with stoichiometric levels.

  16. Learning motor skills from algorithms to robot experiments

    CERN Document Server

    Kober, Jens

    2014-01-01

    This book presents the state of the art in reinforcement learning applied to robotics both in terms of novel algorithms and applications. It discusses recent approaches that allow robots to learn motor skills and presents tasks that need to take into account the dynamic behavior of the robot and its environment, where a kinematic movement plan is not sufficient. The book illustrates a method that learns to generalize parameterized motor plans which is obtained by imitation or reinforcement learning, by adapting a small set of global parameters, and appropriate kernel-based reinforcement learning algorithms. The presented applications explore highly dynamic tasks and exhibit a very efficient learning process. All proposed approaches have been extensively validated with benchmarks tasks, in simulation, and on real robots. These tasks correspond to sports and games but the presented techniques are also applicable to more mundane household tasks. The book is based on the first author’s doctoral thesis, which wo...

  17. Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters.

    Science.gov (United States)

    Khamassi, Mehdi; Enel, Pierre; Dominey, Peter Ford; Procyk, Emmanuel

    2013-01-01

    Converging evidence suggest that the medial prefrontal cortex (MPFC) is involved in feedback categorization, performance monitoring, and task monitoring, and may contribute to the online regulation of reinforcement learning (RL) parameters that would affect decision-making processes in the lateral prefrontal cortex (LPFC). Previous neurophysiological experiments have shown MPFC activities encoding error likelihood, uncertainty, reward volatility, as well as neural responses categorizing different types of feedback, for instance, distinguishing between choice errors and execution errors. Rushworth and colleagues have proposed that the involvement of MPFC in tracking the volatility of the task could contribute to the regulation of one of RL parameters called the learning rate. We extend this hypothesis by proposing that MPFC could contribute to the regulation of other RL parameters such as the exploration rate and default action values in case of task shifts. Here, we analyze the sensitivity to RL parameters of behavioral performance in two monkey decision-making tasks, one with a deterministic reward schedule and the other with a stochastic one. We show that there exist optimal parameter values specific to each of these tasks, that need to be found for optimal performance and that are usually hand-tuned in computational models. In contrast, automatic online regulation of these parameters using some heuristics can help producing a good, although non-optimal, behavioral performance in each task. We finally describe our computational model of MPFC-LPFC interaction used for online regulation of the exploration rate and its application to a human-robot interaction scenario. There, unexpected uncertainties are produced by the human introducing cued task changes or by cheating. The model enables the robot to autonomously learn to reset exploration in response to such uncertain cues and events. The combined results provide concrete evidence specifying how prefrontal

  18. ISTTOK real-time architecture

    Energy Technology Data Exchange (ETDEWEB)

    Carvalho, Ivo S., E-mail: ivoc@ipfn.ist.utl.pt; Duarte, Paulo; Fernandes, Horácio; Valcárcel, Daniel F.; Carvalho, Pedro J.; Silva, Carlos; Duarte, André S.; Neto, André; Sousa, Jorge; Batista, António J.N.; Hekkert, Tiago; Carvalho, Bernardo B.

    2014-03-15

    Highlights: • All real-time diagnostics and actuators were integrated in the same control platform. • A 100 μs control cycle was achieved under the MARTe framework. • Time-windows based control with several event-driven control strategies implemented. • AC discharges with exception handling on iron core flux saturation. • An HTML discharge configuration was developed for configuring the MARTe system. - Abstract: The ISTTOK tokamak was upgraded with a plasma control system based on the Advanced Telecommunications Computing Architecture (ATCA) standard. This control system was designed to improve the discharge stability and to extend the operational space to the alternate plasma current (AC) discharges as part of the ISTTOK scientific program. In order to accomplish these objectives all ISTTOK diagnostics and actuators relevant for real-time operation were integrated in the control system. The control system was programmed in C++ over the Multi-threaded Application Real-Time executor (MARTe) which provides, among other features, a real-time scheduler, an interrupt handler, an intercommunications interface between code blocks and a clearly bounded interface with the external devices. As a complement to the MARTe framework, the BaseLib2 library provides the foundations for the data, code introspection and also a Hypertext Transfer Protocol (HTTP) server service. Taking advantage of the modular nature of MARTe, the algorithms of each diagnostic data processing, discharge timing, context switch, control and actuators output reference generation, run on well-defined blocks of code named Generic Application Module (GAM). This approach allows reusability of the code, simplified simulation, replacement or editing without changing the remaining GAMs. The ISTTOK control system GAMs run sequentially each 100 μs cycle on an Intel{sup ®} Q8200 4-core processor running at 2.33 GHz located in the ATCA crate. Two boards (inside the ATCA crate) with 32 analog

  19. ISTTOK real-time architecture

    International Nuclear Information System (INIS)

    Carvalho, Ivo S.; Duarte, Paulo; Fernandes, Horácio; Valcárcel, Daniel F.; Carvalho, Pedro J.; Silva, Carlos; Duarte, André S.; Neto, André; Sousa, Jorge; Batista, António J.N.; Hekkert, Tiago; Carvalho, Bernardo B.

    2014-01-01

    Highlights: • All real-time diagnostics and actuators were integrated in the same control platform. • A 100 μs control cycle was achieved under the MARTe framework. • Time-windows based control with several event-driven control strategies implemented. • AC discharges with exception handling on iron core flux saturation. • An HTML discharge configuration was developed for configuring the MARTe system. - Abstract: The ISTTOK tokamak was upgraded with a plasma control system based on the Advanced Telecommunications Computing Architecture (ATCA) standard. This control system was designed to improve the discharge stability and to extend the operational space to the alternate plasma current (AC) discharges as part of the ISTTOK scientific program. In order to accomplish these objectives all ISTTOK diagnostics and actuators relevant for real-time operation were integrated in the control system. The control system was programmed in C++ over the Multi-threaded Application Real-Time executor (MARTe) which provides, among other features, a real-time scheduler, an interrupt handler, an intercommunications interface between code blocks and a clearly bounded interface with the external devices. As a complement to the MARTe framework, the BaseLib2 library provides the foundations for the data, code introspection and also a Hypertext Transfer Protocol (HTTP) server service. Taking advantage of the modular nature of MARTe, the algorithms of each diagnostic data processing, discharge timing, context switch, control and actuators output reference generation, run on well-defined blocks of code named Generic Application Module (GAM). This approach allows reusability of the code, simplified simulation, replacement or editing without changing the remaining GAMs. The ISTTOK control system GAMs run sequentially each 100 μs cycle on an Intel ® Q8200 4-core processor running at 2.33 GHz located in the ATCA crate. Two boards (inside the ATCA crate) with 32 analog

  20. Unsupervised deep learning for real-time assessment of video streaming services

    NARCIS (Netherlands)

    Torres Vega, M.; Mocanu, D.C.; Liotta, A.

    2017-01-01

    Evaluating quality of experience in video streaming services requires a quality metric that works in real time and for a broad range of video types and network conditions. This means that, subjective video quality assessment studies, or complex objective video quality assessment metrics, which would

  1. A Real-Time Plagiarism Detection Tool for Computer-Based Assessments

    Science.gov (United States)

    Jeske, Heimo J.; Lall, Manoj; Kogeda, Okuthe P.

    2018-01-01

    Aim/Purpose: The aim of this article is to develop a tool to detect plagiarism in real time amongst students being evaluated for learning in a computer-based assessment setting. Background: Cheating or copying all or part of source code of a program is a serious concern to academic institutions. Many academic institutions apply a combination of…

  2. Projective Simulation compared to reinforcement learning

    OpenAIRE

    Bjerland, Øystein Førsund

    2015-01-01

    This thesis explores the model of projective simulation (PS), a novel approach for an artificial intelligence (AI) agent. The model of PS learns by interacting with the environment it is situated in, and allows for simulating actions before real action is taken. The action selection is based on a random walk through the episodic & compositional memory (ECM), which is a network of clips that represent previous experienced percepts. The network takes percepts as inpu...

  3. Integrating Real-time Earthquakes into Natural Hazard Courses

    Science.gov (United States)

    Furlong, K. P.; Benz, H. M.; Whitlock, J. S.; Bittenbinder, A. N.; Bogaert, B. B.

    2001-12-01

    Natural hazard courses are playing an increasingly important role in college and university earth science curricula. Students' intrinsic curiosity about the subject and the potential to make the course relevant to the interests of both science and non-science students make natural hazards courses popular additions to a department's offerings. However, one vital aspect of "real-life" natural hazard management that has not translated well into the classroom is the real-time nature of both events and response. The lack of a way to entrain students into the event/response mode has made implementing such real-time activities into classroom activities problematic. Although a variety of web sites provide near real-time postings of natural hazards, students essentially learn of the event after the fact. This is particularly true for earthquakes and other events with few precursors. As a result, the "time factor" and personal responsibility associated with natural hazard response is lost to the students. We have integrated the real-time aspects of earthquake response into two natural hazard courses at Penn State (a 'general education' course for non-science majors, and an upper-level course for science majors) by implementing a modification of the USGS Earthworm system. The Earthworm Database Management System (E-DBMS) catalogs current global seismic activity. It provides earthquake professionals with real-time email/cell phone alerts of global seismic activity and access to the data for review/revision purposes. We have modified this system so that real-time response can be used to address specific scientific, policy, and social questions in our classes. As a prototype of using the E-DBMS in courses, we have established an Earthworm server at Penn State. This server receives national and global seismic network data and, in turn, transmits the tailored alerts to "on-duty" students (e-mail, pager/cell phone notification). These students are responsible to react to the alarm

  4. A Real-time Evaluation Technique of Fatigue Damage in Adhesively Bonded Composite-Metal Joints

    International Nuclear Information System (INIS)

    Kwon, Oh Yang; Kim, Tae Hyun

    1999-01-01

    One of the problems for practical use of fiber-reinforced plastics is the performance degradation by fatigue damage in the joints. The study is to develop a nondestructive technique for real-time evaluation of adhesively bonded composite-metal joints. From the prior study we confirmed that the bonding strength can be estimated from the correlation between the qualify of bonded parts and AUP's. We obtained a curve showing the correlation between the degree of fatigue damage and AUP's calculated from signals acquired during fatigue loading of single-lap and double-lap joints of CFRP and Al6061. The curve is an analogy to the one showing stiffness reduction (E/Eo) of polymer matrix composites by fatigue damage. From those facts, it is plausible to predict the degree of fatigue damage in real-time. Amplitude and AUP2 appeared to be optimal parameters to provide more reliable results for single-lap joints whereas Amplitude and AUP2 did for double-lap joints. It is recommended to select optimal parameters for different geometries in the application for real structures

  5. A real time evaluation technique of fatigue damage in adhesively bonded composite metal joints

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Tae Hyun; Kwon Oh Yang [Dept. of Mechanical Engineering, Inje Univesity, Kimhae (Korea, Republic of)

    1999-05-15

    One of the problems for practical use of fiber-reinforced composite material is performance degradation by fatigue damage in the joints. The study is to develope a nondestructive technique for real-time evaluation of adhesively bonded composite-metal joints. From the prior study we confirmed that the bonding strength can be estimated from the correlation between quality of bonded parts and AUP's. We obtained a curve showing the correlation between AUP's calculated from signals obtained from single-lap and double-lap joints and the degree of fatigue damage at bonding interface during fatigue test. The curve is an analogy to the one showing stiffness reduction(E/E{sub 0}) of polymer matrix composites by fatigue damage. From those facts, it is possible to predict degree of damage in real-time. Amplitude and AUP2 appeared to be optimal parameters to provide more reliable results for single-lap joint whereas amplitude and AUP1 did for double-lap joints. It is recommended to select optimal parameters for different geometries in the real structure.

  6. A Real-time Evaluation Technique of Fatigue Damage in Adhesively Bonded Composite-Metal Joints

    Energy Technology Data Exchange (ETDEWEB)

    Kwon, Oh Yang; Kim, Tae Hyun [Inha University, Incheon (Korea, Republic of)

    1999-12-15

    One of the problems for practical use of fiber-reinforced plastics is the performance degradation by fatigue damage in the joints. The study is to develop a nondestructive technique for real-time evaluation of adhesively bonded composite-metal joints. From the prior study we confirmed that the bonding strength can be estimated from the correlation between the qualify of bonded parts and AUP's. We obtained a curve showing the correlation between the degree of fatigue damage and AUP's calculated from signals acquired during fatigue loading of single-lap and double-lap joints of CFRP and Al6061. The curve is an analogy to the one showing stiffness reduction (E/Eo) of polymer matrix composites by fatigue damage. From those facts, it is plausible to predict the degree of fatigue damage in real-time. Amplitude and AUP2 appeared to be optimal parameters to provide more reliable results for single-lap joints whereas Amplitude and AUP2 did for double-lap joints. It is recommended to select optimal parameters for different geometries in the application for real structures

  7. A real time evaluation technique of fatigue damage in adhesively bonded composite metal joints

    International Nuclear Information System (INIS)

    Kim, Tae Hyun; Kwon Oh Yang

    1999-01-01

    One of the problems for practical use of fiber-reinforced composite material is performance degradation by fatigue damage in the joints. The study is to develope a nondestructive technique for real-time evaluation of adhesively bonded composite-metal joints. From the prior study we confirmed that the bonding strength can be estimated from the correlation between quality of bonded parts and AUP's. We obtained a curve showing the correlation between AUP's calculated from signals obtained from single-lap and double-lap joints and the degree of fatigue damage at bonding interface during fatigue test. The curve is an analogy to the one showing stiffness reduction(E/E 0 ) of polymer matrix composites by fatigue damage. From those facts, it is possible to predict degree of damage in real-time. Amplitude and AUP2 appeared to be optimal parameters to provide more reliable results for single-lap joint whereas amplitude and AUP1 did for double-lap joints. It is recommended to select optimal parameters for different geometries in the real structure.

  8. Energy Management Strategy for a Hybrid Electric Vehicle Based on Deep Reinforcement Learning

    OpenAIRE

    Yue Hu; Weimin Li; Kun Xu; Taimoor Zahid; Feiyan Qin; Chenming Li

    2018-01-01

    An energy management strategy (EMS) is important for hybrid electric vehicles (HEVs) since it plays a decisive role on the performance of the vehicle. However, the variation of future driving conditions deeply influences the effectiveness of the EMS. Most existing EMS methods simply follow predefined rules that are not adaptive to different driving conditions online. Therefore, it is useful that the EMS can learn from the environment or driving cycle. In this paper, a deep reinforcement learn...

  9. Reinforcement learning produces dominant strategies for the Iterated Prisoner's Dilemma.

    Science.gov (United States)

    Harper, Marc; Knight, Vincent; Jones, Martin; Koutsovoulos, Georgios; Glynatsi, Nikoleta E; Campbell, Owen

    2017-01-01

    We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.

  10. Extinction of Pavlovian conditioning: The influence of trial number and reinforcement history.

    Science.gov (United States)

    Chan, C K J; Harris, Justin A

    2017-08-01

    Pavlovian conditioning is sensitive to the temporal relationship between the conditioned stimulus (CS) and the unconditioned stimulus (US). This has motivated models that describe learning as a process that continuously updates associative strength during the trial or specifically encodes the CS-US interval. These models predict that extinction of responding is also continuous, such that response loss is proportional to the cumulative duration of exposure to the CS without the US. We review evidence showing that this prediction is incorrect, and that extinction is trial-based rather than time-based. We also present two experiments that test the importance of trials versus time on the Partial Reinforcement Extinction Effect (PREE), in which responding extinguishes more slowly for a CS that was inconsistently reinforced with the US than for a consistently reinforced one. We show that increasing the number of extinction trials of the partially reinforced CS, relative to the consistently reinforced CS, overcomes the PREE. However, increasing the duration of extinction trials by the same amount does not overcome the PREE. We conclude that animals learn about the likelihood of the US per trial during conditioning, and learn trial-by-trial about the absence of the US during extinction. Moreover, what they learn about the likelihood of the US during conditioning affects how sensitive they are to the absence of the US during extinction. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Study and Application of Reinforcement Learning in Cooperative Strategy of the Robot Soccer Based on BDI Model

    Directory of Open Access Journals (Sweden)

    Wu Bo-ying

    2009-11-01

    Full Text Available The dynamic cooperation model of multi-Agent is formed by combining reinforcement learning with BDI model. In this model, the concept of the individual optimization loses its meaning, because the repayment of each Agent dose not only depend on itsself but also on the choice of other Agents. All Agents can pursue a common optimum solution and try to realize the united intention as a whole to a maximum limit. The robot moves to its goal, depending on the present positions of the other robots that cooperate with it and the present position of the ball. One of these robots cooperating with it is controlled to move by man with a joystick. In this way, Agent can be ensured to search for each state-action as frequently as possible when it carries on choosing movements, so as to shorten the time of searching for the movement space so that the convergence speed of reinforcement learning can be improved. The validity of the proposed cooperative strategy for the robot soccer has been proved by combining theoretical analysis with simulation robot soccer match (11vs11 .

  12. A real-time architecture for time-aware agents.

    Science.gov (United States)

    Prouskas, Konstantinos-Vassileios; Pitt, Jeremy V

    2004-06-01

    This paper describes the specification and implementation of a new three-layer time-aware agent architecture. This architecture is designed for applications and environments where societies of humans and agents play equally active roles, but interact and operate in completely different time frames. The architecture consists of three layers: the April real-time run-time (ART) layer, the time aware layer (TAL), and the application agents layer (AAL). The ART layer forms the underlying real-time agent platform. An original online, real-time, dynamic priority-based scheduling algorithm is described for scheduling the computation time of agent processes, and it is shown that the algorithm's O(n) complexity and scalable performance are sufficient for application in real-time domains. The TAL layer forms an abstraction layer through which human and agent interactions are temporally unified, that is, handled in a common way irrespective of their temporal representation and scale. A novel O(n2) interaction scheduling algorithm is described for predicting and guaranteeing interactions' initiation and completion times. The time-aware predicting component of a workflow management system is also presented as an instance of the AAL layer. The described time-aware architecture addresses two key challenges in enabling agents to be effectively configured and applied in environments where humans and agents play equally active roles. It provides flexibility and adaptability in its real-time mechanisms while placing them under direct agent control, and it temporally unifies human and agent interactions.

  13. Towards Real-Time Argumentation

    Directory of Open Access Journals (Sweden)

    Vicente JULIÁN

    2016-07-01

    Full Text Available In this paper, we deal with the problem of real-time coordination with the more general approach of reaching real-time agreements in MAS. Concretely, this work proposes a real-time argumentation framework in an attempt to provide agents with the ability of engaging in argumentative dialogues and come with a solution for their underlying agreement process within a bounded period of time. The framework has been implemented and evaluated in the domain of a customer support application. Concretely, we consider a society of agents that act on behalf of a group of technicians that must solve problems in a Technology Management Centre (TMC within a bounded time. This centre controls every process implicated in the provision of technological and customer support services to private or public organisations by means of a call centre. The contract signed between the TCM and the customer establishes penalties if the specified time is exceeded.

  14. Proposing Community-Based Learning in the Marketing Curriculum

    Science.gov (United States)

    Cadwallader, Susan; Atwong, Catherine; Lebard, Aubrey

    2013-01-01

    Community service and service learning (CS&SL) exposes students to the business practice of giving back to society while reinforcing classroom learning in an applied real-world setting. However, does the CS&SL format provide a better means of instilling the benefits of community service among marketing students than community-based…

  15. Time-Dependent Behavior of Reinforced Polymer Concrete Columns under Eccentric Axial Loading

    Directory of Open Access Journals (Sweden)

    Valentino Paolo Berardi

    2012-11-01

    Full Text Available Polymer concretes (PCs represent a promising alternative to traditional cementitious materials in the field of new construction. In fact, PCs exhibit high compressive strength and ultimate compressive strain values, as well as good chemical resistance. Within the context of these benefits, this paper presents a study on the time-dependent behavior of polymer concrete columns reinforced with different bar types using a mechanical model recently developed by the authors. Balanced internal reinforcements are considered (i.e., two bars at both the top and bottom of the cross-section. The investigation highlights relevant stress and strain variations over time and, consequently, the emergence of a significant decrease in concrete’s stiffness and strength over time. Therefore, the results indicate that deferred effects due to viscous flow may significantly affect the reliability of reinforced polymer concrete elements over time.

  16. Real-time traffic sign recognition based on a general purpose GPU and deep-learning.

    Science.gov (United States)

    Lim, Kwangyong; Hong, Yongwon; Choi, Yeongwoo; Byun, Hyeran

    2017-01-01

    We present a General Purpose Graphics Processing Unit (GPGPU) based real-time traffic sign detection and recognition method that is robust against illumination changes. There have been many approaches to traffic sign recognition in various research fields; however, previous approaches faced several limitations when under low illumination or wide variance of light conditions. To overcome these drawbacks and improve processing speeds, we propose a method that 1) is robust against illumination changes, 2) uses GPGPU-based real-time traffic sign detection, and 3) performs region detecting and recognition using a hierarchical model. This method produces stable results in low illumination environments. Both detection and hierarchical recognition are performed in real-time, and the proposed method achieves 0.97 F1-score on our collective dataset, which uses the Vienna convention traffic rules (Germany and South Korea).

  17. Reinforcement learning produces dominant strategies for the Iterated Prisoner's Dilemma.

    Directory of Open Access Journals (Sweden)

    Marc Harper

    Full Text Available We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms. These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.

  18. Real-time PCR in virology.

    Science.gov (United States)

    Mackay, Ian M; Arden, Katherine E; Nitsche, Andreas

    2002-03-15

    The use of the polymerase chain reaction (PCR) in molecular diagnostics has increased to the point where it is now accepted as the gold standard for detecting nucleic acids from a number of origins and it has become an essential tool in the research laboratory. Real-time PCR has engendered wider acceptance of the PCR due to its improved rapidity, sensitivity, reproducibility and the reduced risk of carry-over contamination. There are currently five main chemistries used for the detection of PCR product during real-time PCR. These are the DNA binding fluorophores, the 5' endonuclease, adjacent linear and hairpin oligoprobes and the self-fluorescing amplicons, which are described in detail. We also discuss factors that have restricted the development of multiplex real-time PCR as well as the role of real-time PCR in quantitating nucleic acids. Both amplification hardware and the fluorogenic detection chemistries have evolved rapidly as the understanding of real-time PCR has developed and this review aims to update the scientist on the current state of the art. We describe the background, advantages and limitations of real-time PCR and we review the literature as it applies to virus detection in the routine and research laboratory in order to focus on one of the many areas in which the application of real-time PCR has provided significant methodological benefits and improved patient outcomes. However, the technology discussed has been applied to other areas of microbiology as well as studies of gene expression and genetic disease.

  19. Near Real-Time Dust Aerosol Detection with Support Vector Machines for Regression

    Science.gov (United States)

    Rivas-Perea, P.; Rivas-Perea, P. E.; Cota-Ruiz, J.; Aragon Franco, R. A.

    2015-12-01

    Remote sensing instruments operating in the near-infrared spectrum usually provide the necessary information for further dust aerosol spectral analysis using statistical or machine learning algorithms. Such algorithms have proven to be effective in analyzing very specific case studies or dust events. However, very few make the analysis open to the public on a regular basis, fewer are designed specifically to operate in near real-time to higher resolutions, and almost none give a global daily coverage. In this research we investigated a large-scale approach to a machine learning algorithm called "support vector regression". The algorithm uses four near-infrared spectral bands from NASA MODIS instrument: B20 (3.66-3.84μm), B29 (8.40-8.70μm), B31 (10.78-11.28μm), and B32 (11.77-12.27μm). The algorithm is presented with ground truth from more than 30 distinct reported dust events, from different geographical regions, at different seasons, both over land and sea cover, in the presence of clouds and clear sky, and in the presence of fires. The purpose of our algorithm is to learn to distinguish the dust aerosols spectral signature from other spectral signatures, providing as output an estimate of the probability of a data point being consistent with dust aerosol signatures. During modeling with ground truth, our algorithm achieved more than 90% of accuracy, and the current live performance of the algorithm is remarkable. Moreover, our algorithm is currently operating in near real-time using NASA's Land, Atmosphere Near real-time Capability for EOS (LANCE) servers, providing a high resolution global overview including 64, 32, 16, 8, 4, 2, and 1km. The near real-time analysis of our algorithm is now available to the general public at http://dust.reev.us and archives of the results starting from 2012 are available upon request.

  20. Real time programming environment for Windows

    Energy Technology Data Exchange (ETDEWEB)

    LaBelle, D.R. [LaBelle (Dennis R.), Clifton Park, NY (United States)

    1998-04-01

    This document provides a description of the Real Time Programming Environment (RTProE). RTProE tools allow a programmer to create soft real time projects under general, multi-purpose operating systems. The basic features necessary for real time applications are provided by RTProE, leaving the programmer free to concentrate efforts on his specific project. The current version supports Microsoft Windows{trademark} 95 and NT. The tasks of real time synchronization and communication with other programs are handled by RTProE. RTProE includes a generic method for connecting a graphical user interface (GUI) to allow real time control and interaction with the programmer`s product. Topics covered in this paper include real time performance issues, portability, details of shared memory management, code scheduling, application control, Operating System specific concerns and the use of Computer Aided Software Engineering (CASE) tools. The development of RTProE is an important step in the expansion of the real time programming community. The financial costs associated with using the system are minimal. All source code for RTProE has been made publicly available. Any person with access to a personal computer, Windows 95 or NT, and C or FORTRAN compilers can quickly enter the world of real time modeling and simulation.

  1. An In-Home Digital Network Architecture for Real-Time and Non-Real-Time Communication

    NARCIS (Netherlands)

    Scholten, Johan; Jansen, P.G.; Hanssen, F.T.Y.; Hattink, Tjalling

    2002-01-01

    This paper describes an in-home digital network architecture that supports both real-time and non-real-time communication. The architecture deploys a distributed token mechanism to schedule communication streams and to offer guaranteed quality-ofservice. Essentially, the token mechanism prevents

  2. Gaussian Processes for Data-Efficient Learning in Robotics and Control.

    Science.gov (United States)

    Deisenroth, Marc Peter; Fox, Dieter; Rasmussen, Carl Edward

    2015-02-01

    Autonomous learning has been a promising direction in control and robotics for more than a decade since data-driven learning allows to reduce the amount of engineering knowledge, which is otherwise required. However, autonomous reinforcement learning (RL) approaches typically require many interactions with the system to learn controllers, which is a practical limitation in real systems, such as robots, where many interactions can be impractical and time consuming. To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, realistic simulators, pre-shaped policies, or specific knowledge about the underlying dynamics. In this paper, we follow a different approach and speed up learning by extracting more information from data. In particular, we learn a probabilistic, non-parametric Gaussian process transition model of the system. By explicitly incorporating model uncertainty into long-term planning and controller learning our approach reduces the effects of model errors, a key problem in model-based learning. Compared to state-of-the art RL our model-based policy search method achieves an unprecedented speed of learning. We demonstrate its applicability to autonomous learning in real robot and control tasks.

  3. MARTe: A Multiplatform Real-Time Framework

    Science.gov (United States)

    Neto, André C.; Sartori, Filippo; Piccolo, Fabio; Vitelli, Riccardo; De Tommasi, Gianmaria; Zabeo, Luca; Barbalace, Antonio; Fernandes, Horacio; Valcarcel, Daniel F.; Batista, Antonio J. N.

    2010-04-01

    Development of real-time applications is usually associated with nonportable code targeted at specific real-time operating systems. The boundary between hardware drivers, system services, and user code is commonly not well defined, making the development in the target host significantly difficult. The Multithreaded Application Real-Time executor (MARTe) is a framework built over a multiplatform library that allows the execution of the same code in different operating systems. The framework provides the high-level interfaces with hardware, external configuration programs, and user interfaces, assuring at the same time hard real-time performances. End-users of the framework are required to define and implement algorithms inside a well-defined block of software, named Generic Application Module (GAM), that is executed by the real-time scheduler. Each GAM is reconfigurable with a set of predefined configuration meta-parameters and interchanges information using a set of data pipes that are provided as inputs and required as output. Using these connections, different GAMs can be chained either in series or parallel. GAMs can be developed and debugged in a non-real-time system and, only once the robustness of the code and correctness of the algorithm are verified, deployed to the real-time system. The software also supplies a large set of utilities that greatly ease the interaction and debugging of a running system. Among the most useful are a highly efficient real-time logger, HTTP introspection of real-time objects, and HTTP remote configuration. MARTe is currently being used to successfully drive the plasma vertical stabilization controller on the largest magnetic confinement fusion device in the world, with a control loop cycle of 50 ?s and a jitter under 1 ?s. In this particular project, MARTe is used with the Real-Time Application Interface (RTAI)/Linux operating system exploiting the new ?86 multicore processors technology.

  4. Adaptive pattern recognition in real-time video-based soccer analysis

    DEFF Research Database (Denmark)

    Schlipsing, Marc; Salmen, Jan; Tschentscher, Marc

    2017-01-01

    are taken into account. Our contribution is twofold: (1) the deliberate use of machine learning and pattern recognition techniques allows us to achieve high classification accuracy in varying environments. We systematically evaluate combinations of image features and learning machines in the given online......Computer-aided sports analysis is demanded by coaches and the media. Image processing and machine learning techniques that allow for "live" recognition and tracking of players exist. But these methods are far from collecting and analyzing event data fully autonomously. To generate accurate results......, human interaction is required at different stages including system setup, calibration, supervision of classifier training, and resolution of tracking conflicts. Furthermore, the real-time constraints are challenging: in contrast to other object recognition and tracking applications, we cannot treat data...

  5. Real-Time and Real-Fast Performance of General-Purpose and Real-Time Operating Systems in Multithreaded Physical Simulation of Complex Mechanical Systems

    Directory of Open Access Journals (Sweden)

    Carlos Garre

    2014-01-01

    Full Text Available Physical simulation is a valuable tool in many fields of engineering for the tasks of design, prototyping, and testing. General-purpose operating systems (GPOS are designed for real-fast tasks, such as offline simulation of complex physical models that should finish as soon as possible. Interfacing hardware at a given rate (as in a hardware-in-the-loop test requires instead maximizing time determinism, for which real-time operating systems (RTOS are designed. In this paper, real-fast and real-time performance of RTOS and GPOS are compared when simulating models of high complexity with large time steps. This type of applications is usually present in the automotive industry and requires a good trade-off between real-fast and real-time performance. The performance of an RTOS and a GPOS is compared by running a tire model scalable on the number of degrees-of-freedom and parallel threads. The benchmark shows that the GPOS present better performance in real-fast runs but worse in real-time due to nonexplicit task switches and to the latency associated with interprocess communication (IPC and task switch.

  6. GNSS global real-time augmentation positioning: Real-time precise satellite clock estimation, prototype system construction and performance analysis

    Science.gov (United States)

    Chen, Liang; Zhao, Qile; Hu, Zhigang; Jiang, Xinyuan; Geng, Changjiang; Ge, Maorong; Shi, Chuang

    2018-01-01

    Lots of ambiguities in un-differenced (UD) model lead to lower calculation efficiency, which isn't appropriate for the high-frequency real-time GNSS clock estimation, like 1 Hz. Mixed differenced model fusing UD pseudo-range and epoch-differenced (ED) phase observations has been introduced into real-time clock estimation. In this contribution, we extend the mixed differenced model for realizing multi-GNSS real-time clock high-frequency updating and a rigorous comparison and analysis on same conditions are performed to achieve the best real-time clock estimation performance taking the efficiency, accuracy, consistency and reliability into consideration. Based on the multi-GNSS real-time data streams provided by multi-GNSS Experiment (MGEX) and Wuhan University, GPS + BeiDou + Galileo global real-time augmentation positioning prototype system is designed and constructed, including real-time precise orbit determination, real-time precise clock estimation, real-time Precise Point Positioning (RT-PPP) and real-time Standard Point Positioning (RT-SPP). The statistical analysis of the 6 h-predicted real-time orbits shows that the root mean square (RMS) in radial direction is about 1-5 cm for GPS, Beidou MEO and Galileo satellites and about 10 cm for Beidou GEO and IGSO satellites. Using the mixed differenced estimation model, the prototype system can realize high-efficient real-time satellite absolute clock estimation with no constant clock-bias and can be used for high-frequency augmentation message updating (such as 1 Hz). The real-time augmentation message signal-in-space ranging error (SISRE), a comprehensive accuracy of orbit and clock and effecting the users' actual positioning performance, is introduced to evaluate and analyze the performance of GPS + BeiDou + Galileo global real-time augmentation positioning system. The statistical analysis of real-time augmentation message SISRE is about 4-7 cm for GPS, whlile 10 cm for Beidou IGSO/MEO, Galileo and about 30 cm

  7. Bio-robots automatic navigation with graded electric reward stimulation based on Reinforcement Learning.

    Science.gov (United States)

    Zhang, Chen; Sun, Chao; Gao, Liqiang; Zheng, Nenggan; Chen, Weidong; Zheng, Xiaoxiang

    2013-01-01

    Bio-robots based on brain computer interface (BCI) suffer from the lack of considering the characteristic of the animals in navigation. This paper proposed a new method for bio-robots' automatic navigation combining the reward generating algorithm base on Reinforcement Learning (RL) with the learning intelligence of animals together. Given the graded electrical reward, the animal e.g. the rat, intends to seek the maximum reward while exploring an unknown environment. Since the rat has excellent spatial recognition, the rat-robot and the RL algorithm can convergent to an optimal route by co-learning. This work has significant inspiration for the practical development of bio-robots' navigation with hybrid intelligence.

  8. Scalable Real-Time Negotiation Toolkit

    National Research Council Canada - National Science Library

    Lesser, Victor

    2004-01-01

    ... to implement an adaptive distributed sensor network. These activities involved the development of a distributed soft, real-time heuristic resource allocation protocol, the development of a domain-independent soft, real time agent architecture...

  9. Long term effects of aversive reinforcement on colour discrimination learning in free-flying bumblebees.

    Directory of Open Access Journals (Sweden)

    Miguel A Rodríguez-Gironés

    Full Text Available The results of behavioural experiments provide important information about the structure and information-processing abilities of the visual system. Nevertheless, if we want to infer from behavioural data how the visual system operates, it is important to know how different learning protocols affect performance and to devise protocols that minimise noise in the response of experimental subjects. The purpose of this work was to investigate how reinforcement schedule and individual variability affect the learning process in a colour discrimination task. Free-flying bumblebees were trained to discriminate between two perceptually similar colours. The target colour was associated with sucrose solution, and the distractor could be associated with water or quinine solution throughout the experiment, or with one substance during the first half of the experiment and the other during the second half. Both acquisition and final performance of the discrimination task (measured as proportion of correct choices were determined by the choice of reinforcer during the first half of the experiment: regardless of whether bees were trained with water or quinine during the second half of the experiment, bees trained with quinine during the first half learned the task faster and performed better during the whole experiment. Our results confirm that the choice of stimuli used during training affects the rate at which colour discrimination tasks are acquired and show that early contact with a strongly aversive stimulus can be sufficient to maintain high levels of attention during several hours. On the other hand, bees which took more time to decide on which flower to alight were more likely to make correct choices than bees which made fast decisions. This result supports the existence of a trade-off between foraging speed and accuracy, and highlights the importance of measuring choice latencies during behavioural experiments focusing on cognitive abilities.

  10. Model Checking Real-Time Systems

    DEFF Research Database (Denmark)

    Bouyer, Patricia; Fahrenberg, Uli; Larsen, Kim Guldstrand

    2018-01-01

    This chapter surveys timed automata as a formalism for model checking real-time systems. We begin with introducing the model, as an extension of finite-state automata with real-valued variables for measuring time. We then present the main model-checking results in this framework, and give a hint...

  11. Modular specification of real-time systems

    DEFF Research Database (Denmark)

    Inal, Recep

    1994-01-01

    Duration Calculus, a real-time interval logic, has been embedded in the Z specification language to provide a notation for real-time systems that combines the modularisation and abstraction facilities of Z with a logic suitable for reasoning about real-time properties. In this article the notation...

  12. Hard Real-Time Networking on Firewire

    NARCIS (Netherlands)

    Zhang, Yuchen; Orlic, Bojan; Visser, Peter; Broenink, Jan

    2005-01-01

    This paper investigates the possibility of using standard, low-cost, widely used FireWire as a new generation fieldbus medium for real-time distributed control applications. A real-time software subsys- tem, RT-FireWire was designed that can, in combination with Linux-based real-time operating

  13. Real-time individualization of the unified model of performance.

    Science.gov (United States)

    Liu, Jianbo; Ramakrishnan, Sridhar; Laxminarayan, Srinivas; Balkin, Thomas J; Reifman, Jaques

    2017-12-01

    Existing mathematical models for predicting neurobehavioural performance are not suited for mobile computing platforms because they cannot adapt model parameters automatically in real time to reflect individual differences in the effects of sleep loss. We used an extended Kalman filter to develop a computationally efficient algorithm that continually adapts the parameters of the recently developed Unified Model of Performance (UMP) to an individual. The algorithm accomplishes this in real time as new performance data for the individual become available. We assessed the algorithm's performance by simulating real-time model individualization for 18 subjects subjected to 64 h of total sleep deprivation (TSD) and 7 days of chronic sleep restriction (CSR) with 3 h of time in bed per night, using psychomotor vigilance task (PVT) data collected every 2 h during wakefulness. This UMP individualization process produced parameter estimates that progressively approached the solution produced by a post-hoc fitting of model parameters using all data. The minimum number of PVT measurements needed to individualize the model parameters depended upon the type of sleep-loss challenge, with ~30 required for TSD and ~70 for CSR. However, model individualization depended upon the overall duration of data collection, yielding increasingly accurate model parameters with greater number of days. Interestingly, reducing the PVT sampling frequency by a factor of two did not notably hamper model individualization. The proposed algorithm facilitates real-time learning of an individual's trait-like responses to sleep loss and enables the development of individualized performance prediction models for use in a mobile computing platform. © 2017 European Sleep Research Society.

  14. Multiprocessor scheduling for real-time systems

    CERN Document Server

    Baruah, Sanjoy; Buttazzo, Giorgio

    2015-01-01

    This book provides a comprehensive overview of both theoretical and pragmatic aspects of resource-allocation and scheduling in multiprocessor and multicore hard-real-time systems.  The authors derive new, abstract models of real-time tasks that capture accurately the salient features of real application systems that are to be implemented on multiprocessor platforms, and identify rules for mapping application systems onto the most appropriate models.  New run-time multiprocessor scheduling algorithms are presented, which are demonstrably better than those currently used, both in terms of run-time efficiency and tractability of off-line analysis.  Readers will benefit from a new design and analysis framework for multiprocessor real-time systems, which will translate into a significantly enhanced ability to provide formally verified, safety-critical real-time systems at a significantly lower cost.

  15. Reinforcement Learning for Predictive Analytics in Smart Cities

    Directory of Open Access Journals (Sweden)

    Kostas Kolomvatsos

    2017-06-01

    Full Text Available The digitization of our lives cause a shift in the data production as well as in the required data management. Numerous nodes are capable of producing huge volumes of data in our everyday activities. Sensors, personal smart devices as well as the Internet of Things (IoT paradigm lead to a vast infrastructure that covers all the aspects of activities in modern societies. In the most of the cases, the critical issue for public authorities (usually, local, like municipalities is the efficient management of data towards the support of novel services. The reason is that analytics provided on top of the collected data could help in the delivery of new applications that will facilitate citizens’ lives. However, the provision of analytics demands intelligent techniques for the underlying data management. The most known technique is the separation of huge volumes of data into a number of parts and their parallel management to limit the required time for the delivery of analytics. Afterwards, analytics requests in the form of queries could be realized and derive the necessary knowledge for supporting intelligent applications. In this paper, we define the concept of a Query Controller ( Q C that receives queries for analytics and assigns each of them to a processor placed in front of each data partition. We discuss an intelligent process for query assignments that adopts Machine Learning (ML. We adopt two learning schemes, i.e., Reinforcement Learning (RL and clustering. We report on the comparison of the two schemes and elaborate on their combination. Our aim is to provide an efficient framework to support the decision making of the QC that should swiftly select the appropriate processor for each query. We provide mathematical formulations for the discussed problem and present simulation results. Through a comprehensive experimental evaluation, we reveal the advantages of the proposed models and describe the outcomes results while comparing them with a

  16. Real-time fMRI using brain-state classification.

    Science.gov (United States)

    LaConte, Stephen M; Peltier, Scott J; Hu, Xiaoping P

    2007-10-01

    We have implemented a real-time functional magnetic resonance imaging system based on multivariate classification. This approach is distinctly different from spatially localized real-time implementations, since it does not require prior assumptions about functional localization and individual performance strategies, and has the ability to provide feedback based on intuitive translations of brain state rather than localized fluctuations. Thus this approach provides the capability for a new class of experimental designs in which real-time feedback control of the stimulus is possible-rather than using a fixed paradigm, experiments can adaptively evolve as subjects receive brain-state feedback. In this report, we describe our implementation and characterize its performance capabilities. We observed approximately 80% classification accuracy using whole brain, block-design, motor data. Within both left and right motor task conditions, important differences exist between the initial transient period produced by task switching (changing between rapid left or right index finger button presses) and the subsequent stable period during sustained activity. Further analysis revealed that very high accuracy is achievable during stable task periods, and that the responsiveness of the classifier to changes in task condition can be much faster than signal time-to-peak rates. Finally, we demonstrate the versatility of this implementation with respect to behavioral task, suggesting that our results are applicable across a spectrum of cognitive domains. Beyond basic research, this technology can complement electroencephalography-based brain computer interface research, and has potential applications in the areas of biofeedback rehabilitation, lie detection, learning studies, virtual reality-based training, and enhanced conscious awareness. Wiley-Liss, Inc.

  17. Prototyping real-time systems

    OpenAIRE

    Clynch, Gary

    1994-01-01

    The traditional software development paradigm, the waterfall life cycle model, is defective when used for developing real-time systems. This thesis puts forward an executable prototyping approach for the development of real-time systems. A prototyping system is proposed which uses ESML (Extended Systems Modelling Language) as a prototype specification language. The prototyping system advocates the translation of non-executable ESML specifications into executable LOOPN (Language of Object ...

  18. Software Design Methods for Real-Time Systems

    Science.gov (United States)

    1989-12-01

    This module describes the concepts and methods used in the software design of real time systems . It outlines the characteristics of real time systems , describes...the role of software design in real time system development, surveys and compares some software design methods for real - time systems , and

  19. Learning dictionaries of sparse codes of 3D movements of body joints for real-time human activity understanding.

    Science.gov (United States)

    Qi, Jin; Yang, Zhiyong

    2014-01-01

    Real-time human activity recognition is essential for human-robot interactions for assisted healthy independent living. Most previous work in this area is performed on traditional two-dimensional (2D) videos and both global and local methods have been used. Since 2D videos are sensitive to changes of lighting condition, view angle, and scale, researchers begun to explore applications of 3D information in human activity understanding in recently years. Unfortunately, features that work well on 2D videos usually don't perform well on 3D videos and there is no consensus on what 3D features should be used. Here we propose a model of human activity recognition based on 3D movements of body joints. Our method has three steps, learning dictionaries of sparse codes of 3D movements of joints, sparse coding, and classification. In the first step, space-time volumes of 3D movements of body joints are obtained via dense sampling and independent component analysis is then performed to construct a dictionary of sparse codes for each activity. In the second step, the space-time volumes are projected to the dictionaries and a set of sparse histograms of the projection coefficients are constructed as feature representations of the activities. Finally, the sparse histograms are used as inputs to a support vector machine to recognize human activities. We tested this model on three databases of human activities and found that it outperforms the state-of-the-art algorithms. Thus, this model can be used for real-time human activity recognition in many applications.

  20. Learning dictionaries of sparse codes of 3D movements of body joints for real-time human activity understanding.

    Directory of Open Access Journals (Sweden)

    Jin Qi

    Full Text Available Real-time human activity recognition is essential for human-robot interactions for assisted healthy independent living. Most previous work in this area is performed on traditional two-dimensional (2D videos and both global and local methods have been used. Since 2D videos are sensitive to changes of lighting condition, view angle, and scale, researchers begun to explore applications of 3D information in human activity understanding in recently years. Unfortunately, features that work well on 2D videos usually don't perform well on 3D videos and there is no consensus on what 3D features should be used. Here we propose a model of human activity recognition based on 3D movements of body joints. Our method has three steps, learning dictionaries of sparse codes of 3D movements of joints, sparse coding, and classification. In the first step, space-time volumes of 3D movements of body joints are obtained via dense sampling and independent component analysis is then performed to construct a dictionary of sparse codes for each activity. In the second step, the space-time volumes are projected to the dictionaries and a set of sparse histograms of the projection coefficients are constructed as feature representations of the activities. Finally, the sparse histograms are used as inputs to a support vector machine to recognize human activities. We tested this model on three databases of human activities and found that it outperforms the state-of-the-art algorithms. Thus, this model can be used for real-time human activity recognition in many applications.

  1. Real-time Pricing in Power Markets

    DEFF Research Database (Denmark)

    Boom, Anette; Schwenen, Sebastian

    We examine welfare e ects of real-time pricing in electricity markets. Before stochastic energy demand is known, competitive retailers contract with nal consumers who exogenously do not have real-time meters. After demand is realized, two electricity generators compete in a uniform price auction...... to satisfy demand from retailers acting on behalf of subscribed customers and from consumers with real-time meters. Increasing the number of consumers on real-time pricing does not always increase welfare since risk-averse consumers dislike uncertain and high prices arising through market power...

  2. Real-time Pricing in Power Markets

    DEFF Research Database (Denmark)

    Boom, Anette; Schwenen, Sebastian

    We examine welfare eects of real-time pricing in electricity markets. Before stochastic energy demand is known, competitive retailers contract with nal consumers who exogenously do not have real-time meters. After demand is realized, two electricity generators compete in a uniform price auction...... to satisfy demand from retailers acting on behalf of subscribed customers and from consumers with real-time meters. Increasing the number of consumers on real-time pricing does not always increase welfare since risk-averse consumers dislike uncertain and high prices arising through market power...

  3. Distributed, Embedded and Real-time Java Systems

    CERN Document Server

    Wellings, Andy

    2012-01-01

    Research on real-time Java technology has been prolific over the past decade, leading to a large number of corresponding hardware and software solutions, and frameworks for distributed and embedded real-time Java systems.  This book is aimed primarily at researchers in real-time embedded systems, particularly those who wish to understand the current state of the art in using Java in this domain.  Much of the work in real-time distributed, embedded and real-time Java has focused on the Real-time Specification for Java (RTSJ) as the underlying base technology, and consequently many of the Chapters in this book address issues with, or solve problems using, this framework. Describes innovative techniques in: scheduling, memory management, quality of service and communication systems supporting real-time Java applications; Includes coverage of multiprocessor embedded systems and parallel programming; Discusses state-of-the-art resource management for embedded systems, including Java’s real-time garbage collect...

  4. External Prior Guided Internal Prior Learning for Real-World Noisy Image Denoising

    Science.gov (United States)

    Xu, Jun; Zhang, Lei; Zhang, David

    2018-06-01

    Most of existing image denoising methods learn image priors from either external data or the noisy image itself to remove noise. However, priors learned from external data may not be adaptive to the image to be denoised, while priors learned from the given noisy image may not be accurate due to the interference of corrupted noise. Meanwhile, the noise in real-world noisy images is very complex, which is hard to be described by simple distributions such as Gaussian distribution, making real noisy image denoising a very challenging problem. We propose to exploit the information in both external data and the given noisy image, and develop an external prior guided internal prior learning method for real noisy image denoising. We first learn external priors from an independent set of clean natural images. With the aid of learned external priors, we then learn internal priors from the given noisy image to refine the prior model. The external and internal priors are formulated as a set of orthogonal dictionaries to efficiently reconstruct the desired image. Extensive experiments are performed on several real noisy image datasets. The proposed method demonstrates highly competitive denoising performance, outperforming state-of-the-art denoising methods including those designed for real noisy images.

  5. MO-FG-BRD-02: Real-Time Imaging and Tracking Techniques for Intrafractional Motion Management: MV Tracking

    Energy Technology Data Exchange (ETDEWEB)

    Berbeco, R. [Brigham and Women’s Hospital and Dana-Farber Cancer Institute (United States)

    2015-06-15

    Intrafraction target motion is a prominent complicating factor in the accurate targeting of radiation within the body. Methods compensating for target motion during treatment, such as gating and dynamic tumor tracking, depend on the delineation of target location as a function of time during delivery. A variety of techniques for target localization have been explored and are under active development; these include beam-level imaging of radio-opaque fiducials, fiducial-less tracking of anatomical landmarks, tracking of electromagnetic transponders, optical imaging of correlated surrogates, and volumetric imaging within treatment delivery. The Joint Imaging and Therapy Symposium will provide an overview of the techniques for real-time imaging and tracking, with special focus on emerging modes of implementation across different modalities. In particular, the symposium will explore developments in 1) Beam-level kilovoltage X-ray imaging techniques, 2) EPID-based megavoltage X-ray tracking, 3) Dynamic tracking using electromagnetic transponders, and 4) MRI-based soft-tissue tracking during radiation delivery. Learning Objectives: Understand the fundamentals of real-time imaging and tracking techniques Learn about emerging techniques in the field of real-time tracking Distinguish between the advantages and disadvantages of different tracking modalities Understand the role of real-time tracking techniques within the clinical delivery work-flow.

  6. MO-FG-BRD-04: Real-Time Imaging and Tracking Techniques for Intrafractional Motion Management: MR Tracking

    Energy Technology Data Exchange (ETDEWEB)

    Low, D. [University of California Los Angeles: Real-Time Imaging and Tracking Techniques for Intrafractional Motion Management: MR Tracking (United States)

    2015-06-15

    Intrafraction target motion is a prominent complicating factor in the accurate targeting of radiation within the body. Methods compensating for target motion during treatment, such as gating and dynamic tumor tracking, depend on the delineation of target location as a function of time during delivery. A variety of techniques for target localization have been explored and are under active development; these include beam-level imaging of radio-opaque fiducials, fiducial-less tracking of anatomical landmarks, tracking of electromagnetic transponders, optical imaging of correlated surrogates, and volumetric imaging within treatment delivery. The Joint Imaging and Therapy Symposium will provide an overview of the techniques for real-time imaging and tracking, with special focus on emerging modes of implementation across different modalities. In particular, the symposium will explore developments in 1) Beam-level kilovoltage X-ray imaging techniques, 2) EPID-based megavoltage X-ray tracking, 3) Dynamic tracking using electromagnetic transponders, and 4) MRI-based soft-tissue tracking during radiation delivery. Learning Objectives: Understand the fundamentals of real-time imaging and tracking techniques Learn about emerging techniques in the field of real-time tracking Distinguish between the advantages and disadvantages of different tracking modalities Understand the role of real-time tracking techniques within the clinical delivery work-flow.

  7. MO-FG-BRD-03: Real-Time Imaging and Tracking Techniques for Intrafractional Motion Management: EM Tracking

    Energy Technology Data Exchange (ETDEWEB)

    Keall, P. [University of Sydney (Australia)

    2015-06-15

    Intrafraction target motion is a prominent complicating factor in the accurate targeting of radiation within the body. Methods compensating for target motion during treatment, such as gating and dynamic tumor tracking, depend on the delineation of target location as a function of time during delivery. A variety of techniques for target localization have been explored and are under active development; these include beam-level imaging of radio-opaque fiducials, fiducial-less tracking of anatomical landmarks, tracking of electromagnetic transponders, optical imaging of correlated surrogates, and volumetric imaging within treatment delivery. The Joint Imaging and Therapy Symposium will provide an overview of the techniques for real-time imaging and tracking, with special focus on emerging modes of implementation across different modalities. In particular, the symposium will explore developments in 1) Beam-level kilovoltage X-ray imaging techniques, 2) EPID-based megavoltage X-ray tracking, 3) Dynamic tracking using electromagnetic transponders, and 4) MRI-based soft-tissue tracking during radiation delivery. Learning Objectives: Understand the fundamentals of real-time imaging and tracking techniques Learn about emerging techniques in the field of real-time tracking Distinguish between the advantages and disadvantages of different tracking modalities Understand the role of real-time tracking techniques within the clinical delivery work-flow.

  8. MO-FG-BRD-04: Real-Time Imaging and Tracking Techniques for Intrafractional Motion Management: MR Tracking

    International Nuclear Information System (INIS)

    Low, D.

    2015-01-01

    Intrafraction target motion is a prominent complicating factor in the accurate targeting of radiation within the body. Methods compensating for target motion during treatment, such as gating and dynamic tumor tracking, depend on the delineation of target location as a function of time during delivery. A variety of techniques for target localization have been explored and are under active development; these include beam-level imaging of radio-opaque fiducials, fiducial-less tracking of anatomical landmarks, tracking of electromagnetic transponders, optical imaging of correlated surrogates, and volumetric imaging within treatment delivery. The Joint Imaging and Therapy Symposium will provide an overview of the techniques for real-time imaging and tracking, with special focus on emerging modes of implementation across different modalities. In particular, the symposium will explore developments in 1) Beam-level kilovoltage X-ray imaging techniques, 2) EPID-based megavoltage X-ray tracking, 3) Dynamic tracking using electromagnetic transponders, and 4) MRI-based soft-tissue tracking during radiation delivery. Learning Objectives: Understand the fundamentals of real-time imaging and tracking techniques Learn about emerging techniques in the field of real-time tracking Distinguish between the advantages and disadvantages of different tracking modalities Understand the role of real-time tracking techniques within the clinical delivery work-flow

  9. MO-FG-BRD-03: Real-Time Imaging and Tracking Techniques for Intrafractional Motion Management: EM Tracking

    International Nuclear Information System (INIS)

    Keall, P.

    2015-01-01

    Intrafraction target motion is a prominent complicating factor in the accurate targeting of radiation within the body. Methods compensating for target motion during treatment, such as gating and dynamic tumor tracking, depend on the delineation of target location as a function of time during delivery. A variety of techniques for target localization have been explored and are under active development; these include beam-level imaging of radio-opaque fiducials, fiducial-less tracking of anatomical landmarks, tracking of electromagnetic transponders, optical imaging of correlated surrogates, and volumetric imaging within treatment delivery. The Joint Imaging and Therapy Symposium will provide an overview of the techniques for real-time imaging and tracking, with special focus on emerging modes of implementation across different modalities. In particular, the symposium will explore developments in 1) Beam-level kilovoltage X-ray imaging techniques, 2) EPID-based megavoltage X-ray tracking, 3) Dynamic tracking using electromagnetic transponders, and 4) MRI-based soft-tissue tracking during radiation delivery. Learning Objectives: Understand the fundamentals of real-time imaging and tracking techniques Learn about emerging techniques in the field of real-time tracking Distinguish between the advantages and disadvantages of different tracking modalities Understand the role of real-time tracking techniques within the clinical delivery work-flow

  10. MO-FG-BRD-02: Real-Time Imaging and Tracking Techniques for Intrafractional Motion Management: MV Tracking

    International Nuclear Information System (INIS)

    Berbeco, R.

    2015-01-01

    Intrafraction target motion is a prominent complicating factor in the accurate targeting of radiation within the body. Methods compensating for target motion during treatment, such as gating and dynamic tumor tracking, depend on the delineation of target location as a function of time during delivery. A variety of techniques for target localization have been explored and are under active development; these include beam-level imaging of radio-opaque fiducials, fiducial-less tracking of anatomical landmarks, tracking of electromagnetic transponders, optical imaging of correlated surrogates, and volumetric imaging within treatment delivery. The Joint Imaging and Therapy Symposium will provide an overview of the techniques for real-time imaging and tracking, with special focus on emerging modes of implementation across different modalities. In particular, the symposium will explore developments in 1) Beam-level kilovoltage X-ray imaging techniques, 2) EPID-based megavoltage X-ray tracking, 3) Dynamic tracking using electromagnetic transponders, and 4) MRI-based soft-tissue tracking during radiation delivery. Learning Objectives: Understand the fundamentals of real-time imaging and tracking techniques Learn about emerging techniques in the field of real-time tracking Distinguish between the advantages and disadvantages of different tracking modalities Understand the role of real-time tracking techniques within the clinical delivery work-flow

  11. Research of real-time communication software

    Science.gov (United States)

    Li, Maotang; Guo, Jingbo; Liu, Yuzhong; Li, Jiahong

    2003-11-01

    Real-time communication has been playing an increasingly important role in our work, life and ocean monitor. With the rapid progress of computer and communication technique as well as the miniaturization of communication system, it is needed to develop the adaptable and reliable real-time communication software in the ocean monitor system. This paper involves the real-time communication software research based on the point-to-point satellite intercommunication system. The object-oriented design method is adopted, which can transmit and receive video data and audio data as well as engineering data by satellite channel. In the real-time communication software, some software modules are developed, which can realize the point-to-point satellite intercommunication in the ocean monitor system. There are three advantages for the real-time communication software. One is that the real-time communication software increases the reliability of the point-to-point satellite intercommunication system working. Second is that some optional parameters are intercalated, which greatly increases the flexibility of the system working. Third is that some hardware is substituted by the real-time communication software, which not only decrease the expense of the system and promotes the miniaturization of communication system, but also aggrandizes the agility of the system.

  12. Real-time simulation: first-hand experience of the challenges of community nursing for students.

    Science.gov (United States)

    Reynolds, Stephanie; Cooper-Stanton, Garry; Potter, Andrew

    2018-04-02

    The Community Challenge is a simulated community event for pre-registration nursing students across all four fields. Through the provision of real-time simulation, the Community Challenge has combined a deeper learning for both nursing students and the drama students who were involved in making the scenarios real and interactive. The event was run over 5 days, with positive evaluations from students and staff. Furthermore, Community Challenge has been found to be successful in expanding opportunities for students that align with national drivers, curriculum planning and interprofessional learning. The event has allowed students to engage in learning with other fields, enhancing their own practice. The Community Challenge has been found to enhance the link between theory and practice within primary care, promoting the relevance and importance of community care within nursing.

  13. Effects of Real and Recalled Success on Learned Helplessness and Depression

    Science.gov (United States)

    Teasdale, John D.

    1978-01-01

    The effects of recalling past successes on the deficits in learned helplessness and depression were examined and, for learned helplessness, compared with those of real success. Results suggest real success does not have its therapeutic effects by modifying attributions for failure toward external factors. (Editor)

  14. Rats bred for helplessness exhibit positive reinforcement learning deficits which are not alleviated by an antidepressant dose of the MAO-B inhibitor deprenyl.

    Science.gov (United States)

    Schulz, Daniela; Henn, Fritz A; Petri, David; Huston, Joseph P

    2016-08-04

    Principles of negative reinforcement learning may play a critical role in the etiology and treatment of depression. We examined the integrity of positive reinforcement learning in congenitally helpless (cH) rats, an animal model of depression, using a random ratio schedule and a devaluation-extinction procedure. Furthermore, we tested whether an antidepressant dose of the monoamine oxidase (MAO)-B inhibitor deprenyl would reverse any deficits in positive reinforcement learning. We found that cH rats (n=9) were impaired in the acquisition of even simple operant contingencies, such as a fixed interval (FI) 20 schedule. cH rats exhibited no apparent deficits in appetite or reward sensitivity. They reacted to the devaluation of food in a manner consistent with a dose-response relationship. Reinforcer motivation as assessed by lever pressing across sessions with progressively decreasing reward probabilities was highest in congenitally non-helpless (cNH, n=10) rats as long as the reward probabilities remained relatively high. cNH compared to wild-type (n=10) rats were also more resistant to extinction across sessions. Compared to saline (n=5), deprenyl (n=5) reduced the duration of immobility of cH rats in the forced swimming test, indicative of antidepressant effects, but did not restore any deficits in the acquisition of a FI 20 schedule. We conclude that positive reinforcement learning was impaired in rats bred for helplessness, possibly due to motivational impairments but not deficits in reward sensitivity, and that deprenyl exerted antidepressant effects but did not reverse the deficits in positive reinforcement learning. Copyright © 2016 IBRO. Published by Elsevier Ltd. All rights reserved.

  15. Vision-based Navigation and Reinforcement Learning Path Finding for Social Robots

    OpenAIRE

    Pérez Sala, Xavier

    2010-01-01

    We propose a robust system for automatic Robot Navigation in uncontrolled en- vironments. The system is composed by three main modules: the Arti cial Vision module, the Reinforcement Learning module, and the behavior control module. The aim of the system is to allow a robot to automatically nd a path that arrives to a pre xed goal. Turn and straight movements in uncontrolled environments are automatically estimated and controlled using the proposed modules. The Arti cial Vi...

  16. Integration of MDSplus in real-time systems

    International Nuclear Information System (INIS)

    Luchetta, A.; Manduchi, G.; Taliercio, C.

    2006-01-01

    RFX-mod makes extensive usage of real-time systems for feedback control and uses MDSplus to interface them to the main Data Acquisition system. For this purpose, the core of MDSplus has been ported to VxWorks, the operating system used for real-time control in RFX. Using this approach, it is possible to integrate real-time systems, but MDSplus is used only for non-real-time tasks, i.e. those tasks which are executed before and after the pulse and whose performance does not affect the system time constraints. More extensive use of MDSplus in real-time systems is foreseen, and a real-time layer for MDSplus is under development, which will provide access to memory-mapped pulse files, shared by the tasks running on the same CPU. Real-time communication will also be integrated in the MDSplus core to provide support for distributed memory-mapped pulse files

  17. Exploring Non-Traditional Learning Methods in Virtual and Real-World Environments

    Science.gov (United States)

    Lukman, Rebeka; Krajnc, Majda

    2012-01-01

    This paper identifies the commonalities and differences within non-traditional learning methods regarding virtual and real-world environments. The non-traditional learning methods in real-world have been introduced within the following courses: Process Balances, Process Calculation, and Process Synthesis, and within the virtual environment through…

  18. Reinforcement Learning Based Web Service Compositions for Mobile Business

    Science.gov (United States)

    Zhou, Juan; Chen, Shouming

    In this paper, we propose a new solution to Reactive Web Service Composition, via molding with Reinforcement Learning, and introducing modified (alterable) QoS variables into the model as elements in the Markov Decision Process tuple. Moreover, we give an example of Reactive-WSC-based mobile banking, to demonstrate the intrinsic capability of the solution in question of obtaining the optimized service composition, characterized by (alterable) target QoS variable sets with optimized values. Consequently, we come to the conclusion that the solution has decent potentials in boosting customer experiences and qualities of services in Web Services, and those in applications in the whole electronic commerce and business sector.

  19. Experimental and Empirical Time to Corrosion of Reinforced Concrete Structures under Different Curing Conditions

    Directory of Open Access Journals (Sweden)

    Ahmed A. Abouhussien

    2014-01-01

    Full Text Available Reinforced concrete structures, especially those in marine environments, are commonly subjected to high concentrations of chlorides, which eventually leads to corrosion of the embedded reinforcing steel. The total time to corrosion of such structures may be divided into three stages: corrosion initiation, cracking, and damage periods. This paper evaluates, both empirically and experimentally, the expected time to corrosion of reinforced concrete structures. The tested reinforced concrete samples were subjected to ten alternative curing techniques, including hot, cold, and normal temperatures, prior to testing. The corrosion initiation, cracking, and damage periods in this investigation were experimentally monitored by an accelerated corrosion test performed on reinforced concrete samples. Alternatively, the corrosion initiation time for counterpart samples was empirically predicted using Fick’s second law of diffusion for comparison. The results showed that the corrosion initiation periods obtained experimentally were comparable to those obtained empirically. The corrosion initiation was found to occur at the first jump of the current measurement in the accelerated corrosion test which matched the half-cell potential reading of around −350 mV.

  20. Dense time discretization technique for verification of real time systems

    International Nuclear Information System (INIS)

    Makackas, Dalius; Miseviciene, Regina

    2016-01-01

    Verifying the real-time system there are two different models to control the time: discrete and dense time based models. This paper argues a novel verification technique, which calculates discrete time intervals from dense time in order to create all the system states that can be reached from the initial system state. The technique is designed for real-time systems specified by a piece-linear aggregate approach. Key words: real-time system, dense time, verification, model checking, piece-linear aggregate

  1. Grounding the meanings in sensorimotor behavior using reinforcement learning

    Directory of Open Access Journals (Sweden)

    Igor eFarkaš

    2012-02-01

    Full Text Available The recent outburst of interest in cognitive developmental robotics is fueled by the ambition to propose ecologically plausible mechanisms of how, among other things, a learning agent/robot could ground linguistic meanings in its sensorimotor behaviour. Along this stream, we propose a model that allows the simulated iCub robot to learn the meanings of actions (point, touch and push oriented towards objects in robot's peripersonal space. In our experiments, the iCub learns to execute motor actions and comment on them. Architecturally, the model is composed of three neural-network-based modules that are trained in different ways. The first module, a two-layer perceptron, is trained by back-propagation to attend to the target position in the visual scene, given the low-level visual information and the feature-based target information. The second module, having the form of an actor-critic architecture, is the most distinguishing part of our model, and is trained by a continuous version of reinforcement learning to execute actions as sequences, based on a linguistic command. The third module, an echo-state network, is trained to provide the linguistic description of the executed actions. The trained model generalises well in case of novel action-target combinations with randomised initial arm positions. It can also promptly adapt its behavior if the action/target suddenly changes during motor execution.

  2. Continuous theta-burst stimulation (cTBS) over the lateral prefrontal cortex alters reinforcement learning bias.

    Science.gov (United States)

    Ott, Derek V M; Ullsperger, Markus; Jocham, Gerhard; Neumann, Jane; Klein, Tilmann A

    2011-07-15

    The prefrontal cortex is known to play a key role in higher-order cognitive functions. Recently, we showed that this brain region is active in reinforcement learning, during which subjects constantly have to integrate trial outcomes in order to optimize performance. To further elucidate the role of the dorsolateral prefrontal cortex (DLPFC) in reinforcement learning, we applied continuous theta-burst stimulation (cTBS) either to the left or right DLPFC, or to the vertex as a control region, respectively, prior to the performance of a probabilistic learning task in an fMRI environment. While there was no influence of cTBS on learning performance per se, we observed a stimulation-dependent modulation of reward vs. punishment sensitivity: Left-hemispherical DLPFC stimulation led to a more reward-guided performance, while right-hemispherical cTBS induced a more avoidance-guided behavior. FMRI results showed enhanced prediction error coding in the ventral striatum in subjects stimulated over the left as compared to the right DLPFC. Both behavioral and imaging results are in line with recent findings that left, but not right-hemispherical stimulation can trigger a release of dopamine in the ventral striatum, which has been suggested to increase the relative impact of rewards rather than punishment on behavior. Copyright © 2011 Elsevier Inc. All rights reserved.

  3. Mixed - mode Operating System for Real - time Performance

    Directory of Open Access Journals (Sweden)

    Hasan M. M.

    2017-11-01

    Full Text Available The purpose of the mixed-mode system research is to handle devices with the accuracy of real-time systems and at the same time, having all the benefits and facilities of a matured Graphic User Interface(GUIoperating system which is typicallynon-real-time. This mixed-mode operating system comprising of a real-time portion and a non-real-time portion was studied and implemented to identify the feasibilities and performances in practical applications (in the context of scheduled the real-time events. In this research an i8751 microcontroller-based hardware was used to measure the performance of the system in real-time-only as well as non-real-time-only configurations. The real-time portion is an 486DX-40 IBM PC system running under DOS-based real-time kernel and the non-real-time portion is a Pentium IIIbased system running under Windows NT. It was found that mixed-mode systems performed as good as a typical real-time system and in fact, gave many additional benefits such as simplified/modular programming and load tolerance.

  4. Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Taylor, M.E.; Stone, P.

    2010-01-01

    Temporal difference and evolutionary methods are two of the most common approaches to solving reinforcement learning problems. However, there is little consensus on their relative merits and there have been few empirical studies that directly compare their performance. This article aims to address

  5. REINFORCEMENT OF DRINKING BY RUNNING: EFFECT OF FIXED RATIO AND REINFORCEMENT TIME.

    Science.gov (United States)

    PREMACK, D; SCHAEFFER, R W; HUNDT, A

    1964-01-01

    Rats were required to complete varying numbers of licks (FR), ranging from 10 to 300, in order to free an activity wheel for predetermined times (CT) ranging from 2 to 20 sec. The reinforcement of drinking by running was shown both by an increased frequency of licking, and by changes in length of the burst of licking relative to operant-level burst length. In log-log coordinates, instrumental licking tended to be a linear increasing function of FR for the range tested, a linear decreasing function of CT for the range tested. Pause time was implicated in both of the above relations, being a generally increasing function of both FR and CT.

  6. Real-Time Facial Segmentation and Performance Capture from RGB Input

    OpenAIRE

    Saito, Shunsuke; Li, Tianye; Li, Hao

    2016-01-01

    We introduce the concept of unconstrained real-time 3D facial performance capture through explicit semantic segmentation in the RGB input. To ensure robustness, cutting edge supervised learning approaches rely on large training datasets of face images captured in the wild. While impressive tracking quality has been demonstrated for faces that are largely visible, any occlusion due to hair, accessories, or hand-to-face gestures would result in significant visual artifacts and loss of tracking ...

  7. Multiagent-Based Simulation of Temporal-Spatial Characteristics of Activity-Travel Patterns Using Interactive Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Min Yang

    2014-01-01

    Full Text Available We propose a multiagent-based reinforcement learning algorithm, in which the interactions between travelers and the environment are considered to simulate temporal-spatial characteristics of activity-travel patterns in a city. Road congestion degree is added to the reinforcement learning algorithm as a medium that passes the influence of one traveler’s decision to others. Meanwhile, the agents used in the algorithm are initialized from typical activity patterns extracted from the travel survey diary data of Shangyu city in China. In the simulation, both macroscopic activity-travel characteristics such as traffic flow spatial-temporal distribution and microscopic characteristics such as activity-travel schedules of each agent are obtained. Comparing the simulation results with the survey data, we find that deviation of the peak-hour traffic flow is less than 5%, while the correlation of the simulated versus survey location choice distribution is over 0.9.

  8. Excellence in Physics Education Award Talk: Curriculum Development for Active Learning using Real Time Graphing and Data Collection Tools

    Science.gov (United States)

    Laws, Priscilla

    2010-02-01

    In June 1986 Ronald Thornton (at the Tufts University Center for Science and Mathematics Teaching) and Priscilla Laws (at Dickinson College) applied independently for grants to develop curricular materials based on both the outcomes of Physics Education Research and the use of Microcomputer Based Laboratory Tools (MBL) developed by Robert Tinker, Ron Thornton and others at Technical Education Research Centers (TERC). Thornton proposed to develop a series of Tools for Scientific Thinking (TST) laboratory exercises to address known learning difficulties using carefully sequenced MBL observations. These TST laboratories were to be beta tested at several types of institutions. Laws proposed to develop a Workshop Physics Activity Guide for a 2 semester calculus-based introductory course sequence centering on MBL-based guided inquiry. Workshop Physics was to be designed to replace traditional lectures and separate labs in relatively small classes and was to be tested at Dickinson College. In September 1986 a project officer at the Fund for Post-Secondary Education (FIPSE) awarded grants to Laws and Thornton provided that they would collaborate. David Sokoloff (at the University of Oregon) joined Thornton to develop and test the TST laboratories. This talk will describe the 23 year collaboration between Thornton, Laws, and Sokoloff that led to the development of a suite of Activity Based Physics curricular materials, new apparatus and enhanced computer tools for real time graphing, data collection and mathematical modeling. The Suite includes TST Labs, the Workshop Physics Activity Guide, RealTime Physics Laboratory Modules, and a series of Interactive Lecture Demonstrations. A textbook and a guide to using the Suite were also developed. The vital importance of obtaining continued grant support, doing continuous research on student learning, collaborating with instructors at other institutions, and forging relationships with vendors and publishers will be described. )

  9. Research in Distributed Real-Time Systems

    Science.gov (United States)

    Mukkamala, R.

    1997-01-01

    This document summarizes the progress we have made on our study of issues concerning the schedulability of real-time systems. Our study has produced several results in the scalability issues of distributed real-time systems. In particular, we have used our techniques to resolve schedulability issues in distributed systems with end-to-end requirements. During the next year (1997-98), we propose to extend the current work to address the modeling and workload characterization issues in distributed real-time systems. In particular, we propose to investigate the effect of different workload models and component models on the design and the subsequent performance of distributed real-time systems.

  10. FRB microstructure revealed by the real-time detection of FRB170827

    OpenAIRE

    Farah, W.; Flynn, C.; Bailes, M.; Jameson, A.; Bannister, K. W.; Barr, E. D.; Bateman, T.; Bhandari, S.; Caleb, M.; Campbell-Wilson, D.; Chang, S. -W.; Deller, A.; Green, A. J.; Hunstead, R.; Jankowski, F.

    2018-01-01

    We report a new Fast Radio Burst (FRB) discovered in real-time as part of the UTMOST project at the Molonglo Observatory Synthesis Radio Telescope (MOST). FRB170827 is the first detected with our low-latency ($< 24$ s), machine-learning-based FRB detection system. The FRB discovery was accompanied by the capture of voltage data at the native time and frequency resolution of the observing system, enabling coherent dedispersion and detailed off-line analysis, which have unveiled fine temporal a...

  11. Real-time data access layer for MDSplus

    International Nuclear Information System (INIS)

    Manduchi, G.; Luchetta, A.; Taliercio, C.; Fredian, T.; Stillerman, J.

    2008-01-01

    Recent extensions to MDSplus allow data handling in long discharges and provide a real-time data access and communication layer. The real-time data access layer is an additional component of MDSplus: it is possible to use the traditional MDSplus API during normal operation, and to select a subset of data items to be used in real time. Real-time notification is provided by a communication layer using a publish-subscribe pattern. The notification covers processes sharing the same data items even running on different machines, thus allowing the implementation of distributed control systems. The real-time data access layer has been developed for Windows, Linux, and VxWorks; it is currently being ported to Linux RTAI. In order to quantify the fingerprint of the presented system, the performance of the real-time access layer approach is compared with that of an ad hoc, manually optimized program in a sample real-time application

  12. A Real-Time Systems Symposium Preprint.

    Science.gov (United States)

    1983-09-01

    Real - Time Systems Symposium Preprint Interim Tech...estimate of the occurence of the error. Unclassii ledSECUqITY CLASSIF’ICA T" NO MI*IA If’ inDI /’rrd erter for~~ble. ’Corrputnqg A REAL - TIME SYSTEMS SYMPOSIUM...ABSTRACT This technical report contains a preprint of a paper accepted for presentation at the REAL - TIME SYSTEMS SYMPOSIUM, Arlington,

  13. Benefits of real-time gas management

    International Nuclear Information System (INIS)

    Nolty, R.; Dolezalek, D. Jr.

    1994-01-01

    In today's competitive gas gathering, processing, storage and transportation business environment, the requirements to do business are continually changing. These changes arise from government regulations such as the amendments to the Clean Air Act concerning the environment and FERC Order 636 concerning business practices. Other changes are due to advances in technology such as electronic flow measurement (EFM) and real-time communications capabilities within the gas industry. Gas gathering, processing, storage and transportation companies must be flexible in adapting to these changes to remain competitive. These dynamic requirements can be met with an open, real-time gas management computer information system. Such a system provides flexible services with a variety of software applications. Allocations, nominations management and gas dispatching are examples of applications that are provided on a real-time basis. By providing real-time services, the gas management system enables operations personnel to make timely adjustments within the current accounting period. Benefits realized from implementing a real-time gas management system include reduced unaccountable gas, reduced imbalance penalties, reduced regulatory violations, improved facility operations and better service to customers. These benefits give a company the competitive edge. This article discusses the applications provided, the benefits from implementing a real-time gas management system, and the definition of such a system

  14. Simulation-based learning: Just like the real thing.

    Science.gov (United States)

    Lateef, Fatimah

    2010-10-01

    Simulation is a technique for practice and learning that can be applied to many different disciplines and trainees. It is a technique (not a technology) to replace and amplify real experiences with guided ones, often "immersive" in nature, that evoke or replicate substantial aspects of the real world in a fully interactive fashion. Simulation-based learning can be the way to develop health professionals' knowledge, skills, and attitudes, whilst protecting patients from unnecessary risks. Simulation-based medical education can be a platform which provides a valuable tool in learning to mitigate ethical tensions and resolve practical dilemmas. Simulation-based training techniques, tools, and strategies can be applied in designing structured learning experiences, as well as be used as a measurement tool linked to targeted teamwork competencies and learning objectives. It has been widely applied in fields such aviation and the military. In medicine, simulation offers good scope for training of interdisciplinary medical teams. The realistic scenarios and equipment allows for retraining and practice till one can master the procedure or skill. An increasing number of health care institutions and medical schools are now turning to simulation-based learning. Teamwork training conducted in the simulated environment may offer an additive benefit to the traditional didactic instruction, enhance performance, and possibly also help reduce errors.

  15. Making real-time reactive systems reliable

    Science.gov (United States)

    Marzullo, Keith; Wood, Mark

    1990-01-01

    A reactive system is characterized by a control program that interacts with an environment (or controlled program). The control program monitors the environment and reacts to significant events by sending commands to the environment. This structure is quite general. Not only are most embedded real time systems reactive systems, but so are monitoring and debugging systems and distributed application management systems. Since reactive systems are usually long running and may control physical equipment, fault tolerance is vital. The research tries to understand the principal issues of fault tolerance in real time reactive systems and to build tools that allow a programmer to design reliable, real time reactive systems. In order to make real time reactive systems reliable, several issues must be addressed: (1) How can a control program be built to tolerate failures of sensors and actuators. To achieve this, a methodology was developed for transforming a control program that references physical value into one that tolerates sensors that can fail and can return inaccurate values; (2) How can the real time reactive system be built to tolerate failures of the control program. Towards this goal, whether the techniques presented can be extended to real time reactive systems is investigated; and (3) How can the environment be specified in a way that is useful for writing a control program. Towards this goal, whether a system with real time constraints can be expressed as an equivalent system without such constraints is also investigated.

  16. Teleducation : Linking Continents Across Time and Space Through Live, Real-Time Interactive Classes

    Science.gov (United States)

    Macko, S. A.; Szuba, T.; Swap, R.; Annegarn, H.; Marjanovic, B.; Vieira, F.; Brito, R.

    2005-12-01

    International education is a natural extension of global economies, global environmental concerns, and global science. While faculty and student exchanges between geographic areas permit for educational experiences and cultural exchanges for the privileged few, distance learning offers opportunities for educational exchanges under any circumstance where time, expense, or location otherwise inhibit offering or taking a particular course of study. However, there are severe pedagogical limitations to traditional Web-based courses that suffer from a lack of personalized, spontaneous exchange between instructor and student. The technology to establish a real time, interactive teleducation program exists, but to our knowledge is relatively untested in a science classroom situation, especially internationally over great distances. In a project to evaluate this type of linkage, we offered a real-time, interactive class at three separate universities, which communicated instantaneously across an ocean at a distance of greater than 8,000 miles and seven time zones. The course, 'Seminar on the Ecology of African Savannas', consisted of a series of 11 lectures originating in either Mozambique (University of Eduardo Mondlane), South Africa (University of the Witwatersrand) or the United States (University of Virginia). We combined ISDN, internet and satellite linkages to facilitate the lectures and real time discussions between instructors and approximately 200 university students in the three countries. Although numerous technical, logistical, and pedagogical issues - both expected and unexpected - arose throughout the pilot year, the project can be viewed as overwhelmingly successful and certainly serves as proof-of-concept for future initiatives, both internationally and locally. This review of our experience will help to prepare other students, faculty, and institutions interested in establishing or developing international education initiatives

  17. Space Weather and Real-Time Monitoring

    Directory of Open Access Journals (Sweden)

    S Watari

    2009-04-01

    Full Text Available Recent advance of information and communications technology enables to collect a large amount of ground-based and space-based observation data in real-time. The real-time data realize nowcast of space weather. This paper reports a history of space weather by the International Space Environment Service (ISES in association with the International Geophysical Year (IGY and importance of real-time monitoring in space weather.

  18. Research Directions in Real-Time Systems.

    Science.gov (United States)

    1996-09-01

    This report summarizes a survey of published research in real time systems . Material is presented that provides an overview of the topic, focusing on...communications protocols and scheduling techniques. It is noted that real - time systems deserve special attention separate from other areas because of...formal tools for design and analysis of real - time systems . The early work on applications as well as notable theoretical advances are summarized

  19. From Creatures of Habit to Goal-Directed Learners: Tracking the Developmental Emergence of Model-Based Reinforcement Learning.

    Science.gov (United States)

    Decker, Johannes H; Otto, A Ross; Daw, Nathaniel D; Hartley, Catherine A

    2016-06-01

    Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults performed a sequential reinforcement-learning task that enabled estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was apparent in choice behavior across all age groups, a model-based strategy was absent in children, became evident in adolescents, and strengthened in adults. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior. © The Author(s) 2016.

  20. Trip Travel Time Forecasting Based on Selective Forgetting Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Zhiming Gui

    2014-01-01

    Full Text Available Travel time estimation on road networks is a valuable traffic metric. In this paper, we propose a machine learning based method for trip travel time estimation in road networks. The method uses the historical trip information extracted from taxis trace data as the training data. An optimized online sequential extreme machine, selective forgetting extreme learning machine, is adopted to make the prediction. Its selective forgetting learning ability enables the prediction algorithm to adapt to trip conditions changes well. Experimental results using real-life taxis trace data show that the forecasting model provides an effective and practical way for the travel time forecasting.

  1. Reinforcement Learning for Routing in Cognitive Radio Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    Hasan A. A. Al-Rawi

    2014-01-01

    Full Text Available Cognitive radio (CR enables unlicensed users (or secondary users, SUs to sense for and exploit underutilized licensed spectrum owned by the licensed users (or primary users, PUs. Reinforcement learning (RL is an artificial intelligence approach that enables a node to observe, learn, and make appropriate decisions on action selection in order to maximize network performance. Routing enables a source node to search for a least-cost route to its destination node. While there have been increasing efforts to enhance the traditional RL approach for routing in wireless networks, this research area remains largely unexplored in the domain of routing in CR networks. This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate through simulation. New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing. Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs’ network performance without significantly jeopardizing PUs’ network performance, specifically SUs’ interference to PUs.

  2. Space Objects Maneuvering Detection and Prediction via Inverse Reinforcement Learning

    Science.gov (United States)

    Linares, R.; Furfaro, R.

    This paper determines the behavior of Space Objects (SOs) using inverse Reinforcement Learning (RL) to estimate the reward function that each SO is using for control. The approach discussed in this work can be used to analyze maneuvering of SOs from observational data. The inverse RL problem is solved using the Feature Matching approach. This approach determines the optimal reward function that a SO is using while maneuvering by assuming that the observed trajectories are optimal with respect to the SO's own reward function. This paper uses estimated orbital elements data to determine the behavior of SOs in a data-driven fashion.

  3. Real-Time MENTAT programming language and architecture

    Science.gov (United States)

    Grimshaw, Andrew S.; Silberman, Ami; Liu, Jane W. S.

    1989-01-01

    Real-time MENTAT, a programming environment designed to simplify the task of programming real-time applications in distributed and parallel environments, is described. It is based on the same data-driven computation model and object-oriented programming paradigm as MENTAT. It provides an easy-to-use mechanism to exploit parallelism, language constructs for the expression and enforcement of timing constraints, and run-time support for scheduling and exciting real-time programs. The real-time MENTAT programming language is an extended C++. The extensions are added to facilitate automatic detection of data flow and generation of data flow graphs, to express the timing constraints of individual granules of computation, and to provide scheduling directives for the runtime system. A high-level view of the real-time MENTAT system architecture and programming language constructs is provided.

  4. Real Time Conference 2016 Overview

    Science.gov (United States)

    Luchetta, Adriano

    2017-06-01

    This is a special issue of the IEEE Transactions on Nuclear Science containing papers from the invited, oral, and poster presentation of the 20th Real Time Conference (RT2016). The conference was held June 6-10, 2016, at Centro Congressi Padova “A. Luciani,” Padova, Italy, and was organized by Consorzio RFX (CNR, ENEA, INFN, Università di Padova, Acciaierie Venete SpA) and the Istituto Nazionale di Fisica Nucleare. The Real Time Conference is multidisciplinary and focuses on the latest developments in real-time techniques in high-energy physics, nuclear physics, astrophysics and astroparticle physics, nuclear fusion, medical physics, space instrumentation, nuclear power instrumentation, general radiation instrumentation, and real-time security and safety. Taking place every second year, it is sponsored by the Computer Application in Nuclear and Plasma Sciences technical committee of the IEEE Nuclear and Plasma Sciences Society. RT2016 attracted more than 240 registrants, with a large proportion of young researchers and engineers. It had an attendance of 67 students from many countries.

  5. Run-time middleware to support real-time system scenarios

    NARCIS (Netherlands)

    Goossens, K.; Koedam, M.; Sinha, S.; Nelson, A.; Geilen, M.

    2015-01-01

    Systems on Chip (SOC) are powerful multiprocessor systems capable of running multiple independent applications, often with both real-time and non-real-time requirements. Scenarios exist at two levels: first, combinations of independent applications, and second, different states of a single

  6. Advanced real-time manipulation of video streams

    CERN Document Server

    Herling, Jan

    2014-01-01

    Diminished Reality is a new fascinating technology that removes real-world content from live video streams. This sensational live video manipulation actually removes real objects and generates a coherent video stream in real-time. Viewers cannot detect modified content. Existing approaches are restricted to moving objects and static or almost static cameras and do not allow real-time manipulation of video content. Jan Herling presents a new and innovative approach for real-time object removal with arbitrary camera movements.

  7. Real Science, Real Learning: Bridging the Gap Between Scientists, Educators and Students

    Science.gov (United States)

    Lewis, Y.

    2006-05-01

    Today as never before, America needs its citizens to be literate in science and technology. Not only must we only inspire a new generation of scientists and engineers and technologists, we must foster a society capable of meeting complex, 21st-century challenges. Unfortunately, the need for creative, flexible thinkers is growing at a time when our young students are lagging in science interest and performance. Over the past 17 years, the JASON Project has worked to link real science and scientists to the classroom. This link provide viable pipeline to creating the next generation scientists and researchers. Ultimately, JASON's mission is to improve the way science is taught by enabling students to learn directly from leading scientists. Through partnerships with agencies such as NOAA and NASA, JASON creates multimedia classroom products based on current scientific research. Broadcasts of science expeditions, hosted by leading researchers, are coupled with classroom materials that include interactive computer-based simulations, video- on-demand, inquiry-based experiments and activities, and print materials for students and teachers. A "gated" Web site hosts online resources and provides a secure platform to network with scientists and other classrooms in a nationwide community of learners. Each curriculum is organized around a specific theme for a comprehensive learning experience. It may be taught as a complete package, or individual components can be selected to teach specific, standards-based concepts. Such thematic units include: Disappearing Wetlands, Mysteries of Earth and Mars, and Monster Storms. All JASON curriculum units are grounded in "inquiry-based learning." The highly interactive curriculum will enable students to access current, real-world scientific research and employ the scientific method through reflection, investigation, identification of problems, sharing of data, and forming and testing hypotheses. JASON specializes in effectively applying

  8. Towards the Future "Earthquake" School in the Cloud: Near-real Time Earthquake Games Competition in Taiwan

    Science.gov (United States)

    Chen, K. H.; Liang, W. T.; Wu, Y. F.; Yen, E.

    2014-12-01

    To prevent the future threats of natural disaster, it is important to understand how the disaster happened, why lives were lost, and what lessons have been learned. By that, the attitude of society toward natural disaster can be transformed from training to learning. The citizen-seismologists-in-Taiwan project is designed to elevate the quality of earthquake science education by means of incorporating earthquake/tsunami stories and near-real time earthquake games competition into the traditional curricula in schools. Through pilot of courses and professional development workshops, we have worked closely with teachers from elementary, junior high, and senior high schools, to design workable teaching plans through a practical operation of seismic monitoring at home or school. We will introduce how the 9-years-old do P- and S-wave picking and measure seismic intensity through interactive learning platform, how do scientists and school teachers work together, and how do we create an environment to facilitate continuous learning (i.e., near-real time earthquake games competition), to make earthquake science fun.

  9. Continuous theta-burst stimulation (cTBS) over the lateral prefrontal cortex alters reinforcement learning bias

    NARCIS (Netherlands)

    Ott, D.V.M.; Ullsperger, M.; Jocham, G.; Neumann, J.; Klein, T.A.

    2011-01-01

    The prefrontal cortex is known to play a key role in higher-order cognitive functions. Recently, we showed that this brain region is active in reinforcement learning, during which subjects constantly have to integrate trial outcomes in order to optimize performance. To further elucidate the role of

  10. How clerkship students learn from real patients in practice settings

    NARCIS (Netherlands)

    Steven, Kathryn; Wenger, Etienne; Boshuizen, Els; Scherpbier, Albert; Dornan, Tim

    2018-01-01

    Purpose To explore how undergraduate medical students learn from real patients in practice settings, the factors that affect their learning, and how clerkship learning might be enhanced. Method In 2009, 22 medical students in the three clerkship years of an undergraduate medical program in the

  11. Archtecture of distributed real-time systems

    OpenAIRE

    Wing Leung, Cheuk

    2013-01-01

    CRAFTERS (Constraint and Application Driven Framework for Tailoring Embedded Real-time System) project aims to address the problem of uncertainty and heterogeneity in a distributed system by providing seamless, portable connectivity and middleware. This thesis contributes to the project by investigating the techniques that can be used in a distributed real-time embedded system. The conclusion is that, there is a list of specifications to be meet in order to provide a transparent and real-time...

  12. Hand rim wheelchair propulsion training using biomechanical real-time visual feedback based on motor learning theory principles.

    Science.gov (United States)

    Rice, Ian; Gagnon, Dany; Gallagher, Jere; Boninger, Michael

    2010-01-01

    As considerable progress has been made in laboratory-based assessment of manual wheelchair propulsion biomechanics, the necessity to translate this knowledge into new clinical tools and treatment programs becomes imperative. The objective of this study was to describe the development of a manual wheelchair propulsion training program aimed to promote the development of an efficient propulsion technique among long-term manual wheelchair users. Motor learning theory principles were applied to the design of biomechanical feedback-based learning software, which allows for random discontinuous real-time visual presentation of key spatiotemporal and kinetic parameters. This software was used to train a long-term wheelchair user on a dynamometer during 3 low-intensity wheelchair propulsion training sessions over a 3-week period. Biomechanical measures were recorded with a SmartWheel during over ground propulsion on a 50-m level tile surface at baseline and 3 months after baseline. Training software was refined and administered to a participant who was able to improve his propulsion technique by increasing contact angle while simultaneously reducing stroke cadence, mean resultant force, peak and mean moment out of plane, and peak rate of rise of force applied to the pushrim after training. The proposed propulsion training protocol may lead to favorable changes in manual wheelchair propulsion technique. These changes could limit or prevent upper limb injuries among manual wheelchair users. In addition, many of the motor learning theory-based techniques examined in this study could be applied to training individuals in various stages of rehabilitation to optimize propulsion early on.

  13. Simulation-based learning: Just like the real thing

    Directory of Open Access Journals (Sweden)

    Lateef Fatimah

    2010-01-01

    Full Text Available Simulation is a technique for practice and learning that can be applied to many different disciplines and trainees. It is a technique (not a technology to replace and amplify real experiences with guided ones, often "immersive" in nature, that evoke or replicate substantial aspects of the real world in a fully interactive fashion. Simulation-based learning can be the way to develop health professionals′ knowledge, skills, and attitudes, whilst protecting patients from unnecessary risks. Simulation-based medical education can be a platform which provides a valuable tool in learning to mitigate ethical tensions and resolve practical dilemmas. Simulation-based training techniques, tools, and strategies can be applied in designing structured learning experiences, as well as be used as a measurement tool linked to targeted teamwork competencies and learning objectives. It has been widely applied in fields such aviation and the military. In medicine, simulation offers good scope for training of interdisciplinary medical teams. The realistic scenarios and equipment allows for retraining and practice till one can master the procedure or skill. An increasing number of health care institutions and medical schools are now turning to simulation-based learning. Teamwork training conducted in the simulated environment may offer an additive benefit to the traditional didactic instruction, enhance performance, and possibly also help reduce errors.

  14. The real-time price elasticity of electricity

    NARCIS (Netherlands)

    Lijesen, M.G.

    2007-01-01

    The real-time price elasticity of electricity contains important information on the demand response of consumers to the volatility of peak prices. Despite the importance, empirical estimates of the real-time elasticity are hardly available. This paper provides a quantification of the real-time

  15. A Kinect-Based Framework For Better User Experience in Real-Time Audiovisual Content Manipulation

    DEFF Research Database (Denmark)

    Potetsianakis, Emmanouil; Ksylakis, Emmanouil; Triantafyllidis, Georgios

    2014-01-01

    Applications for real-time multimedia content production, because of their delay-sensitive nature, require fast and precise control by the user. This is commonly achieved by specialized physical controllers that are application-specific with steep learning curves. In our work, we propose using the...

  16. Implementing Run-Time Evaluation of Distributed Timing Constraints in a Real-Time Environment

    DEFF Research Database (Denmark)

    Kristensen, C. H.; Drejer, N.

    1994-01-01

    In this paper we describe a solution to the problem of implementing run-time evaluation of timing constraints in distributed real-time environments......In this paper we describe a solution to the problem of implementing run-time evaluation of timing constraints in distributed real-time environments...

  17. Borehole images while drilling : real-time dip picking in the foothills

    Energy Technology Data Exchange (ETDEWEB)

    Dexter, D. [Schlumberger Canada Ltd., Calgary, AB (Canada); Brezsnyak, F. [Talisman Energy Inc., Calgary, AB (Canada); Roth, J. [Talisman Energy Inc., Calgary, AB (Canada)

    2008-07-01

    The Alberta Foothills drilling environment is a structurally complex thrust belt with slow costly drilling and frequent plan changes after logging. The cross sections are not always accurate due to poor resolution. Therefore, the placement of the wellbore is crucial to success. This presentation showed borehole images from drilling in the Foothills. Topics that were addressed included the Foothills drilling environment; target selection; current well placement methods; and current well performance. Borehole images included resistivity images and density images. The presentation addressed why real-time images should be run. These reasons include the ability to pick dips in real-time; structural information in real time allows for better well placement; it is easier to find and stay in producing areas; reduced non-productive time and probability of sidetracks; and elimination of pipe conveys logs. Applications in the Alberta Foothills such as the commercial run for GVR4 were also offered. Among the operational issues and lessons learned, it was determined that the reservoir thickness to measurement point distance ratio is too great to avoid exiting the sweet spot and that the survey calculation error cause image offset. It was concluded that GVR is a drillers tool for well placement. figs.

  18. Towards understanding and managing the learning process in mail sorting.

    Science.gov (United States)

    Berglund, M; Karltun, A

    2012-01-01

    This paper was based on case study research at the Swedish Mail Service Division and it addresses learning time to sort mail at new districts and means to support the learning process on an individual as well as organizational level. The study population consisted of 46 postmen and one team leader in the Swedish Mail Service Division. Data were collected through measurements of time for mail sorting, interviews and a focus group. The study showed that learning to sort mail was a much more complex process and took more time than expected by management. Means to support the learning process included clarification of the relationship between sorting and the topology of the district, a good work environment, increased support from colleagues and management, and a thorough introduction for new postmen. The identified means to support the learning process require an integration of human, technological and organizational aspects. The study further showed that increased operations flexibility cannot be reinforced without a systems perspective and thorough knowledge about real work activities and that ergonomists can aid businesses to acquire this knowledge.

  19. REAL TIME SYSTEM OPERATIONS 2006-2007

    Energy Technology Data Exchange (ETDEWEB)

    Eto, Joseph H.; Parashar, Manu; Lewis, Nancy Jo

    2008-08-15

    The Real Time System Operations (RTSO) 2006-2007 project focused on two parallel technical tasks: (1) Real-Time Applications of Phasors for Monitoring, Alarming and Control; and (2) Real-Time Voltage Security Assessment (RTVSA) Prototype Tool. The overall goal of the phasor applications project was to accelerate adoption and foster greater use of new, more accurate, time-synchronized phasor measurements by conducting research and prototyping applications on California ISO's phasor platform - Real-Time Dynamics Monitoring System (RTDMS) -- that provide previously unavailable information on the dynamic stability of the grid. Feasibility assessment studies were conducted on potential application of this technology for small-signal stability monitoring, validating/improving existing stability nomograms, conducting frequency response analysis, and obtaining real-time sensitivity information on key metrics to assess grid stress. Based on study findings, prototype applications for real-time visualization and alarming, small-signal stability monitoring, measurement based sensitivity analysis and frequency response assessment were developed, factory- and field-tested at the California ISO and at BPA. The goal of the RTVSA project was to provide California ISO with a prototype voltage security assessment tool that runs in real time within California ISO?s new reliability and congestion management system. CERTS conducted a technical assessment of appropriate algorithms, developed a prototype incorporating state-of-art algorithms (such as the continuation power flow, direct method, boundary orbiting method, and hyperplanes) into a framework most suitable for an operations environment. Based on study findings, a functional specification was prepared, which the California ISO has since used to procure a production-quality tool that is now a part of a suite of advanced computational tools that is used by California ISO for reliability and congestion management.

  20. A study of real-time content marketing : formulating real-time content marketing based on content, search and social media

    OpenAIRE

    Nguyen, Thi Kim Duyen

    2015-01-01

    The primary objective of this research is to understand profoundly the new concept of content marketing – real-time content marketing on the aspect of the digital marketing experts. Particularly, the research will focus on the real-time content marketing theories and how to build real-time content marketing strategy based on content, search and social media. It also finds out how marketers measure and keep track of conversion rates of their real-time content marketing plan. Practically, th...

  1. Application of XML in real-time data warehouse

    Science.gov (United States)

    Zhao, Yanhong; Wang, Beizhan; Liu, Lizhao; Ye, Su

    2009-07-01

    At present, XML is one of the most widely-used technologies of data-describing and data-exchanging, and the needs for real-time data make real-time data warehouse a popular area in the research of data warehouse. What effects can we have if we apply XML technology to the research of real-time data warehouse? XML technology solves many technologic problems which are impossible to be addressed in traditional real-time data warehouse, and realize the integration of OLAP (On-line Analytical Processing) and OLTP (Online transaction processing) environment. Then real-time data warehouse can truly be called "real time".

  2. Optimal medication dosing from suboptimal clinical examples: a deep reinforcement learning approach.

    Science.gov (United States)

    Nemati, Shamim; Ghassemi, Mohammad M; Clifford, Gari D

    2016-08-01

    Misdosing medications with sensitive therapeutic windows, such as heparin, can place patients at unnecessary risk, increase length of hospital stay, and lead to wasted hospital resources. In this work, we present a clinician-in-the-loop sequential decision making framework, which provides an individualized dosing policy adapted to each patient's evolving clinical phenotype. We employed retrospective data from the publicly available MIMIC II intensive care unit database, and developed a deep reinforcement learning algorithm that learns an optimal heparin dosing policy from sample dosing trails and their associated outcomes in large electronic medical records. Using separate training and testing datasets, our model was observed to be effective in proposing heparin doses that resulted in better expected outcomes than the clinical guidelines. Our results demonstrate that a sequential modeling approach, learned from retrospective data, could potentially be used at the bedside to derive individualized patient dosing policies.

  3. Mixed - mode Operating System for Real - time Performance

    OpenAIRE

    Hasan M. M.; Sultana S.; Foo C.K.

    2017-01-01

    The purpose of the mixed-mode system research is to handle devices with the accuracy of real-time systems and at the same time, having all the benefits and facilities of a matured Graphic User Interface(GUI)operating system which is typicallynon-real-time. This mixed-mode operating system comprising of a real-time portion and a non-real-time portion was studied and implemented to identify the feasibilities and performances in practical applications (in the context of scheduled the real-time e...

  4. Comparing L2 Word Learning through a Tablet or Real Objects: What Benefits Learning Most?

    NARCIS (Netherlands)

    Vlaar, M.A.J.; Verhagen, J.; Oudgenoeg-Paz, O.; Leseman, P.P.M.

    2017-01-01

    In child-robot interactions focused on language learning, tablets are often used to structure the interaction between the robot and the child. However, it is not clear how tablets affect children’s learning gains. Real-life objects are thought to benefit children’s word learning, but it is not clear

  5. Continuous-time on-policy neural reinforcement learning of working memory tasks

    NARCIS (Netherlands)

    D. Zambrano (Davide); P.R. Roelfsema; S.M. Bohte (Sander)

    2015-01-01

    htmlabstractAs living organisms, one of our primary characteristics is the ability to rapidly process and react to unknown and unexpected events. To this end, we are able to recognize an event or a sequence of events and learn to respond properly. Despite advances in machine learning, current

  6. Mixed-mode Operating System for Real-time Performance

    Directory of Open Access Journals (Sweden)

    M.M. Hasan

    2017-11-01

    Full Text Available The purpose of the mixed-mode system research is to handle devices with the accuracy of real-time systems and at the same time, having all the benefits and facilities of a matured Graphic User Interface (GUI operating system which is typically nonreal-time. This mixed-mode operating system comprising of a real-time portion and a non-real-time portion was studied and implemented to identify the feasibilities and performances in practical applications (in the context of scheduled the real-time events. In this research an i8751 microcontroller-based hardware was used to measure the performance of the system in real-time-only as well as non-real-time-only configurations. The real-time portion is an 486DX-40 IBM PC system running under DOS-based realtime kernel and the non-real-time portion is a Pentium III based system running under Windows NT. It was found that mixed-mode systems performed as good as a typical realtime system and in fact, gave many additional benefits such as simplified/modular programming and load tolerance.

  7. Probability Learning: Changes in Behavior across Time and Development

    Science.gov (United States)

    Plate, Rista C.; Fulvio, Jacqueline M.; Shutts, Kristin; Green, C. Shawn; Pollak, Seth D.

    2018-01-01

    Individuals track probabilities, such as associations between events in their environments, but less is known about the degree to which experience--within a learning session and over development--influences people's use of incoming probabilistic information to guide behavior in real time. In two experiments, children (4-11 years) and adults…

  8. Continuous carbon nanotube reinforced composites.

    Science.gov (United States)

    Ci, L; Suhr, J; Pushparaj, V; Zhang, X; Ajayan, P M

    2008-09-01

    Carbon nanotubes are considered short fibers, and polymer composites with nanotube fillers are always analogues of random, short fiber composites. The real structural carbon fiber composites, on the other hand, always contain carbon fiber reinforcements where fibers run continuously through the composite matrix. With the recent optimization in aligned nanotube growth, samples of nanotubes in macroscopic lengths have become available, and this allows the creation of composites that are similar to the continuous fiber composites with individual nanotubes running continuously through the composite body. This allows the proper utilization of the extreme high modulus and strength predicted for nanotubes in structural composites. Here, we fabricate such continuous nanotube polymer composites with continuous nanotube reinforcements and report that under compressive loadings, the nanotube composites can generate more than an order of magnitude improvement in the longitudinal modulus (up to 3,300%) as well as damping capability (up to 2,100%). It is also observed that composites with a random distribution of nanotubes of same length and similar filler fraction provide three times less effective reinforcement in composites.

  9. A reinforcement learning model of joy, distress, hope and fear

    Science.gov (United States)

    Broekens, Joost; Jacobs, Elmer; Jonker, Catholijn M.

    2015-07-01

    In this paper we computationally study the relation between adaptive behaviour and emotion. Using the reinforcement learning framework, we propose that learned state utility, ?, models fear (negative) and hope (positive) based on the fact that both signals are about anticipation of loss or gain. Further, we propose that joy/distress is a signal similar to the error signal. We present agent-based simulation experiments that show that this model replicates psychological and behavioural dynamics of emotion. This work distinguishes itself by assessing the dynamics of emotion in an adaptive agent framework - coupling it to the literature on habituation, development, extinction and hope theory. Our results support the idea that the function of emotion is to provide a complex feedback signal for an organism to adapt its behaviour. Our work is relevant for understanding the relation between emotion and adaptation in animals, as well as for human-robot interaction, in particular how emotional signals can be used to communicate between adaptive agents and humans.

  10. Reinforcement of drinking by running: effect of fixed ratio and reinforcement time1

    Science.gov (United States)

    Premack, David; Schaeffer, Robert W.; Hundt, Alan

    1964-01-01

    Rats were required to complete varying numbers of licks (FR), ranging from 10 to 300, in order to free an activity wheel for predetermined times (CT) ranging from 2 to 20 sec. The reinforcement of drinking by running was shown both by an increased frequency of licking, and by changes in length of the burst of licking relative to operant-level burst length. In log-log coordinates, instrumental licking tended to be a linear increasing function of FR for the range tested, a linear decreasing function of CT for the range tested. Pause time was implicated in both of the above relations, being a generally increasing function of both FR and CT. PMID:14120150

  11. Testing of real-time-software

    International Nuclear Information System (INIS)

    Friesland, G.; Ovenhausen, H.

    1975-05-01

    The situation in the area of testing real-time-software is unsatisfactory. During the first phase of the project PROMOTE (prozessorientiertes Modul- und Gesamttestsystem) an analysis of the momentary situation took place, results of which are summarized in the following study about some user interviews and an analysis of relevant literature. 22 users (industry, software-houses, hardware-manufacturers, and institutes) have been interviewed. Discussions were held about reliability of real-time software with special interest to error avoidance, testing, and debugging. Main aims of the analysis of the literature were elaboration of standard terms, comparison of existing test methods and -systems, and the definition of boundaries to related areas. During the further steps of this project some means and techniques will be worked out to systematically test real-time software. (orig.) [de

  12. Validation and Assessment of Multi-GNSS Real-Time Precise Point Positioning in Simulated Kinematic Mode Using IGS Real-Time Service

    Directory of Open Access Journals (Sweden)

    Liang Wang

    2018-02-01

    Full Text Available Precise Point Positioning (PPP is a popular technology for precise applications based on the Global Navigation Satellite System (GNSS. Multi-GNSS combined PPP has become a hot topic in recent years with the development of multiple GNSSs. Meanwhile, with the operation of the real-time service (RTS of the International GNSS Service (IGS agency that provides satellite orbit and clock corrections to broadcast ephemeris, it is possible to obtain the real-time precise products of satellite orbits and clocks and to conduct real-time PPP. In this contribution, the real-time multi-GNSS orbit and clock corrections of the CLK93 product are applied for real-time multi-GNSS PPP processing, and its orbit and clock qualities are investigated, first with a seven-day experiment by comparing them with the final multi-GNSS precise product ‘GBM’ from GFZ. Then, an experiment involving real-time PPP processing for three stations in the Multi-GNSS Experiment (MGEX network with a testing period of two weeks is conducted in order to evaluate the convergence performance of real-time PPP in a simulated kinematic mode. The experimental result shows that real-time PPP can achieve a convergence performance of less than 15 min for an accuracy level of 20 cm. Finally, the real-time data streams from 12 globally distributed IGS/MGEX stations for one month are used to assess and validate the positioning accuracy of real-time multi-GNSS PPP. The results show that the simulated kinematic positioning accuracy achieved by real-time PPP on different stations is about 3.0 to 4.0 cm for the horizontal direction and 5.0 to 7.0 cm for the three-dimensional (3D direction.

  13. Real-time tumor motion estimation using respiratory surrogate via memory-based learning

    International Nuclear Information System (INIS)

    Li Ruijiang; Xing Lei; Lewis, John H; Berbeco, Ross I

    2012-01-01

    Respiratory tumor motion is a major challenge in radiation therapy for thoracic and abdominal cancers. Effective motion management requires an accurate knowledge of the real-time tumor motion. External respiration monitoring devices (optical, etc) provide a noninvasive, non-ionizing, low-cost and practical approach to obtain the respiratory signal. Due to the highly complex and nonlinear relations between tumor and surrogate motion, its ultimate success hinges on the ability to accurately infer the tumor motion from respiratory surrogates. Given their widespread use in the clinic, such a method is critically needed. We propose to use a powerful memory-based learning method to find the complex relations between tumor motion and respiratory surrogates. The method first stores the training data in memory and then finds relevant data to answer a particular query. Nearby data points are assigned high relevance (or weights) and conversely distant data are assigned low relevance. By fitting relatively simple models to local patches instead of fitting one single global model, it is able to capture highly nonlinear and complex relations between the internal tumor motion and external surrogates accurately. Due to the local nature of weighting functions, the method is inherently robust to outliers in the training data. Moreover, both training and adapting to new data are performed almost instantaneously with memory-based learning, making it suitable for dynamically following variable internal/external relations. We evaluated the method using respiratory motion data from 11 patients. The data set consists of simultaneous measurement of 3D tumor motion and 1D abdominal surface (used as the surrogate signal in this study). There are a total of 171 respiratory traces, with an average peak-to-peak amplitude of ∼15 mm and average duration of ∼115 s per trace. Given only 5 s (roughly one breath) pretreatment training data, the method achieved an average 3D error of 1.5 mm and 95

  14. Real-time tumor motion estimation using respiratory surrogate via memory-based learning

    Science.gov (United States)

    Li, Ruijiang; Lewis, John H.; Berbeco, Ross I.; Xing, Lei

    2012-08-01

    Respiratory tumor motion is a major challenge in radiation therapy for thoracic and abdominal cancers. Effective motion management requires an accurate knowledge of the real-time tumor motion. External respiration monitoring devices (optical, etc) provide a noninvasive, non-ionizing, low-cost and practical approach to obtain the respiratory signal. Due to the highly complex and nonlinear relations between tumor and surrogate motion, its ultimate success hinges on the ability to accurately infer the tumor motion from respiratory surrogates. Given their widespread use in the clinic, such a method is critically needed. We propose to use a powerful memory-based learning method to find the complex relations between tumor motion and respiratory surrogates. The method first stores the training data in memory and then finds relevant data to answer a particular query. Nearby data points are assigned high relevance (or weights) and conversely distant data are assigned low relevance. By fitting relatively simple models to local patches instead of fitting one single global model, it is able to capture highly nonlinear and complex relations between the internal tumor motion and external surrogates accurately. Due to the local nature of weighting functions, the method is inherently robust to outliers in the training data. Moreover, both training and adapting to new data are performed almost instantaneously with memory-based learning, making it suitable for dynamically following variable internal/external relations. We evaluated the method using respiratory motion data from 11 patients. The data set consists of simultaneous measurement of 3D tumor motion and 1D abdominal surface (used as the surrogate signal in this study). There are a total of 171 respiratory traces, with an average peak-to-peak amplitude of ∼15 mm and average duration of ∼115 s per trace. Given only 5 s (roughly one breath) pretreatment training data, the method achieved an average 3D error of 1.5 mm and 95

  15. MO-FG-BRD-01: Real-Time Imaging and Tracking Techniques for Intrafractional Motion Management: Introduction and KV Tracking

    International Nuclear Information System (INIS)

    Fahimian, B.

    2015-01-01

    Intrafraction target motion is a prominent complicating factor in the accurate targeting of radiation within the body. Methods compensating for target motion during treatment, such as gating and dynamic tumor tracking, depend on the delineation of target location as a function of time during delivery. A variety of techniques for target localization have been explored and are under active development; these include beam-level imaging of radio-opaque fiducials, fiducial-less tracking of anatomical landmarks, tracking of electromagnetic transponders, optical imaging of correlated surrogates, and volumetric imaging within treatment delivery. The Joint Imaging and Therapy Symposium will provide an overview of the techniques for real-time imaging and tracking, with special focus on emerging modes of implementation across different modalities. In particular, the symposium will explore developments in 1) Beam-level kilovoltage X-ray imaging techniques, 2) EPID-based megavoltage X-ray tracking, 3) Dynamic tracking using electromagnetic transponders, and 4) MRI-based soft-tissue tracking during radiation delivery. Learning Objectives: Understand the fundamentals of real-time imaging and tracking techniques Learn about emerging techniques in the field of real-time tracking Distinguish between the advantages and disadvantages of different tracking modalities Understand the role of real-time tracking techniques within the clinical delivery work-flow

  16. MO-FG-BRD-01: Real-Time Imaging and Tracking Techniques for Intrafractional Motion Management: Introduction and KV Tracking

    Energy Technology Data Exchange (ETDEWEB)

    Fahimian, B. [Stanford University (United States)

    2015-06-15

    Intrafraction target motion is a prominent complicating factor in the accurate targeting of radiation within the body. Methods compensating for target motion during treatment, such as gating and dynamic tumor tracking, depend on the delineation of target location as a function of time during delivery. A variety of techniques for target localization have been explored and are under active development; these include beam-level imaging of radio-opaque fiducials, fiducial-less tracking of anatomical landmarks, tracking of electromagnetic transponders, optical imaging of correlated surrogates, and volumetric imaging within treatment delivery. The Joint Imaging and Therapy Symposium will provide an overview of the techniques for real-time imaging and tracking, with special focus on emerging modes of implementation across different modalities. In particular, the symposium will explore developments in 1) Beam-level kilovoltage X-ray imaging techniques, 2) EPID-based megavoltage X-ray tracking, 3) Dynamic tracking using electromagnetic transponders, and 4) MRI-based soft-tissue tracking during radiation delivery. Learning Objectives: Understand the fundamentals of real-time imaging and tracking techniques Learn about emerging techniques in the field of real-time tracking Distinguish between the advantages and disadvantages of different tracking modalities Understand the role of real-time tracking techniques within the clinical delivery work-flow.

  17. The FERMI-Elettra distributed real-time framework

    International Nuclear Information System (INIS)

    Pivetta, L.; Gaio, G.; Passuello, R.; Scalamera, G.

    2012-01-01

    FERMI-Elettra is a Free Electron Laser (FEL) based on a 1.5 GeV linac. The pulsed operation of the accelerator and the necessity to characterize and control each electron bunch requires synchronous acquisition of the beam diagnostics together with the ability to drive actuators in real-time at the linac repetition rate. The Adeos/Xenomai real-time extensions have been adopted in order to add real-time capabilities to the Linux based control system computers running the Tango software. A software communication protocol based on Gigabit Ethernet and known as Network Reflective Memory (NRM) has been developed to implement a shared memory across the whole control system, allowing computers to communicate in real-time. The NRM architecture, the real-time performance and the integration in the control system are described. (authors)

  18. Real-time video quality monitoring

    Science.gov (United States)

    Liu, Tao; Narvekar, Niranjan; Wang, Beibei; Ding, Ran; Zou, Dekun; Cash, Glenn; Bhagavathy, Sitaram; Bloom, Jeffrey

    2011-12-01

    The ITU-T Recommendation G.1070 is a standardized opinion model for video telephony applications that uses video bitrate, frame rate, and packet-loss rate to measure the video quality. However, this model was original designed as an offline quality planning tool. It cannot be directly used for quality monitoring since the above three input parameters are not readily available within a network or at the decoder. And there is a great room for the performance improvement of this quality metric. In this article, we present a real-time video quality monitoring solution based on this Recommendation. We first propose a scheme to efficiently estimate the three parameters from video bitstreams, so that it can be used as a real-time video quality monitoring tool. Furthermore, an enhanced algorithm based on the G.1070 model that provides more accurate quality prediction is proposed. Finally, to use this metric in real-world applications, we present an example emerging application of real-time quality measurement to the management of transmitted videos, especially those delivered to mobile devices.

  19. Depression, Activity, and Evaluation of Reinforcement

    Science.gov (United States)

    Hammen, Constance L.; Glass, David R., Jr.

    1975-01-01

    This research attempted to find the causal relation between mood and level of reinforcement. An effort was made to learn what mood change might occur if depressed subjects increased their levels of participation in reinforcing activities. (Author/RK)

  20. Heterogeneous Embedded Real-Time Systems Environment

    Science.gov (United States)

    2003-12-01

    AFRL-IF-RS-TR-2003-290 Final Technical Report December 2003 HETEROGENEOUS EMBEDDED REAL - TIME SYSTEMS ENVIRONMENT Integrated...HETEROGENEOUS EMBEDDED REAL - TIME SYSTEMS ENVIRONMENT 6. AUTHOR(S) Cosmo Castellano and James Graham 5. FUNDING NUMBERS C - F30602-97-C-0259

  1. Real Time with the Librarian: Using Web Conferencing Software to Connect to Distance Students

    Science.gov (United States)

    Riedel, Tom; Betty, Paul

    2013-01-01

    A pilot program to provide real-time library webcasts to Regis University distance students using Adobe Connect software was initiated in fall of 2011. Previously, most interaction between librarians and online students had been accomplished by asynchronous discussion threads in the Learning Management System. Library webcasts were offered in…

  2. MO-E-BRB-04: Real-Time Exit-Fluence Delivery Validation

    Energy Technology Data Exchange (ETDEWEB)

    Siebers, J. [University of Virginia Health System (United States)

    2015-06-15

    Recent high profile reports of technical failures and human errors causing severe radiation- induced injuries and deaths come in support of the sustained efforts to ensure patient safety in the delivery of radiation treatments. In addition, highly conformal radiation therapies and escalated fraction doses mandate increased and sustained accuracy of the entire radiotherapy process. Consequently, and as a Result of AAPM and ASTRO led efforts patient specific quality assurance for specialized radiation treatments such as IMRT, SRS/SBRT and Arc Therapy had become a three-tier process: Pre-treatment, during treatment, and post treatment patient specific QA. Traditional patient QA consists of pre-treatment data transfer integrity dosimetric verifications and during-treatment geometric verifications. However, as treatment adaptation becomes closer to deployment in the clinics, during treatment validation via exit detectors had become a realistic QA option, permitting plan assessment in near real time. Post-treatment, machine logs allow comparisons of a range of mechanical parameters. A combination of these techniques could be used in evaluating inter-fraction, and intra-fraction delivery over a long time period such as an year, to evaluate the significant errors per site, per treatment technique. This type of data mining over longer periods of time provides the potential to recognize suboptimal radiation treatments, while allowing to identify systematic, possibly significant errors. This would allow creation of a data base of realized errors, small and large in dosimetry that could be for process or equipment improvement. This educational symposium will describe and review patient QA techniques, results, and strategies for patient specific quality assurance. Learning Objectives: review the goals of pre-treatment QA for various specialized procedures review methods and means for pre-treatment QA, limitations and tolerances review the scenarios where Varian/Tomo Log files

  3. MO-E-BRB-04: Real-Time Exit-Fluence Delivery Validation

    International Nuclear Information System (INIS)

    Siebers, J.

    2015-01-01

    Recent high profile reports of technical failures and human errors causing severe radiation- induced injuries and deaths come in support of the sustained efforts to ensure patient safety in the delivery of radiation treatments. In addition, highly conformal radiation therapies and escalated fraction doses mandate increased and sustained accuracy of the entire radiotherapy process. Consequently, and as a Result of AAPM and ASTRO led efforts patient specific quality assurance for specialized radiation treatments such as IMRT, SRS/SBRT and Arc Therapy had become a three-tier process: Pre-treatment, during treatment, and post treatment patient specific QA. Traditional patient QA consists of pre-treatment data transfer integrity dosimetric verifications and during-treatment geometric verifications. However, as treatment adaptation becomes closer to deployment in the clinics, during treatment validation via exit detectors had become a realistic QA option, permitting plan assessment in near real time. Post-treatment, machine logs allow comparisons of a range of mechanical parameters. A combination of these techniques could be used in evaluating inter-fraction, and intra-fraction delivery over a long time period such as an year, to evaluate the significant errors per site, per treatment technique. This type of data mining over longer periods of time provides the potential to recognize suboptimal radiation treatments, while allowing to identify systematic, possibly significant errors. This would allow creation of a data base of realized errors, small and large in dosimetry that could be for process or equipment improvement. This educational symposium will describe and review patient QA techniques, results, and strategies for patient specific quality assurance. Learning Objectives: review the goals of pre-treatment QA for various specialized procedures review methods and means for pre-treatment QA, limitations and tolerances review the scenarios where Varian/Tomo Log files

  4. Temporal Proof Methodologies for Real-Time Systems,

    Science.gov (United States)

    1990-09-01

    real time systems that communicate either through shared variables or by message passing and real time issues such as time-outs, process priorities (interrupts) and process scheduling. The authors exhibit two styles for the specification of real - time systems . While the first approach uses bounded versions of temporal operators the second approach allows explicit references to time through a special clock variable. Corresponding to two styles of specification the authors present and compare two fundamentally different proof

  5. Real Time Physiological Status Monitoring (RT-PSM): Accomplishments, Requirements, and Research Roadmap

    Science.gov (United States)

    2016-03-01

    actionable information. With many lessons learned , the first implementation of real time physiological monitoring (RT-PSM) uses thermal-work strain... Bidirectional Inductive On-Body Network (BIONET) for WPSM Develop sensor links and processing nodes on-Soldier and non-RF links off-Soldier Elintrix...recent sleep watches (e.g., BASIS Peak, Intel Corp.) are attempting to parse sleep quality beyond duration and interruptions into deep and REM sleep

  6. Real-time communication protocols: an overview

    NARCIS (Netherlands)

    Hanssen, F.T.Y.; Jansen, P.G.

    2003-01-01

    This paper describes several existing data link layer protocols that provide real-time capabilities on wired networks, focusing on token-ring and Carrier Sense Multiple Access based networks. Existing modifications to provide better real-time capabilities and performance are also described. Finally

  7. Self-Organization in Embedded Real-Time Systems

    CERN Document Server

    Brinkschulte, Uwe; Rettberg, Achim

    2013-01-01

    This book describes the emerging field of self-organizing, multicore, distributed and real-time embedded systems.  Self-organization of both hardware and software can be a key technique to handle the growing complexity of modern computing systems. Distributed systems running hundreds of tasks on dozens of processors, each equipped with multiple cores, requires self-organization principles to ensure efficient and reliable operation. This book addresses various, so-called Self-X features such as self-configuration, self-optimization, self-adaptation, self-healing and self-protection. Presents open components for embedded real-time adaptive and self-organizing applications; Describes innovative techniques in: scheduling, memory management, quality of service, communications supporting organic real-time applications; Covers multi-/many-core embedded systems supporting real-time adaptive systems and power-aware, adaptive hardware and software systems; Includes case studies of open embedded real-time self-organizi...

  8. Real-time systems scheduling fundamentals

    CERN Document Server

    Chetto, Maryline

    2014-01-01

    Real-time systems are used in a wide range of applications, including control, sensing, multimedia, etc.  Scheduling is a central problem for these computing/communication systems since responsible of software execution in a timely manner. This book provides state of knowledge in this domain with special emphasis on the key results obtained within the last decade. This book addresses foundations as well as the latest advances and findings in Real-Time Scheduling, giving all references to important papers. But nevertheless the chapters will be short and not overloaded with confusing details.

  9. A Plant Control Technology Using Reinforcement Learning Method with Automatic Reward Adjustment

    Science.gov (United States)

    Eguchi, Toru; Sekiai, Takaaki; Yamada, Akihiro; Shimizu, Satoru; Fukai, Masayuki

    A control technology using Reinforcement Learning (RL) and Radial Basis Function (RBF) Network has been developed to reduce environmental load substances exhausted from power and industrial plants. This technology consists of the statistic model using RBF Network, which estimates characteristics of plants with respect to environmental load substances, and RL agent, which learns the control logic for the plants using the statistic model. In this technology, it is necessary to design an appropriate reward function given to the agent immediately according to operation conditions and control goals to control plants flexibly. Therefore, we propose an automatic reward adjusting method of RL for plant control. This method adjusts the reward function automatically using information of the statistic model obtained in its learning process. In the simulations, it is confirmed that the proposed method can adjust the reward function adaptively for several test functions, and executes robust control toward the thermal power plant considering the change of operation conditions and control goals.

  10. Real-time specifications

    DEFF Research Database (Denmark)

    David, A.; Larsen, K.G.; Legay, A.

    2015-01-01

    A specification theory combines notions of specifications and implementations with a satisfaction relation, a refinement relation, and a set of operators supporting stepwise design. We develop a specification framework for real-time systems using Timed I/O Automata as the specification formalism......, with the semantics expressed in terms of Timed I/O Transition Systems. We provide constructs for refinement, consistency checking, logical and structural composition, and quotient of specifications-all indispensable ingredients of a compositional design methodology. The theory is implemented in the new tool Ecdar...

  11. Reinforcement learning of self-regulated β-oscillations for motor restoration in chronic stroke

    Directory of Open Access Journals (Sweden)

    Georgios eNaros

    2015-07-01

    Full Text Available Neurofeedback training of motor imagery-related brain-states with brain-machine interfaces (BMI is currently being explored prior to standard physiotherapy to improve the motor outcome of stroke rehabilitation. Pilot studies suggest that such a priming intervention before physiotherapy might increase the responsiveness of the brain to the subsequent physiotherapy, thereby improving the clinical outcome. However, there is little evidence up to now that these BMI-based interventions have achieved operate conditioning of specific brain states that facilitate task-specific functional gains beyond the practice of primed physiotherapy. In this context, we argue that BMI technology needs to aim at physiological features relevant for the targeted behavioral gain. Moreover, this therapeutic intervention has to be informed by concepts of reinforcement learning to develop its full potential. Such a refined neurofeedback approach would need to address the following issues (1 Defining a physiological feedback target specific to the intended behavioral gain, e.g. β-band oscillations for cortico-muscular communication. This targeted brain state could well be different from the brain state optimal for the neurofeedback task (2 Selecting a BMI classification and thresholding approach on the basis of learning principles, i.e. balancing challenge and reward of the neurofeedback task instead of maximizing the classification accuracy of the feedback device (3 Adjusting the feedback in the course of the training period to account for the cognitive load and the learning experience of the participant. The proposed neurofeedback strategy provides evidence for the feasibility of the suggested approach by demonstrating that dynamic threshold adaptation based on reinforcement learning may lead to frequency-specific operant conditioning of β-band oscillations paralleled by task-specific motor improvement; a proposal that requires investigation in a larger cohort of stroke

  12. TractorEYE: Vision-based Real-time Detection for Autonomous Vehicles in Agriculture

    DEFF Research Database (Denmark)

    Christiansen, Peter

    ) using a smaller memory footprint and 7.3-times faster processing. Low memory footprint and fast processing makes DeepAnomaly suitable for real-time applications running on an embedded GPU. FieldSAFE is a multi-modal dataset for detection of static and moving obstacles in agriculture. The dataset...... (four for rgb camera, one for thermal camera and one for a Multi-beam lidar) and fuse detection information in a common format using either 3D positions or Inverse Sensor Models. A GPU powered computational platform is able to run detection algorithms online. For the rgb camera, a deep learning...... algorithm is proposed DeepAnomaly to perform real-time anomaly detection of distant, heavy occluded and unknown obstacles in agriculture. DeepAnomaly is - compared to a state-of-the-art object detector Faster R-CNN - for an agricultural use-case able to detect humans better and at longer ranges (45-90m...

  13. Integrating Real-Time Antecedent Rubrics via Blackboard™ into a Community College General Psychology Class

    Science.gov (United States)

    Goomas, David

    2015-01-01

    Numerous studies have reported on the innovative and effective delivery of online course content by community colleges, but not much has been done on how learning management systems (LMS) can deliver real-time (immediate data delivery) antecedents that inform students of performance requirements. This pilot study used Blackboard's™ interactive…

  14. Flexural strength using Steel Plate, Carbon Fiber Reinforced Polymer (CFRP) and Glass Fiber Reinforced Polymer (GFRP) on reinforced concrete beam in building technology

    Science.gov (United States)

    Tarigan, Johannes; Patra, Fadel Muhammad; Sitorus, Torang

    2018-03-01

    Reinforced concrete structures are very commonly used in buildings because they are cheaper than the steel structures. But in reality, many concrete structures are damaged, so there are several ways to overcome this problem, by providing reinforcement with Fiber Reinforced Polymer (FRP) and reinforcement with steel plates. Each type of reinforcements has its advantages and disadvantages. In this study, researchers discuss the comparison between flexural strength of reinforced concrete beam using steel plates and Fiber Reinforced Polymer (FRP). In this case, the researchers use Carbon Fiber Reinforced Polymer (CFRP) and Glass Fiber Reinforced Polymer (GFRP) as external reinforcements. The dimension of the beams is 15 x 25 cm with the length of 320 cm. Based on the analytical results, the strength of the beam with CFRP is 1.991 times its initial, GFRP is 1.877 times while with the steel plate is 1.646 times. Based on test results, the strength of the beam with CFRP is 1.444 times its initial, GFRP is 1.333 times while the steel plate is 1.167 times. Based on these test results, the authors conclude that beam with CFRP is the best choice for external reinforcement in building technology than the others.

  15. On Real-Time Systems Using Local Area Networks.

    Science.gov (United States)

    1987-07-01

    87-35 July, 1987 CS-TR-1892 On Real - Time Systems Using Local Area Networks*I VShem-Tov Levi Department of Computer Science Satish K. Tripathit...1892 On Real - Time Systems Using Local Area Networks* Shem-Tov Levi Department of Computer Science Satish K. Tripathit Department of Computer Science...constraints and the clock systems that feed the time to real - time systems . A model for real-time system based on LAN communication is presented in

  16. Approaching near real-time biosensing: microfluidic microsphere based biosensor for real-time analyte detection.

    Science.gov (United States)

    Cohen, Noa; Sabhachandani, Pooja; Golberg, Alexander; Konry, Tania

    2015-04-15

    In this study we describe a simple lab-on-a-chip (LOC) biosensor approach utilizing well mixed microfluidic device and a microsphere-based assay capable of performing near real-time diagnostics of clinically relevant analytes such cytokines and antibodies. We were able to overcome the adsorption kinetics reaction rate-limiting mechanism, which is diffusion-controlled in standard immunoassays, by introducing the microsphere-based assay into well-mixed yet simple microfluidic device with turbulent flow profiles in the reaction regions. The integrated microsphere-based LOC device performs dynamic detection of the analyte in minimal amount of biological specimen by continuously sampling micro-liter volumes of sample per minute to detect dynamic changes in target analyte concentration. Furthermore we developed a mathematical model for the well-mixed reaction to describe the near real time detection mechanism observed in the developed LOC method. To demonstrate the specificity and sensitivity of the developed real time monitoring LOC approach, we applied the device for clinically relevant analytes: Tumor Necrosis Factor (TNF)-α cytokine and its clinically used inhibitor, anti-TNF-α antibody. Based on the reported results herein, the developed LOC device provides continuous sensitive and specific near real-time monitoring method for analytes such as cytokines and antibodies, reduces reagent volumes by nearly three orders of magnitude as well as eliminates the washing steps required by standard immunoassays. Copyright © 2014 Elsevier B.V. All rights reserved.

  17. Mixed-mode Operating System for Real-time Performance

    OpenAIRE

    M.M. Hasan; S. Sultana; C.K. Foo

    2017-01-01

    The purpose of the mixed-mode system research is to handle devices with the accuracy of real-time systems and at the same time, having all the benefits and facilities of a matured Graphic User Interface (GUI) operating system which is typically nonreal-time. This mixed-mode operating system comprising of a real-time portion and a non-real-time portion was studied and implemented to identify the feasibilities and performances in practical applications (in the context of scheduled the real-time...

  18. Linux real-time framework for fusion devices

    Energy Technology Data Exchange (ETDEWEB)

    Neto, Andre [Associacao Euratom-IST, Instituto de Plasmas e Fusao Nuclear, Av. Rovisco Pais, 1049-001 Lisboa (Portugal)], E-mail: andre.neto@cfn.ist.utl.pt; Sartori, Filippo; Piccolo, Fabio [Euratom-UKAEA, Culham Science Centre, Abingdon, Oxon OX14 3DB (United Kingdom); Barbalace, Antonio [Euratom-ENEA Association, Consorzio RFX, 35127 Padova (Italy); Vitelli, Riccardo [Dipartimento di Informatica, Sistemi e Produzione, Universita di Roma, Tor Vergata, Via del Politecnico 1-00133, Roma (Italy); Fernandes, Horacio [Associacao Euratom-IST, Instituto de Plasmas e Fusao Nuclear, Av. Rovisco Pais, 1049-001 Lisboa (Portugal)

    2009-06-15

    A new framework for the development and execution of real-time codes is currently being developed and commissioned at JET. The foundations of the system are Linux, the Real Time Application Interface (RTAI) and a wise exploitation of the new i386 multi-core processors technology. The driving motivation was the need to find a real-time operating system for the i386 platform able to satisfy JET Vertical Stabilisation Enhancement project requirements: 50 {mu}s cycle time. Even if the initial choice was the VxWorks operating system, it was decided to explore an open source alternative, mostly because of the costs involved in the commercial product. The work started with the definition of a precise set of requirements and milestones to achieve: Linux distribution and kernel versions to be used for the real-time operating system; complete characterization of the Linux/RTAI real-time capabilities; exploitation of the multi-core technology; implementation of all the required and missing features; commissioning of the system. Latency and jitter measurements were compared for Linux and RTAI in both user and kernel-space. The best results were attained using the RTAI kernel solution where the time to reschedule a real-time task after an external interrupt is of 2.35 {+-} 0.35 {mu}s. In order to run the real-time codes in the kernel-space, a solution to provide user-space functionalities to the kernel modules had to be designed. This novel work provided the most common functions from the standard C library and transparent interaction with files and sockets to the kernel real-time modules. Kernel C++ support was also tested, further developed and integrated in the framework. The work has produced very convincing results so far: complete isolation of the processors assigned to real-time from the Linux non real-time activities, high level of stability over several days of benchmarking operations and values well below 3 {mu}s for task rescheduling after external interrupt. From

  19. Linux real-time framework for fusion devices

    International Nuclear Information System (INIS)

    Neto, Andre; Sartori, Filippo; Piccolo, Fabio; Barbalace, Antonio; Vitelli, Riccardo; Fernandes, Horacio

    2009-01-01

    A new framework for the development and execution of real-time codes is currently being developed and commissioned at JET. The foundations of the system are Linux, the Real Time Application Interface (RTAI) and a wise exploitation of the new i386 multi-core processors technology. The driving motivation was the need to find a real-time operating system for the i386 platform able to satisfy JET Vertical Stabilisation Enhancement project requirements: 50 μs cycle time. Even if the initial choice was the VxWorks operating system, it was decided to explore an open source alternative, mostly because of the costs involved in the commercial product. The work started with the definition of a precise set of requirements and milestones to achieve: Linux distribution and kernel versions to be used for the real-time operating system; complete characterization of the Linux/RTAI real-time capabilities; exploitation of the multi-core technology; implementation of all the required and missing features; commissioning of the system. Latency and jitter measurements were compared for Linux and RTAI in both user and kernel-space. The best results were attained using the RTAI kernel solution where the time to reschedule a real-time task after an external interrupt is of 2.35 ± 0.35 μs. In order to run the real-time codes in the kernel-space, a solution to provide user-space functionalities to the kernel modules had to be designed. This novel work provided the most common functions from the standard C library and transparent interaction with files and sockets to the kernel real-time modules. Kernel C++ support was also tested, further developed and integrated in the framework. The work has produced very convincing results so far: complete isolation of the processors assigned to real-time from the Linux non real-time activities, high level of stability over several days of benchmarking operations and values well below 3 μs for task rescheduling after external interrupt. From being the

  20. Static Schedulers for Embedded Real-Time Systems

    Science.gov (United States)

    1989-12-01

    Because of the need for having efficient scheduling algorithms in large scale real time systems , software engineers put a lot of effort on developing...provide static schedulers for he Embedded Real Time Systems with single processor using Ada programming language. The independent nonpreemptable...support the Computer Aided Rapid Prototyping for Embedded Real Time Systems so that we determine whether the system, as designed, meets the required