WorldWideScience

Sample records for adaptive control evaluation function policy search approach reinforcement learning value function.

  1. Evaluation-Function-based Model-free Adaptive Fuzzy Control

    Directory of Open Access Journals (Sweden)

    Agus Naba

    2016-12-01

    Full Text Available Designs of adaptive fuzzy controllers (AFC are commonly based on the Lyapunov approach, which requires a known model of the controlled plant. They need to consider a Lyapunov function candidate as an evaluation function to be minimized. In this study these drawbacks were handled by designing a model-free adaptive fuzzy controller (MFAFC using an approximate evaluation function defined in terms of the current state, the next state, and the control action. MFAFC considers the approximate evaluation function as an evaluative control performance measure similar to the state-action value function in reinforcement learning. The simulation results of applying MFAFC to the inverted pendulum benchmark verified the proposed scheme’s efficacy.

  2. Adaptive Trajectory Tracking Control using Reinforcement Learning for Quadrotor

    Directory of Open Access Journals (Sweden)

    Wenjie Lou

    2016-02-01

    Full Text Available Inaccurate system parameters and unpredicted external disturbances affect the performance of non-linear controllers. In this paper, a new adaptive control algorithm under the reinforcement framework is proposed to stabilize a quadrotor helicopter. Based on a command-filtered non-linear control algorithm, adaptive elements are added and learned by policy-search methods. To predict the inaccurate system parameters, a new kernel-based regression learning method is provided. In addition, Policy learning by Weighting Exploration with the Returns (PoWER and Return Weighted Regression (RWR are utilized to learn the appropriate parameters for adaptive elements in order to cancel the effect of external disturbance. Furthermore, numerical simulations under several conditions are performed, and the ability of adaptive trajectory-tracking control with reinforcement learning are demonstrated.

  3. Optimal Control via Reinforcement Learning with Symbolic Policy Approximation

    NARCIS (Netherlands)

    Kubalìk, Jiřì; Alibekov, Eduard; Babuska, R.; Dochain, Denis; Henrion, Didier; Peaucelle, Dimitri

    2017-01-01

    Model-based reinforcement learning (RL) algorithms can be used to derive optimal control laws for nonlinear dynamic systems. With continuous-valued state and input variables, RL algorithms have to rely on function approximators to represent the value function and policy mappings. This paper

  4. Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games.

    Science.gov (United States)

    Li, Jinna; Modares, Hamidreza; Chai, Tianyou; Lewis, Frank L; Xie, Lihua

    2017-10-01

    This paper develops an off-policy reinforcement learning (RL) algorithm to solve optimal synchronization of multiagent systems. This is accomplished by using the framework of graphical games. In contrast to traditional control protocols, which require complete knowledge of agent dynamics, the proposed off-policy RL algorithm is a model-free approach, in that it solves the optimal synchronization problem without knowing any knowledge of the agent dynamics. A prescribed control policy, called behavior policy, is applied to each agent to generate and collect data for learning. An off-policy Bellman equation is derived for each agent to learn the value function for the policy under evaluation, called target policy, and find an improved policy, simultaneously. Actor and critic neural networks along with least-square approach are employed to approximate target control policies and value functions using the data generated by applying prescribed behavior policies. Finally, an off-policy RL algorithm is presented that is implemented in real time and gives the approximate optimal control policy for each agent using only measured data. It is shown that the optimal distributed policies found by the proposed algorithm satisfy the global Nash equilibrium and synchronize all agents to the leader. Simulation results illustrate the effectiveness of the proposed method.

  5. Learning to trade via direct reinforcement.

    Science.gov (United States)

    Moody, J; Saffell, M

    2001-01-01

    We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision-making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills.

  6. Episodic reinforcement learning control approach for biped walking

    Directory of Open Access Journals (Sweden)

    Katić Duško

    2012-01-01

    Full Text Available This paper presents a hybrid dynamic control approach to the realization of humanoid biped robotic walk, focusing on the policy gradient episodic reinforcement learning with fuzzy evaluative feedback. The proposed structure of controller involves two feedback loops: a conventional computed torque controller and an episodic reinforcement learning controller. The reinforcement learning part includes fuzzy information about Zero-Moment- Point errors. Simulation tests using a medium-size 36-DOF humanoid robot MEXONE were performed to demonstrate the effectiveness of our method.

  7. Towards autonomous neuroprosthetic control using Hebbian reinforcement learning.

    Science.gov (United States)

    Mahmoudi, Babak; Pohlmeyer, Eric A; Prins, Noeline W; Geng, Shijia; Sanchez, Justin C

    2013-12-01

    Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.

  8. Manifold Regularized Reinforcement Learning.

    Science.gov (United States)

    Li, Hongliang; Liu, Derong; Wang, Ding

    2018-04-01

    This paper introduces a novel manifold regularized reinforcement learning scheme for continuous Markov decision processes. Smooth feature representations for value function approximation can be automatically learned using the unsupervised manifold regularization method. The learned features are data-driven, and can be adapted to the geometry of the state space. Furthermore, the scheme provides a direct basis representation extension for novel samples during policy learning and control. The performance of the proposed scheme is evaluated on two benchmark control tasks, i.e., the inverted pendulum and the energy storage problem. Simulation results illustrate the concepts of the proposed scheme and show that it can obtain excellent performance.

  9. 'Proactive' use of cue-context congruence for building reinforcement learning's reward function.

    Science.gov (United States)

    Zsuga, Judit; Biro, Klara; Tajti, Gabor; Szilasi, Magdolna Emma; Papp, Csaba; Juhasz, Bela; Gesztelyi, Rudolf

    2016-10-28

    Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent's knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability. The central objective of reinforcement learning is to solve these two functions outside the agent's control either using, or not using a model. In the present paper, using the proactive model of reinforcement learning we offer insight on how the brain creates simplified representations of the environment, and how these representations are organized to support the identification of relevant stimuli and action. Furthermore, we identify neurobiological correlates of our model by suggesting that the reward and policy functions, attributes of the Bellman equitation, are built by the orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC), respectively. Based on this we propose that the OFC assesses cue-context congruence to activate the most context frame. Furthermore given the bidirectional neuroanatomical link between the OFC and model-free structures, we suggest that model-based input is incorporated into the reward prediction error (RPE) signal, and conversely RPE signal may be used to update the reward-related information of context frames and the policy underlying action selection in the OFC and ACC, respectively. Furthermore clinical implications for cognitive behavioral interventions are discussed.

  10. Adaptive representations for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.

    2010-01-01

    This book presents new algorithms for reinforcement learning, a form of machine learning in which an autonomous agent seeks a control policy for a sequential decision task. Since current methods typically rely on manually designed solution representations, agents that automatically adapt their own

  11. Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems

    International Nuclear Information System (INIS)

    Wei Qing-Lai; Song Rui-Zhuo; Xiao Wen-Dong; Sun Qiu-Ye

    2015-01-01

    This paper estimates an off-policy integral reinforcement learning (IRL) algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the system data generated by an arbitrary control. Moreover, off-policy IRL can be regarded as a direct learning method, which avoids the identification of system dynamics. In this paper, the performance index function is first given based on the system tracking error and control error. For solving the Hamilton–Jacobi–Bellman (HJB) equation, an off-policy IRL algorithm is proposed. It is proven that the iterative control makes the tracking error system asymptotically stable, and the iterative performance index function is convergent. Simulation study demonstrates the effectiveness of the developed tracking control method. (paper)

  12. Framework for robot skill learning using reinforcement learning

    Science.gov (United States)

    Wei, Yingzi; Zhao, Mingyang

    2003-09-01

    Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.

  13. Reinforcement learning solution for HJB equation arising in constrained optimal control problem.

    Science.gov (United States)

    Luo, Biao; Wu, Huai-Ning; Huang, Tingwen; Liu, Derong

    2015-11-01

    The constrained optimal control problem depends on the solution of the complicated Hamilton-Jacobi-Bellman equation (HJBE). In this paper, a data-based off-policy reinforcement learning (RL) method is proposed, which learns the solution of the HJBE and the optimal control policy from real system data. One important feature of the off-policy RL is that its policy evaluation can be realized with data generated by other behavior policies, not necessarily the target policy, which solves the insufficient exploration problem. The convergence of the off-policy RL is proved by demonstrating its equivalence to the successive approximation approach. Its implementation procedure is based on the actor-critic neural networks structure, where the function approximation is conducted with linearly independent basis functions. Subsequently, the convergence of the implementation procedure with function approximation is also proved. Finally, its effectiveness is verified through computer simulations. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. The Study of Reinforcement Learning for Traffic Self-Adaptive Control under Multiagent Markov Game Environment

    Directory of Open Access Journals (Sweden)

    Lun-Hui Xu

    2013-01-01

    Full Text Available Urban traffic self-adaptive control problem is dynamic and uncertain, so the states of traffic environment are hard to be observed. Efficient agent which controls a single intersection can be discovered automatically via multiagent reinforcement learning. However, in the majority of the previous works on this approach, each agent needed perfect observed information when interacting with the environment and learned individually with less efficient coordination. This study casts traffic self-adaptive control as a multiagent Markov game problem. The design employs traffic signal control agent (TSCA for each signalized intersection that coordinates with neighboring TSCAs. A mathematical model for TSCAs’ interaction is built based on nonzero-sum markov game which has been applied to let TSCAs learn how to cooperate. A multiagent Markov game reinforcement learning approach is constructed on the basis of single-agent Q-learning. This method lets each TSCA learn to update its Q-values under the joint actions and imperfect information. The convergence of the proposed algorithm is analyzed theoretically. The simulation results show that the proposed method is convergent and effective in realistic traffic self-adaptive control setting.

  15. Deep reinforcement learning for automated radiation adaptation in lung cancer.

    Science.gov (United States)

    Tseng, Huan-Hsin; Luo, Yi; Cui, Sunan; Chien, Jen-Tzung; Ten Haken, Randall K; Naqa, Issam El

    2017-12-01

    To investigate deep reinforcement learning (DRL) based on historical treatment plans for developing automated radiation adaptation protocols for nonsmall cell lung cancer (NSCLC) patients that aim to maximize tumor local control at reduced rates of radiation pneumonitis grade 2 (RP2). In a retrospective population of 114 NSCLC patients who received radiotherapy, a three-component neural networks framework was developed for deep reinforcement learning (DRL) of dose fractionation adaptation. Large-scale patient characteristics included clinical, genetic, and imaging radiomics features in addition to tumor and lung dosimetric variables. First, a generative adversarial network (GAN) was employed to learn patient population characteristics necessary for DRL training from a relatively limited sample size. Second, a radiotherapy artificial environment (RAE) was reconstructed by a deep neural network (DNN) utilizing both original and synthetic data (by GAN) to estimate the transition probabilities for adaptation of personalized radiotherapy patients' treatment courses. Third, a deep Q-network (DQN) was applied to the RAE for choosing the optimal dose in a response-adapted treatment setting. This multicomponent reinforcement learning approach was benchmarked against real clinical decisions that were applied in an adaptive dose escalation clinical protocol. In which, 34 patients were treated based on avid PET signal in the tumor and constrained by a 17.2% normal tissue complication probability (NTCP) limit for RP2. The uncomplicated cure probability (P+) was used as a baseline reward function in the DRL. Taking our adaptive dose escalation protocol as a blueprint for the proposed DRL (GAN + RAE + DQN) architecture, we obtained an automated dose adaptation estimate for use at ∼2/3 of the way into the radiotherapy treatment course. By letting the DQN component freely control the estimated adaptive dose per fraction (ranging from 1-5 Gy), the DRL automatically favored dose

  16. Functional Contour-following via Haptic Perception and Reinforcement Learning.

    Science.gov (United States)

    Hellman, Randall B; Tekin, Cem; van der Schaar, Mihaela; Santos, Veronica J

    2018-01-01

    Many tasks involve the fine manipulation of objects despite limited visual feedback. In such scenarios, tactile and proprioceptive feedback can be leveraged for task completion. We present an approach for real-time haptic perception and decision-making for a haptics-driven, functional contour-following task: the closure of a ziplock bag. This task is challenging for robots because the bag is deformable, transparent, and visually occluded by artificial fingertip sensors that are also compliant. A deep neural net classifier was trained to estimate the state of a zipper within a robot's pinch grasp. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards by balancing exploration versus exploitation of the state-action space. The C-MAB learner outperformed a benchmark Q-learner by more efficiently exploring the state-action space while learning a hard-to-code task. The learned C-MAB policy was tested with novel ziplock bag scenarios and contours (wire, rope). Importantly, this work contributes to the development of reinforcement learning approaches that account for limited resources such as hardware life and researcher time. As robots are used to perform complex, physically interactive tasks in unstructured or unmodeled environments, it becomes important to develop methods that enable efficient and effective learning with physical testbeds.

  17. Off-policy reinforcement learning for H∞ control design.

    Science.gov (United States)

    Luo, Biao; Wu, Huai-Ning; Huang, Tingwen

    2015-01-01

    The H∞ control design problem is considered for nonlinear systems with unknown internal system model. It is known that the nonlinear H∞ control problem can be transformed into solving the so-called Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, model-based approaches cannot be used for approximately solving HJI equation, when the accurate system model is unavailable or costly to obtain in practice. To overcome these difficulties, an off-policy reinforcement leaning (RL) method is introduced to learn the solution of HJI equation from real system data instead of mathematical system model, and its convergence is proved. In the off-policy RL method, the system data can be generated with arbitrary policies rather than the evaluating policy, which is extremely important and promising for practical systems. For implementation purpose, a neural network (NN)-based actor-critic structure is employed and a least-square NN weight update algorithm is derived based on the method of weighted residuals. Finally, the developed NN-based off-policy RL method is tested on a linear F16 aircraft plant, and further applied to a rotational/translational actuator system.

  18. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning.

    Science.gov (United States)

    Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S

    2011-01-01

    As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. © 2011 IEEE

  19. Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems.

    Science.gov (United States)

    Liu, Yan-Jun; Tang, Li; Tong, Shaocheng; Chen, C L Philip; Li, Dong-Juan

    2015-01-01

    Based on the neural network (NN) approximator, an online reinforcement learning algorithm is proposed for a class of affine multiple input and multiple output (MIMO) nonlinear discrete-time systems with unknown functions and disturbances. In the design procedure, two networks are provided where one is an action network to generate an optimal control signal and the other is a critic network to approximate the cost function. An optimal control signal and adaptation laws can be generated based on two NNs. In the previous approaches, the weights of critic and action networks are updated based on the gradient descent rule and the estimations of optimal weight vectors are directly adjusted in the design. Consequently, compared with the existing results, the main contributions of this paper are: 1) only two parameters are needed to be adjusted, and thus the number of the adaptation laws is smaller than the previous results and 2) the updating parameters do not depend on the number of the subsystems for MIMO systems and the tuning rules are replaced by adjusting the norms on optimal weight vectors in both action and critic networks. It is proven that the tracking errors, the adaptation laws, and the control inputs are uniformly bounded using Lyapunov analysis method. The simulation examples are employed to illustrate the effectiveness of the proposed algorithm.

  20. From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning.

    Science.gov (United States)

    Elfwing, Stefan; Uchibe, Eiji; Doya, Kenji

    2016-12-01

    Free-energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state and action spaces. However, the FERL method does only really work well with binary, or close to binary, state input, where the number of active states is fewer than the number of non-active states. In the FERL method, the value function is approximated by the negative free energy of a restricted Boltzmann machine (RBM). In our earlier study, we demonstrated that the performance and the robustness of the FERL method can be improved by scaling the free energy by a constant that is related to the size of network. In this study, we propose that RBM function approximation can be further improved by approximating the value function by the negative expected energy (EERL), instead of the negative free energy, as well as being able to handle continuous state input. We validate our proposed method by demonstrating that EERL: (1) outperforms FERL, as well as standard neural network and linear function approximation, for three versions of a gridworld task with high-dimensional image state input; (2) achieves new state-of-the-art results in stochastic SZ-Tetris in both model-free and model-based learning settings; and (3) significantly outperforms FERL and standard neural network function approximation for a robot navigation task with raw and noisy RGB images as state input and a large number of actions. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  1. Multiobjective Reinforcement Learning for Traffic Signal Control Using Vehicular Ad Hoc Network

    Directory of Open Access Journals (Sweden)

    Houli Duan

    2010-01-01

    Full Text Available We propose a new multiobjective control algorithm based on reinforcement learning for urban traffic signal control, named multi-RL. A multiagent structure is used to describe the traffic system. A vehicular ad hoc network is used for the data exchange among agents. A reinforcement learning algorithm is applied to predict the overall value of the optimization objective given vehicles' states. The policy which minimizes the cumulative value of the optimization objective is regarded as the optimal one. In order to make the method adaptive to various traffic conditions, we also introduce a multiobjective control scheme in which the optimization objective is selected adaptively to real-time traffic states. The optimization objectives include the vehicle stops, the average waiting time, and the maximum queue length of the next intersection. In addition, we also accommodate a priority control to the buses and the emergency vehicles through our model. The simulation results indicated that our algorithm could perform more efficiently than traditional traffic light control methods.

  2. Deep Reinforcement Learning: An Overview

    OpenAIRE

    Li, Yuxi

    2017-01-01

    We give an overview of recent exciting achievements of deep reinforcement learning (RL). We discuss six core elements, six important mechanisms, and twelve applications. We start with background of machine learning, deep learning and reinforcement learning. Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. After that, we discuss important mechanisms for RL, including attention and memory, unsuperv...

  3. Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments

    OpenAIRE

    Kidziński, Łukasz; Mohanty, Sharada Prasanna; Ong, Carmichael; Huang, Zhewei; Zhou, Shuchang; Pechenko, Anton; Stelmaszczyk, Adam; Jarosik, Piotr; Pavlov, Mikhail; Kolesnikov, Sergey; Plis, Sergey; Chen, Zhibo; Zhang, Zhizheng; Chen, Jiale; Shi, Jun

    2018-01-01

    In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar ...

  4. A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control

    Science.gov (United States)

    Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu

    This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.

  5. Neural Basis of Reinforcement Learning and Decision Making

    Science.gov (United States)

    Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan

    2012-01-01

    Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543

  6. Instance-based Policy Learning by Real-coded Genetic Algorithms and Its Application to Control of Nonholonomic Systems

    Science.gov (United States)

    Miyamae, Atsushi; Sakuma, Jun; Ono, Isao; Kobayashi, Shigenobu

    The stabilization control of nonholonomic systems have been extensively studied because it is essential for nonholonomic robot control problems. The difficulty in this problem is that the theoretical derivation of control policy is not necessarily guaranteed achievable. In this paper, we present a reinforcement learning (RL) method with instance-based policy (IBP) representation, in which control policies for this class are optimized with respect to user-defined cost functions. Direct policy search (DPS) is an approach for RL; the policy is represented by parametric models and the model parameters are directly searched by optimization techniques including genetic algorithms (GAs). In IBP representation an instance consists of a state and an action pair; a policy consists of a set of instances. Several DPSs with IBP have been previously proposed. In these methods, sometimes fail to obtain optimal control policies when state-action variables are continuous. In this paper, we present a real-coded GA for DPSs with IBP. Our method is specifically designed for continuous domains. Optimization of IBP has three difficulties; high-dimensionality, epistasis, and multi-modality. Our solution is designed for overcoming these difficulties. The policy search with IBP representation appears to be high-dimensional optimization; however, instances which can improve the fitness are often limited to active instances (instances used for the evaluation). In fact, the number of active instances is small. Therefore, we treat the search problem as a low dimensional problem by restricting search variables only to active instances. It has been commonly known that functions with epistasis can be efficiently optimized with crossovers which satisfy the inheritance of statistics. For efficient search of IBP, we propose extended crossover-like mutation (extended XLM) which generates a new instance around an instance with satisfying the inheritance of statistics. For overcoming multi-modality, we

  7. A fuzzy controller with a robust learning function

    International Nuclear Information System (INIS)

    Tanji, Jun-ichi; Kinoshita, Mitsuo

    1987-01-01

    A self-organizing fuzzy controller is able to use linguistic decision rules of control strategy and has a strong adaptive property by virture of its rule learning function. While a simple linguistic description of the learning algorithm first introduced by Procyk, et al. has much flexibility for applications to a wide range of different processes, its detailed formulation, in particular with control stability and learning process convergence, is not clear. In this paper, we describe the formulation of an analytical basis for a self-organizing fuzzy controller by using a method of model reference adaptive control systems (MRACS) for which stability in the adaptive loop is theoretically proven. A detailed formulation is described regarding performance evaluation and rule modification in the rule learning process of the controller. Furthermore, an improved learning algorithm using adaptive rule is proposed. An adaptive rule gives a modification coefficient for a rule change estimating the effect of disturbance occurrence in performance evaluation. The effect of introducing an adaptive rule to improve the learning convergency is described by using a simple iterative formulation. Simulation tests are presented for an application of the proposed self-organizing fuzzy controller to the pressure control system in a Boiling Water Reactor (BWR) plant. Results with the tests confirm the improved learning algorithm has strong convergent properties, even in a very disturbed environment. (author)

  8. GA-based fuzzy reinforcement learning for control of a magnetic bearing system.

    Science.gov (United States)

    Lin, C T; Jou, C P

    2000-01-01

    This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.

  9. Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions.

    Science.gov (United States)

    Tamosiunaite, Minija; Asfour, Tamim; Wörgötter, Florentin

    2009-03-01

    Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult.

  10. A Plant Control Technology Using Reinforcement Learning Method with Automatic Reward Adjustment

    Science.gov (United States)

    Eguchi, Toru; Sekiai, Takaaki; Yamada, Akihiro; Shimizu, Satoru; Fukai, Masayuki

    A control technology using Reinforcement Learning (RL) and Radial Basis Function (RBF) Network has been developed to reduce environmental load substances exhausted from power and industrial plants. This technology consists of the statistic model using RBF Network, which estimates characteristics of plants with respect to environmental load substances, and RL agent, which learns the control logic for the plants using the statistic model. In this technology, it is necessary to design an appropriate reward function given to the agent immediately according to operation conditions and control goals to control plants flexibly. Therefore, we propose an automatic reward adjusting method of RL for plant control. This method adjusts the reward function automatically using information of the statistic model obtained in its learning process. In the simulations, it is confirmed that the proposed method can adjust the reward function adaptively for several test functions, and executes robust control toward the thermal power plant considering the change of operation conditions and control goals.

  11. Optimal control in microgrid using multi-agent reinforcement learning.

    Science.gov (United States)

    Li, Fu-Dong; Wu, Min; He, Yong; Chen, Xin

    2012-11-01

    This paper presents an improved reinforcement learning method to minimize electricity costs on the premise of satisfying the power balance and generation limit of units in a microgrid with grid-connected mode. Firstly, the microgrid control requirements are analyzed and the objective function of optimal control for microgrid is proposed. Then, a state variable "Average Electricity Price Trend" which is used to express the most possible transitions of the system is developed so as to reduce the complexity and randomicity of the microgrid, and a multi-agent architecture including agents, state variables, action variables and reward function is formulated. Furthermore, dynamic hierarchical reinforcement learning, based on change rate of key state variable, is established to carry out optimal policy exploration. The analysis shows that the proposed method is beneficial to handle the problem of "curse of dimensionality" and speed up learning in the unknown large-scale world. Finally, the simulation results under JADE (Java Agent Development Framework) demonstrate the validity of the presented method in optimal control for a microgrid with grid-connected mode. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  12. Q-learning-based adjustable fixed-phase quantum Grover search algorithm

    International Nuclear Information System (INIS)

    Guo Ying; Shi Wensha; Wang Yijun; Hu, Jiankun

    2017-01-01

    We demonstrate that the rotation phase can be suitably chosen to increase the efficiency of the phase-based quantum search algorithm, leading to a dynamic balance between iterations and success probabilities of the fixed-phase quantum Grover search algorithm with Q-learning for a given number of solutions. In this search algorithm, the proposed Q-learning algorithm, which is a model-free reinforcement learning strategy in essence, is used for performing a matching algorithm based on the fraction of marked items λ and the rotation phase α. After establishing the policy function α = π(λ), we complete the fixed-phase Grover algorithm, where the phase parameter is selected via the learned policy. Simulation results show that the Q-learning-based Grover search algorithm (QLGA) enables fewer iterations and gives birth to higher success probabilities. Compared with the conventional Grover algorithms, it avoids the optimal local situations, thereby enabling success probabilities to approach one. (author)

  13. Use of frontal lobe hemodynamics as reinforcement signals to an adaptive controller.

    Directory of Open Access Journals (Sweden)

    Marcello M DiStasio

    Full Text Available Decision-making ability in the frontal lobe (among other brain structures relies on the assignment of value to states of the animal and its environment. Then higher valued states can be pursued and lower (or negative valued states avoided. The same principle forms the basis for computational reinforcement learning controllers, which have been fruitfully applied both as models of value estimation in the brain, and as artificial controllers in their own right. This work shows how state desirability signals decoded from frontal lobe hemodynamics, as measured with near-infrared spectroscopy (NIRS, can be applied as reinforcers to an adaptable artificial learning agent in order to guide its acquisition of skills. A set of experiments carried out on an alert macaque demonstrate that both oxy- and deoxyhemoglobin concentrations in the frontal lobe show differences in response to both primarily and secondarily desirable (versus undesirable stimuli. This difference allows a NIRS signal classifier to serve successfully as a reinforcer for an adaptive controller performing a virtual tool-retrieval task. The agent's adaptability allows its performance to exceed the limits of the NIRS classifier decoding accuracy. We also show that decoding state desirabilities is more accurate when using relative concentrations of both oxyhemoglobin and deoxyhemoglobin, rather than either species alone.

  14. Reinforcement learning for optimal control of low exergy buildings

    International Nuclear Information System (INIS)

    Yang, Lei; Nagy, Zoltan; Goffin, Philippe; Schlueter, Arno

    2015-01-01

    Highlights: • Implementation of reinforcement learning control for LowEx Building systems. • Learning allows adaptation to local environment without prior knowledge. • Presentation of reinforcement learning control for real-life applications. • Discussion of the applicability for real-life situations. - Abstract: Over a third of the anthropogenic greenhouse gas (GHG) emissions stem from cooling and heating buildings, due to their fossil fuel based operation. Low exergy building systems are a promising approach to reduce energy consumption as well as GHG emissions. They consists of renewable energy technologies, such as PV, PV/T and heat pumps. Since careful tuning of parameters is required, a manual setup may result in sub-optimal operation. A model predictive control approach is unnecessarily complex due to the required model identification. Therefore, in this work we present a reinforcement learning control (RLC) approach. The studied building consists of a PV/T array for solar heat and electricity generation, as well as geothermal heat pumps. We present RLC for the PV/T array, and the full building model. Two methods, Tabular Q-learning and Batch Q-learning with Memory Replay, are implemented with real building settings and actual weather conditions in a Matlab/Simulink framework. The performance is evaluated against standard rule-based control (RBC). We investigated different neural network structures and find that some outperformed RBC already during the learning phase. Overall, every RLC strategy for PV/T outperformed RBC by over 10% after the third year. Likewise, for the full building, RLC outperforms RBC in terms of meeting the heating demand, maintaining the optimal operation temperature and compensating more effectively for ground heat. This allows to reduce engineering costs associated with the setup of these systems, as well as decrease the return-of-invest period, both of which are necessary to create a sustainable, zero-emission building

  15. Two-Phase Iteration for Value Function Approximation and Hyperparameter Optimization in Gaussian-Kernel-Based Adaptive Critic Design

    Directory of Open Access Journals (Sweden)

    Xin Chen

    2015-01-01

    Full Text Available Adaptive Dynamic Programming (ADP with critic-actor architecture is an effective way to perform online learning control. To avoid the subjectivity in the design of a neural network that serves as a critic network, kernel-based adaptive critic design (ACD was developed recently. There are two essential issues for a static kernel-based model: how to determine proper hyperparameters in advance and how to select right samples to describe the value function. They all rely on the assessment of sample values. Based on the theoretical analysis, this paper presents a two-phase simultaneous learning method for a Gaussian-kernel-based critic network. It is able to estimate the values of samples without infinitively revisiting them. And the hyperparameters of the kernel model are optimized simultaneously. Based on the estimated sample values, the sample set can be refined by adding alternatives or deleting redundances. Combining this critic design with actor network, we present a Gaussian-kernel-based Adaptive Dynamic Programming (GK-ADP approach. Simulations are used to verify its feasibility, particularly the necessity of two-phase learning, the convergence characteristics, and the improvement of the system performance by using a varying sample set.

  16. Barrier Function-Based Neural Adaptive Control With Locally Weighted Learning and Finite Neuron Self-Growing Strategy.

    Science.gov (United States)

    Jia, Zi-Jun; Song, Yong-Duan

    2017-06-01

    This paper presents a new approach to construct neural adaptive control for uncertain nonaffine systems. By integrating locally weighted learning with barrier Lyapunov function (BLF), a novel control design method is presented to systematically address the two critical issues in neural network (NN) control field: one is how to fulfill the compact set precondition for NN approximation, and the other is how to use varying rather than a fixed NN structure to improve the functionality of NN control. A BLF is exploited to ensure the NN inputs to remain bounded during the entire system operation. To account for system nonlinearities, a neuron self-growing strategy is proposed to guide the process for adding new neurons to the system, resulting in a self-adjustable NN structure for better learning capabilities. It is shown that the number of neurons needed to accomplish the control task is finite, and better performance can be obtained with less number of neurons as compared with traditional methods. The salient feature of the proposed method also lies in the continuity of the control action everywhere. Furthermore, the resulting control action is smooth almost everywhere except for a few time instants at which new neurons are added. Numerical example illustrates the effectiveness of the proposed approach.

  17. Reinforcement learning or active inference?

    Science.gov (United States)

    Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J

    2009-07-29

    This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  18. Reinforcement learning or active inference?

    Directory of Open Access Journals (Sweden)

    Karl J Friston

    2009-07-01

    Full Text Available This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  19. Dynamic Sensor Tasking for Space Situational Awareness via Reinforcement Learning

    Science.gov (United States)

    Linares, R.; Furfaro, R.

    2016-09-01

    This paper studies the Sensor Management (SM) problem for optical Space Object (SO) tracking. The tasking problem is formulated as a Markov Decision Process (MDP) and solved using Reinforcement Learning (RL). The RL problem is solved using the actor-critic policy gradient approach. The actor provides a policy which is random over actions and given by a parametric probability density function (pdf). The critic evaluates the policy by calculating the estimated total reward or the value function for the problem. The parameters of the policy action pdf are optimized using gradients with respect to the reward function. Both the critic and the actor are modeled using deep neural networks (multi-layer neural networks). The policy neural network takes the current state as input and outputs probabilities for each possible action. This policy is random, and can be evaluated by sampling random actions using the probabilities determined by the policy neural network's outputs. The critic approximates the total reward using a neural network. The estimated total reward is used to approximate the gradient of the policy network with respect to the network parameters. This approach is used to find the non-myopic optimal policy for tasking optical sensors to estimate SO orbits. The reward function is based on reducing the uncertainty for the overall catalog to below a user specified uncertainty threshold. This work uses a 30 km total position error for the uncertainty threshold. This work provides the RL method with a negative reward as long as any SO has a total position error above the uncertainty threshold. This penalizes policies that take longer to achieve the desired accuracy. A positive reward is provided when all SOs are below the catalog uncertainty threshold. An optimal policy is sought that takes actions to achieve the desired catalog uncertainty in minimum time. This work trains the policy in simulation by letting it task a single sensor to "learn" from its performance

  20. Multi-agent machine learning a reinforcement approach

    CERN Document Server

    Schwartz, H M

    2014-01-01

    The book begins with a chapter on traditional methods of supervised learning, covering recursive least squares learning, mean square error methods, and stochastic approximation. Chapter 2 covers single agent reinforcement learning. Topics include learning value functions, Markov games, and TD learning with eligibility traces. Chapter 3 discusses two player games including two player matrix games with both pure and mixed strategies. Numerous algorithms and examples are presented. Chapter 4 covers learning in multi-player games, stochastic games, and Markov games, focusing on learning multi-pla

  1. A Reinforcement Learning Approach to Call Admission Control in HAPS Communication System

    Directory of Open Access Journals (Sweden)

    Ni Shu Yan

    2017-01-01

    Full Text Available The large changing of link capacity and number of users caused by the movement of both platform and users in communication system based on high altitude platform station (HAPS will resulting in high dropping rate of handover and reduce resource utilization. In order to solve these problems, this paper proposes an adaptive call admission control strategy based on reinforcement learning approach. The goal of this strategy is to maximize long-term gains of system, with the introduction of cross-layer interaction and the service downgraded. In order to access different traffics adaptively, the access utility of handover traffics and new call traffics is designed in different state of communication system. Numerical simulation result shows that the proposed call admission control strategy can enhance bandwidth resource utilization and the performances of handover traffics.

  2. Adaptive Constrained Optimal Control Design for Data-Based Nonlinear Discrete-Time Systems With Critic-Only Structure.

    Science.gov (United States)

    Luo, Biao; Liu, Derong; Wu, Huai-Ning

    2018-06-01

    Reinforcement learning has proved to be a powerful tool to solve optimal control problems over the past few years. However, the data-based constrained optimal control problem of nonaffine nonlinear discrete-time systems has rarely been studied yet. To solve this problem, an adaptive optimal control approach is developed by using the value iteration-based Q-learning (VIQL) with the critic-only structure. Most of the existing constrained control methods require the use of a certain performance index and only suit for linear or affine nonlinear systems, which is unreasonable in practice. To overcome this problem, the system transformation is first introduced with the general performance index. Then, the constrained optimal control problem is converted to an unconstrained optimal control problem. By introducing the action-state value function, i.e., Q-function, the VIQL algorithm is proposed to learn the optimal Q-function of the data-based unconstrained optimal control problem. The convergence results of the VIQL algorithm are established with an easy-to-realize initial condition . To implement the VIQL algorithm, the critic-only structure is developed, where only one neural network is required to approximate the Q-function. The converged Q-function obtained from the critic-only VIQL method is employed to design the adaptive constrained optimal controller based on the gradient descent scheme. Finally, the effectiveness of the developed adaptive control method is tested on three examples with computer simulation.

  3. Reinforcement Learning Based Novel Adaptive Learning Framework for Smart Grid Prediction

    Directory of Open Access Journals (Sweden)

    Tian Li

    2017-01-01

    Full Text Available Smart grid is a potential infrastructure to supply electricity demand for end users in a safe and reliable manner. With the rapid increase of the share of renewable energy and controllable loads in smart grid, the operation uncertainty of smart grid has increased briskly during recent years. The forecast is responsible for the safety and economic operation of the smart grid. However, most existing forecast methods cannot account for the smart grid due to the disabilities to adapt to the varying operational conditions. In this paper, reinforcement learning is firstly exploited to develop an online learning framework for the smart grid. With the capability of multitime scale resolution, wavelet neural network has been adopted in the online learning framework to yield reinforcement learning and wavelet neural network (RLWNN based adaptive learning scheme. The simulations on two typical prediction problems in smart grid, including wind power prediction and load forecast, validate the effectiveness and the scalability of the proposed RLWNN based learning framework and algorithm.

  4. Reinforcement learning for dpm of embedded visual sensor nodes

    International Nuclear Information System (INIS)

    Khani, U.; Sadhayo, I. H.

    2014-01-01

    This paper proposes a RL (Reinforcement Learning) based DPM (Dynamic Power Management) technique to learn time out policies during a visual sensor node's operation which has multiple power/performance states. As opposed to the widely used static time out policies, our proposed DPM policy which is also referred to as OLTP (Online Learning of Time out Policies), learns to dynamically change the time out decisions in the different node states including the non-operational states. The selection of time out values in different power/performance states of a visual sensing platform is based on the workload estimates derived from a ML-ANN (Multi-Layer Artificial Neural Network) and an objective function given by weighted performance and power parameters. The DPM approach is also able to dynamically adjust the power-performance weights online to satisfy a given constraint of either power consumption or performance. Results show that the proposed learning algorithm explores the power-performance tradeoff with non-stationary workload and outperforms other DPM policies. It also performs the online adjustment of the tradeoff parameters in order to meet a user-specified constraint. (author)

  5. Probabilistic dual heuristic programming-based adaptive critic

    Science.gov (United States)

    Herzallah, Randa

    2010-02-01

    Adaptive critic (AC) methods have common roots as generalisations of dynamic programming for neural reinforcement learning approaches. Since they approximate the dynamic programming solutions, they are potentially suitable for learning in noisy, non-linear and non-stationary environments. In this study, a novel probabilistic dual heuristic programming (DHP)-based AC controller is proposed. Distinct to current approaches, the proposed probabilistic (DHP) AC method takes uncertainties of forward model and inverse controller into consideration. Therefore, it is suitable for deterministic and stochastic control problems characterised by functional uncertainty. Theoretical development of the proposed method is validated by analytically evaluating the correct value of the cost function which satisfies the Bellman equation in a linear quadratic control problem. The target value of the probabilistic critic network is then calculated and shown to be equal to the analytically derived correct value. Full derivation of the Riccati solution for this non-standard stochastic linear quadratic control problem is also provided. Moreover, the performance of the proposed probabilistic controller is demonstrated on linear and non-linear control examples.

  6. Visual reinforcement shapes eye movements in visual search.

    Science.gov (United States)

    Paeye, Céline; Schütz, Alexander C; Gegenfurtner, Karl R

    2016-08-01

    We use eye movements to gain information about our visual environment; this information can indirectly be used to affect the environment. Whereas eye movements are affected by explicit rewards such as points or money, it is not clear whether the information gained by finding a hidden target has a similar reward value. Here we tested whether finding a visual target can reinforce eye movements in visual search performed in a noise background, which conforms to natural scene statistics and contains a large number of possible target locations. First we tested whether presenting the target more often in one specific quadrant would modify eye movement search behavior. Surprisingly, participants did not learn to search for the target more often in high probability areas. Presumably, participants could not learn the reward structure of the environment. In two subsequent experiments we used a gaze-contingent display to gain full control over the reinforcement schedule. The target was presented more often after saccades into a specific quadrant or a specific direction. The proportions of saccades meeting the reinforcement criteria increased considerably, and participants matched their search behavior to the relative reinforcement rates of targets. Reinforcement learning seems to serve as the mechanism to optimize search behavior with respect to the statistics of the task.

  7. Ensemble Network Architecture for Deep Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Xi-liang Chen

    2018-01-01

    Full Text Available The popular deep Q learning algorithm is known to be instability because of the Q-value’s shake and overestimation action values under certain conditions. These issues tend to adversely affect their performance. In this paper, we develop the ensemble network architecture for deep reinforcement learning which is based on value function approximation. The temporal ensemble stabilizes the training process by reducing the variance of target approximation error and the ensemble of target values reduces the overestimate and makes better performance by estimating more accurate Q-value. Our results show that this architecture leads to statistically significant better value evaluation and more stable and better performance on several classical control tasks at OpenAI Gym environment.

  8. Human-level control through deep reinforcement learning

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-01

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  9. Human-level control through deep reinforcement learning.

    Science.gov (United States)

    Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

    2015-02-26

    The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

  10. Optimal medication dosing from suboptimal clinical examples: a deep reinforcement learning approach.

    Science.gov (United States)

    Nemati, Shamim; Ghassemi, Mohammad M; Clifford, Gari D

    2016-08-01

    Misdosing medications with sensitive therapeutic windows, such as heparin, can place patients at unnecessary risk, increase length of hospital stay, and lead to wasted hospital resources. In this work, we present a clinician-in-the-loop sequential decision making framework, which provides an individualized dosing policy adapted to each patient's evolving clinical phenotype. We employed retrospective data from the publicly available MIMIC II intensive care unit database, and developed a deep reinforcement learning algorithm that learns an optimal heparin dosing policy from sample dosing trails and their associated outcomes in large electronic medical records. Using separate training and testing datasets, our model was observed to be effective in proposing heparin doses that resulted in better expected outcomes than the clinical guidelines. Our results demonstrate that a sequential modeling approach, learned from retrospective data, could potentially be used at the bedside to derive individualized patient dosing policies.

  11. Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning.

    Science.gov (United States)

    Yang, Xiong; Liu, Derong; Wang, Ding; Wei, Qinglai

    2014-07-01

    In this paper, a reinforcement-learning-based direct adaptive control is developed to deliver a desired tracking performance for a class of discrete-time (DT) nonlinear systems with unknown bounded disturbances. We investigate multi-input-multi-output unknown nonaffine nonlinear DT systems and employ two neural networks (NNs). By using Implicit Function Theorem, an action NN is used to generate the control signal and it is also designed to cancel the nonlinearity of unknown DT systems, for purpose of utilizing feedback linearization methods. On the other hand, a critic NN is applied to estimate the cost function, which satisfies the recursive equations derived from heuristic dynamic programming. The weights of both the action NN and the critic NN are directly updated online instead of offline training. By utilizing Lyapunov's direct method, the closed-loop tracking errors and the NN estimated weights are demonstrated to be uniformly ultimately bounded. Two numerical examples are provided to show the effectiveness of the present approach. Copyright © 2014 Elsevier Ltd. All rights reserved.

  12. Structure identification in fuzzy inference using reinforcement learning

    Science.gov (United States)

    Berenji, Hamid R.; Khedkar, Pratap

    1993-01-01

    In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.

  13. Two-Phase Iteration for Value Function Approximation and Hyperparameter Optimization in Gaussian-Kernel-Based Adaptive Critic Design

    OpenAIRE

    Chen, Xin; Xie, Penghuan; Xiong, Yonghua; He, Yong; Wu, Min

    2015-01-01

    Adaptive Dynamic Programming (ADP) with critic-actor architecture is an effective way to perform online learning control. To avoid the subjectivity in the design of a neural network that serves as a critic network, kernel-based adaptive critic design (ACD) was developed recently. There are two essential issues for a static kernel-based model: how to determine proper hyperparameters in advance and how to select right samples to describe the value function. They all rely on the assessment of sa...

  14. Learning-Based Adaptive Optimal Tracking Control of Strict-Feedback Nonlinear Systems.

    Science.gov (United States)

    Gao, Weinan; Jiang, Zhong-Ping; Weinan Gao; Zhong-Ping Jiang; Gao, Weinan; Jiang, Zhong-Ping

    2018-06-01

    This paper proposes a novel data-driven control approach to address the problem of adaptive optimal tracking for a class of nonlinear systems taking the strict-feedback form. Adaptive dynamic programming (ADP) and nonlinear output regulation theories are integrated for the first time to compute an adaptive near-optimal tracker without any a priori knowledge of the system dynamics. Fundamentally different from adaptive optimal stabilization problems, the solution to a Hamilton-Jacobi-Bellman (HJB) equation, not necessarily a positive definite function, cannot be approximated through the existing iterative methods. This paper proposes a novel policy iteration technique for solving positive semidefinite HJB equations with rigorous convergence analysis. A two-phase data-driven learning method is developed and implemented online by ADP. The efficacy of the proposed adaptive optimal tracking control methodology is demonstrated via a Van der Pol oscillator with time-varying exogenous signals.

  15. A novel approach to locomotion learning: Actor-Critic architecture using central pattern generators and dynamic motor primitives.

    Science.gov (United States)

    Li, Cai; Lowe, Robert; Ziemke, Tom

    2014-01-01

    In this article, we propose an architecture of a bio-inspired controller that addresses the problem of learning different locomotion gaits for different robot morphologies. The modeling objective is split into two: baseline motion modeling and dynamics adaptation. Baseline motion modeling aims to achieve fundamental functions of a certain type of locomotion and dynamics adaptation provides a "reshaping" function for adapting the baseline motion to desired motion. Based on this assumption, a three-layer architecture is developed using central pattern generators (CPGs, a bio-inspired locomotor center for the baseline motion) and dynamic motor primitives (DMPs, a model with universal "reshaping" functions). In this article, we use this architecture with the actor-critic algorithms for finding a good "reshaping" function. In order to demonstrate the learning power of the actor-critic based architecture, we tested it on two experiments: (1) learning to crawl on a humanoid and, (2) learning to gallop on a puppy robot. Two types of actor-critic algorithms (policy search and policy gradient) are compared in order to evaluate the advantages and disadvantages of different actor-critic based learning algorithms for different morphologies. Finally, based on the analysis of the experimental results, a generic view/architecture for locomotion learning is discussed in the conclusion.

  16. A Novel Approach to Locomotion Learning: Actor-Critic Architecture using Central Pattern Generators and Dynamic Motor Primitives

    Directory of Open Access Journals (Sweden)

    Cai eLi

    2014-10-01

    Full Text Available In this article, we propose an architecture of a bio-inspired controller that addresses the problem of learning different locomotion gaits for different robot morphologies. The modelling objective is split into two: baseline motion modelling and dynamics adaptation. Baseline motion modelling aims to achieve fundamental functions of a certain type of locomotion and dynamics adaptation provides a ``reshaping function for adapting the baseline motion to desired motion. Based on this assumption, a three-layer architecture is developed using central pattern generators (CPGs, a bio-inspired locomotor center for the the baseline motion and dynamic motor primitives (DMPs, a model with universal ``reshaping functions. In this article, we use this architecture with the actor-critic algorithms for finding a good ``reshaping function. In order to demonstrate the learning power of the actor-critic based architecture, we tested it on two experiments: 1 learning to crawl on a humanoid and, 2 learning to gallop on a puppy robot. Two types of actor-critic algorithms (policy search and policy gradient are compared in order to evaluate the advantages and disadvantages of different actor-critic based learning algorithms for different morphologies. Finally, based on the analysis of the experimental results, a generic view/architecture for locomotion learning is discussed in the conclusion.

  17. Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control.

    Science.gov (United States)

    Luo, Biao; Liu, Derong; Wu, Huai-Ning; Wang, Ding; Lewis, Frank L

    2017-10-01

    The model-free optimal control problem of general discrete-time nonlinear systems is considered in this paper, and a data-based policy gradient adaptive dynamic programming (PGADP) algorithm is developed to design an adaptive optimal controller method. By using offline and online data rather than the mathematical system model, the PGADP algorithm improves control policy with a gradient descent scheme. The convergence of the PGADP algorithm is proved by demonstrating that the constructed Q -function sequence converges to the optimal Q -function. Based on the PGADP algorithm, the adaptive control method is developed with an actor-critic structure and the method of weighted residuals. Its convergence properties are analyzed, where the approximate Q -function converges to its optimum. Computer simulation results demonstrate the effectiveness of the PGADP-based adaptive control method.

  18. Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints.

    Science.gov (United States)

    Liu, Derong; Yang, Xiong; Wang, Ding; Wei, Qinglai

    2015-07-01

    The design of stabilizing controller for uncertain nonlinear systems with control constraints is a challenging problem. The constrained-input coupled with the inability to identify accurately the uncertainties motivates the design of stabilizing controller based on reinforcement-learning (RL) methods. In this paper, a novel RL-based robust adaptive control algorithm is developed for a class of continuous-time uncertain nonlinear systems subject to input constraints. The robust control problem is converted to the constrained optimal control problem with appropriately selecting value functions for the nominal system. Distinct from typical action-critic dual networks employed in RL, only one critic neural network (NN) is constructed to derive the approximate optimal control. Meanwhile, unlike initial stabilizing control often indispensable in RL, there is no special requirement imposed on the initial control. By utilizing Lyapunov's direct method, the closed-loop optimal control system and the estimated weights of the critic NN are proved to be uniformly ultimately bounded. In addition, the derived approximate optimal control is verified to guarantee the uncertain nonlinear system to be stable in the sense of uniform ultimate boundedness. Two simulation examples are provided to illustrate the effectiveness and applicability of the present approach.

  19. DYNAMIC AND INCREMENTAL EXPLORATION STRATEGY IN FUSION ADAPTIVE RESONANCE THEORY FOR ONLINE REINFORCEMENT LEARNING

    Directory of Open Access Journals (Sweden)

    Budhitama Subagdja

    2016-06-01

    Full Text Available One of the fundamental challenges in reinforcement learning is to setup a proper balance between exploration and exploitation to obtain the maximum cummulative reward in the long run. Most protocols for exploration bound the overall values to a convergent level of performance. If new knowledge is inserted or the environment is suddenly changed, the issue becomes more intricate as the exploration must compromise the pre-existing knowledge. This paper presents a type of multi-channel adaptive resonance theory (ART neural network model called fusion ART which serves as a fuzzy approximator for reinforcement learning with inherent features that can regulate the exploration strategy. This intrinsic regulation is driven by the condition of the knowledge learnt so far by the agent. The model offers a stable but incremental reinforcement learning that can involve prior rules as bootstrap knowledge for guiding the agent to select the right action. Experiments in obstacle avoidance and navigation tasks demonstrate that in the configuration of learning wherein the agent learns from scratch, the inherent exploration model in fusion ART model is comparable to the basic E-greedy policy. On the other hand, the model is demonstrated to deal with prior knowledge and strike a balance between exploration and exploitation.

  20. Continuous residual reinforcement learning for traffic signal control optimization

    NARCIS (Netherlands)

    Aslani, Mohammad; Seipel, Stefan; Wiering, Marco

    2018-01-01

    Traffic signal control can be naturally regarded as a reinforcement learning problem. Unfortunately, it is one of the most difficult classes of reinforcement learning problems owing to its large state space. A straightforward approach to address this challenge is to control traffic signals based on

  1. Manufacturing Scheduling Using Colored Petri Nets and Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Maria Drakaki

    2017-02-01

    Full Text Available Agent-based intelligent manufacturing control systems are capable to efficiently respond and adapt to environmental changes. Manufacturing system adaptation and evolution can be addressed with learning mechanisms that increase the intelligence of agents. In this paper a manufacturing scheduling method is presented based on Timed Colored Petri Nets (CTPNs and reinforcement learning (RL. CTPNs model the manufacturing system and implement the scheduling. In the search for an optimal solution a scheduling agent uses RL and in particular the Q-learning algorithm. A warehouse order-picking scheduling is presented as a case study to illustrate the method. The proposed scheduling method is compared to existing methods. Simulation and state space results are used to evaluate performance and identify system properties.

  2. Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms.

    Science.gov (United States)

    Zhang, Huaguang; Jiang, He; Luo, Chaomin; Xiao, Geyang

    2017-10-01

    In this paper, we investigate the nonzero-sum games for a class of discrete-time (DT) nonlinear systems by using a novel policy iteration (PI) adaptive dynamic programming (ADP) method. The main idea of our proposed PI scheme is to utilize the iterative ADP algorithm to obtain the iterative control policies, which not only ensure the system to achieve stability but also minimize the performance index function for each player. This paper integrates game theory, optimal control theory, and reinforcement learning technique to formulate and handle the DT nonzero-sum games for multiplayer. First, we design three actor-critic algorithms, an offline one and two online ones, for the PI scheme. Subsequently, neural networks are employed to implement these algorithms and the corresponding stability analysis is also provided via the Lyapunov theory. Finally, a numerical simulation example is presented to demonstrate the effectiveness of our proposed approach.

  3. Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators.

    Science.gov (United States)

    Yang, Qinmin; Jagannathan, Sarangapani

    2012-04-01

    In this paper, reinforcement learning state- and output-feedback-based adaptive critic controller designs are proposed by using the online approximators (OLAs) for a general multi-input and multioutput affine unknown nonlinear discretetime systems in the presence of bounded disturbances. The proposed controller design has two entities, an action network that is designed to produce optimal signal and a critic network that evaluates the performance of the action network. The critic estimates the cost-to-go function which is tuned online using recursive equations derived from heuristic dynamic programming. Here, neural networks (NNs) are used both for the action and critic whereas any OLAs, such as radial basis functions, splines, fuzzy logic, etc., can be utilized. For the output-feedback counterpart, an additional NN is designated as the observer to estimate the unavailable system states, and thus, separation principle is not required. The NN weight tuning laws for the controller schemes are also derived while ensuring uniform ultimate boundedness of the closed-loop system using Lyapunov theory. Finally, the effectiveness of the two controllers is tested in simulation on a pendulum balancing system and a two-link robotic arm system.

  4. Using Direct Policy Search to Identify Robust Strategies in Adapting to Uncertain Sea Level Rise and Storm Surge

    Science.gov (United States)

    Garner, G. G.; Keller, K.

    2017-12-01

    Sea-level rise poses considerable risks to coastal communities, ecosystems, and infrastructure. Decision makers are faced with deeply uncertain sea-level projections when designing a strategy for coastal adaptation. The traditional methods have provided tremendous insight into this decision problem, but are often silent on tradeoffs as well as the effects of tail-area events and of potential future learning. Here we reformulate a simple sea-level rise adaptation model to address these concerns. We show that Direct Policy Search yields improved solution quality, with respect to Pareto-dominance in the objectives, over the traditional approach under uncertain sea-level rise projections and storm surge. Additionally, the new formulation produces high quality solutions with less computational demands than the traditional approach. Our results illustrate the utility of multi-objective adaptive formulations for the example of coastal adaptation, the value of information provided by observations, and point to wider-ranging application in climate change adaptation decision problems.

  5. Reinforcement-learning-based output-feedback control of nonstrict nonlinear discrete-time systems with application to engine emission control.

    Science.gov (United States)

    Shih, Peter; Kaul, Brian C; Jagannathan, Sarangapani; Drallmeier, James A

    2009-10-01

    A novel reinforcement-learning-based output adaptive neural network (NN) controller, which is also referred to as the adaptive-critic NN controller, is developed to deliver the desired tracking performance for a class of nonlinear discrete-time systems expressed in nonstrict feedback form in the presence of bounded and unknown disturbances. The adaptive-critic NN controller consists of an observer, a critic, and two action NNs. The observer estimates the states and output, and the two action NNs provide virtual and actual control inputs to the nonlinear discrete-time system. The critic approximates a certain strategic utility function, and the action NNs minimize the strategic utility function and control inputs. All NN weights adapt online toward minimization of a performance index, utilizing the gradient-descent-based rule, in contrast with iteration-based adaptive-critic schemes. Lyapunov functions are used to show the stability of the closed-loop tracking error, weights, and observer estimates. Separation and certainty equivalence principles, persistency of excitation condition, and linearity in the unknown parameter assumption are not needed. Experimental results on a spark ignition (SI) engine operating lean at an equivalence ratio of 0.75 show a significant (25%) reduction in cyclic dispersion in heat release with control, while the average fuel input changes by less than 1% compared with the uncontrolled case. Consequently, oxides of nitrogen (NO(x)) drop by 30%, and unburned hydrocarbons drop by 16% with control. Overall, NO(x)'s are reduced by over 80% compared with stoichiometric levels.

  6. Concurrent Learning of Control in Multi agent Sequential Decision Tasks

    Science.gov (United States)

    2018-04-17

    Concurrent Learning of Control in Multi-agent Sequential Decision Tasks The overall objective of this project was to develop multi-agent reinforcement... learning (MARL) approaches for intelligent agents to autonomously learn distributed control policies in decentral- ized partially observable... learning of policies in Dec-POMDPs, established performance bounds, evaluated these algorithms both theoretically and empirically, The views

  7. Optimal and Scalable Caching for 5G Using Reinforcement Learning of Space-Time Popularities

    Science.gov (United States)

    Sadeghi, Alireza; Sheikholeslami, Fatemeh; Giannakis, Georgios B.

    2018-02-01

    Small basestations (SBs) equipped with caching units have potential to handle the unprecedented demand growth in heterogeneous networks. Through low-rate, backhaul connections with the backbone, SBs can prefetch popular files during off-peak traffic hours, and service them to the edge at peak periods. To intelligently prefetch, each SB must learn what and when to cache, while taking into account SB memory limitations, the massive number of available contents, the unknown popularity profiles, as well as the space-time popularity dynamics of user file requests. In this work, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown. Joint consideration of global and local popularity demands along with cache-refreshing costs allow for a simple, yet practical asynchronous caching approach. The novel RL-based caching relies on a Q-learning algorithm to implement the optimal policy in an online fashion, thus enabling the cache control unit at the SB to learn, track, and possibly adapt to the underlying dynamics. To endow the algorithm with scalability, a linear function approximation of the proposed Q-learning scheme is introduced, offering faster convergence as well as reduced complexity and memory requirements. Numerical tests corroborate the merits of the proposed approach in various realistic settings.

  8. Adaptive Functional-Based Neuro-Fuzzy-PID Incremental Controller Structure

    Directory of Open Access Journals (Sweden)

    Ashraf Ahmed Fahmy

    2014-03-01

    Full Text Available This paper presents an adaptive functional-based Neuro-fuzzy-PID incremental (NFPID controller structure that can be tuned either offline or online according to required controller performance. First, differential membership functions are used to represent the fuzzy membership functions of the input-output space of the three term controller. Second, controller rules are generated based on the discrete proportional, derivative, and integral function for the fuzzy space. Finally, a fully differentiable fuzzy neural network is constructed to represent the developed controller for either offline or online controller parameter adaptation.  Two different adaptation methods are used for controller tuning, offline method based on controller transient performance cost function optimization using Bees Algorithm, and online method based on tracking error minimization using back-propagation with momentum algorithm. The proposed control system was tested to show the validity of the controller structure over a fixed PID controller gains to control SCARA type robot arm.

  9. Analysis of the decision-support function of policy assessment in real-world policy making in the field of poverty and social inequalities. Case study on migrant integration policies in the Brussels-Capital Region

    International Nuclear Information System (INIS)

    Feyaerts, Gille; Deguerry, Murielle; Deboosere, Patrick; De Spiegelaere, Myriam

    2017-01-01

    Despite its high potential to support decision-making, the role of policy assessment in real-world policy making in the field of poverty and social inequalities remains largely questioned. In this study, we analyse policy assessment's role in a context of real-world policymaking, by means of a case study on a legislative proposal on integration policy for immigrant newcomers in the Brussels-Capital Region, for which we evaluate the potential effects on poverty and social inequalities. We first analyse the policy process surrounding the policy proposal – a process that is often treated as a black box within policy assessment research. Understanding the factors that influence and determine the decision-making process, enables us to gain insight into the potential decision-support function(s). Second, we develop an approach to policy assessment that aims to fully exploit its potential to contribute to the functions of both instrumental and conceptual learning. For this purpose, we propose to introduce the approach of realist evaluation and to focus on evaluating the underlying policy intervention theory from the perspective of poverty and social inequalities. Finally, we illustrate this new approach and its added value by applying it to the legislative proposal on integration policy and analyse its contribution to policy-oriented learning. - Highlights: •The field of policy assessment should draw on insights from policy studies. •We unpacked the policymaking black-box to identify the mechanisms of policy change. •The policy process is driven by an interaction of ideas, interests and institutions. •Policy assessment's potential lies in both instrumental and conceptual learning. •We propose to integrate realist evaluation's logic of inquiry within policy assessment.

  10. In search of an integrative measure of functioning.

    Science.gov (United States)

    Madden, Rosamond H; Glozier, Nick; Fortune, Nicola; Dyson, Maree; Gilroy, John; Bundy, Anita; Llewellyn, Gwynnyth; Salvador-Carulla, Luis; Lukersmith, Sue; Mpofu, Elias; Madden, Richard

    2015-05-26

    International trends towards people-centred, integrative care and support require any measurement of functioning and disability to meet multiple aims. The information requirements of two major Australian programs for disability and rehabilitation are outlined, and the findings of two searches for suitable measures of functioning and disability are analysed. Over 30 current measures of functioning were evaluated in each search. Neither search found a generic measure of functioning suitable for these multibillion dollar programs, relevant to a wide range of people with a variety of health conditions and functioning experiences, and capable of indicating support needs, associated costs, progress and outcomes. This unsuccessful outcome has implications internationally for policy-relevant information for disability, rehabilitation and related programs. The paper outlines the features of an Integrative Measure of Functioning (IMF) based on the concepts of functioning and environmental factors in the International Classification of Functioning, Disability and Health (ICF). An IMF would be applicable across a variety of health conditions, settings and purposes, ranging from individual assessment to public health. An IMF could deliver person-centred, policy-relevant information for a range of programs, promoting harmonised language and measurement and supporting international trends in human services and public health.

  11. Combining Correlation-Based and Reward-Based Learning in Neural Control for Policy Improvement

    DEFF Research Database (Denmark)

    Manoonpong, Poramate; Kolodziejski, Christoph; Wörgötter, Florentin

    2013-01-01

    Classical conditioning (conventionally modeled as correlation-based learning) and operant conditioning (conventionally modeled as reinforcement learning or reward-based learning) have been found in biological systems. Evidence shows that these two mechanisms strongly involve learning about...... associations. Based on these biological findings, we propose a new learning model to achieve successful control policies for artificial systems. This model combines correlation-based learning using input correlation learning (ICO learning) and reward-based learning using continuous actor–critic reinforcement...... learning (RL), thereby working as a dual learner system. The model performance is evaluated by simulations of a cart-pole system as a dynamic motion control problem and a mobile robot system as a goal-directed behavior control problem. Results show that the model can strongly improve pole balancing control...

  12. Instructional control of reinforcement learning: a behavioral and neurocomputational investigation.

    Science.gov (United States)

    Doll, Bradley B; Jacobs, W Jake; Sanfey, Alan G; Frank, Michael J

    2009-11-24

    Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.

  13. Mild-moderate TBI: clinical recommendations to optimize neurobehavioral functioning, learning, and adaptation.

    Science.gov (United States)

    Chen, Anthony J-W; Loya, Fred

    2014-11-01

    Traumatic brain injury (TBI) can result in functional deficits that persist long after acute injury. The authors present a case study of an individual who experienced some of the most common debilitating problems that characterize the chronic phase of mild-to-moderate TBI-difficulties with neurobehavioral functions that manifest via complaints of distractibility, poor memory, disorganization, poor frustration tolerance, and feeling easily overwhelmed. They present a rational strategy for management that addresses important domain-general targets likely to have far-ranging benefits. This integrated, longitudinal, and multifaceted approach first addresses approachable targets and provides an important foundation to enhance the success of other, more specific interventions requiring specialty intervention. The overall approach places an emphasis on accomplishing two major categories of clinical objectives: optimizing current functioning and enhancing learning and adaptation to support improvement of functioning in the long-term for individuals living with brain injury. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  14. Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints

    Science.gov (United States)

    Yang, Xiong; Liu, Derong; Wang, Ding

    2014-03-01

    In this paper, an adaptive reinforcement learning-based solution is developed for the infinite-horizon optimal control problem of constrained-input continuous-time nonlinear systems in the presence of nonlinearities with unknown structures. Two different types of neural networks (NNs) are employed to approximate the Hamilton-Jacobi-Bellman equation. That is, an recurrent NN is constructed to identify the unknown dynamical system, and two feedforward NNs are used as the actor and the critic to approximate the optimal control and the optimal cost, respectively. Based on this framework, the action NN and the critic NN are tuned simultaneously, without the requirement for the knowledge of system drift dynamics. Moreover, by using Lyapunov's direct method, the weights of the action NN and the critic NN are guaranteed to be uniformly ultimately bounded, while keeping the closed-loop system stable. To demonstrate the effectiveness of the present approach, simulation results are illustrated.

  15. Off-Policy Reinforcement Learning: Optimal Operational Control for Two-Time-Scale Industrial Processes.

    Science.gov (United States)

    Li, Jinna; Kiumarsi, Bahare; Chai, Tianyou; Lewis, Frank L; Fan, Jialu

    2017-12-01

    Industrial flow lines are composed of unit processes operating on a fast time scale and performance measurements known as operational indices measured at a slower time scale. This paper presents a model-free optimal solution to a class of two time-scale industrial processes using off-policy reinforcement learning (RL). First, the lower-layer unit process control loop with a fast sampling period and the upper-layer operational index dynamics at a slow time scale are modeled. Second, a general optimal operational control problem is formulated to optimally prescribe the set-points for the unit industrial process. Then, a zero-sum game off-policy RL algorithm is developed to find the optimal set-points by using data measured in real-time. Finally, a simulation experiment is employed for an industrial flotation process to show the effectiveness of the proposed method.

  16. In Search of an Integrative Measure of Functioning

    Directory of Open Access Journals (Sweden)

    Rosamond H. Madden

    2015-05-01

    Full Text Available International trends towards people-centred, integrative care and support require any measurement of functioning and disability to meet multiple aims. The information requirements of two major Australian programs for disability and rehabilitation are outlined, and the findings of two searches for suitable measures of functioning and disability are analysed. Over 30 current measures of functioning were evaluated in each search. Neither search found a generic measure of functioning suitable for these multibillion dollar programs, relevant to a wide range of people with a variety of health conditions and functioning experiences, and capable of indicating support needs, associated costs, progress and outcomes. This unsuccessful outcome has implications internationally for policy-relevant information for disability, rehabilitation and related programs. The paper outlines the features of an Integrative Measure of Functioning (IMF based on the concepts of functioning and environmental factors in the International Classification of Functioning, Disability and Health (ICF. An IMF would be applicable across a variety of health conditions, settings and purposes, ranging from individual assessment to public health. An IMF could deliver person-centred, policy-relevant information for a range of programs, promoting harmonised language and measurement and supporting international trends in human services and public health.

  17. Discrete-Time Stable Generalized Self-Learning Optimal Control With Approximation Errors.

    Science.gov (United States)

    Wei, Qinglai; Li, Benkai; Song, Ruizhuo

    2018-04-01

    In this paper, a generalized policy iteration (GPI) algorithm with approximation errors is developed for solving infinite horizon optimal control problems for nonlinear systems. The developed stable GPI algorithm provides a general structure of discrete-time iterative adaptive dynamic programming algorithms, by which most of the discrete-time reinforcement learning algorithms can be described using the GPI structure. It is for the first time that approximation errors are explicitly considered in the GPI algorithm. The properties of the stable GPI algorithm with approximation errors are analyzed. The admissibility of the approximate iterative control law can be guaranteed if the approximation errors satisfy the admissibility criteria. The convergence of the developed algorithm is established, which shows that the iterative value function is convergent to a finite neighborhood of the optimal performance index function, if the approximate errors satisfy the convergence criterion. Finally, numerical examples and comparisons are presented.

  18. Flexible Heuristic Dynamic Programming for Reinforcement Learning in Quadrotors

    NARCIS (Netherlands)

    Helmer, Alexander; de Visser, C.C.; van Kampen, E.

    2018-01-01

    Reinforcement learning is a paradigm for learning decision-making tasks from interaction with the environment. Function approximators solve a part of the curse of dimensionality when learning in high-dimensional state and/or action spaces. It can be a time-consuming process to learn a good policy in

  19. Multiple determinants of transfer of evaluative function after conditioning with free-operant schedules of reinforcement.

    Science.gov (United States)

    Dack, Charlotte; Reed, Phil; McHugh, Louise

    2010-11-01

    The aim of the four present experiments was to explore how different schedules of reinforcement influence schedule-induced behavior, their impact on evaluative ratings given to conditioned stimuli associated with each schedule through evaluative conditioning, and the transfer of these evaluations through derived stimulus networks. Experiment 1 compared two contrasting response reinforcement rules (variable ratio [VR], variable interval [VI]). Experiment 2 varied the response to reinforcement rule between two schedules but equated the outcome to response rate (differential reinforcement of high rate [DRH] vs. VR). Experiment 3 compared molar and molecular aspects of contingencies of reinforcement (tandem VIVR vs. tandem VRVI). Finally, Experiment 4 employed schedules that induced low rates of responding to determine whether, under these circumstances, responses were more sensitive to the molecular aspects of a schedule (differential reinforcement of low rate [DRL] vs. VI). The findings suggest that the transfer of evaluative functions is determined mainly by differences in response rate between the schedules and the molar aspects of the schedules. However, when neither schedule was based on a strong response reinforcement rule, the transfer of evaluative judgments came under the control of the molecular aspects of the schedule.

  20. Multi-Objective Reinforcement Learning-Based Deep Neural Networks for Cognitive Space Communications

    Science.gov (United States)

    Ferreria, Paulo Victor R.; Paffenroth, Randy; Wyglinski, Alexander M.; Hackett, Timothy M.; Bilen, Sven G.; Reinhart, Richard C.; Mortensen, Dale J.

    2017-01-01

    Future communication subsystems of space exploration missions can potentially benefit from software-defined radios (SDRs) controlled by machine learning algorithms. In this paper, we propose a novel hybrid radio resource allocation management control algorithm that integrates multi-objective reinforcement learning and deep artificial neural networks. The objective is to efficiently manage communications system resources by monitoring performance functions with common dependent variables that result in conflicting goals. The uncertainty in the performance of thousands of different possible combinations of radio parameters makes the trade-off between exploration and exploitation in reinforcement learning (RL) much more challenging for future critical space-based missions. Thus, the system should spend as little time as possible on exploring actions, and whenever it explores an action, it should perform at acceptable levels most of the time. The proposed approach enables on-line learning by interactions with the environment and restricts poor resource allocation performance through virtual environment exploration. Improvements in the multiobjective performance can be achieved via transmitter parameter adaptation on a packet-basis, with poorly predicted performance promptly resulting in rejected decisions. Simulations presented in this work considered the DVB-S2 standard adaptive transmitter parameters and additional ones expected to be present in future adaptive radio systems. Performance results are provided by analysis of the proposed hybrid algorithm when operating across a satellite communication channel from Earth to GEO orbit during clear sky conditions. The proposed approach constitutes part of the core cognitive engine proof-of-concept to be delivered to the NASA Glenn Research Center SCaN Testbed located onboard the International Space Station.

  1. Generalized projective synchronization of chaotic systems via adaptive learning control

    International Nuclear Information System (INIS)

    Yun-Ping, Sun; Jun-Min, Li; Hui-Lin, Wang; Jiang-An, Wang

    2010-01-01

    In this paper, a learning control approach is applied to the generalized projective synchronisation (GPS) of different chaotic systems with unknown periodically time-varying parameters. Using the Lyapunov–Krasovskii functional stability theory, a differential-difference mixed parametric learning law and an adaptive learning control law are constructed to make the states of two different chaotic systems asymptotically synchronised. The scheme is successfully applied to the generalized projective synchronisation between the Lorenz system and Chen system. Moreover, numerical simulations results are used to verify the effectiveness of the proposed scheme. (general)

  2. Functional Based Adaptive and Fuzzy Sliding Controller for Non-Autonomous Active Suspension System

    Science.gov (United States)

    Huang, Shiuh-Jer; Chen, Hung-Yi

    In this paper, an adaptive sliding controller is developed for controlling a vehicle active suspension system. The functional approximation technique is employed to substitute the unknown non-autonomous functions of the suspension system and release the model-based requirement of sliding mode control algorithm. In order to improve the control performance and reduce the implementation problem, a fuzzy strategy with online learning ability is added to compensate the functional approximation error. The update laws of the functional approximation coefficients and the fuzzy tuning parameters are derived from the Lyapunov theorem to guarantee the system stability. The proposed controller is implemented on a quarter-car hydraulic actuating active suspension system test-rig. The experimental results show that the proposed controller suppresses the oscillation amplitude of the suspension system effectively.

  3. Punishment and psychopathy: a case-control functional MRI investigation of reinforcement learning in violent antisocial personality disordered men.

    Science.gov (United States)

    Gregory, Sarah; Blair, R James; Ffytche, Dominic; Simmons, Andrew; Kumari, Veena; Hodgins, Sheilagh; Blackwood, Nigel

    2015-02-01

    Men with antisocial personality disorder show lifelong abnormalities in adaptive decision making guided by the weighing up of reward and punishment information. Among men with antisocial personality disorder, modification of the behaviour of those with additional diagnoses of psychopathy seems particularly resistant to punishment. We did a case-control functional MRI (fMRI) study in 50 men, of whom 12 were violent offenders with antisocial personality disorder and psychopathy, 20 were violent offenders with antisocial personality disorder but not psychopathy, and 18 were healthy non-offenders. We used fMRI to measure brain activation associated with the representation of punishment or reward information during an event-related probabilistic response-reversal task, assessed with standard general linear-model-based analysis. Offenders with antisocial personality disorder and psychopathy displayed discrete regions of increased activation in the posterior cingulate cortex and anterior insula in response to punished errors during the task reversal phase, and decreased activation to all correct rewarded responses in the superior temporal cortex. This finding was in contrast to results for offenders without psychopathy and healthy non-offenders. Punishment prediction error signalling in offenders with antisocial personality disorder and psychopathy was highly atypical. This finding challenges the widely held view that such men are simply characterised by diminished neural sensitivity to punishment. Instead, this finding indicates altered organisation of the information-processing system responsible for reinforcement learning and appropriate decision making. This difference between violent offenders with antisocial personality disorder with and without psychopathy has implications for the causes of these disorders and for treatment approaches. National Forensic Mental Health Research and Development Programme, UK Ministry of Justice, Psychiatry Research Trust, NIHR

  4. Functional Dual Adaptive Control with Recursive Gaussian Process Model

    International Nuclear Information System (INIS)

    Prüher, Jakub; Král, Ladislav

    2015-01-01

    The paper deals with dual adaptive control problem, where the functional uncertainties in the system description are modelled by a non-parametric Gaussian process regression model. Current approaches to adaptive control based on Gaussian process models are severely limited in their practical applicability, because the model is re-adjusted using all the currently available data, which keeps growing with every time step. We propose the use of recursive Gaussian process regression algorithm for significant reduction in computational requirements, thus bringing the Gaussian process-based adaptive controllers closer to their practical applicability. In this work, we design a bi-criterial dual controller based on recursive Gaussian process model for discrete-time stochastic dynamic systems given in an affine-in-control form. Using Monte Carlo simulations, we show that the proposed controller achieves comparable performance with the full Gaussian process-based controller in terms of control quality while keeping the computational demands bounded. (paper)

  5. Human reinforcement learning subdivides structured action spaces by learning effector-specific values.

    Science.gov (United States)

    Gershman, Samuel J; Pesaran, Bijan; Daw, Nathaniel D

    2009-10-28

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.

  6. Intelligent Control of a Sensor-Actuator System via Kernelized Least-Squares Policy Iteration

    Directory of Open Access Journals (Sweden)

    Bo Liu

    2012-02-01

    Full Text Available In this paper a new framework, called Compressive Kernelized Reinforcement Learning (CKRL, for computing near-optimal policies in sequential decision making with uncertainty is proposed via incorporating the non-adaptive data-independent Random Projections and nonparametric Kernelized Least-squares Policy Iteration (KLSPI. Random Projections are a fast, non-adaptive dimensionality reduction framework in which high-dimensionality data is projected onto a random lower-dimension subspace via spherically random rotation and coordination sampling. KLSPI introduce kernel trick into the LSPI framework for Reinforcement Learning, often achieving faster convergence and providing automatic feature selection via various kernel sparsification approaches. In this approach, policies are computed in a low-dimensional subspace generated by projecting the high-dimensional features onto a set of random basis. We first show how Random Projections constitute an efficient sparsification technique and how our method often converges faster than regular LSPI, while at lower computational costs. Theoretical foundation underlying this approach is a fast approximation of Singular Value Decomposition (SVD. Finally, simulation results are exhibited on benchmark MDP domains, which confirm gains both in computation time and in performance in large feature spaces.

  7. Rollout sampling approximate policy iteration

    NARCIS (Netherlands)

    Dimitrakakis, C.; Lagoudakis, M.G.

    2008-01-01

    Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a

  8. SCAFFOLDINGAND REINFORCEMENT: USING DIGITAL LOGBOOKS IN LEARNING VOCABULARY

    OpenAIRE

    Khalifa, Salma Hasan Almabrouk; Shabdin, Ahmad Affendi

    2016-01-01

    Reinforcement and scaffolding are tested approaches to enhance learning achievements. Keeping a record of the learning process as well as the new learned words functions as scaffolding to help learners build a comprehensive vocabulary. Similarly, repetitive learning of new words reinforces permanent learning for long-term memory. Paper-based logbooks may prove to be good records of the learning process, but if learners use digital logbooks, the results may be even better. Digital logbooks wit...

  9. SU-D-BRB-05: Quantum Learning for Knowledge-Based Response-Adaptive Radiotherapy

    Energy Technology Data Exchange (ETDEWEB)

    El Naqa, I; Ten, R [Haken University of Michigan, Ann Arbor, MI (United States)

    2016-06-15

    Purpose: There is tremendous excitement in radiotherapy about applying data-driven methods to develop personalized clinical decisions for real-time response-based adaptation. However, classical statistical learning methods lack in terms of efficiency and ability to predict outcomes under conditions of uncertainty and incomplete information. Therefore, we are investigating physics-inspired machine learning approaches by utilizing quantum principles for developing a robust framework to dynamically adapt treatments to individual patient’s characteristics and optimize outcomes. Methods: We studied 88 liver SBRT patients with 35 on non-adaptive and 53 on adaptive protocols. Adaptation was based on liver function using a split-course of 3+2 fractions with a month break. The radiotherapy environment was modeled as a Markov decision process (MDP) of baseline and one month into treatment states. The patient environment was modeled by a 5-variable state represented by patient’s clinical and dosimetric covariates. For comparison of classical and quantum learning methods, decision-making to adapt at one month was considered. The MDP objective was defined by the complication-free tumor control (P{sup +}=TCPx(1-NTCP)). A simple regression model represented state-action mapping. Single bit in classical MDP and a qubit of 2-superimposed states in quantum MDP represented the decision actions. Classical decision selection was done using reinforcement Q-learning and quantum searching was performed using Grover’s algorithm, which applies uniform superposition over possible states and yields quadratic speed-up. Results: Classical/quantum MDPs suggested adaptation (probability amplitude ≥0.5) 79% of the time for splitcourses and 100% for continuous-courses. However, the classical MDP had an average adaptation probability of 0.5±0.22 while the quantum algorithm reached 0.76±0.28. In cases where adaptation failed, classical MDP yielded 0.31±0.26 average amplitude while the

  10. Space Objects Maneuvering Detection and Prediction via Inverse Reinforcement Learning

    Science.gov (United States)

    Linares, R.; Furfaro, R.

    This paper determines the behavior of Space Objects (SOs) using inverse Reinforcement Learning (RL) to estimate the reward function that each SO is using for control. The approach discussed in this work can be used to analyze maneuvering of SOs from observational data. The inverse RL problem is solved using the Feature Matching approach. This approach determines the optimal reward function that a SO is using while maneuvering by assuming that the observed trajectories are optimal with respect to the SO's own reward function. This paper uses estimated orbital elements data to determine the behavior of SOs in a data-driven fashion.

  11. Reinforcement Learning State-of-the-Art

    CERN Document Server

    Wiering, Marco

    2012-01-01

    Reinforcement learning encompasses both a science of adaptive behavior of rational beings in uncertain environments and a computational methodology for finding optimal behaviors for challenging problems in control, optimization and adaptive behavior of intelligent agents. As a field, reinforcement learning has progressed tremendously in the past decade. The main goal of this book is to present an up-to-date series of survey articles on the main contemporary sub-fields of reinforcement learning. This includes surveys on partially observable environments, hierarchical task decompositions, relational knowledge representation and predictive state representations. Furthermore, topics such as transfer, evolutionary methods and continuous spaces in reinforcement learning are surveyed. In addition, several chapters review reinforcement learning methods in robotics, in games, and in computational neuroscience. In total seventeen different subfields are presented by mostly young experts in those areas, and together the...

  12. Using Spatial Reinforcement Learning to Build Forest Wildfire Dynamics Models From Satellite Images

    Directory of Open Access Journals (Sweden)

    Sriram Ganapathi Subramanian

    2018-04-01

    Full Text Available Machine learning algorithms have increased tremendously in power in recent years but have yet to be fully utilized in many ecology and sustainable resource management domains such as wildlife reserve design, forest fire management, and invasive species spread. One thing these domains have in common is that they contain dynamics that can be characterized as a spatially spreading process (SSP, which requires many parameters to be set precisely to model the dynamics, spread rates, and directional biases of the elements which are spreading. We present related work in artificial intelligence and machine learning for SSP sustainability domains including forest wildfire prediction. We then introduce a novel approach for learning in SSP domains using reinforcement learning (RL where fire is the agent at any cell in the landscape and the set of actions the fire can take from a location at any point in time includes spreading north, south, east, or west or not spreading. This approach inverts the usual RL setup since the dynamics of the corresponding Markov Decision Process (MDP is a known function for immediate wildfire spread. Meanwhile, we learn an agent policy for a predictive model of the dynamics of a complex spatial process. Rewards are provided for correctly classifying which cells are on fire or not compared with satellite and other related data. We examine the behavior of five RL algorithms on this problem: value iteration, policy iteration, Q-learning, Monte Carlo Tree Search, and Asynchronous Advantage Actor-Critic (A3C. We compare to a Gaussian process-based supervised learning approach and also discuss the relation of our approach to manually constructed, state-of-the-art methods from forest wildfire modeling. We validate our approach with satellite image data of two massive wildfire events in Northern Alberta, Canada; the Fort McMurray fire of 2016 and the Richardson fire of 2011. The results show that we can learn predictive, agent

  13. Functional Credentials

    Directory of Open Access Journals (Sweden)

    Deuber Dominic

    2018-04-01

    Full Text Available A functional credential allows a user to anonymously prove possession of a set of attributes that fulfills a certain policy. The policies are arbitrary polynomially computable predicates that are evaluated over arbitrary attributes. The key feature of this primitive is the delegation of verification to third parties, called designated verifiers. The delegation protects the privacy of the policy: A designated verifier can verify that a user satisfies a certain policy without learning anything about the policy itself. We illustrate the usefulness of this property in different applications, including outsourced databases with access control. We present a new framework to construct functional credentials that does not require (non-interactive zero-knowledge proofs. This is important in settings where the statements are complex and thus the resulting zero-knowledge proofs are not efficient. Our construction is based on any predicate encryption scheme and the security relies on standard assumptions. A complexity analysis and an experimental evaluation confirm the practicality of our approach.

  14. Adaptive Dynamic Programming for Control Algorithms and Stability

    CERN Document Server

    Zhang, Huaguang; Luo, Yanhong; Wang, Ding

    2013-01-01

    There are many methods of stable controller design for nonlinear systems. In seeking to go beyond the minimum requirement of stability, Adaptive Dynamic Programming for Control approaches the challenging topic of optimal control for nonlinear systems using the tools of  adaptive dynamic programming (ADP). The range of systems treated is extensive; affine, switched, singularly perturbed and time-delay nonlinear systems are discussed as are the uses of neural networks and techniques of value and policy iteration. The text features three main aspects of ADP in which the methods proposed for stabilization and for tracking and games benefit from the incorporation of optimal control methods: • infinite-horizon control for which the difficulty of solving partial differential Hamilton–Jacobi–Bellman equations directly is overcome, and  proof provided that the iterative value function updating sequence converges to the infimum of all the value functions obtained by admissible control law sequences; • finite-...

  15. Gaussian Processes for Data-Efficient Learning in Robotics and Control.

    Science.gov (United States)

    Deisenroth, Marc Peter; Fox, Dieter; Rasmussen, Carl Edward

    2015-02-01

    Autonomous learning has been a promising direction in control and robotics for more than a decade since data-driven learning allows to reduce the amount of engineering knowledge, which is otherwise required. However, autonomous reinforcement learning (RL) approaches typically require many interactions with the system to learn controllers, which is a practical limitation in real systems, such as robots, where many interactions can be impractical and time consuming. To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, realistic simulators, pre-shaped policies, or specific knowledge about the underlying dynamics. In this paper, we follow a different approach and speed up learning by extracting more information from data. In particular, we learn a probabilistic, non-parametric Gaussian process transition model of the system. By explicitly incorporating model uncertainty into long-term planning and controller learning our approach reduces the effects of model errors, a key problem in model-based learning. Compared to state-of-the art RL our model-based policy search method achieves an unprecedented speed of learning. We demonstrate its applicability to autonomous learning in real robot and control tasks.

  16. Optimal and Autonomous Control Using Reinforcement Learning: A Survey.

    Science.gov (United States)

    Kiumarsi, Bahare; Vamvoudakis, Kyriakos G; Modares, Hamidreza; Lewis, Frank L

    2018-06-01

    This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.

  17. Adaptive Load Balancing of Parallel Applications with Multi-Agent Reinforcement Learning on Heterogeneous Systems

    Directory of Open Access Journals (Sweden)

    Johan Parent

    2004-01-01

    Full Text Available We report on the improvements that can be achieved by applying machine learning techniques, in particular reinforcement learning, for the dynamic load balancing of parallel applications. The applications being considered in this paper are coarse grain data intensive applications. Such applications put high pressure on the interconnect of the hardware. Synchronization and load balancing in complex, heterogeneous networks need fast, flexible, adaptive load balancing algorithms. Viewing a parallel application as a one-state coordination game in the framework of multi-agent reinforcement learning, and by using a recently introduced multi-agent exploration technique, we are able to improve upon the classic job farming approach. The improvements are achieved with limited computation and communication overhead.

  18. Knowledge-Based Reinforcement Learning for Data Mining

    Science.gov (United States)

    Kudenko, Daniel; Grzes, Marek

    experts have developed heuristics that help them in planning and scheduling resources in their work place. However, this domain knowledge is often rough and incomplete. When the domain knowledge is used directly by an automated expert system, the solutions are often sub-optimal, due to the incompleteness of the knowledge, the uncertainty of environments, and the possibility to encounter unexpected situations. RL, on the other hand, can overcome the weaknesses of the heuristic domain knowledge and produce optimal solutions. In the talk we propose two techniques, which represent first steps in the area of knowledge-based RL (KBRL). The first technique [1] uses high-level STRIPS operator knowledge in reward shaping to focus the search for the optimal policy. Empirical results show that the plan-based reward shaping approach outperforms other RL techniques, including alternative manual and MDP-based reward shaping when it is used in its basic form. We showed that MDP-based reward shaping may fail and successful experiments with STRIPS-based shaping suggest modifications which can overcome encountered problems. The STRIPSbased method we propose allows expressing the same domain knowledge in a different way and the domain expert can choose whether to define an MDP or STRIPS planning task. We also evaluated the robustness of the proposed STRIPS-based technique to errors in the plan knowledge. In case that STRIPS knowledge is not available, we propose a second technique [2] that shapes the reward with hierarchical tile coding. Where the Q-function is represented with low-level tile coding, a V-function with coarser tile coding can be learned in parallel and used to approximate the potential for ground states. In the context of data mining, our KBRL approaches can also be used for any data collection task where the acquisition of data may incur considerable cost. In addition, observing the data collection agent in specific scenarios may lead to new insights into optimal data

  19. Adaptive Neural Control of Nonaffine Nonlinear Systems without Differential Condition for Nonaffine Function

    Directory of Open Access Journals (Sweden)

    Chaojiao Sun

    2016-01-01

    Full Text Available An adaptive neural control scheme is proposed for nonaffine nonlinear system without using the implicit function theorem or mean value theorem. The differential conditions on nonaffine nonlinear functions are removed. The control-gain function is modeled with the nonaffine function probably being indifferentiable. Furthermore, only a semibounded condition for nonaffine nonlinear function is required in the proposed method, and the basic idea of invariant set theory is then constructively introduced to cope with the difficulty in the control design for nonaffine nonlinear systems. It is rigorously proved that all the closed-loop signals are bounded and the tracking error converges to a small residual set asymptotically. Finally, simulation examples are provided to demonstrate the effectiveness of the designed method.

  20. The "proactive" model of learning: Integrative framework for model-free and model-based reinforcement learning utilizing the associative learning-based proactive brain concept.

    Science.gov (United States)

    Zsuga, Judit; Biro, Klara; Papp, Csaba; Tajti, Gabor; Gesztelyi, Rudolf

    2016-02-01

    Reinforcement learning (RL) is a powerful concept underlying forms of associative learning governed by the use of a scalar reward signal, with learning taking place if expectations are violated. RL may be assessed using model-based and model-free approaches. Model-based reinforcement learning involves the amygdala, the hippocampus, and the orbitofrontal cortex (OFC). The model-free system involves the pedunculopontine-tegmental nucleus (PPTgN), the ventral tegmental area (VTA) and the ventral striatum (VS). Based on the functional connectivity of VS, model-free and model based RL systems center on the VS that by integrating model-free signals (received as reward prediction error) and model-based reward related input computes value. Using the concept of reinforcement learning agent we propose that the VS serves as the value function component of the RL agent. Regarding the model utilized for model-based computations we turned to the proactive brain concept, which offers an ubiquitous function for the default network based on its great functional overlap with contextual associative areas. Hence, by means of the default network the brain continuously organizes its environment into context frames enabling the formulation of analogy-based association that are turned into predictions of what to expect. The OFC integrates reward-related information into context frames upon computing reward expectation by compiling stimulus-reward and context-reward information offered by the amygdala and hippocampus, respectively. Furthermore we suggest that the integration of model-based expectations regarding reward into the value signal is further supported by the efferent of the OFC that reach structures canonical for model-free learning (e.g., the PPTgN, VTA, and VS). (c) 2016 APA, all rights reserved).

  1. Covercrete with hybrid functions - A novel approach to durable reinforced concrete structures

    Energy Technology Data Exchange (ETDEWEB)

    Tang, L.; Zhang, E.Q. [Chalmers University of Technology, SE-412 96 Gothenburg (Sweden); Fu, Y. [KTH Royal Institute of Technology, SE-106 91 Stockholm (Sweden); Schouenborg, B.; Lindqvist, J.E. [CBI Swedish Cement and Concrete Research Institute, c/o SP, Box 857, SE-501 15 Boraas (Sweden)

    2012-12-15

    Due to the corrosion of steel in reinforced concrete structures, the concrete with low water-cement ratio (w/c), high cement content, and large cover thickness is conventionally used for prolonging the passivation period of steel. Obviously, this conventional approach to durable concrete structures is at the sacrifice of more CO{sub 2} emission and natural resources through consuming higher amount of cement and more constituent materials, which is against sustainability. By placing an economically affordable conductive mesh made of carbon fiber or conductive polymer fiber in the near surface zone of concrete acting as anode we can build up a cathodic prevention system with intermittent low current density supplied by, e.g., the solar cells. In such a way, the aggressive negative ions such as Cl{sup -}, CO{sub 3}{sup 2-}, and SO{sub 4}{sup 2-} can be stopped near the cathodic (steel) zone. Thus the reinforcement steel is prevented from corrosion even in the concrete with relatively high w/c and small cover thickness. This conductive mesh functions not only as electrode, but also as surface reinforcement to prevent concrete surface from cracking. Therefore, this new type of covercrete has hybrid functions. This paper presents the theoretical analysis of feasibility of this approach and discusses the potential durability problems and possible solutions to the potential problems. (Copyright copyright 2012 WILEY-VCH Verlag GmbH and Co. KGaA, Weinheim)

  2. Video Demo: Deep Reinforcement Learning for Coordination in Traffic Light Control

    NARCIS (Netherlands)

    van der Pol, E.; Oliehoek, F.A.; Bosse, T.; Bredeweg, B.

    2016-01-01

    This video demonstration contrasts two approaches to coordination in traffic light control using reinforcement learning: earlier work, based on a deconstruction of the state space into a linear combination of vehicle states, and our own approach based on the Deep Q-learning algorithm.

  3. Online Adaptation and Over-Trial Learning in Macaque Visuomotor Control

    Science.gov (United States)

    Braun, Daniel A.; Aertsen, Ad; Paz, Rony; Vaadia, Eilon; Rotter, Stefan; Mehring, Carsten

    2011-01-01

    When faced with unpredictable environments, the human motor system has been shown to develop optimized adaptation strategies that allow for online adaptation during the control process. Such online adaptation is to be contrasted to slower over-trial learning that corresponds to a trial-by-trial update of the movement plan. Here we investigate the interplay of both processes, i.e., online adaptation and over-trial learning, in a visuomotor experiment performed by macaques. We show that simple non-adaptive control schemes fail to perform in this task, but that a previously suggested adaptive optimal feedback control model can explain the observed behavior. We also show that over-trial learning as seen in learning and aftereffect curves can be explained by learning in a radial basis function network. Our results suggest that both the process of over-trial learning and the process of online adaptation are crucial to understand visuomotor learning. PMID:21720526

  4. Developing rapid methods for analyzing upland riparian functions and values.

    Science.gov (United States)

    Hruby, Thomas

    2009-06-01

    Regulators protecting riparian areas need to understand the integrity, health, beneficial uses, functions, and values of this resource. Up to now most methods providing information about riparian areas are based on analyzing condition or integrity. These methods, however, provide little information about functions and values. Different methods are needed that specifically address this aspect of riparian areas. In addition to information on functions and values, regulators have very specific needs that include: an analysis at the site scale, low cost, usability, and inclusion of policy interpretations. To meet these needs a rapid method has been developed that uses a multi-criteria decision matrix to categorize riparian areas in Washington State, USA. Indicators are used to identify the potential of the site to provide a function, the potential of the landscape to support the function, and the value the function provides to society. To meet legal needs fixed boundaries for assessment units are established based on geomorphology, the distance from "Ordinary High Water Mark" and different categories of land uses. Assessment units are first classified based on ecoregions, geomorphic characteristics, and land uses. This simplifies the data that need to be collected at a site, but it requires developing and calibrating a separate model for each "class." The approach to developing methods is adaptable to other locations as its basic structure is not dependent on local conditions.

  5. Reinforcement Learning for Routing in Cognitive Radio Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    Hasan A. A. Al-Rawi

    2014-01-01

    Full Text Available Cognitive radio (CR enables unlicensed users (or secondary users, SUs to sense for and exploit underutilized licensed spectrum owned by the licensed users (or primary users, PUs. Reinforcement learning (RL is an artificial intelligence approach that enables a node to observe, learn, and make appropriate decisions on action selection in order to maximize network performance. Routing enables a source node to search for a least-cost route to its destination node. While there have been increasing efforts to enhance the traditional RL approach for routing in wireless networks, this research area remains largely unexplored in the domain of routing in CR networks. This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate through simulation. New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing. Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs’ network performance without significantly jeopardizing PUs’ network performance, specifically SUs’ interference to PUs.

  6. Learning in robotic manipulation: The role of dimensionality reduction in policy search methods. Comment on "Hand synergies: Integration of robotics and neuroscience for understanding the control of biological and artificial hands" by Marco Santello et al.

    Science.gov (United States)

    Ficuciello, Fanny; Siciliano, Bruno

    2016-07-01

    A question that often arises, among researchers working on artificial hands and robotic manipulation, concerns the real meaning of synergies. Namely, are they a realistic representation of the central nervous system control of manipulation activities at different levels and of the sensory-motor manipulation apparatus of the human being, or do they constitute just a theoretical framework exploiting analytical methods to simplify the representation of grasping and manipulation activities? Apparently, this is not a simple question to answer and, in this regard, many minds from the field of neuroscience and robotics are addressing the issue [1]. The interest of robotics is definitely oriented towards the adoption of synergies to tackle the control problem of devices with high number of degrees of freedom (DoFs) which are required to achieve motor and learning skills comparable to those of humans. The synergy concept is useful for innovative underactuated design of anthropomorphic hands [2], while the resulting dimensionality reduction simplifies the control of biomedical devices such as myoelectric hand prostheses [3]. Synergies might also be useful in conjunction with the learning process [4]. This aspect is less explored since few works on synergy-based learning have been realized in robotics. In learning new tasks through trial-and-error, physical interaction is important. On the other hand, advanced mechanical designs such as tendon-driven actuation, underactuated compliant mechanisms and hyper-redundant/continuum robots might exhibit enhanced capabilities of adapting to changing environments and learning from exploration. In particular, high DoFs and compliance increase the complexity of modelling and control of these devices. An analytical approach to manipulation planning requires a precise model of the object, an accurate description of the task, and an evaluation of the object affordance, which all make the process rather time consuming. The integration of

  7. Optimizing microstimulation using a reinforcement learning framework.

    Science.gov (United States)

    Brockmeier, Austin J; Choi, John S; Distasio, Marcello M; Francis, Joseph T; Príncipe, José C

    2011-01-01

    The ability to provide sensory feedback is desired to enhance the functionality of neuroprosthetics. Somatosensory feedback provides closed-loop control to the motor system, which is lacking in feedforward neuroprosthetics. In the case of existing somatosensory function, a template of the natural response can be used as a template of desired response elicited by electrical microstimulation. In the case of no initial training data, microstimulation parameters that produce responses close to the template must be selected in an online manner. We propose using reinforcement learning as a framework to balance the exploration of the parameter space and the continued selection of promising parameters for further stimulation. This approach avoids an explicit model of the neural response from stimulation. We explore a preliminary architecture--treating the task as a k-armed bandit--using offline data recorded for natural touch and thalamic microstimulation, and we examine the methods efficiency in exploring the parameter space while concentrating on promising parameter forms. The best matching stimulation parameters, from k = 68 different forms, are selected by the reinforcement learning algorithm consistently after 334 realizations.

  8. Embedded Incremental Feature Selection for Reinforcement Learning

    Science.gov (United States)

    2012-05-01

    Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature

  9. Image Captioning with Word Gate and Adaptive Self-Critical Learning

    Directory of Open Access Journals (Sweden)

    Xinxin Zhu

    2018-06-01

    Full Text Available Although the policy-gradient methods for reinforcement learning have shown significant improvement in image captioning, how to achieve high performance during the reinforcement optimizing process is still not a simple task. There are at least two difficulties: (1 The large size of vocabulary leads to a large action space, which makes it difficult for the model to accurately predict the current word. (2 The large variance of gradient estimation in reinforcement learning usually causes severe instabilities in the training process. In this paper, we propose two innovations to boost the performance of self-critical sequence training (SCST. First, we modify the standard long short-term memory (LSTMbased decoder by introducing a gate function to reduce the search scope of the vocabulary for any given image, which is termed the word gate decoder. Second, instead of only considering current maximum actions greedily, we propose a stabilized gradient estimation method whose gradient variance is controlled by the difference between the sampling reward from the current model and the expectation of the historical reward. We conducted extensive experiments, and results showed that our method could accelerate the training process and increase the prediction accuracy. Our method was validated on MS COCO datasets and yielded state-of-the-art performance.

  10. Neural network-based model reference adaptive control system.

    Science.gov (United States)

    Patino, H D; Liu, D

    2000-01-01

    In this paper, an approach to model reference adaptive control based on neural networks is proposed and analyzed for a class of first-order continuous-time nonlinear dynamical systems. The controller structure can employ either a radial basis function network or a feedforward neural network to compensate adaptively the nonlinearities in the plant. A stable controller-parameter adjustment mechanism, which is determined using the Lyapunov theory, is constructed using a sigma-modification-type updating law. The evaluation of control error in terms of the neural network learning error is performed. That is, the control error converges asymptotically to a neighborhood of zero, whose size is evaluated and depends on the approximation error of the neural network. In the design and analysis of neural network-based control systems, it is important to take into account the neural network learning error and its influence on the control error of the plant. Simulation results showing the feasibility and performance of the proposed approach are given.

  11. An Improved Reinforcement Learning System Using Affective Factors

    Directory of Open Access Journals (Sweden)

    Takashi Kuremoto

    2013-07-01

    Full Text Available As a powerful and intelligent machine learning method, reinforcement learning (RL has been widely used in many fields such as game theory, adaptive control, multi-agent system, nonlinear forecasting, and so on. The main contribution of this technique is its exploration and exploitation approaches to find the optimal solution or semi-optimal solution of goal-directed problems. However, when RL is applied to multi-agent systems (MASs, problems such as “curse of dimension”, “perceptual aliasing problem”, and uncertainty of the environment constitute high hurdles to RL. Meanwhile, although RL is inspired by behavioral psychology and reward/punishment from the environment is used, higher mental factors such as affects, emotions, and motivations are rarely adopted in the learning procedure of RL. In this paper, to challenge agents learning in MASs, we propose a computational motivation function, which adopts two principle affective factors “Arousal” and “Pleasure” of Russell’s circumplex model of affects, to improve the learning performance of a conventional RL algorithm named Q-learning (QL. Compared with the conventional QL, computer simulations of pursuit problems with static and dynamic preys were carried out, and the results showed that the proposed method results in agents having a faster and more stable learning performance.

  12. The social value of mortality risk reduction: VSL versus the social welfare function approach.

    Science.gov (United States)

    Adler, Matthew D; Hammitt, James K; Treich, Nicolas

    2014-05-01

    We examine how different welfarist frameworks evaluate the social value of mortality risk reduction. These frameworks include classical, distributively unweighted cost-benefit analysis--i.e., the "value per statistical life" (VSL) approach-and various social welfare functions (SWFs). The SWFs are either utilitarian or prioritarian, applied to policy choice under risk in either an "ex post" or "ex ante" manner. We examine the conditions on individual utility and on the SWF under which these frameworks display sensitivity to wealth and to baseline risk. Moreover, we discuss whether these frameworks satisfy related properties that have received some attention in the literature, namely equal value of risk reduction, preference for risk equity, and catastrophe aversion. We show that the particular manner in which VSL ranks risk-reduction measures is not necessarily shared by other welfarist frameworks. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Human demonstrations for fast and safe exploration in reinforcement learning

    NARCIS (Netherlands)

    Schonebaum, G.K.; Junell, J.L.; van Kampen, E.

    2017-01-01

    Reinforcement learning is a promising framework for controlling complex vehicles with a high level of autonomy, since it does not need a dynamic model of the vehicle, and it is able to adapt to changing conditions. When learning from scratch, the performance of a reinforcement learning controller

  14. Connection of functional quality of partial removable dentures and the degree of patients' phonetic adaptation.

    Science.gov (United States)

    Artjomenko, Victoria; Vidzis, Aldis; Zigurs, Guntis

    2015-01-01

    Phonetic adaptation is a complex biological phenomenon with a highly individual course, depending on the patient's motivation to use prosthesis, on the functional quality of removable dentures. The aim of the study was to estimate phonetic adaptation in patients with partial dentures, connecting it to alteration in speech quality and dentures functional value. We examined some peculiarities of phonetic adaptation in 50 patients with removable dentures (50 patients with natural dentition were invited for the control group). The standardized evaluation protocols (12 speech quality determining parameters) were developed separately for Latvian and Russian native speakers. 500 speech video samples were recorded and analysed according to pre-established guidelines. The connection of speech quality and the functional quality of the dentures was assessed. Statistical analysis was performed using SPSS 20.0. P values equal to or less than 0.05 were considered to be statistically significant. In patients with appropriate functional quality of removable dentures distorted speech production was detected in 25% (pk=0.008) cases and in patients with inappropriate functional quality of the prosthesis - in 40% (pkdentures functional value were satisfied with their speech performance in 96% (pk=0.674), in the group with inappropriate dentures functional value only 59% (premovable dentures depends on the patient's individual adaptation capacity, prosthetic design and functional value. Thus statistically significant correlation between removable partial dentures functional value, duration of usage and the degree of patients' phonetic adaptation (p<0.001) may be considered to be confirmed.

  15. Switching Reinforcement Learning for Continuous Action Space

    Science.gov (United States)

    Nagayoshi, Masato; Murao, Hajime; Tamaki, Hisashi

    Reinforcement Learning (RL) attracts much attention as a technique of realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. This difficulty includes a problem of designing a suitable action space of an agent, i.e., satisfying two requirements in trade-off: (i) to keep the characteristics (or structure) of an original search space as much as possible in order to seek strategies that lie close to the optimal, and (ii) to reduce the search space as much as possible in order to expedite the learning process. In order to design a suitable action space adaptively, we propose switching RL model to mimic a process of an infant's motor development in which gross motor skills develop before fine motor skills. Then, a method for switching controllers is constructed by introducing and referring to the “entropy”. Further, through computational experiments by using robot navigation problems with one and two-dimensional continuous action space, the validity of the proposed method has been confirmed.

  16. The Reinforcement Learning Competition 2014

    OpenAIRE

    Dimitrakakis, Christos; Li, Guangliang; Tziortziotis, Nikoalos

    2014-01-01

    Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the development of very general learning agents for arbitrary reinforcement learning problems and to provide a test-bed for the unbiased evaluation of algorithms.

  17. Balancing Exploration, Uncertainty Representation and Computational Time in Many-Objective Reservoir Policy Optimization

    Science.gov (United States)

    Zatarain-Salazar, J.; Reed, P. M.; Quinn, J.; Giuliani, M.; Castelletti, A.

    2016-12-01

    As we confront the challenges of managing river basin systems with a large number of reservoirs and increasingly uncertain tradeoffs impacting their operations (due to, e.g. climate change, changing energy markets, population pressures, ecosystem services, etc.), evolutionary many-objective direct policy search (EMODPS) solution strategies will need to address the computational demands associated with simulating more uncertainties and therefore optimizing over increasingly noisy objective evaluations. Diagnostic assessments of state-of-the-art many-objective evolutionary algorithms (MOEAs) to support EMODPS have highlighted that search time (or number of function evaluations) and auto-adaptive search are key features for successful optimization. Furthermore, auto-adaptive MOEA search operators are themselves sensitive to having a sufficient number of function evaluations to learn successful strategies for exploring complex spaces and for escaping from local optima when stagnation is detected. Fortunately, recent parallel developments allow coordinated runs that enhance auto-adaptive algorithmic learning and can handle scalable and reliable search with limited wall-clock time, but at the expense of the total number of function evaluations. In this study, we analyze this tradeoff between parallel coordination and depth of search using different parallelization schemes of the Multi-Master Borg on a many-objective stochastic control problem. We also consider the tradeoff between better representing uncertainty in the stochastic optimization, and simplifying this representation to shorten the function evaluation time and allow for greater search. Our analysis focuses on the Lower Susquehanna River Basin (LSRB) system where multiple competing objectives for hydropower production, urban water supply, recreation and environmental flows need to be balanced. Our results provide guidance for balancing exploration, uncertainty, and computational demands when using the EMODPS

  18. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning

    Directory of Open Access Journals (Sweden)

    Yuntian Feng

    2017-01-01

    Full Text Available We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  19. Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning.

    Science.gov (United States)

    Feng, Yuntian; Zhang, Hongjun; Hao, Wenning; Chen, Gang

    2017-01-01

    We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q -Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

  20. Examining the reinforcing value of stimuli within social and non-social contexts in children with and without high-functioning autism.

    Science.gov (United States)

    Goldberg, Melissa C; Allman, Melissa J; Hagopian, Louis P; Triggs, Mandy M; Frank-Crawford, Michelle A; Mostofsky, Stewart H; Denckla, Martha B; DeLeon, Iser G

    2017-10-01

    One of the key diagnostic criteria for autism spectrum disorder includes impairments in social interactions. This study compared the extent to which boys with high-functioning autism and typically developing boys "value" engaging in activities with a parent or alone. Two different assessments that can empirically determine the relative reinforcing value of social and non-social stimuli were employed: paired-choice preference assessments and progressive-ratio schedules. There were no significant differences between boys with high-functioning autism and typically developing boys on either measure. Moreover, there was a strong correspondence in performance across these two measures for participants in each group. These results suggest that the relative reinforcing value of engaging in activities with a primary caregiver is not diminished for children with autism spectrum disorder.

  1. Mastering the game of Go with deep neural networks and tree search

    Science.gov (United States)

    Silver, David; Huang, Aja; Maddison, Chris J.; Guez, Arthur; Sifre, Laurent; van den Driessche, George; Schrittwieser, Julian; Antonoglou, Ioannis; Panneershelvam, Veda; Lanctot, Marc; Dieleman, Sander; Grewe, Dominik; Nham, John; Kalchbrenner, Nal; Sutskever, Ilya; Lillicrap, Timothy; Leach, Madeleine; Kavukcuoglu, Koray; Graepel, Thore; Hassabis, Demis

    2016-01-01

    The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

  2. Finding function: evaluation methods for functional genomic data

    Directory of Open Access Journals (Sweden)

    Barrett Daniel R

    2006-07-01

    Full Text Available Abstract Background Accurate evaluation of the quality of genomic or proteomic data and computational methods is vital to our ability to use them for formulating novel biological hypotheses and directing further experiments. There is currently no standard approach to evaluation in functional genomics. Our analysis of existing approaches shows that they are inconsistent and contain substantial functional biases that render the resulting evaluations misleading both quantitatively and qualitatively. These problems make it essentially impossible to compare computational methods or large-scale experimental datasets and also result in conclusions that generalize poorly in most biological applications. Results We reveal issues with current evaluation methods here and suggest new approaches to evaluation that facilitate accurate and representative characterization of genomic methods and data. Specifically, we describe a functional genomics gold standard based on curation by expert biologists and demonstrate its use as an effective means of evaluation of genomic approaches. Our evaluation framework and gold standard are freely available to the community through our website. Conclusion Proper methods for evaluating genomic data and computational approaches will determine how much we, as a community, are able to learn from the wealth of available data. We propose one possible solution to this problem here but emphasize that this topic warrants broader community discussion.

  3. Scheduled power tracking control of the wind-storage hybrid system based on the reinforcement learning theory

    Science.gov (United States)

    Li, Ze

    2017-09-01

    In allusion to the intermittency and uncertainty of the wind electricity, energy storage and wind generator are combined into a hybrid system to improve the controllability of the output power. A scheduled power tracking control method is proposed based on the reinforcement learning theory and Q-learning algorithm. In this method, the state space of the environment is formed with two key factors, i.e. the state of charge of the energy storage and the difference value between the actual wind power and scheduled power, the feasible action is the output power of the energy storage, and the corresponding immediate rewarding function is designed to reflect the rationality of the control action. By interacting with the environment and learning from the immediate reward, the optimal control strategy is gradually formed. After that, it could be applied to the scheduled power tracking control of the hybrid system. Finally, the rationality and validity of the method are verified through simulation examples.

  4. Adaptive Functioning in Williams Syndrome: A Systematic Review

    Science.gov (United States)

    Brawn, Gabrielle; Porter, Melanie

    2018-01-01

    Literature on the level of adaptive functioning and relative strengths and weaknesses in functioning of individuals with Williams syndrome (WS) was reviewed. The electronic databases PsycINFO, PubMed, Expanded Academic, Web of Science, Scopus and ProQuest were searched electronically for relevant articles and dissertations using the search terms…

  5. Beyond adaptive-critic creative learning for intelligent mobile robots

    Science.gov (United States)

    Liao, Xiaoqun; Cao, Ming; Hall, Ernest L.

    2001-10-01

    Intelligent industrial and mobile robots may be considered proven technology in structured environments. Teach programming and supervised learning methods permit solutions to a variety of applications. However, we believe that to extend the operation of these machines to more unstructured environments requires a new learning method. Both unsupervised learning and reinforcement learning are potential candidates for these new tasks. The adaptive critic method has been shown to provide useful approximations or even optimal control policies to non-linear systems. The purpose of this paper is to explore the use of new learning methods that goes beyond the adaptive critic method for unstructured environments. The adaptive critic is a form of reinforcement learning. A critic element provides only high level grading corrections to a cognition module that controls the action module. In the proposed system the critic's grades are modeled and forecasted, so that an anticipated set of sub-grades are available to the cognition model. The forecasting grades are interpolated and are available on the time scale needed by the action model. The success of the system is highly dependent on the accuracy of the forecasted grades and adaptability of the action module. Examples from the guidance of a mobile robot are provided to illustrate the method for simple line following and for the more complex navigation and control in an unstructured environment. The theory presented that is beyond the adaptive critic may be called creative theory. Creative theory is a form of learning that models the highest level of human learning - imagination. The application of the creative theory appears to not only be to mobile robots but also to many other forms of human endeavor such as educational learning and business forecasting. Reinforcement learning such as the adaptive critic may be applied to known problems to aid in the discovery of their solutions. The significance of creative theory is that it

  6. Rational and Mechanistic Perspectives on Reinforcement Learning

    Science.gov (United States)

    Chater, Nick

    2009-01-01

    This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…

  7. [Adaptive behaviour and learning in children with neurodevelopmental disorders (autism spectrum disorders and attention deficit hyperactivity disorder). Effects of executive functioning].

    Science.gov (United States)

    Rosello-Miranda, B; Berenguer-Forner, C; Miranda-Casas, A

    2018-03-01

    Autism spectrum disorder (ASD) and attention deficit hyperactivity disorder (ADHD) present difficulties in adaptive functioning and learning, possibly associated with failures in executive functioning characteristic of both disorders. To analyze the impact of executive functioning in the adaptive behaviors of socialization and daily life and in learning behaviors in children with ASD and children with ADHD. The participants were 124 children matched in age and intellectual quotient: 37 children with typical development, 52 children with ASD and 35 children with ADHD. Parents reported on their children's adaptive behaviors, while teachers provided information on learning behaviors and executive functioning in daily life. There are significant differences between the groups with ASD and ADHD with the typical development group in all domains evaluated. In addition, the group with ASD had worse socialization skills while persistence in learning was more affected in children with ADHD. Finally, the metacognitive index of executive functioning predicted the socialization and persistence of children with ASD. On the other hand, the index of behavioral regulation and the educational level of the parents predicted the socialization skills in children with ADHD. The results highlight the need to include differentiated executive strategies in the intervention of children with ASD and children with ADHD.

  8. Policy improvement by a model-free Dyna architecture.

    Science.gov (United States)

    Hwang, Kao-Shing; Lo, Chia-Yue

    2013-05-01

    The objective of this paper is to accelerate the process of policy improvement in reinforcement learning. The proposed Dyna-style system combines two learning schemes, one of which utilizes a temporal difference method for direct learning; the other uses relative values for indirect learning in planning between two successive direct learning cycles. Instead of establishing a complicated world model, the approach introduces a simple predictor of average rewards to actor-critic architecture in the simulation (planning) mode. The relative value of a state, defined as the accumulated differences between immediate reward and average reward, is used to steer the improvement process in the right direction. The proposed learning scheme is applied to control a pendulum system for tracking a desired trajectory to demonstrate its adaptability and robustness. Through reinforcement signals from the environment, the system takes the appropriate action to drive an unknown dynamic to track desired outputs in few learning cycles. Comparisons are made between the proposed model-free method, a connectionist adaptive heuristic critic, and an advanced method of Dyna-Q learning in the experiments of labyrinth exploration. The proposed method outperforms its counterparts in terms of elapsed time and convergence rate.

  9. A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment.

    Science.gov (United States)

    Hung, Shao-Ming; Givigi, Sidney N

    2017-01-01

    In the past two decades, unmanned aerial vehicles (UAVs) have demonstrated their efficacy in supporting both military and civilian applications, where tasks can be dull, dirty, dangerous, or simply too costly with conventional methods. Many of the applications contain tasks that can be executed in parallel, hence the natural progression is to deploy multiple UAVs working together as a force multiplier. However, to do so requires autonomous coordination among the UAVs, similar to swarming behaviors seen in animals and insects. This paper looks at flocking with small fixed-wing UAVs in the context of a model-free reinforcement learning problem. In particular, Peng's Q(λ) with a variable learning rate is employed by the followers to learn a control policy that facilitates flocking in a leader-follower topology. The problem is structured as a Markov decision process, where the agents are modeled as small fixed-wing UAVs that experience stochasticity due to disturbances such as winds and control noises, as well as weight and balance issues. Learned policies are compared to ones solved using stochastic optimal control (i.e., dynamic programming) by evaluating the average cost incurred during flight according to a cost function. Simulation results demonstrate the feasibility of the proposed learning approach at enabling agents to learn how to flock in a leader-follower topology, while operating in a nonstationary stochastic environment.

  10. Stochastic abstract policies: generalizing knowledge to improve reinforcement learning.

    Science.gov (United States)

    Koga, Marcelo L; Freire, Valdinei; Costa, Anna H R

    2015-01-01

    Reinforcement learning (RL) enables an agent to learn behavior by acquiring experience through trial-and-error interactions with a dynamic environment. However, knowledge is usually built from scratch and learning to behave may take a long time. Here, we improve the learning performance by leveraging prior knowledge; that is, the learner shows proper behavior from the beginning of a target task, using the knowledge from a set of known, previously solved, source tasks. In this paper, we argue that building stochastic abstract policies that generalize over past experiences is an effective way to provide such improvement and this generalization outperforms the current practice of using a library of policies. We achieve that contributing with a new algorithm, AbsProb-PI-multiple and a framework for transferring knowledge represented as a stochastic abstract policy in new RL tasks. Stochastic abstract policies offer an effective way to encode knowledge because the abstraction they provide not only generalizes solutions but also facilitates extracting the similarities among tasks. We perform experiments in a robotic navigation environment and analyze the agent's behavior throughout the learning process and also assess the transfer ratio for different amounts of source tasks. We compare our method with the transfer of a library of policies, and experiments show that the use of a generalized policy produces better results by more effectively guiding the agent when learning a target task.

  11. Evaluation and Policy Learning

    DEFF Research Database (Denmark)

    Borrás, Susana; Højlund, Steven

    2015-01-01

    This article examines how evaluation induces policy learning – a question largely neglected by the scholarly literature on evaluation and policy learning. Following a learner's perspective, the article attempts to ascertain who the learners are, and what, and how, learners actually learn from...... evaluations. In so doing, it focuses on what different types of learners actually learn within the context of the evaluation framework (the set of administrative structures defining the evaluation goals and process). Taking the empirical case of three EU programme evaluations, the patterns of policy learning...... emanating from them are examined. The findings are that only two types of actors involved in the evaluation are actually learning (programme units and external evaluators), that learners learn different things (programme overview, small-scale programme adjustments, policy change and evaluation methods...

  12. Distance learning education for mitigation/adaptation policy: a case study

    Science.gov (United States)

    Slini, T.; Giama, E.; Papadopoulou, Ch.-O.

    2016-02-01

    The efficient training of young environmental scientists has proven to be a challenging goal over the last years, while several dynamic initiatives have been developed aiming to provide complete and consistent education. A successful example is the e-learning course for participants mainly coming from emerging economy countries 'Development of mitigation/adaptation policy portfolios' organised in the frame of the project Promitheas4: Knowledge transfer and research needs for preparing mitigation/adaptation policy portfolios, aiming to provide knowledge transfer, enhance new skills and competencies, using modern didactic approaches and learning technologies. The present paper addresses the experience and the results of these actions, which seem promising and encouraging and were broadly welcomed by the participants.

  13. Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data.

    Science.gov (United States)

    Lewis, F L; Vamvoudakis, Kyriakos G

    2011-02-01

    Approximate dynamic programming (ADP) is a class of reinforcement learning methods that have shown their importance in a variety of applications, including feedback control of dynamical systems. ADP generally requires full information about the system internal states, which is usually not available in practical situations. In this paper, we show how to implement ADP methods using only measured input/output data from the system. Linear dynamical systems with deterministic behavior are considered herein, which are systems of great interest in the control system community. In control system theory, these types of methods are referred to as output feedback (OPFB). The stochastic equivalent of the systems dealt with in this paper is a class of partially observable Markov decision processes. We develop both policy iteration and value iteration algorithms that converge to an optimal controller that requires only OPFB. It is shown that, similar to Q -learning, the new methods have the important advantage that knowledge of the system dynamics is not needed for the implementation of these learning algorithms or for the OPFB control. Only the order of the system, as well as an upper bound on its "observability index," must be known. The learned OPFB controller is in the form of a polynomial autoregressive moving-average controller that has equivalent performance with the optimal state variable feedback gain.

  14. Adaptive Control and Function Projective Synchronization in 2D Discrete-Time Chaotic Systems

    International Nuclear Information System (INIS)

    Li Yin; Chen Yong; Li Biao

    2009-01-01

    This study addresses the adaptive control and function projective synchronization problems between 2D Rulkov discrete-time system and Network discrete-time system. Based on backstepping design with three controllers, a systematic, concrete and automatic scheme is developed to investigate the function projective synchronization of discrete-time chaotic systems. In addition, the adaptive control function is applied to achieve the state synchronization of two discrete-time systems. Numerical results demonstrate the effectiveness of the proposed control scheme.

  15. Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.

    Science.gov (United States)

    Ren, Zhipeng; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Zhipeng Ren; Daoyi Dong; Huaxiong Li; Chunlin Chen; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Ren, Zhipeng

    2018-06-01

    In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.

  16. Reinforcement Learning for Ramp Control: An Analysis of Learning Parameters

    Directory of Open Access Journals (Sweden)

    Chao Lu

    2016-08-01

    Full Text Available Reinforcement Learning (RL has been proposed to deal with ramp control problems under dynamic traffic conditions; however, there is a lack of sufficient research on the behaviour and impacts of different learning parameters. This paper describes a ramp control agent based on the RL mechanism and thoroughly analyzed the influence of three learning parameters; namely, learning rate, discount rate and action selection parameter on the algorithm performance. Two indices for the learning speed and convergence stability were used to measure the algorithm performance, based on which a series of simulation-based experiments were designed and conducted by using a macroscopic traffic flow model. Simulation results showed that, compared with the discount rate, the learning rate and action selection parameter made more remarkable impacts on the algorithm performance. Based on the analysis, some suggestionsabout how to select suitable parameter values that can achieve a superior performance were provided.

  17. Value learning through reinforcement : The basics of dopamine and reinforcement learning

    NARCIS (Netherlands)

    Daw, N.D.; Tobler, P.N.; Glimcher, P.W.; Fehr, E.

    2013-01-01

    This chapter provides an overview of reinforcement learning and temporal difference learning and relates these topics to the firing properties of midbrain dopamine neurons. First, we review the RescorlaWagner learning rule and basic learning phenomena, such as blocking, which the rule explains. Then

  18. Autonomous reinforcement learning with experience replay.

    Science.gov (United States)

    Wawrzyński, Paweł; Tanwani, Ajay Kumar

    2013-05-01

    This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. Copyright © 2012 Elsevier Ltd. All rights reserved.

  19. Feed-in Tariff Pricing and Social Burden in Japan: Evaluating International Learning through a Policy Transfer Approach

    Directory of Open Access Journals (Sweden)

    Yugo Tanaka

    2017-10-01

    Full Text Available Feed-in tariff (FiT policy approaches for renewable energy (RE deployment are employed in many nations around the world. Although FiTs are considered effective in boosting RE deployment, the issue of increasing energy bills and social burden is an often-reported negative impact of their use. The FiT has been employed in Japan since 2012, following after many developed countries, and, as was experienced in other nations, led to a social burden imparted on society significantly higher than initial government estimates. Although policy decision making does not necessarily reflect international policy experience, it is still prudent to ask how international policy experiences of social burden increase were considered within the Japanese approach. In this research, we analyzed the transfer process by adapting a conventional model to develop more objective observations than was previously possible, by setting a benchmark for evaluation based on prior international experiences. We identified two streams of policy transfer, each led by different actors; the government and representatives of the National Diet of Japan (Diet. Both actors were exposed to the same experiences, however the interpretation, application to policy development and priority settings employed were vastly different. Although the framework can only assess policy learning processes, we have found that the government undertook a reasonable and rational process toward learning, while, on the other hand, the modified bill developed by the Diet members did not thoroughly derive learnings in the same way, due to cognitive and political reasons, and specifically, the issue of limiting social burden was not addressed.

  20. A Neuro-Control Design Based on Fuzzy Reinforcement Learning

    DEFF Research Database (Denmark)

    Katebi, S.D.; Blanke, M.

    This paper describes a neuro-control fuzzy critic design procedure based on reinforcement learning. An important component of the proposed intelligent control configuration is the fuzzy credit assignment unit which acts as a critic, and through fuzzy implications provides adjustment mechanisms....... The fuzzy credit assignment unit comprises a fuzzy system with the appropriate fuzzification, knowledge base and defuzzification components. When an external reinforcement signal (a failure signal) is received, sequences of control actions are evaluated and modified by the action applier unit. The desirable...... ones instruct the neuro-control unit to adjust its weights and are simultaneously stored in the memory unit during the training phase. In response to the internal reinforcement signal (set point threshold deviation), the stored information is retrieved by the action applier unit and utilized for re...

  1. Deep Learning Policy Quantization

    NARCIS (Netherlands)

    van de Wolfshaar, Jos; Wiering, Marco; Schomaker, Lambertus

    2018-01-01

    We introduce a novel type of actor-critic approach for deep reinforcement learning which is based on learning vector quantization. We replace the softmax operator of the policy with a more general and more flexible operator that is similar to the robust soft learning vector quantization algorithm.

  2. New approach to equipment quality evaluation method with distinct functions

    Directory of Open Access Journals (Sweden)

    Milisavljević Vladimir M.

    2016-01-01

    Full Text Available The paper presents new approach for improving method for quality evaluation and selection of equipment (devices and machinery by applying distinct functions. Quality evaluation and selection of devices and machinery is a multi-criteria problem which involves the consideration of numerous parameters of various origins. Original selection method with distinct functions is based on technical parameters with arbitrary evaluation of each parameter importance (weighting. Improvement of this method, presented in this paper, addresses the issue of weighting of parameters by using Delphi Method. Finally, two case studies are provided, which included quality evaluation of standard boilers for heating and evaluation of load-haul-dump (LHD machines, to demonstrate applicability of this approach. Analytical Hierarchical Process (AHP is used as a control method.

  3. Study on state grouping and opportunity evaluation for reinforcement learning methods; Kyoka gakushuho no tame no jotai grouping to opportunity hyoka ni kansuru kenkyu

    Energy Technology Data Exchange (ETDEWEB)

    Yu, W.; Yokoi, H.; Kakazu, Y. [Hokkaido University, Sapporo (Japan)

    1997-08-20

    In this paper, we propose the State Grouping scheme for coping with the problem of scaling up the Reinforcement Learning Algorithm to real, large size application. The grouping scheme is based on geographical and trial-error information, and is made up with state generating, state combining, state splitting, state forgetting procedures, with corresponding action selecting module and learning module. Also, we discuss the Labeling Based Evaluation scheme which can evaluate the opportunity of the state-action pair, therefore, use better experience to guide the exploration of the state-space effectively. Incorporating the Labeling Based Evaluation and State Grouping scheme into the Reinforcement Learning Algorithm, we get the approach that can generate organized state space for Reinforcement Learning, and do problem solving as well. We argue that the approach with this kind of ability is necessary for autonomous agent, namely, autonomous agent can not act depending on any pre-defined map, instead, it should search the environment as well as find the optimal problem solution autonomously and simultaneously. By solving the large state-size 3-DOF and 4-link manipulator problem, we show the efficiency of the proposed approach, i.e., the agent can achieve the optimal or sub-optimal path with less memory and less time. 14 refs., 11 figs., 3 tabs.

  4. Intrinsic interactive reinforcement learning - Using error-related potentials for real world human-robot interaction.

    Science.gov (United States)

    Kim, Su Kyoung; Kirchner, Elsa Andrea; Stefes, Arne; Kirchner, Frank

    2017-12-14

    Reinforcement learning (RL) enables robots to learn its optimal behavioral strategy in dynamic environments based on feedback. Explicit human feedback during robot RL is advantageous, since an explicit reward function can be easily adapted. However, it is very demanding and tiresome for a human to continuously and explicitly generate feedback. Therefore, the development of implicit approaches is of high relevance. In this paper, we used an error-related potential (ErrP), an event-related activity in the human electroencephalogram (EEG), as an intrinsically generated implicit feedback (rewards) for RL. Initially we validated our approach with seven subjects in a simulated robot learning scenario. ErrPs were detected online in single trial with a balanced accuracy (bACC) of 91%, which was sufficient to learn to recognize gestures and the correct mapping between human gestures and robot actions in parallel. Finally, we validated our approach in a real robot scenario, in which seven subjects freely chose gestures and the real robot correctly learned the mapping between gestures and actions (ErrP detection (90% bACC)). In this paper, we demonstrated that intrinsically generated EEG-based human feedback in RL can successfully be used to implicitly improve gesture-based robot control during human-robot interaction. We call our approach intrinsic interactive RL.

  5. Effort-Based Reinforcement Processing and Functional Connectivity Underlying Amotivation in Medicated Patients with Depression and Schizophrenia.

    Science.gov (United States)

    Park, Il Ho; Lee, Boung Chul; Kim, Jae-Jin; Kim, Joong Il; Koo, Min-Seung

    2017-04-19

    Amotivation is a common phenotype of major depressive disorder and schizophrenia, which are clinically distinct disorders. Effective treatment targets and strategies can be discovered by examining the dopaminergic reward network function underlying amotivation between these disorders. We conducted an fMRI study in healthy human participants and medicated patients with depression and schizophrenia using an effort-based reinforcement task. We examined regional activations related to reward type (positive and negative reinforcement), effort level, and their composite value, as well as resting-state functional connectivities within the meso-striatal-prefrontal pathway. We found that integrated reward and effort values of low effort-positive reinforcement and high effort-negative reinforcement were behaviorally anticipated and represented in the putamen and medial orbitofrontal cortex activities. Patients with schizophrenia and depression did not show anticipation-related and work-related reaction time reductions, respectively. Greater amotivation severity correlated with smaller work-related putamen activity changes according to reward type in schizophrenia and effort level in depression. Patients with schizophrenia showed feedback-related putamen hyperactivity of low effort compared with healthy controls and depressed patients. The strength of medial orbitofrontal-striatal functional connectivity predicted work-related reaction time reduction of high effort negative reinforcement in healthy controls and amotivation severity in both patients with schizophrenia and those with depression. Patients with depression showed deficient medial orbitofrontal-striatal functional connectivity compared with healthy controls and patients with schizophrenia. These results indicate that amotivation in depression and schizophrenia involves different pathophysiology in the prefrontal-striatal circuitry. SIGNIFICANCE STATEMENT Amotivation is present in both depression and schizophrenia

  6. Reinforcement learning improves behaviour from evaluative feedback

    Science.gov (United States)

    Littman, Michael L.

    2015-05-01

    Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.

  7. Event-Triggered Distributed Control of Nonlinear Interconnected Systems Using Online Reinforcement Learning With Exploration.

    Science.gov (United States)

    Narayanan, Vignesh; Jagannathan, Sarangapani

    2017-09-07

    In this paper, a distributed control scheme for an interconnected system composed of uncertain input affine nonlinear subsystems with event triggered state feedback is presented by using a novel hybrid learning scheme-based approximate dynamic programming with online exploration. First, an approximate solution to the Hamilton-Jacobi-Bellman equation is generated with event sampled neural network (NN) approximation and subsequently, a near optimal control policy for each subsystem is derived. Artificial NNs are utilized as function approximators to develop a suite of identifiers and learn the dynamics of each subsystem. The NN weight tuning rules for the identifier and event-triggering condition are derived using Lyapunov stability theory. Taking into account, the effects of NN approximation of system dynamics and boot-strapping, a novel NN weight update is presented to approximate the optimal value function. Finally, a novel strategy to incorporate exploration in online control framework, using identifiers, is introduced to reduce the overall cost at the expense of additional computations during the initial online learning phase. System states and the NN weight estimation errors are regulated and local uniformly ultimately bounded results are achieved. The analytical results are substantiated using simulation studies.

  8. Relevance of normative values for functional capacity evaluation

    NARCIS (Netherlands)

    Soer, R.; Van Der Schans, C.; Geertzen, J.; Groothoff, J.; Brouwer, Sandra; Dijkstra, P.; Reneman, M.

    2009-01-01

    Background: Functional Capacity Evaluations (FCEs) are evaluations designed to measure capacity to perform activities and are used to make recommendations for participation in work. Normative values of healthy working subjects' performances are unavailable, thus patients' performances cannot be

  9. Normative values for a functional capacity evaluation.

    Science.gov (United States)

    Soer, Remko; van der Schans, Cees P; Geertzen, Jan H; Groothoff, Johan W; Brouwer, Sandra; Dijkstra, Pieter U; Reneman, Michiel F

    2009-10-01

    Soer R, van der Schans CP, Geertzen JH, Groothoff JW, Brouwer S, Dijkstra PU, Reneman MF. Normative values for a functional capacity evaluation. To establish normative values for a functional capacity evaluation (FCE) of healthy working subjects. Descriptive. Rehabilitation center. Healthy working subjects (N=701; 448 men, 253 women) between 20 and 60 years of age, working in more than 180 occupations. Subjects performed a 2-hour FCE consisting of 12 work-related tests. Subjects were classified into categories based on physical demands according to the Dictionary of Occupational Titles. Means, ranges, SDs, and percentiles were provided for normative values of FCE, and a regression analysis for outcome of the 12 tests was performed. Normative FCE values were established for 4 physical demand categories. The normative values enable comparison of patients' performances to these values. If a patient's performance exceeds the lowest scores in his/her corresponding demand category, then the patient's capacity is very likely to be sufficient to meet the workload. Further, clinicians can make more precise return-to-work recommendations and set goals for rehabilitation programs. A comparison of the normative values can be useful to the fields of rehabilitation, occupational, and insurance medicine. Further research is needed to test the validity of the normative values with respect to workplace assessments and return-to-work recommendations.

  10. Normative Values for a Functional Capacity Evaluation

    NARCIS (Netherlands)

    Soer, Remko; van der Schans, Cees P.; Geertzen, Jan H.; Groothoff, Johan W.; Brouwer, Sandra; Dijkstra, Pieter U.; Reneman, Michiel F.

    2009-01-01

    Objective: To establish normative values for a functional capacity evaluation (FCE) of healthy working subjects. Design: Descriptive. Setting: Rehabilitation center. Participants: Healthy working subjects (N=701; 448 men, 253 women) between 20 and 60 years of age, working in more than 180

  11. Algebraic and adaptive learning in neural control systems

    Science.gov (United States)

    Ferrari, Silvia

    A systematic approach is developed for designing adaptive and reconfigurable nonlinear control systems that are applicable to plants modeled by ordinary differential equations. The nonlinear controller comprising a network of neural networks is taught using a two-phase learning procedure realized through novel techniques for initialization, on-line training, and adaptive critic design. A critical observation is that the gradients of the functions defined by the neural networks must equal corresponding linear gain matrices at chosen operating points. On-line training is based on a dual heuristic adaptive critic architecture that improves control for large, coupled motions by accounting for actual plant dynamics and nonlinear effects. An action network computes the optimal control law; a critic network predicts the derivative of the cost-to-go with respect to the state. Both networks are algebraically initialized based on prior knowledge of satisfactory pointwise linear controllers and continue to adapt on line during full-scale simulations of the plant. On-line training takes place sequentially over discrete periods of time and involves several numerical procedures. A backpropagating algorithm called Resilient Backpropagation is modified and successfully implemented to meet these objectives, without excessive computational expense. This adaptive controller is as conservative as the linear designs and as effective as a global nonlinear controller. The method is successfully implemented for the full-envelope control of a six-degree-of-freedom aircraft simulation. The results show that the on-line adaptation brings about improved performance with respect to the initialization phase during aircraft maneuvers that involve large-angle and coupled dynamics, and parameter variations.

  12. Functional neuroimaging of emotional learning and autonomic reactions.

    Science.gov (United States)

    Peper, Martin; Herpers, Martin; Spreer, Joachim; Hennig, Jürgen; Zentner, Josef

    2006-06-01

    This article provides a selective overview of the functional neuroimaging literature with an emphasis on emotional activation processes. Emotions are fast and flexible response systems that provide basic tendencies for adaptive action. From the range of involved component functions, we first discuss selected automatic mechanisms that control basic adaptational changes. Second, we illustrate how neuroimaging work has contributed to the mapping of the network components associated with basic emotion families (fear, anger, disgust, happiness), and secondary dimensional concepts that organise the meaning space for subjective experience and verbal labels (emotional valence, activity/intensity, approach/withdrawal, etc.). Third, results and methodological difficulties are discussed in view of own neuroimaging experiments that investigated the component functions involved in emotional learning. The amygdala, prefrontal cortex, and striatum form a network of reciprocal connections that show topographically distinct patterns of activity as a correlate of up and down regulation processes during an emotional episode. Emotional modulations of other brain systems have attracted recent research interests. Emotional neuroimaging calls for more representative designs that highlight the modulatory influences of regulation strategies and socio-cultural factors responsible for inhibitory control and extinction. We conclude by emphasising the relevance of the temporal process dynamics of emotional activations that may provide improved prediction of individual differences in emotionality.

  13. A Novel adaptative Discrete Cuckoo Search Algorithm for parameter optimization in computer vision

    Directory of Open Access Journals (Sweden)

    loubna benchikhi

    2017-10-01

    Full Text Available Computer vision applications require choosing operators and their parameters, in order to provide the best outcomes. Often, the users quarry on expert knowledge and must experiment many combinations to find manually the best one. As performance, time and accuracy are important, it is necessary to automate parameter optimization at least for crucial operators. In this paper, a novel approach based on an adaptive discrete cuckoo search algorithm (ADCS is proposed. It automates the process of algorithms’ setting and provides optimal parameters for vision applications. This work reconsiders a discretization problem to adapt the cuckoo search algorithm and presents the procedure of parameter optimization. Some experiments on real examples and comparisons to other metaheuristic-based approaches: particle swarm optimization (PSO, reinforcement learning (RL and ant colony optimization (ACO show the efficiency of this novel method.

  14. Adaptive filters and internal models: multilevel description of cerebellar function.

    Science.gov (United States)

    Porrill, John; Dean, Paul; Anderson, Sean R

    2013-11-01

    Cerebellar function is increasingly discussed in terms of engineering schemes for motor control and signal processing that involve internal models. To address the relation between the cerebellum and internal models, we adopt the chip metaphor that has been used to represent the combination of a homogeneous cerebellar cortical microcircuit with individual microzones having unique external connections. This metaphor indicates that identifying the function of a particular cerebellar chip requires knowledge of both the general microcircuit algorithm and the chip's individual connections. Here we use a popular candidate algorithm as embodied in the adaptive filter, which learns to decorrelate its inputs from a reference ('teaching', 'error') signal. This algorithm is computationally powerful enough to be used in a very wide variety of engineering applications. However, the crucial issue is whether the external connectivity required by such applications can be implemented biologically. We argue that some applications appear to be in principle biologically implausible: these include the Smith predictor and Kalman filter (for state estimation), and the feedback-error-learning scheme for adaptive inverse control. However, even for plausible schemes, such as forward models for noise cancellation and novelty-detection, and the recurrent architecture for adaptive inverse control, there is unlikely to be a simple mapping between microzone function and internal model structure. This initial analysis suggests that cerebellar involvement in particular behaviours is therefore unlikely to have a neat classification into categories such as 'forward model'. It is more likely that cerebellar microzones learn a task-specific adaptive-filter operation which combines a number of signal-processing roles. Copyright © 2012 Elsevier Ltd. All rights reserved.

  15. An adaptive-learning approach to affect regulation: strategic influences on evaluative priming.

    Science.gov (United States)

    Freytag, Peter; Bluemke, Matthias; Fiedler, Klaus

    2011-04-01

    An adaptive cognition approach to evaluative priming is not compatible with the view that the entire process is automatically determined by prime stimulus valence alone. In addition to the evaluative congruity of individual prime-target pairs, an adaptive regulation function should be sensitive to the base rates of positive and negative stimuli as well as to the perceived contingency between prime and target valence. The present study was particularly concerned with pseudocontingent inferences that offer a proxy for the assessment of contingencies from degraded or incomplete stimulus input. As expected, response latencies were shorter for the more prevalent target valence and for evaluatively congruent trials. However, crucially, the congruity effect was eliminated and overridden by pseudocontingencies inferred from the stimulus environment. These strategic inferences were further enhanced when the task called for the evaluation of both prime stimuli and target stimuli. © 2011 Psychology Press, an imprint of the Taylor & Francis Group, an Informa business

  16. Switching Adaptability in Human-Inspired Sidesteps: A Minimal Model.

    Science.gov (United States)

    Fujii, Keisuke; Yoshihara, Yuki; Tanabe, Hiroko; Yamamoto, Yuji

    2017-01-01

    Humans can adapt to abruptly changing situations by coordinating redundant components, even in bipedality. Conventional adaptability has been reproduced by various computational approaches, such as optimal control, neural oscillator, and reinforcement learning; however, the adaptability in bipedal locomotion necessary for biological and social activities, such as unpredicted direction change in chase-and-escape, is unknown due to the dynamically unstable multi-link closed-loop system. Here we propose a switching adaptation model for performing bipedal locomotion by improving autonomous distributed control, where autonomous actuators interact without central control and switch the roles for propulsion, balancing, and leg swing. Our switching mobility model achieved direction change at any time using only three actuators, although it showed higher motor costs than comparable models without direction change. Our method of evaluating such adaptation at any time should be utilized as a prerequisite for understanding universal motor control. The proposed algorithm may simply explain and predict the adaptation mechanism in human bipedality to coordinate the actuator functions within and between limbs.

  17. An Innovative Approach to Control Steel Reinforcement Corrosion by Self-Healing

    Directory of Open Access Journals (Sweden)

    Dessi A. Koleva

    2018-02-01

    Full Text Available The corrosion of reinforced steel, and subsequent reinforced concrete degradation, is a major concern for infrastructure durability. New materials with specific, tailor-made properties or the establishment of optimum construction regimes are among the many approaches to improving civil structure performance. Ideally, novel materials would carry self-repairing or self-healing capacities, triggered in the event of detrimental influence and/or damage. Controlling or altering a material’s behavior at the nano-level would result in traditional materials with radically enhanced properties. Nevertheless, nanotechnology applications are still rare in construction, and would break new ground in engineering practice. An approach to controlling the corrosion-related degradation of reinforced concrete was designed as a synergetic action of electrochemistry, cement chemistry and nanotechnology. This contribution presents the concept of the approach, namely to simultaneously achieve steel corrosion resistance and improved bulk matrix properties. The technical background and challenges for the application of polymeric nanomaterials in the field are briefly outlined in view of this concept, which has the added value of self-healing. The credibility of the approach is discussed with reference to previously reported outcomes, and is illustrated via the results of the steel electrochemical responses and microscopic evaluations of the discussed materials.

  18. An Innovative Approach to Control Steel Reinforcement Corrosion by Self-Healing

    Science.gov (United States)

    2018-01-01

    The corrosion of reinforced steel, and subsequent reinforced concrete degradation, is a major concern for infrastructure durability. New materials with specific, tailor-made properties or the establishment of optimum construction regimes are among the many approaches to improving civil structure performance. Ideally, novel materials would carry self-repairing or self-healing capacities, triggered in the event of detrimental influence and/or damage. Controlling or altering a material’s behavior at the nano-level would result in traditional materials with radically enhanced properties. Nevertheless, nanotechnology applications are still rare in construction, and would break new ground in engineering practice. An approach to controlling the corrosion-related degradation of reinforced concrete was designed as a synergetic action of electrochemistry, cement chemistry and nanotechnology. This contribution presents the concept of the approach, namely to simultaneously achieve steel corrosion resistance and improved bulk matrix properties. The technical background and challenges for the application of polymeric nanomaterials in the field are briefly outlined in view of this concept, which has the added value of self-healing. The credibility of the approach is discussed with reference to previously reported outcomes, and is illustrated via the results of the steel electrochemical responses and microscopic evaluations of the discussed materials. PMID:29461495

  19. Flight Test Approach to Adaptive Control Research

    Science.gov (United States)

    Pavlock, Kate Maureen; Less, James L.; Larson, David Nils

    2011-01-01

    The National Aeronautics and Space Administration s Dryden Flight Research Center completed flight testing of adaptive controls research on a full-scale F-18 testbed. The validation of adaptive controls has the potential to enhance safety in the presence of adverse conditions such as structural damage or control surface failures. This paper describes the research interface architecture, risk mitigations, flight test approach and lessons learned of adaptive controls research.

  20. Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems.

    Science.gov (United States)

    Wei, Qinglai; Liu, Derong; Lin, Hanquan

    2016-03-01

    In this paper, a value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon undiscounted optimal control problems for discrete-time nonlinear systems. The present value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize the algorithm. A novel convergence analysis is developed to guarantee that the iterative value function converges to the optimal performance index function. Initialized by different initial functions, it is proven that the iterative value function will be monotonically nonincreasing, monotonically nondecreasing, or nonmonotonic and will converge to the optimum. In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms. It is emphasized that new termination criteria are established to guarantee the effectiveness of the iterative control laws. Neural networks are used to approximate the iterative value function and compute the iterative control law, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.

  1. Reinforcement Learning Based Artificial Immune Classifier

    Directory of Open Access Journals (Sweden)

    Mehmet Karakose

    2013-01-01

    Full Text Available One of the widely used methods for classification that is a decision-making process is artificial immune systems. Artificial immune systems based on natural immunity system can be successfully applied for classification, optimization, recognition, and learning in real-world problems. In this study, a reinforcement learning based artificial immune classifier is proposed as a new approach. This approach uses reinforcement learning to find better antibody with immune operators. The proposed new approach has many contributions according to other methods in the literature such as effectiveness, less memory cell, high accuracy, speed, and data adaptability. The performance of the proposed approach is demonstrated by simulation and experimental results using real data in Matlab and FPGA. Some benchmark data and remote image data are used for experimental results. The comparative results with supervised/unsupervised based artificial immune system, negative selection classifier, and resource limited artificial immune classifier are given to demonstrate the effectiveness of the proposed new method.

  2. Quantum-enhanced reinforcement learning for finite-episode games with discrete state spaces

    Science.gov (United States)

    Neukart, Florian; Von Dollen, David; Seidel, Christian; Compostella, Gabriele

    2017-12-01

    Quantum annealing algorithms belong to the class of metaheuristic tools, applicable for solving binary optimization problems. Hardware implementations of quantum annealing, such as the quantum annealing machines produced by D-Wave Systems, have been subject to multiple analyses in research, with the aim of characterizing the technology's usefulness for optimization and sampling tasks. Here, we present a way to partially embed both Monte Carlo policy iteration for finding an optimal policy on random observations, as well as how to embed n sub-optimal state-value functions for approximating an improved state-value function given a policy for finite horizon games with discrete state spaces on a D-Wave 2000Q quantum processing unit (QPU). We explain how both problems can be expressed as a quadratic unconstrained binary optimization (QUBO) problem, and show that quantum-enhanced Monte Carlo policy evaluation allows for finding equivalent or better state-value functions for a given policy with the same number episodes compared to a purely classical Monte Carlo algorithm. Additionally, we describe a quantum-classical policy learning algorithm. Our first and foremost aim is to explain how to represent and solve parts of these problems with the help of the QPU, and not to prove supremacy over every existing classical policy evaluation algorithm.

  3. Adaptive dynamic programming with applications in optimal control

    CERN Document Server

    Liu, Derong; Wang, Ding; Yang, Xiong; Li, Hongliang

    2017-01-01

    This book covers the most recent developments in adaptive dynamic programming (ADP). The text begins with a thorough background review of ADP making sure that readers are sufficiently familiar with the fundamentals. In the core of the book, the authors address first discrete- and then continuous-time systems. Coverage of discrete-time systems starts with a more general form of value iteration to demonstrate its convergence, optimality, and stability with complete and thorough theoretical analysis. A more realistic form of value iteration is studied where value function approximations are assumed to have finite errors. Adaptive Dynamic Programming also details another avenue of the ADP approach: policy iteration. Both basic and generalized forms of policy-iteration-based ADP are studied with complete and thorough theoretical analysis in terms of convergence, optimality, stability, and error bounds. Among continuous-time systems, the control of affine and nonaffine nonlinear systems is studied using the ADP app...

  4. Universal approximators for multi-objective direct policy search in water reservoir management problems: a comparative analysis

    Science.gov (United States)

    Giuliani, Matteo; Mason, Emanuele; Castelletti, Andrea; Pianosi, Francesca

    2014-05-01

    The optimal operation of water resources systems is a wide and challenging problem due to non-linearities in the model and the objectives, high dimensional state-control space, and strong uncertainties in the hydroclimatic regimes. The application of classical optimization techniques (e.g., SDP, Q-learning, gradient descent-based algorithms) is strongly limited by the dimensionality of the system and by the presence of multiple, conflicting objectives. This study presents a novel approach which combines Direct Policy Search (DPS) and Multi-Objective Evolutionary Algorithms (MOEAs) to solve high-dimensional state and control space problems involving multiple objectives. DPS, also known as parameterization-simulation-optimization in the water resources literature, is a simulation-based approach where the reservoir operating policy is first parameterized within a given family of functions and, then, the parameters optimized with respect to the objectives of the management problem. The selection of a suitable class of functions to which the operating policy belong to is a key step, as it might restrict the search for the optimal policy to a subspace of the decision space that does not include the optimal solution. In the water reservoir literature, a number of classes have been proposed. However, many of these rules are based largely on empirical or experimental successes and they were designed mostly via simulation and for single-purpose reservoirs. In a multi-objective context similar rules can not easily inferred from the experience and the use of universal function approximators is generally preferred. In this work, we comparatively analyze two among the most common universal approximators: artificial neural networks (ANN) and radial basis functions (RBF) under different problem settings to estimate their scalability and flexibility in dealing with more and more complex problems. The multi-purpose HoaBinh water reservoir in Vietnam, accounting for hydropower

  5. Reinforcement function design and bias for efficient learning in mobile robots

    International Nuclear Information System (INIS)

    Touzet, C.; Santos, J.M.

    1998-01-01

    The main paradigm in sub-symbolic learning robot domain is the reinforcement learning method. Various techniques have been developed to deal with the memorization/generalization problem, demonstrating the superior ability of artificial neural network implementations. In this paper, the authors address the issue of designing the reinforcement so as to optimize the exploration part of the learning. They also present and summarize works relative to the use of bias intended to achieve the effective synthesis of the desired behavior. Demonstrative experiments involving a self-organizing map implementation of the Q-learning and real mobile robots (Nomad 200 and Khepera) in a task of obstacle avoidance behavior synthesis are described. 3 figs., 5 tabs

  6. Evolutionary and adaptive learning in complex markets: a brief summary

    Science.gov (United States)

    Hommes, Cars H.

    2007-06-01

    We briefly review some work on expectations and learning in complex markets, using the familiar demand-supply cobweb model. We discuss and combine two different approaches on learning. According to the adaptive learning approach, agents behave as econometricians using time series observations to form expectations, and update the parameters as more observations become available. This approach has become popular in macro. The second approach has an evolutionary flavor and is sometimes referred to as reinforcement learning. Agents employ different forecasting strategies and evaluate these strategies based upon a fitness measure, e.g. past realized profits. In this framework, boundedly rational agents switch between different, but fixed behavioral rules. This approach has become popular in finance. We combine evolutionary and adaptive learning to model complex markets and discuss whether this theory can match empirical facts and forecasting behavior in laboratory experiments with human subjects.

  7. [Ecological executive function characteristics and effects of executive function on social adaptive function in school-aged children with epilepsy].

    Science.gov (United States)

    Xu, X J; Wang, L L; Zhou, N

    2016-02-23

    To explore the characteristics of ecological executive function in school-aged children with idiopathic or probably symptomatic epilepsy and examine the effects of executive function on social adaptive function. A total of 51 school-aged children with idiopathic or probably symptomatic epilepsy aged 5-12 years at our hospital and 37 normal ones of the same gender, age and educational level were included. The differences in ecological executive function and social adaptive function were compared between the two groups with the Behavior Rating Inventory of Executive Function (BRIEF) and Child Adaptive Behavior Scale, the Pearson's correlation test and multiple stepwise linear regression were used to explore the impact of executive function on social adaptive function. The scores of school-aged children with idiopathic or probably symptomatic epilepsy in global executive composite (GEC), behavioral regulation index (BRI) and metacognition index (MI) of BRIEF ((62±12), (58±13) and (63±12), respectively) were significantly higher than those of the control group ((47±7), (44±6) and (48±8), respectively))(Pchildren with idiopathic or probably symptomatic epilepsy in adaptive behavior quotient (ADQ), independence, cognition, self-control ((86±22), (32±17), (49±14), (41±16), respectively) were significantly lower than those of the control group ((120±12), (59±14), (59±7) and (68±10), respectively))(Pchildren with idiopathic or probably symptomatic epilepsy. School-aged children with idiopathic or probably symptomatic epilepsy may have significantly ecological executive function impairment and social adaptive function reduction. The aspects of BRI, inhibition and working memory in ecological executive function are significantly related with social adaptive function in school-aged children with epilepsy.

  8. Analysing innovation policy indicators through a functional approach: the aeronautic industry case

    Energy Technology Data Exchange (ETDEWEB)

    Haddad, C.R.; Uriona Maldonado, M.

    2016-07-01

    Developing countries face different problems than developed countries and the use of the same indicator to evaluate and compare both regions can lead to misleading conclusions. Traditional indicators, such as R&D and patents may not capture the whole dynamic of a system, as they are used to compare systems focusing on its current structure. Many authors have been discussing the processes underlying industry transformation, innovation, and economic growth to access a system performance, i.e. the functions of innovation systems. Therefore, the purpose of this paper is to analyze these functions as indicators to measure the performance of the system in order to identify policy issues. In order to do that, we analyze the case of the aeronautic sectoral system of innovation of a region in Brazil. The functional approach helped us to better capture the dynamic of the system, by not restricting our analysis to the system’s structure. (Author)

  9. 21 CFR 1401.2 - The Office of National Drug Control Policy-organization and functions.

    Science.gov (United States)

    2010-04-01

    ... 21 Food and Drugs 9 2010-04-01 2010-04-01 false The Office of National Drug Control Policy-organization and functions. 1401.2 Section 1401.2 Food and Drugs OFFICE OF NATIONAL DRUG CONTROL POLICY PUBLIC AVAILABILITY OF INFORMATION § 1401.2 The Office of National Drug Control Policy—organization and functions. (a) The Office of National Drug...

  10. [Social learning as an uncertainty-reduction strategy: an adaptationist approach].

    Science.gov (United States)

    Nakanishi, Daisuke; Kameda, Tatsuya; Shinada, Mizuho

    2003-04-01

    Social learning is an effective mechanism to reduce uncertainty about environmental knowledge, helping individuals adopt an adaptive behavior in the environment at small cost. Although this is evident for learning about temporally stable targets (e.g., acquiring avoidance of toxic foods culturally), the functional value of social learning in a temporally unstable environment is less clear; knowledge acquired by social learning may be outdated. This paper addressed adaptive values of social learning in a non-stationary environment empirically. When individual learning about the non-stationary environment is costly, a hawk-dove-game-like equilibrium is expected to emerge in the population, where members who engage in costly individual learning and members who skip the information search and free-ride on other members' search efforts coexist at a stable ratio. Such a "producer-scrounger" structure should qualify effectiveness of social/cultural learning severely, especially "conformity bias" when using social information (Boyd & Richerson, 1985). We tested these predictions by an experiment implementing a non-stationary uncertain environment in a laboratory. The results supported our thesis. Implications of these findings and some future directions were discussed.

  11. Feeding, evaluating, and controlling rumen function.

    Science.gov (United States)

    Lean, Ian J; Golder, Helen M; Hall, Mary Beth

    2014-11-01

    Achieving optimal rumen function requires an understanding of feeds and systems of nutritional evaluation. Key influences on optimal function include achieving good dry matter intake. The function of feeds in the rumen depends on other factors including chemical composition, rate of passage, degradation rate of the feed, availability of other substrates and cofactors, and individual animal variation. This article discusses carbohydrate, protein, and fat metabolism in the rumen, and provides practical means of evaluation of rations in the field. Conditions under which rumen function is suboptimal (ie, acidosis and bloat) are discussed, and methods for control examined. Copyright © 2014 Elsevier Inc. All rights reserved.

  12. Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

    Science.gov (United States)

    Uchibe, Eiji; Doya, Kenji

    2008-12-01

    Understanding the design principle of reward functions is a substantial challenge both in artificial intelligence and neuroscience. Successful acquisition of a task usually requires not only rewards for goals, but also for intermediate states to promote effective exploration. This paper proposes a method for designing 'intrinsic' rewards of autonomous agents by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use Cyber Rodent robots, in which collision avoidance, recharging from battery packs, and 'mating' by software reproduction are three major 'extrinsic' rewards. We show in hardware experiments that the robots can find appropriate 'intrinsic' rewards for the vision of battery packs and other robots to promote approach behaviors.

  13. Knowing How Good Our Searches Are: An Approach Derived from Search Filter Development Methodology

    Directory of Open Access Journals (Sweden)

    Sarah Hayman

    2015-12-01

    Full Text Available Objective – Effective literature searching is of paramount importance in supporting evidence based practice, research, and policy. Missed references can have adverse effects on outcomes. This paper reports on the development and evaluation of an online learning resource, designed for librarians and other interested searchers, presenting an evidence based approach to enhancing and testing literature searches. Methods – We developed and evaluated the set of free online learning modules for librarians called Smart Searching, suggesting the use of techniques derived from search filter development undertaken by the CareSearch Palliative Care Knowledge Network and its associated project Flinders Filters. The searching module content has been informed by the processes and principles used in search filter development. The self-paced modules are intended to help librarians and other interested searchers test the effectiveness of their literature searches, provide evidence of search performance that can be used to improve searches, as well as to evaluate and promote searching expertise. Each module covers one of four techniques, or core principles, employed in search filter development: (1 collaboration with subject experts; (2 use of a reference sample set; (3 term identification through frequency analysis; and (4 iterative testing. Evaluation of the resource comprised ongoing monitoring of web analytics to determine factors such as numbers of users and geographic origin; a user survey conducted online elicited qualitative information about the usefulness of the resource. Results – The resource was launched in May 2014. Web analytics show over 6,000 unique users from 101 countries (at 9 August 2015. Responses to the survey (n=50 indicated that 80% would recommend the resource to a colleague. Conclusions – An evidence based approach to searching, derived from search filter development methodology, has been shown to have value as an online learning

  14. An efficient scenario-based and fuzzy self-adaptive learning particle swarm optimization approach for dynamic economic emission dispatch considering load and wind power uncertainties

    International Nuclear Information System (INIS)

    Bahmani-Firouzi, Bahman; Farjah, Ebrahim; Azizipanah-Abarghooee, Rasoul

    2013-01-01

    Renewable energy resources such as wind power plants are playing an ever-increasing role in power generation. This paper extends the dynamic economic emission dispatch problem by incorporating wind power plant. This problem is a multi-objective optimization approach in which total electrical power generation costs and combustion emissions are simultaneously minimized over a short-term time span. A stochastic approach based on scenarios is suggested to model the uncertainty associated with hourly load and wind power forecasts. A roulette wheel technique on the basis of probability distribution functions of load and wind power is implemented to generate scenarios. As a result, the stochastic nature of the suggested problem is emancipated by decomposing it into a set of equivalent deterministic problem. An improved multi-objective particle swarm optimization algorithm is applied to obtain the best expected solutions for the proposed stochastic programming framework. To enhance the overall performance and effectiveness of the particle swarm optimization, a fuzzy adaptive technique, θ-search and self-adaptive learning strategy for velocity updating are used to tune the inertia weight factor and to escape from local optima, respectively. The suggested algorithm goes through the search space in the polar coordinates instead of the Cartesian one; whereby the feasible space is more compact. In order to evaluate the efficiency and feasibility of the suggested framework, it is applied to two test systems with small and large scale characteristics. - Highlights: ► Formulates multi-objective DEED problem under a stochastic programming framework. ► Considers uncertainties related to forecasted values of load demand and wind power. ► Proposes an interactive fuzzy satisfying method based on the novel FSALPSO. ► Presents a new self-adaptive learning strategy to improve original PSO algorithm

  15. Information search with situation-specific reward functions

    Directory of Open Access Journals (Sweden)

    Bjorn Meder

    2012-03-01

    Full Text Available can strongly conflict with the goal of obtaining information for improving payoffs. Two environments with such a conflict were identified through computer optimization. Three subsequent experiments investigated people's search behavior in these environments. Experiments 1 and 2 used a multiple-cue probabilistic category-learning task to convey environmental probabilities. In a subsequent search task subjects could query only a single feature before making a classification decision. The crucial manipulation concerned the search-task reward structure. The payoffs corresponded either to accuracy, with equal rewards associated with the two categories, or to an asymmetric payoff function, with different rewards associated with each category. In Experiment 1, in which learning-task feedback corresponded to the true category, people later preferentially searched the accuracy-maximizing feature, whether or not this would improve monetary rewards. In Experiment 2, an asymmetric reward structure was used during learning. Subjects searched the reward-maximizing feature when asymmetric payoffs were preserved in the search task. However, if search-task payoffs corresponded to accuracy, subjects preferentially searched a feature that was suboptimal for reward and accuracy alike. Importantly, this feature would have been most useful, under the learning-task payoff structure. Experiment 3 found that, if words and numbers are used to convey environmental probabilities, neither reward nor accuracy consistently predicts search. These findings emphasize the necessity of taking into account people's goals and search-and-decision processes during learning, thereby challenging current models of information search.

  16. Cognitive functions in drivers with brain injury : Anticipation and adaption

    OpenAIRE

    Lundqvist, Anna

    2001-01-01

    The purpose of this thesis was to improve the understanding of what cognitive functions are important for driving performance, investigate the impact of impaired cognitive functions on drivers with brain injury, and study adaptation strategies relevant for driving performance after brain injury. Finally, the predictive value of a neuropsychological test battery was evaluated for driving performance. Main results can be summarized in the following conclusions: (a) Cognitive functions in terms ...

  17. Online reinforcement learning control for aerospace systems

    NARCIS (Netherlands)

    Zhou, Y.

    2018-01-01

    Reinforcement Learning (RL) methods are relatively new in the field of aerospace guidance, navigation, and control. This dissertation aims to exploit RL methods to improve the autonomy and online learning of aerospace systems with respect to the a priori unknown system and environment, dynamical

  18. Spared internal but impaired external reward prediction error signals in major depressive disorder during reinforcement learning.

    Science.gov (United States)

    Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris

    2017-01-01

    Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.

  19. A policy iteration approach to online optimal control of continuous-time constrained-input systems.

    Science.gov (United States)

    Modares, Hamidreza; Naghibi Sistani, Mohammad-Bagher; Lewis, Frank L

    2013-09-01

    This paper is an effort towards developing an online learning algorithm to find the optimal control solution for continuous-time (CT) systems subject to input constraints. The proposed method is based on the policy iteration (PI) technique which has recently evolved as a major technique for solving optimal control problems. Although a number of online PI algorithms have been developed for CT systems, none of them take into account the input constraints caused by actuator saturation. In practice, however, ignoring these constraints leads to performance degradation or even system instability. In this paper, to deal with the input constraints, a suitable nonquadratic functional is employed to encode the constraints into the optimization formulation. Then, the proposed PI algorithm is implemented on an actor-critic structure to solve the Hamilton-Jacobi-Bellman (HJB) equation associated with this nonquadratic cost functional in an online fashion. That is, two coupled neural network (NN) approximators, namely an actor and a critic are tuned online and simultaneously for approximating the associated HJB solution and computing the optimal control policy. The critic is used to evaluate the cost associated with the current policy, while the actor is used to find an improved policy based on information provided by the critic. Convergence to a close approximation of the HJB solution as well as stability of the proposed feedback control law are shown. Simulation results of the proposed method on a nonlinear CT system illustrate the effectiveness of the proposed approach. Copyright © 2013 ISA. All rights reserved.

  20. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

    OpenAIRE

    He, Frank S.; Liu, Yang; Schwing, Alexander G.; Peng, Jian

    2016-01-01

    We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and...

  1. Value function in economic growth model

    Science.gov (United States)

    Bagno, Alexander; Tarasyev, Alexandr A.; Tarasyev, Alexander M.

    2017-11-01

    Properties of the value function are examined in an infinite horizon optimal control problem with an unlimited integrand index appearing in the quality functional with a discount factor. Optimal control problems of such type describe solutions in models of economic growth. Necessary and sufficient conditions are derived to ensure that the value function satisfies the infinitesimal stability properties. It is proved that value function coincides with the minimax solution of the Hamilton-Jacobi equation. Description of the growth asymptotic behavior for the value function is provided for the logarithmic, power and exponential quality functionals and an example is given to illustrate construction of the value function in economic growth models.

  2. Normative Functional Performance Values in High School Athletes: The Functional Pre-Participation Evaluation Project.

    Science.gov (United States)

    Onate, James A; Starkel, Cambrie; Clifton, Daniel R; Best, Thomas M; Borchers, James; Chaudhari, Ajit; Comstock, R Dawn; Cortes, Nelson; Grooms, Dustin R; Hertel, Jay; Hewett, Timothy E; Miller, Meghan Maume; Pan, Xueliang; Schussler, Eric; Van Lunen, Bonnie L

    2018-01-01

      The fourth edition of the Preparticipation Physical Evaluation recommends functional testing for the musculoskeletal portion of the examination; however, normative data across sex and grade level are limited. Establishing normative data can provide clinicians reference points with which to compare their patients, potentially aiding in the development of future injury-risk assessments and injury-mitigation programs.   To establish normative functional performance and limb-symmetry data for high school-aged male and female athletes in the United States.   Cross-sectional study.   Athletic training facilities and gymnasiums across the United States.   A total of 3951 male and female athletes who participated on high school-sponsored basketball, football, lacrosse, or soccer teams enrolled in this nationwide study.   Functional performance testing consisted of 3 evaluations. Ankle-joint range of motion, balance, and lower extremity muscular power and landing control were assessed via the weight-bearing ankle-dorsiflexion-lunge, single-legged anterior-reach, and anterior single-legged hop-for-distance (SLHOP) tests, respectively. We used 2-way analyses of variance and χ 2 analyses to examine the effects of sex and grade level on ankle-dorsiflexion-lunge, single-legged anterior-reach, and SLHOP test performance and symmetry.   The SLHOP performance differed between sexes (males = 187.8% ± 33.1% of limb length, females = 157.5% ± 27.8% of limb length; t = 30.3, P performance. We observed differences for SLHOP and ankle-dorsiflexion-lunge performance among grade levels, but these differences were not clinically meaningful.   We demonstrated differences in normative data for lower extremity functional performance during preparticipation physical evaluations across sex and grade levels. The results of this study will allow clinicians to compare sex- and grade-specific functional performances and implement approaches for preventing musculoskeletal

  3. Facilitating tolerance of delayed reinforcement during functional communication training.

    Science.gov (United States)

    Fisher, W W; Thompson, R H; Hagopian, L P; Bowman, L G; Krug, A

    2000-01-01

    Few clinical investigations have addressed the problem of delayed reinforcement. In this investigation, three individuals whose destructive behavior was maintained by positive reinforcement were treated using functional communication training (FCT) with extinction (EXT). Next, procedures used in the basic literature on delayed reinforcement and self-control (reinforcer delay fading, punishment of impulsive responding, and provision of an alternative activity during reinforcer delay) were used to teach participants to tolerate delayed reinforcement. With the first case, reinforcer delay fading alone was effective at maintaining low rates of destructive behavior while introducing delayed reinforcement. In the second case, the addition of a punishment component reduced destructive behavior to near-zero levels and facilitated reinforcer delay fading. With the third case, reinforcer delay fading was associated with increases in masturbation and head rolling, but prompting and praising the individual for completing work during the delay interval reduced all problem behaviors and facilitated reinforcer delay fading.

  4. Learning-based traffic signal control algorithms with neighborhood information sharing: An application for sustainable mobility

    Energy Technology Data Exchange (ETDEWEB)

    Aziz, H. M. Abdul [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Zhu, Feng [Purdue University, West Lafayette, IN (United States). Lyles School of Civil Engineering; Ukkusuri, Satish V. [Purdue University, West Lafayette, IN (United States). Lyles School of Civil Engineering

    2017-10-04

    Here, this research applies R-Markov Average Reward Technique based reinforcement learning (RL) algorithm, namely RMART, for vehicular signal control problem leveraging information sharing among signal controllers in connected vehicle environment. We implemented the algorithm in a network of 18 signalized intersections and compare the performance of RMART with fixed, adaptive, and variants of the RL schemes. Results show significant improvement in system performance for RMART algorithm with information sharing over both traditional fixed signal timing plans and real time adaptive control schemes. Additionally, the comparison with reinforcement learning algorithms including Q learning and SARSA indicate that RMART performs better at higher congestion levels. Further, a multi-reward structure is proposed that dynamically adjusts the reward function with varying congestion states at the intersection. Finally, the results from test networks show significant reduction in emissions (CO, CO2, NOx, VOC, PM10) when RL algorithms are implemented compared to fixed signal timings and adaptive schemes.

  5. Australopithecus anamensis: a finite-element approach to studying the functional adaptations of extinct hominins.

    Science.gov (United States)

    Macho, Gabriele A; Shimizu, Daisuke; Jiang, Yong; Spears, Iain R

    2005-04-01

    Australopithecus anamensis is the stem species of all later hominins and exhibits the suite of characters traditionally associated with hominins, i.e., bipedal locomotion when on the ground, canine reduction, and thick-enameled teeth. The functional consequences of its thick enamel are, however, unclear. Without appropriate structural reinforcement, these thick-enameled teeth may be prone to failure. This article investigates the mechanical behavior of A. anamensis enamel and represents the first in a series that will attempt to determine the functional adaptations of hominin teeth. First, the microstructural arrangement of enamel prisms in A. anamensis teeth was reconstructed using recently developed software and was compared with that of extant hominoids. Second, a finite-element model of a block of enamel containing one cycle of prism deviation was reconstructed for Homo, Pan, Gorilla, and A. anamensis and the behavior of these tissues under compressive stress was determined. Despite similarities in enamel microstructure between A. anamensis and the African great apes, the structural arrangement of prismatic enamel in A. anamensis appears to be more effective in load dissipation under these compressive loads. The findings may imply that this hominin species was well adapted to puncture crushing and are in some respects contrary to expectations based on macromorphology of teeth. Taking together, information obtained from both finite-element analyses and dental macroanatomy leads us to suggest that A. anamensis was probably adapted for habitually consuming a hard-tough diet. However, additional tests are needed to understand the functional adaptations of A. anamensis teeth fully.

  6. Adaptive terminal sliding mode control for hypersonic flight vehicles with strictly lower convex function based nonlinear disturbance observer.

    Science.gov (United States)

    Wu, Yun-Jie; Zuo, Jing-Xing; Sun, Liang-Hua

    2017-11-01

    In this paper, the altitude and velocity tracking control of a generic hypersonic flight vehicle (HFV) is considered. A novel adaptive terminal sliding mode controller (ATSMC) with strictly lower convex function based nonlinear disturbance observer (SDOB) is proposed for the longitudinal dynamics of HFV in presence of both parametric uncertainties and external disturbances. First, for the sake of enhancing the anti-interference capability, SDOB is presented to estimate and compensate the equivalent disturbances by introducing a strictly lower convex function. Next, the SDOB based ATSMC (SDOB-ATSMC) is proposed to guarantee the system outputs track the reference trajectory. Then, stability of the proposed control scheme is analyzed by the Lyapunov function method. Compared with other HFV control approaches, key novelties of SDOB-ATSMC are that a novel SDOB is proposed and drawn into the (virtual) control laws to compensate the disturbances and that several adaptive laws are used to deal with the differential explosion problem. Finally, it is illustrated by the simulation results that the new method exhibits an excellent robustness and a better disturbance rejection performance than the convention approach. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  7. An adaptive deep Q-learning strategy for handwritten digit recognition.

    Science.gov (United States)

    Qiao, Junfei; Wang, Gongming; Li, Wenjing; Chen, Min

    2018-02-22

    Handwritten digits recognition is a challenging problem in recent years. Although many deep learning-based classification algorithms are studied for handwritten digits recognition, the recognition accuracy and running time still need to be further improved. In this paper, an adaptive deep Q-learning strategy is proposed to improve accuracy and shorten running time for handwritten digit recognition. The adaptive deep Q-learning strategy combines the feature-extracting capability of deep learning and the decision-making of reinforcement learning to form an adaptive Q-learning deep belief network (Q-ADBN). First, Q-ADBN extracts the features of original images using an adaptive deep auto-encoder (ADAE), and the extracted features are considered as the current states of Q-learning algorithm. Second, Q-ADBN receives Q-function (reward signal) during recognition of the current states, and the final handwritten digits recognition is implemented by maximizing the Q-function using Q-learning algorithm. Finally, experimental results from the well-known MNIST dataset show that the proposed Q-ADBN has a superiority to other similar methods in terms of accuracy and running time. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning.

    Directory of Open Access Journals (Sweden)

    Borja Fernandez-Gauna

    Full Text Available Multi-Agent Reinforcement Learning (MARL algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS in the relevant state-action subspace. Each agent's local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.

  9. "Notice of Violation of IEEE Publication Principles" Multiobjective Reinforcement Learning: A Comprehensive Overview.

    Science.gov (United States)

    Liu, Chunming; Xu, Xin; Hu, Dewen

    2013-04-29

    Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.

  10. A Review of the Relationship between Novelty, Intrinsic Motivation and Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Siddique Nazmul

    2017-11-01

    Full Text Available This paper presents a review on the tri-partite relationship between novelty, intrinsic motivation and reinforcement learning. The paper first presents a literature survey on novelty and the different computational models of novelty detection, with a specific focus on the features of stimuli that trigger a Hedonic value for generating a novelty signal. It then presents an overview of intrinsic motivation and investigations into different models with the aim of exploring deeper co-relationships between specific features of a novelty signal and its effect on intrinsic motivation in producing a reward function. Finally, it presents survey results on reinforcement learning, different models and their functional relationship with intrinsic motivation.

  11. Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances.

    Science.gov (United States)

    Song, Ruizhuo; Lewis, Frank L; Wei, Qinglai; Zhang, Huaguang

    2016-05-01

    An optimal control method is developed for unknown continuous-time systems with unknown disturbances in this paper. The integral reinforcement learning (IRL) algorithm is presented to obtain the iterative control. Off-policy learning is used to allow the dynamics to be completely unknown. Neural networks are used to construct critic and action networks. It is shown that if there are unknown disturbances, off-policy IRL may not converge or may be biased. For reducing the influence of unknown disturbances, a disturbances compensation controller is added. It is proven that the weight errors are uniformly ultimately bounded based on Lyapunov techniques. Convergence of the Hamiltonian function is also proven. The simulation study demonstrates the effectiveness of the proposed optimal control method for unknown systems with disturbances.

  12. Cerebellar and prefrontal cortex contributions to adaptation, strategies, and reinforcement learning.

    Science.gov (United States)

    Taylor, Jordan A; Ivry, Richard B

    2014-01-01

    Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. © 2014 Elsevier B.V. All rights reserved.

  13. Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks.

    Science.gov (United States)

    Modares, Hamidreza; Lewis, Frank L; Naghibi-Sistani, Mohammad-Bagher

    2013-10-01

    This paper presents an online policy iteration (PI) algorithm to learn the continuous-time optimal control solution for unknown constrained-input systems. The proposed PI algorithm is implemented on an actor-critic structure where two neural networks (NNs) are tuned online and simultaneously to generate the optimal bounded control policy. The requirement of complete knowledge of the system dynamics is obviated by employing a novel NN identifier in conjunction with the actor and critic NNs. It is shown how the identifier weights estimation error affects the convergence of the critic NN. A novel learning rule is developed to guarantee that the identifier weights converge to small neighborhoods of their ideal values exponentially fast. To provide an easy-to-check persistence of excitation condition, the experience replay technique is used. That is, recorded past experiences are used simultaneously with current data for the adaptation of the identifier weights. Stability of the whole system consisting of the actor, critic, system state, and system identifier is guaranteed while all three networks undergo adaptation. Convergence to a near-optimal control law is also shown. The effectiveness of the proposed method is illustrated with a simulation example.

  14. Adaptive critic learning techniques for engine torque and air-fuel ratio control.

    Science.gov (United States)

    Liu, Derong; Javaherian, Hossein; Kovalenko, Olesia; Huang, Ting

    2008-08-01

    A new approach for engine calibration and control is proposed. In this paper, we present our research results on the implementation of adaptive critic designs for self-learning control of automotive engines. A class of adaptive critic designs that can be classified as (model-free) action-dependent heuristic dynamic programming is used in this research project. The goals of the present learning control design for automotive engines include improved performance, reduced emissions, and maintained optimum performance under various operating conditions. Using the data from a test vehicle with a V8 engine, we developed a neural network model of the engine and neural network controllers based on the idea of approximate dynamic programming to achieve optimal control. We have developed and simulated self-learning neural network controllers for both engine torque (TRQ) and exhaust air-fuel ratio (AFR) control. The goal of TRQ control and AFR control is to track the commanded values. For both control problems, excellent neural network controller transient performance has been achieved.

  15. Learning related modulation of functional retrieval networks in man.

    Science.gov (United States)

    Petersson, K M; Sandblom, J; Gisselgård, J; Ingvar, M

    2001-07-01

    The medial temporal lobe has been implicated in studies of episodic memory tasks involving spatio-temporal context and object-location conjunctions. We have previously demonstrated that an increased level of practice in a free-recall task parallels a decrease in the functional activity of several brain regions, including the medial temporal lobe, the prefrontal, the anterior cingulate, the anterior insular, and the posterior parietal cortices, that in concert demonstrate a move from elaborate controlled processing towards a higher degree of automaticity. Here we report data from two experiments that extend these initial observations. We used a similar experimental approach but probed for effects of retrieval paradigms and stimulus material. In the first experiment we investigated practice related changes during recognition of object-location conjunctions and in the second during free-recall of pseudo-words. Learning in a neural network is a dynamic consequence of information processing and network plasticity. The present and previous PET results indicate that practice can induce a learning related functional restructuring of information processing. Different adaptive processes likely subserve the functional re-organisation observed. These may in part be related to different demands for attentional and working memory processing. It appears that the role(s) of the prefrontal cortex and the medial temporal lobe in memory retrieval are complex, perhaps reflecting several different interacting processes or cognitive components. We suggest that an integrative interactive perspective on the role of the prefrontal and medial temporal lobe is necessary for an understanding of the processing significance of these regions in learning and memory. It appears necessary to develop elaborated and explicit computational models for prefrontal and medial temporal functions in order to derive detailed empirical predictions, and in combination with an efficient use and development of

  16. Algorithms for Reinforcement Learning

    CERN Document Server

    Szepesvari, Csaba

    2010-01-01

    Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms'

  17. Altered cingulo-striatal function underlies reward drive deficits in schizophrenia.

    Science.gov (United States)

    Park, Il Ho; Chun, Ji Won; Park, Hae-Jeong; Koo, Min-Seong; Park, Sunyoung; Kim, Seok-Hyeong; Kim, Jae-Jin

    2015-02-01

    Amotivation in schizophrenia is assumed to involve dysfunctional dopaminergic signaling of reward prediction or anticipation. It is unclear, however, whether the translation of neural representation of reward value to behavioral drive is affected in schizophrenia. In order to examine how abnormal neural processing of response valuation and initiation affects incentive motivation in schizophrenia, we conducted functional MRI using a deterministic reinforcement learning task with variable intervals of contingency reversals in 20 clinically stable patients with schizophrenia and 20 healthy controls. Behaviorally, the advantage of positive over negative reinforcer in reinforcement-related responsiveness was not observed in patients. Patients showed altered response valuation and initiation-related striatal activity and deficient rostro-ventral anterior cingulate cortex activation during reward approach initiation. Among these neural abnormalities, rostro-ventral anterior cingulate cortex activation was correlated with positive reinforcement-related responsiveness in controls and social anhedonia and social amotivation subdomain scores in patients. Our findings indicate that the central role of the anterior cingulate cortex is in translating action value into driving force of action, and underscore the role of the cingulo-striatal network in amotivation in schizophrenia. Copyright © 2014 Elsevier B.V. All rights reserved.

  18. MyHealthAtVanderbilt: policies and procedures governing patient portal functionality

    Science.gov (United States)

    Rosenbloom, S Trent; Stenner, Shane P; Anders, Shilo; Muse, Sue; Johnson, Kevin B; Jirjis, Jim; Jackson, Gretchen Purcell

    2011-01-01

    Explicit guidelines are needed to develop safe and effective patient portals. This paper proposes general principles, policies, and procedures for patient portal functionality based on MyHealthAtVanderbilt (MHAV), a robust portal for Vanderbilt University Medical Center. We describe policies and procedures designed to govern popular portal functions, address common user concerns, and support adoption. We present the results of our approach as overall and function-specific usage data. Five years after implementation, MHAV has over 129 800 users; 45% have used bi-directional messaging; 52% have viewed test results and 45% have viewed other medical record data; 30% have accessed health education materials; 39% have scheduled appointments; and 29% have managed a medical bill. Our policies and procedures have supported widespread adoption and use of MHAV. We believe other healthcare organizations could employ our general guidelines and lessons learned to facilitate portal implementation and usage. PMID:21807648

  19. Reinforcement learning in supply chains.

    Science.gov (United States)

    Valluri, Annapurna; North, Michael J; Macal, Charles M

    2009-10-01

    Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.

  20. Implementation of real-time energy management strategy based on reinforcement learning for hybrid electric vehicles and simulation validation.

    Science.gov (United States)

    Kong, Zehui; Zou, Yuan; Liu, Teng

    2017-01-01

    To further improve the fuel economy of series hybrid electric tracked vehicles, a reinforcement learning (RL)-based real-time energy management strategy is developed in this paper. In order to utilize the statistical characteristics of online driving schedule effectively, a recursive algorithm for the transition probability matrix (TPM) of power-request is derived. The reinforcement learning (RL) is applied to calculate and update the control policy at regular time, adapting to the varying driving conditions. A facing-forward powertrain model is built in detail, including the engine-generator model, battery model and vehicle dynamical model. The robustness and adaptability of real-time energy management strategy are validated through the comparison with the stationary control strategy based on initial transition probability matrix (TPM) generated from a long naturalistic driving cycle in the simulation. Results indicate that proposed method has better fuel economy than stationary one and is more effective in real-time control.

  1. Implementation of real-time energy management strategy based on reinforcement learning for hybrid electric vehicles and simulation validation.

    Directory of Open Access Journals (Sweden)

    Zehui Kong

    Full Text Available To further improve the fuel economy of series hybrid electric tracked vehicles, a reinforcement learning (RL-based real-time energy management strategy is developed in this paper. In order to utilize the statistical characteristics of online driving schedule effectively, a recursive algorithm for the transition probability matrix (TPM of power-request is derived. The reinforcement learning (RL is applied to calculate and update the control policy at regular time, adapting to the varying driving conditions. A facing-forward powertrain model is built in detail, including the engine-generator model, battery model and vehicle dynamical model. The robustness and adaptability of real-time energy management strategy are validated through the comparison with the stationary control strategy based on initial transition probability matrix (TPM generated from a long naturalistic driving cycle in the simulation. Results indicate that proposed method has better fuel economy than stationary one and is more effective in real-time control.

  2. A Factor-Analytic Study of Adaptive Behavior and Intellectual Functioning in Learning Disabled Children.

    Science.gov (United States)

    Yeargan, Dollye R.

    The factorial structure of intellectual functioning and adaptive behavior was examined in 160 learning disabled students (6 to 16 years old). Ss were administered the Wechsler Intelligence Scale for Children-Revised (WISC-R) and the Coping Inventory (CI). Factor analysis of WISC-R scores revealed three factors: verbal comprehenson, perceptual…

  3. Rational metareasoning and the plasticity of cognitive control

    Science.gov (United States)

    Shenhav, Amitai; Musslick, Sebastian; Griffiths, Thomas L.

    2018-01-01

    The human brain has the impressive capacity to adapt how it processes information to high-level goals. While it is known that these cognitive control skills are malleable and can be improved through training, the underlying plasticity mechanisms are not well understood. Here, we develop and evaluate a model of how people learn when to exert cognitive control, which controlled process to use, and how much effort to exert. We derive this model from a general theory according to which the function of cognitive control is to select and configure neural pathways so as to make optimal use of finite time and limited computational resources. The central idea of our Learned Value of Control model is that people use reinforcement learning to predict the value of candidate control signals of different types and intensities based on stimulus features. This model correctly predicts the learning and transfer effects underlying the adaptive control-demanding behavior observed in an experiment on visual attention and four experiments on interference control in Stroop and Flanker paradigms. Moreover, our model explained these findings significantly better than an associative learning model and a Win-Stay Lose-Shift model. Our findings elucidate how learning and experience might shape people’s ability and propensity to adaptively control their minds and behavior. We conclude by predicting under which circumstances these learning mechanisms might lead to self-control failure. PMID:29694347

  4. Rational metareasoning and the plasticity of cognitive control.

    Science.gov (United States)

    Lieder, Falk; Shenhav, Amitai; Musslick, Sebastian; Griffiths, Thomas L

    2018-04-01

    The human brain has the impressive capacity to adapt how it processes information to high-level goals. While it is known that these cognitive control skills are malleable and can be improved through training, the underlying plasticity mechanisms are not well understood. Here, we develop and evaluate a model of how people learn when to exert cognitive control, which controlled process to use, and how much effort to exert. We derive this model from a general theory according to which the function of cognitive control is to select and configure neural pathways so as to make optimal use of finite time and limited computational resources. The central idea of our Learned Value of Control model is that people use reinforcement learning to predict the value of candidate control signals of different types and intensities based on stimulus features. This model correctly predicts the learning and transfer effects underlying the adaptive control-demanding behavior observed in an experiment on visual attention and four experiments on interference control in Stroop and Flanker paradigms. Moreover, our model explained these findings significantly better than an associative learning model and a Win-Stay Lose-Shift model. Our findings elucidate how learning and experience might shape people's ability and propensity to adaptively control their minds and behavior. We conclude by predicting under which circumstances these learning mechanisms might lead to self-control failure.

  5. Rational metareasoning and the plasticity of cognitive control.

    Directory of Open Access Journals (Sweden)

    Falk Lieder

    2018-04-01

    Full Text Available The human brain has the impressive capacity to adapt how it processes information to high-level goals. While it is known that these cognitive control skills are malleable and can be improved through training, the underlying plasticity mechanisms are not well understood. Here, we develop and evaluate a model of how people learn when to exert cognitive control, which controlled process to use, and how much effort to exert. We derive this model from a general theory according to which the function of cognitive control is to select and configure neural pathways so as to make optimal use of finite time and limited computational resources. The central idea of our Learned Value of Control model is that people use reinforcement learning to predict the value of candidate control signals of different types and intensities based on stimulus features. This model correctly predicts the learning and transfer effects underlying the adaptive control-demanding behavior observed in an experiment on visual attention and four experiments on interference control in Stroop and Flanker paradigms. Moreover, our model explained these findings significantly better than an associative learning model and a Win-Stay Lose-Shift model. Our findings elucidate how learning and experience might shape people's ability and propensity to adaptively control their minds and behavior. We conclude by predicting under which circumstances these learning mechanisms might lead to self-control failure.

  6. Internet Searches and Their Relationship to Cognitive Function in Older Adults: Cross-Sectional Analysis.

    Science.gov (United States)

    Austin, Johanna; Hollingshead, Kristy; Kaye, Jeffrey

    2017-09-06

    Alzheimer disease (AD) is a very challenging experience for all those affected. Unfortunately, detection of Alzheimer disease in its early stages when clinical treatments may be most effective is challenging, as the clinical evaluations are time-consuming and costly. Recent studies have demonstrated a close relationship between cognitive function and everyday behavior, an avenue of research that holds great promise for the early detection of cognitive decline. One area of behavior that changes with cognitive decline is language use. Multiple groups have demonstrated a close relationship between cognitive function and vocabulary size, verbal fluency, and semantic ability, using conventional in-person cognitive testing. An alternative to this approach which is inherently ecologically valid may be to take advantage of automated computer monitoring software to continually capture and analyze language use while on the computer. The aim of this study was to understand the relationship between Internet searches as a measure of language and cognitive function in older adults. We hypothesize that individuals with poorer cognitive function will search using fewer unique terms, employ shorter words, and use less obscure words in their searches. Computer monitoring software (WorkTime, Nestersoft Inc) was used to continuously track the terms people entered while conducting searches in Google, Yahoo, Bing, and Ask.com. For all searches, punctuation, accents, and non-ASCII characters were removed, and the resulting search terms were spell-checked before any analysis. Cognitive function was evaluated as a z-normalized summary score capturing five unique cognitive domains. Linear regression was used to determine the relationship between cognitive function and Internet searches by controlling for variables such as age, sex, and education. Over a 6-month monitoring period, 42 participants (mean age 81 years [SD 10.5], 83% [35/42] female) conducted 2915 searches using these top search

  7. Fracture Behavior and Properties of Functionally Graded Fiber-Reinforced Concrete

    International Nuclear Information System (INIS)

    Roesler, Jeffery; Bordelon, Amanda; Gaedicke, Cristian; Park, Kyoungsoo; Paulino, Glaucio

    2008-01-01

    In concrete pavements, a single concrete mixture design is selected to resist mechanical loading without attempting to adversely affect the concrete pavement shrinkage, ride quality, or noise attenuation. An alternative approach is to design distinct layers within the concrete pavement surface which have specific functions thus achieving higher performance at a lower cost. The objective of this research was to address the structural benefits of functionally graded concrete materials (FGCM) for rigid pavements by testing and modeling the fracture behavior of different combinations of layered plain and synthetic fiber-reinforced concrete materials. Fracture parameters and the post-peak softening behavior were obtained for each FGCM beam configuration by the three point bending beam test. The peak loads and initial fracture energy between the plain, fiber-reinforced, and FGCM signified similar crack initiation. The total fracture energy indicated improvements in fracture behavior of FGCM relative to full-depth plain concrete. The fracture behavior of FGCM depended on the position of the fiber-reinforced layer relative to the starter notch. The fracture parameters of both fiber-reinforced and plain concrete were embedded into a finite element-based cohesive zone model. The model successfully captured the experimental behavior of the FGCMs and predicted the fracture behavior of proposed FGCM configurations and structures. This integrated approach (testing and modeling) demonstrates the viability of FGCM for designing layered concrete pavements system

  8. Children's Learning in Scientific Thinking: Instructional Approaches and Roles of Variable Identification and Executive Function

    Science.gov (United States)

    Blums, Angela

    The present study examines instructional approaches and cognitive factors involved in elementary school children's thinking and learning the Control of Variables Strategy (CVS), a critical aspect of scientific reasoning. Previous research has identified several features related to effective instruction of CVS, including using a guided learning approach, the use of self-reflective questions, and learning in individual and group contexts. The current study examined the roles of procedural and conceptual instruction in learning CVS and investigated the role of executive function in the learning process. Additionally, this study examined how learning to identify variables is a part of the CVS process. In two studies (individual and classroom experiments), 139 third, fourth, and fifth grade students participated in hands-on and paper and pencil CVS learning activities and, in each study, were assigned to either a procedural instruction, conceptual instruction, or control (no instruction) group. Participants also completed a series of executive function tasks. The study was carried out with two parts--Study 1 used an individual context and Study 2 was carried out in a group setting. Results indicated that procedural and conceptual instruction were more effective than no instruction, and the ability to identify variables was identified as a key piece to the CVS process. Executive function predicted ability to identify variables and predicted success on CVS tasks. Developmental differences were present, in that older children outperformed younger children on CVS tasks, and that conceptual instruction was slightly more effective for older children. Some differences between individual and group instruction were found, with those in the individual context showing some advantage over the those in the group setting in learning CVS concepts. Conceptual implications about scientific thinking and practical implications in science education are discussed.

  9. Closed-loop adaptation of neurofeedback based on mental effort facilitates reinforcement learning of brain self-regulation.

    Science.gov (United States)

    Bauer, Robert; Fels, Meike; Royter, Vladislav; Raco, Valerio; Gharabaghi, Alireza

    2016-09-01

    Considering self-rated mental effort during neurofeedback may improve training of brain self-regulation. Twenty-one healthy, right-handed subjects performed kinesthetic motor imagery of opening their left hand, while threshold-based classification of beta-band desynchronization resulted in proprioceptive robotic feedback. The experiment consisted of two blocks in a cross-over design. The participants rated their perceived mental effort nine times per block. In the adaptive block, the threshold was adjusted on the basis of these ratings whereas adjustments were carried out at random in the other block. Electroencephalography was used to examine the cortical activation patterns during the training sessions. The perceived mental effort was correlated with the difficulty threshold of neurofeedback training. Adaptive threshold-setting reduced mental effort and increased the classification accuracy and positive predictive value. This was paralleled by an inter-hemispheric cortical activation pattern in low frequency bands connecting the right frontal and left parietal areas. Optimal balance of mental effort was achieved at thresholds significantly higher than maximum classification accuracy. Rating of mental effort is a feasible approach for effective threshold-adaptation during neurofeedback training. Closed-loop adaptation of the neurofeedback difficulty level facilitates reinforcement learning of brain self-regulation. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  10. Adapting SMME business functions during economic turmoil

    Directory of Open Access Journals (Sweden)

    Chantal Rootman

    2010-12-01

    Full Text Available Purpose: The purpose of this study is to investigate how SMMEs should adapt their business functions to improve business performance during times of economic turmoil. Problem investigated: SMMEs are important contributors to the economy as these firms provide employment opportunities and create economic wealth. However, many SMMEs fail due to reasons such as the influence of economic factors (low sales and growth prospects as well as the lack of finance, managerial skills and expertise. SMMEs could possibly increase their chances of success if they adjust aspects in their firms which the owners and managers of the SMMEs can control. Business functions are regarded as internal forces influencing a firm and SMME owners and managers can control these functions. These business functions include general and strategic management, purchasing management, production management, marketing management, financial management, human resources management, business communication management and information management. It is important to investigate how SMMEs can adapt their business functions, during difficult economic times, to improve their business performance. Methodology: A self-developed, self-administered and structured questionnaire was distributed to 300 SMMEs in the Eastern Cape and the Garden Route area. A total of 250 usable questionnaires were received, therefore a response rate of 83% was obtained. Findings and implications: The findings of this study revealed that all eight of the business functions require adjustments during difficult economic times to improve the business performance of SMMEs. Respondents regarded the financial management function as the area in SMMEs that needs the most focus and adjustments, during challenging economic times to improve business performance. Following financial management is the purchasing- and information management business functions. Originality and value of the research: This study specifically focussed on

  11. applying reinforcement learning to the weapon assignment problem

    African Journals Online (AJOL)

    ismith

    Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy ..... closest to the threat should fire (that weapon also had the highest probability to ... Monte Carlo ..... “Reinforcement learning: Theory, methods and application to.

  12. ADAPTIVE E-LEARNING AND ITS EVALUATION

    Directory of Open Access Journals (Sweden)

    KOSTOLÁNYOVÁ, Katerina

    2012-12-01

    Full Text Available This paper introduces a complex plan for a complete system of individualized electronic instruction. The core of the system is a computer program to control teaching, the so called “virtual teacher”. The virtual teacher automatically adapts to individual student’s characteristics and their learning style. It adapts to static as well as to dynamic characteristics of the student. To manage all this it needs a database of various styles and forms of teaching as well as a sufficient amount of information about the learning style, type of memory and other characteristics of the student. The information about these characteristics, the structure of data storage and its use by the virtual teacher are also part of this paper. We also outline a methodology of adaptive study materials. We define basic rules and forms to create adaptive study materials. This adaptive e-learning system was pilot tested in learning of more than 50 students. These students filled in a learning style questionnaire at the beginning of the study and they had the option to fill in an adaptive evaluation questionnaire at the end of the study. Results of these questionnaires were analyzed. Several conclusions were concluded from this analysis to alter the methodology of adaptive study materials.

  13. Neuromorphic function learning with carbon nanotube based synapses

    International Nuclear Information System (INIS)

    Gacem, Karim; Filoramo, Arianna; Derycke, Vincent; Retrouvey, Jean-Marie; Chabi, Djaafar; Zhao, Weisheng; Klein, Jacques-Olivier

    2013-01-01

    The principle of using nanoscale memory devices as artificial synapses in neuromorphic circuits is recognized as a promising way to build ground-breaking circuit architectures tolerant to defects and variability. Yet, actual experimental demonstrations of the neural network type of circuits based on non-conventional/non-CMOS memory devices and displaying function learning capabilities remain very scarce. We show here that carbon-nanotube-based memory elements can be used as artificial synapses, combined with conventional neurons and trained to perform functions through the application of a supervised learning algorithm. The same ensemble of eight devices can notably be trained multiple times to code successively any three-input linearly separable Boolean logic function despite device-to-device variability. This work thus represents one of the very few demonstrations of actual function learning with synapses based on nanoscale building blocks. The potential of such an approach for the parallel learning of multiple and more complex functions is also evaluated. (paper)

  14. A Novel Quantum-Behaved Lightning Search Algorithm Approach to Improve the Fuzzy Logic Speed Controller for an Induction Motor Drive

    Directory of Open Access Journals (Sweden)

    Jamal Abd Ali

    2015-11-01

    Full Text Available This paper presents a novel lightning search algorithm (LSA using quantum mechanics theories to generate a quantum-inspired LSA (QLSA. The QLSA improves the searching of each step leader to obtain the best position for a projectile. To evaluate the reliability and efficiency of the proposed algorithm, the QLSA is tested using eighteen benchmark functions with various characteristics. The QLSA is applied to improve the design of the fuzzy logic controller (FLC for controlling the speed response of the induction motor drive. The proposed algorithm avoids the exhaustive conventional trial-and-error procedure for obtaining membership functions (MFs. The generated adaptive input and output MFs are implemented in the fuzzy speed controller design to formulate the objective functions. Mean absolute error (MAE of the rotor speed is the objective function of optimization controller. An optimal QLSA-based FLC (QLSAF optimization controller is employed to tune and minimize the MAE, thereby improving the performance of the induction motor with the change in speed and mechanical load. To validate the performance of the developed controller, the results obtained with the QLSAF are compared to the results obtained with LSA, the backtracking search algorithm (BSA, the gravitational search algorithm (GSA, the particle swarm optimization (PSO and the proportional integral derivative controllers (PID, respectively. Results show that the QLASF outperforms the other control methods in all of the tested cases in terms of damping capability and transient response under different mechanical loads and speeds.

  15. A structure-based approach to evaluation product adaptability in adaptable design

    International Nuclear Information System (INIS)

    Cheng, Qiang; Liu, Zhifeng; Cai, Ligang; Zhang, Guojun; Gu, Peihua

    2011-01-01

    Adaptable design, as a new design paradigm, involves creating designs and products that can be easily changed to satisfy different requirements. In this paper, two types of product adaptability are proposed as essential adaptability and behavioral adaptability, and through measuring which respectively a model for product adaptability evaluation is developed. The essential adaptability evaluation proceeds with analyzing the independencies of function requirements and function modules firstly based on axiomatic design, and measuring the adaptability of interfaces secondly with three indices. The behavioral adaptability reflected by the performance of adaptable requirements after adaptation is measured based on Kano model. At last, the effectiveness of the proposed method is demonstrated by an illustrative example of the motherboard of a personal computer. The results show that the method can evaluate and reveal the adaptability of a product in essence, and is of directive significance to improving design and innovative design

  16. Traffic light control by multiagent reinforcement learning systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.; Groen, F.C.A.; Babuška, R.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  17. Traffic Light Control by Multiagent Reinforcement Learning Systems

    NARCIS (Netherlands)

    Bakker, B.; Whiteson, S.; Kester, L.J.H.M.; Groen, F.C.A.

    2010-01-01

    Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of

  18. Applying reinforcement learning to the weapon assignment problem in air defence

    CSIR Research Space (South Africa)

    Mouton, H

    2011-12-01

    Full Text Available . The techniques investigated in this article were two methods from the machine-learning subfield of reinforcement learning (RL), namely a Monte Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy temporal-difference (TD) learning...

  19. Systems control with generalized probabilistic fuzzy-reinforcement learning

    NARCIS (Netherlands)

    Hinojosa, J.; Nefti, S.; Kaymak, U.

    2011-01-01

    Reinforcement learning (RL) is a valuable learning method when the systems require a selection of control actions whose consequences emerge over long periods for which input-output data are not available. In most combinations of fuzzy systems and RL, the environment is considered to be

  20. A reinforcement learning model of joy, distress, hope and fear

    Science.gov (United States)

    Broekens, Joost; Jacobs, Elmer; Jonker, Catholijn M.

    2015-07-01

    In this paper we computationally study the relation between adaptive behaviour and emotion. Using the reinforcement learning framework, we propose that learned state utility, ?, models fear (negative) and hope (positive) based on the fact that both signals are about anticipation of loss or gain. Further, we propose that joy/distress is a signal similar to the error signal. We present agent-based simulation experiments that show that this model replicates psychological and behavioural dynamics of emotion. This work distinguishes itself by assessing the dynamics of emotion in an adaptive agent framework - coupling it to the literature on habituation, development, extinction and hope theory. Our results support the idea that the function of emotion is to provide a complex feedback signal for an organism to adapt its behaviour. Our work is relevant for understanding the relation between emotion and adaptation in animals, as well as for human-robot interaction, in particular how emotional signals can be used to communicate between adaptive agents and humans.

  1. Inhibitory control as a factor of adaptive functioning of children with mild intellectual disability

    Directory of Open Access Journals (Sweden)

    Gligorović Milica

    2012-01-01

    Full Text Available Bearing in mind that the adaptive behaviour is one of the defining parameters of intellectual disability, determining of the influence of inhibitory control on adaptive functioning in children with mild intellectual disability was defined as a basic aim of this research. The sample covered 95 children with mild intellectual disability (MID, of both genders, from 10 to 14 years of age. By analysis of the data of schools' pedagogical-psychological departments, data on age and intellectual abilities of participants were collected. Inhibitory control was estimated by Go no Go task, consisted of Conflict Response and Response Delay sets. Adaptive skills data were gained on the basis of a standardized interview with special education teachers, by applying of AAMR Scale of adaptive functioning. On the basis of factor analysis, Scale scores were grouped in five factors: Personal independence, Social Independence, Personal and Social Responsibility, Social Adaptability and Personal Adaptability. Significance of relations among the observed variables was established by Pearson's correlation coefficient, by partial correlation coefficient and multifactorial variance analysis. Based on the analysis of results a statistically significant relationship between errors in the execution of tasks that belong to the set of conflict motor responses and adaptive functioning (p≤0.000 was established. The relationship between errors that belong to the set of the response delay, and adaptive functioning is not statistically significant (p=0.324. Inhibition of the interference response is a significant factor of practical (partial η2=0.227, conceptual (partial η2=0.341 and social (partial η2=0.131 adaptive skills, while the response delay is significantly associated with the conceptual skills (p=0.029 only. Inhibitory control did not prove itself a significant factor in behaviour problems of externalizing and internalized type.

  2. Dynamic Learning from Adaptive Neural Control of Uncertain Robots with Guaranteed Full-State Tracking Precision

    Directory of Open Access Journals (Sweden)

    Min Wang

    2017-01-01

    Full Text Available A dynamic learning method is developed for an uncertain n-link robot with unknown system dynamics, achieving predefined performance attributes on the link angular position and velocity tracking errors. For a known nonsingular initial robotic condition, performance functions and unconstrained transformation errors are employed to prevent the violation of the full-state tracking error constraints. By combining two independent Lyapunov functions and radial basis function (RBF neural network (NN approximator, a novel and simple adaptive neural control scheme is proposed for the dynamics of the unconstrained transformation errors, which guarantees uniformly ultimate boundedness of all the signals in the closed-loop system. In the steady-state control process, RBF NNs are verified to satisfy the partial persistent excitation (PE condition. Subsequently, an appropriate state transformation is adopted to achieve the accurate convergence of neural weight estimates. The corresponding experienced knowledge on unknown robotic dynamics is stored in NNs with constant neural weight values. Using the stored knowledge, a static neural learning controller is developed to improve the full-state tracking performance. A comparative simulation study on a 2-link robot illustrates the effectiveness of the proposed scheme.

  3. Hidden policy ciphertext-policy attribute-based encryption with keyword search against keyword guessing attack

    Institute of Scientific and Technical Information of China (English)

    Shuo; QIU; Jiqiang; LIU; Yanfeng; SHI; Rui; ZHANG

    2017-01-01

    Attribute-based encryption with keyword search(ABKS) enables data owners to grant their search capabilities to other users by enforcing an access control policy over the outsourced encrypted data. However,existing ABKS schemes cannot guarantee the privacy of the access structures, which may contain some sensitive private information. Furthermore, resulting from the exposure of the access structures, ABKS schemes are susceptible to an off-line keyword guessing attack if the keyword space has a polynomial size. To solve these problems, we propose a novel primitive named hidden policy ciphertext-policy attribute-based encryption with keyword search(HP-CPABKS). With our primitive, the data user is unable to search on encrypted data and learn any information about the access structure if his/her attribute credentials cannot satisfy the access control policy specified by the data owner. We present a rigorous selective security analysis of the proposed HP-CPABKS scheme, which simultaneously keeps the indistinguishability of the keywords and the access structures. Finally,the performance evaluation verifies that our proposed scheme is efficient and practical.

  4. Oxytocin attenuates trust as a subset of more general reinforcement learning, with altered reward circuit functional connectivity in males.

    Science.gov (United States)

    Ide, Jaime S; Nedic, Sanja; Wong, Kin F; Strey, Shmuel L; Lawson, Elizabeth A; Dickerson, Bradford C; Wald, Lawrence L; La Camera, Giancarlo; Mujica-Parodi, Lilianne R

    2018-07-01

    Oxytocin (OT) is an endogenous neuropeptide that, while originally thought to promote trust, has more recently been found to be context-dependent. Here we extend experimental paradigms previously restricted to de novo decision-to-trust, to a more realistic environment in which social relationships evolve in response to iterative feedback over twenty interactions. In a randomized, double blind, placebo-controlled within-subject/crossover experiment of human adult males, we investigated the effects of a single dose of intranasal OT (40 IU) on Bayesian expectation updating and reinforcement learning within a social context, with associated brain circuit dynamics. Subjects participated in a neuroeconomic task (Iterative Trust Game) designed to probe iterative social learning while their brains were scanned using ultra-high field (7T) fMRI. We modeled each subject's behavior using Bayesian updating of belief-states ("willingness to trust") as well as canonical measures of reinforcement learning (learning rate, inverse temperature). Behavioral trajectories were then used as regressors within fMRI activation and connectivity analyses to identify corresponding brain network functionality affected by OT. Behaviorally, OT reduced feedback learning, without bias with respect to positive versus negative reward. Neurobiologically, reduced learning under OT was associated with muted communication between three key nodes within the reward circuit: the orbitofrontal cortex, amygdala, and lateral (limbic) habenula. Our data suggest that OT, rather than inspiring feelings of generosity, instead attenuates the brain's encoding of prediction error and therefore its ability to modulate pre-existing beliefs. This effect may underlie OT's putative role in promoting what has typically been reported as 'unjustified trust' in the face of information that suggests likely betrayal, while also resolving apparent contradictions with regard to OT's context-dependent behavioral effects. Copyright

  5. Approximation Of Multi-Valued Inverse Functions Using Clustering And Sugeno Fuzzy Inference

    Science.gov (United States)

    Walden, Maria A.; Bikdash, Marwan; Homaifar, Abdollah

    1998-01-01

    Finding the inverse of a continuous function can be challenging and computationally expensive when the inverse function is multi-valued. Difficulties may be compounded when the function itself is difficult to evaluate. We show that we can use fuzzy-logic approximators such as Sugeno inference systems to compute the inverse on-line. To do so, a fuzzy clustering algorithm can be used in conjunction with a discriminating function to split the function data into branches for the different values of the forward function. These data sets are then fed into a recursive least-squares learning algorithm that finds the proper coefficients of the Sugeno approximators; each Sugeno approximator finds one value of the inverse function. Discussions about the accuracy of the approximation will be included.

  6. Reinforcement learning for a biped robot based on a CPG-actor-critic method.

    Science.gov (United States)

    Nakamura, Yutaka; Mori, Takeshi; Sato, Masa-aki; Ishii, Shin

    2007-08-01

    Animals' rhythmic movements, such as locomotion, are considered to be controlled by neural circuits called central pattern generators (CPGs), which generate oscillatory signals. Motivated by this biological mechanism, studies have been conducted on the rhythmic movements controlled by CPG. As an autonomous learning framework for a CPG controller, we propose in this article a reinforcement learning method we call the "CPG-actor-critic" method. This method introduces a new architecture to the actor, and its training is roughly based on a stochastic policy gradient algorithm presented recently. We apply this method to an automatic acquisition problem of control for a biped robot. Computer simulations show that training of the CPG can be successfully performed by our method, thus allowing the biped robot to not only walk stably but also adapt to environmental changes.

  7. Evolutionary Policy Transfer and Search Methods for Boosting Behavior Quality: RoboCup Keep-Away Case Study

    Directory of Open Access Journals (Sweden)

    Geoff Nitschke

    2017-11-01

    Full Text Available This study evaluates various evolutionary search methods to direct neural controller evolution in company with policy (behavior transfer across increasingly complex collective robotic (RoboCup keep-away tasks. Robot behaviors are first evolved in a source task and then transferred for further evolution to more complex target tasks. Evolutionary search methods tested include objective-based search (fitness function, behavioral and genotypic diversity maintenance, and hybrids of such diversity maintenance and objective-based search. Evolved behavior quality is evaluated according to effectiveness and efficiency. Effectiveness is the average task performance of transferred and evolved behaviors, where task performance is the average time the ball is controlled by a keeper team. Efficiency is the average number of generations taken for the fittest evolved behaviors to reach a minimum task performance threshold given policy transfer. Results indicate that policy transfer coupled with hybridized evolution (behavioral diversity maintenance and objective-based search addresses the bootstrapping problem for increasingly complex keep-away tasks. That is, this hybrid method (coupled with policy transfer evolves behaviors that could not otherwise be evolved. Also, this hybrid evolutionary search was demonstrated as consistently evolving topologically simple neural controllers that elicited high-quality behaviors.

  8. The role of function analysis in the ACR control centre design

    International Nuclear Information System (INIS)

    Leger, R.P.; Davey, E.C.

    2006-01-01

    An essential aspect of control centre design is the need to characterize: plant functions and their inter-relationships to support the achievement of operational goals, and roles for humans and automation in sharing and exchanging the execution of functions across all operational phases. Function analysis is a design activity that has been internationally accepted as an approach to satisfy this need. It is recognized as a fundamental and necessary component in the systematic approach to control centre design and is carried out early in the design process. A function analysis can provide a clear basis for: the control centre design for the purposes of design team communication, and customer or regulatory review, the control centre display and control systems, the staffing and layout requirements of the control centre, assessing the completeness of control centre displays and controls prior and supplementary to mock-up walkthroughs or simulator evaluations, and the design of operating procedures and training programs. This paper will explore the role for function analysis in supporting the design of the control centre. The development of the ACR control room will be used as an illustrative context for the discussion. The paper will also discuss the merits of using function analysis in a goal-or function-based approach resulting in a more robust, operationally compatible, and cost-effective design over the life of the plant. Two former papers have previously outlined, the evolution in AECL's application approach and lessons learned in applying function analysis in support of control room design. This paper provides the most recent update to this progression in application refinement. (author)

  9. Bio-inspired adaptive feedback error learning architecture for motor control.

    Science.gov (United States)

    Tolu, Silvia; Vanegas, Mauricio; Luque, Niceto R; Garrido, Jesús A; Ros, Eduardo

    2012-10-01

    This study proposes an adaptive control architecture based on an accurate regression method called Locally Weighted Projection Regression (LWPR) and on a bio-inspired module, such as a cerebellar-like engine. This hybrid architecture takes full advantage of the machine learning module (LWPR kernel) to abstract an optimized representation of the sensorimotor space while the cerebellar component integrates this to generate corrective terms in the framework of a control task. Furthermore, we illustrate how the use of a simple adaptive error feedback term allows to use the proposed architecture even in the absence of an accurate analytic reference model. The presented approach achieves an accurate control with low gain corrective terms (for compliant control schemes). We evaluate the contribution of the different components of the proposed scheme comparing the obtained performance with alternative approaches. Then, we show that the presented architecture can be used for accurate manipulation of different objects when their physical properties are not directly known by the controller. We evaluate how the scheme scales for simulated plants of high Degrees of Freedom (7-DOFs).

  10. Symmetry-Adapted Ro-vibrational Basis Functions for Variational Nuclear Motion Calculations: TROVE Approach.

    Science.gov (United States)

    Yurchenko, Sergei N; Yachmenev, Andrey; Ovsyannikov, Roman I

    2017-09-12

    We present a general, numerically motivated approach to the construction of symmetry-adapted basis functions for solving ro-vibrational Schrödinger equations. The approach is based on the property of the Hamiltonian operator to commute with the complete set of symmetry operators and, hence, to reflect the symmetry of the system. The symmetry-adapted ro-vibrational basis set is constructed numerically by solving a set of reduced vibrational eigenvalue problems. In order to assign the irreducible representations associated with these eigenfunctions, their symmetry properties are probed on a grid of molecular geometries with the corresponding symmetry operations. The transformation matrices are reconstructed by solving overdetermined systems of linear equations related to the transformation properties of the corresponding wave functions on the grid. Our method is implemented in the variational approach TROVE and has been successfully applied to many problems covering the most important molecular symmetry groups. Several examples are used to illustrate the procedure, which can be easily applied to different types of coordinates, basis sets, and molecular systems.

  11. Identifying ecological "sweet spots" underlying cyanobacteria functional group dynamics from long-term observations using a statistical machine learning approach

    Science.gov (United States)

    Nelson, N.; Munoz-Carpena, R.; Phlips, E. J.

    2017-12-01

    Diversity in the eco-physiological adaptations of cyanobacteria genera creates challenges for water managers who are tasked with developing appropriate actions for controlling not only the intensity and frequency of cyanobacteria blooms, but also reducing the potential for blooms of harmful taxa (e.g., toxin producers, N2 fixers). Compounding these challenges, the efficacy of nutrient management strategies (phosphorus-only versus nitrogen-and-phosphorus) for cyanobacteria bloom abatement is the subject of an ongoing debate, which increases uncertainty associated with bloom mitigation decision-making. In this work, we analyze a unique long-term (17-year) dataset composed of monthly observations of cyanobacteria genera abundances, zooplankton abundances, water quality, and flow from Lake George, a bloom-impacted flow-through lake of the St. Johns River (FL, USA). Using the Random Forests machine learning algorithm, an assumption-free ensemble modeling approach, the dataset was evaluated to quantify and characterize relationships between environmental conditions and seven cyanobacteria groupings: five genera (Anabaena, Cylindrospermopsis, Lyngbya, Microcystis, and Oscillatoria) and two functional groups (N2 fixers and non-fixers). Results highlight the selectivity of nitrogen in describing genera and functional group dynamics, and potential for physical effects to limit the efficacy of nutrient management as a mechanism for cyanobacteria bloom mitigation.

  12. Disrupted avoidance learning in functional neurological disorder: Implications for harm avoidance theories

    Directory of Open Access Journals (Sweden)

    Laurel S. Morris

    Full Text Available Background: Functional neurological disorder (FND is an elusive disorder characterized by unexplained neurological symptoms alongside aberrant cognitive processing and negative affect, often associated with amygdala reactivity. Methods: We examined the effect of negative conditioning on cognitive function and amygdala reactivity in 25 FND patients and 20 healthy volunteers (HV. Participants were first conditioned to stimuli paired with negative affective or neutral (CS+/CS− information. During functional MRI, subjects then performed an instrumental associative learning task to avoid monetary losses in the context of the previously conditioned stimuli. We expected that FND patients would be better at learning to avoid losses when faced with negatively conditioned stimuli (increased harm avoidance. Multi-echo resting state fMRI was also collected from the same subjects and a robust denoising method was employed, important for removing motion and physiological artifacts. Results: FND subjects were more sensitive to the negative CS+ compared to HV, demonstrated by a reinforcement learning model. Contrary to expectation, FND patients were generally more impaired at learning to avoid losses under both contexts (CS+/CS−, persisting to choose the option that resulted in a negative outcome demonstrated by both behavioural and computational analyses. FND patients showed enhanced amygdala but reduced dorsolateral prefrontal cortex responses when they received negative feedback. Patients also had increased resting state functional connectivity between these two regions. Conclusions: FND patients had impaired instrumental avoidance learning, findings that parallel previous observations of impaired action-outcome binding. FND patients further show enhanced behavioural and neural sensitivity to negative information. However, this did not translate to improved avoidance learning. Put together, our findings do not support the theory of harm avoidance in FND

  13. Reinforcement Learning in Distributed Domains: Beyond Team Games

    Science.gov (United States)

    Wolpert, David H.; Sill, Joseph; Turner, Kagan

    2000-01-01

    Distributed search algorithms are crucial in dealing with large optimization problems, particularly when a centralized approach is not only impractical but infeasible. Many machine learning concepts have been applied to search algorithms in order to improve their effectiveness. In this article we present an algorithm that blends Reinforcement Learning (RL) and hill climbing directly, by using the RL signal to guide the exploration step of a hill climbing algorithm. We apply this algorithm to the domain of a constellations of communication satellites where the goal is to minimize the loss of importance weighted data. We introduce the concept of 'ghost' traffic, where correctly setting this traffic induces the satellites to act to optimize the world utility. Our results indicated that the bi-utility search introduced in this paper outperforms both traditional hill climbing algorithms and distributed RL approaches such as team games.

  14. Lessons from a Pluralist Approach to a Wicked Policy Issue

    Directory of Open Access Journals (Sweden)

    Jake Chapman

    2010-03-01

    Full Text Available The most difficult policy issues are those where there are profound disagreements about what is wrong, what should be done, and how things work. This paper describes a pluralist approach, based on the soft systems methodology, to youth nuisance on deprived estates in Manchester, UK, where there were profound disagreements between the agencies involved. When there are disagreements about the nature of the problem, its causes, or about how the system of interest actually functioned, a pluralist approach is required, and this is provided by Checkland’s soft systems approach. When the disagreements involve conflicts of value, it is necessary to adopt an adaptive approach that fosters change in the values, beliefs or behaviour of those involved. In the spectrum of public sector agencies involved, five different perspectives of agencies were identified, their descriptions indicating the need for the pluralist approach taken. The project was an experiment in using systemic approaches in public policy and the paper describes the learning associated with impacting outcomes. Processes used in the project included a “soft systems” workshop, which is described along with some effects on both the project participants and overall outcomes. The overall aim is to share the experience of this project so that it may inform those working with systemic approaches and other pluralist methods on wicked problems in the public sector.

  15. Quantization-Based Adaptive Actor-Critic Tracking Control With Tracking Error Constraints.

    Science.gov (United States)

    Fan, Quan-Yong; Yang, Guang-Hong; Ye, Dan

    2018-04-01

    In this paper, the problem of adaptive actor-critic (AC) tracking control is investigated for a class of continuous-time nonlinear systems with unknown nonlinearities and quantized inputs. Different from the existing results based on reinforcement learning, the tracking error constraints are considered and new critic functions are constructed to improve the performance further. To ensure that the tracking errors keep within the predefined time-varying boundaries, a tracking error transformation technique is used to constitute an augmented error system. Specific critic functions, rather than the long-term cost function, are introduced to supervise the tracking performance and tune the weights of the AC neural networks (NNs). A novel adaptive controller with a special structure is designed to reduce the effect of the NN reconstruction errors, input quantization, and disturbances. Based on the Lyapunov stability theory, the boundedness of the closed-loop signals and the desired tracking performance can be guaranteed. Finally, simulations on two connected inverted pendulums are given to illustrate the effectiveness of the proposed method.

  16. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    Science.gov (United States)

    Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane

    2017-01-01

    Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning. PMID:29163004

  17. Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Lucas Kastner

    2017-10-01

    Full Text Available Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1 Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2 Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3 Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.

  18. An artificial functional family filter in homolog searching in next-generation sequencing metagenomics.

    Directory of Open Access Journals (Sweden)

    Ruofei Du

    Full Text Available In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures.

  19. Reinforcement learning techniques for controlling resources in power networks

    Science.gov (United States)

    Kowli, Anupama Sunil

    As power grids transition towards increased reliance on renewable generation, energy storage and demand response resources, an effective control architecture is required to harness the full functionalities of these resources. There is a critical need for control techniques that recognize the unique characteristics of the different resources and exploit the flexibility afforded by them to provide ancillary services to the grid. The work presented in this dissertation addresses these needs. Specifically, new algorithms are proposed, which allow control synthesis in settings wherein the precise distribution of the uncertainty and its temporal statistics are not known. These algorithms are based on recent developments in Markov decision theory, approximate dynamic programming and reinforcement learning. They impose minimal assumptions on the system model and allow the control to be "learned" based on the actual dynamics of the system. Furthermore, they can accommodate complex constraints such as capacity and ramping limits on generation resources, state-of-charge constraints on storage resources, comfort-related limitations on demand response resources and power flow limits on transmission lines. Numerical studies demonstrating applications of these algorithms to practical control problems in power systems are discussed. Results demonstrate how the proposed control algorithms can be used to improve the performance and reduce the computational complexity of the economic dispatch mechanism in a power network. We argue that the proposed algorithms are eminently suitable to develop operational decision-making tools for large power grids with many resources and many sources of uncertainty.

  20. The learning evaluations of the concept function in the mathematical subject I

    Directory of Open Access Journals (Sweden)

    Wilmer Valle Castañeda

    2018-03-01

    Full Text Available The evaluation must be one of the most complex tasks that teachers face today, both for the process itself and for having to issue an assessment about the achievements and deficiencies of the students. It is for them that techniques and instruments were developed, which allow the evaluation of the function concept in the Mathematics I subject´s. Methods of the theoretical level, of the empirical level such as the historical-logical analysis, the surveys, were used in the research carried out. The documentary analyses, as well as procedures such as the analysis - synthesis that made it possible to investigate the theoretical and practical fundament´s learning evaluation´s. The evaluation instruments presented allowed for the evaluation of the students in Mathematics I, less than one of the most important functions of the evaluation: the formative or educational function. These constituted a reference for the continuous improvement of student learning.

  1. Selectionist and evolutionary approaches to brain function: a critical appraisal

    Directory of Open Access Journals (Sweden)

    Chrisantha Thomas Fernando

    2012-04-01

    Full Text Available We consider approaches to brain dynamics and function that have been claimed to be Darwinian. These include Edelman’s theory of neuronal group selection, Changeux’s theory of synaptic selection and selective stabilization of pre-representations, Seung’s Darwinian synapse, Loewenstein’s synaptic melioration, Adam’s selfish synapse and Calvin’s replicating activity patterns. Except for the last two, the proposed mechanisms are selectionist but not truly Darwinian, because no replicators with information transfer to copies and hereditary variation can be identified in them. All of them fit, however, a generalized selectionist framework conforming to the picture of Price’s covariance formulation, which deliberately was not specific even to selection in biology, and therefore does not imply an algorithmic picture of biological evolution. Bayesian models and reinforcement learning are formally in agreement with selection dynamics. A classification of search algorithms is shown to include Darwinian replicators (evolutionary units with multiplication, heredity and variability as the most powerful mechanism in a sparsely occupied search space. Examples of why parallel competitive search with information transfer among the units is efficient are given. Finally, we review our recent attempts to construct and analyze simple models of true Darwinian evolutionary units in the brain in terms of connectivity and activity copying of neuronal groups. Although none of the proposed neuronal replicators include miraculous mechanisms, their identification remains a challenge but also a great promise.

  2. Design issues of a reinforcement-based self-learning fuzzy controller for petrochemical process control

    Science.gov (United States)

    Yen, John; Wang, Haojin; Daugherity, Walter C.

    1992-01-01

    Fuzzy logic controllers have some often-cited advantages over conventional techniques such as PID control, including easier implementation, accommodation to natural language, and the ability to cover a wider range of operating conditions. One major obstacle that hinders the broader application of fuzzy logic controllers is the lack of a systematic way to develop and modify their rules; as a result the creation and modification of fuzzy rules often depends on trial and error or pure experimentation. One of the proposed approaches to address this issue is a self-learning fuzzy logic controller (SFLC) that uses reinforcement learning techniques to learn the desirability of states and to adjust the consequent part of its fuzzy control rules accordingly. Due to the different dynamics of the controlled processes, the performance of a self-learning fuzzy controller is highly contingent on its design. The design issue has not received sufficient attention. The issues related to the design of a SFLC for application to a petrochemical process are discussed, and its performance is compared with that of a PID and a self-tuning fuzzy logic controller.

  3. Seven-day human biological rhythms: An expedition in search of their origin, synchronization, functional advantage, adaptive value and clinical relevance.

    Science.gov (United States)

    Reinberg, Alain E; Dejardin, Laurence; Smolensky, Michael H; Touitou, Yvan

    2017-01-01

    This fact-finding expedition explores the perspectives and knowledge of the origin and functional relevance of the 7 d domain of the biological time structure, with special reference to human beings. These biological rhythms are displayed at various levels of organization in diverse species - from the unicellular sea algae of Acetabularia and Goniaulax to plants, insects, fish, birds and mammals, including man - under natural as well as artificial, i.e. constant, environmental conditions. Nonetheless, very little is known about their derivation, functional advantage, adaptive value, synchronization and potential clinical relevance. About 7 d cosmic cycles are seemingly too weak, and the 6 d work/1 d rest week commanded from G-d through the Laws of Mosses to the Hebrews is too recent an event to be the origin in humans. Moreover, human and insect studies conducted under controlled constant conditions devoid of environmental, social and other time cues report the persistence of 7 d rhythms, but with a slightly different (free-running) period (τ), indicating their source is endogenous. Yet, a series of human and laboratory rodent studies reveal certain mainly non-cyclic exogenous events can trigger 7 d rhythm-like phenomena. However, it is unknown whether such triggers unmask, amplify and/or synchronize previous non-overtly expressed oscillations. Circadian (~24 h), circa-monthly (~30 d) and circannual (~1 y) rhythms are viewed as genetically based features of life forms that during evolution conferred significant functional advantage to individual organisms and survival value to species. No such advantages are apparent for endogenous 7 d rhythms, raising several questions: What is the significance of the 7 d activity/rest cycle, i.e. week, storied in the Book of Genesis and adopted by the Hebrews and thereafter the residents of nearby Mediterranean countries and ultimately the world? Why do humans require 1 d off per 7 d span? Do 7 d rhythms bestow functional

  4. The Dynamic Interplay among EFL Learners' Ambiguity Tolerance, Adaptability, Cultural Intelligence, Learning Approach, and Language Achievement

    Science.gov (United States)

    Alahdadi, Shadi; Ghanizadeh, Afsaneh

    2017-01-01

    A key objective of education is to prepare individuals to be fully-functioning learners. This entails developing the cognitive, metacognitive, motivational, cultural, and emotional competencies. The present study aimed to examine the interrelationships among adaptability, tolerance of ambiguity, cultural intelligence, learning approach, and…

  5. Efficient model learning methods for actor-critic control.

    Science.gov (United States)

    Grondman, Ivo; Vaandrager, Maarten; Buşoniu, Lucian; Babuska, Robert; Schuitema, Erik

    2012-06-01

    We propose two new actor-critic algorithms for reinforcement learning. Both algorithms use local linear regression (LLR) to learn approximations of the functions involved. A crucial feature of the algorithms is that they also learn a process model, and this, in combination with LLR, provides an efficient policy update for faster learning. The first algorithm uses a novel model-based update rule for the actor parameters. The second algorithm does not use an explicit actor but learns a reference model which represents a desired behavior, from which desired control actions can be calculated using the inverse of the learned process model. The two novel methods and a standard actor-critic algorithm are applied to the pendulum swing-up problem, in which the novel methods achieve faster learning than the standard algorithm.

  6. Multiple kernel learning using single stage function approximation for binary classification problems

    Science.gov (United States)

    Shiju, S.; Sumitra, S.

    2017-12-01

    In this paper, the multiple kernel learning (MKL) is formulated as a supervised classification problem. We dealt with binary classification data and hence the data modelling problem involves the computation of two decision boundaries of which one related with that of kernel learning and the other with that of input data. In our approach, they are found with the aid of a single cost function by constructing a global reproducing kernel Hilbert space (RKHS) as the direct sum of the RKHSs corresponding to the decision boundaries of kernel learning and input data and searching that function from the global RKHS, which can be represented as the direct sum of the decision boundaries under consideration. In our experimental analysis, the proposed model had shown superior performance in comparison with that of existing two stage function approximation formulation of MKL, where the decision functions of kernel learning and input data are found separately using two different cost functions. This is due to the fact that single stage representation helps the knowledge transfer between the computation procedures for finding the decision boundaries of kernel learning and input data, which inturn boosts the generalisation capacity of the model.

  7. Optimal Wonderful Life Utility Functions in Multi-Agent Systems

    Science.gov (United States)

    Wolpert, David H.; Tumer, Kagan; Swanson, Keith (Technical Monitor)

    2000-01-01

    The mathematics of Collective Intelligence (COINs) is concerned with the design of multi-agent systems so as to optimize an overall global utility function when those systems lack centralized communication and control. Typically in COINs each agent runs a distinct Reinforcement Learning (RL) algorithm, so that much of the design problem reduces to how best to initialize/update each agent's private utility function, as far as the ensuing value of the global utility is concerned. Traditional team game solutions to this problem assign to each agent the global utility as its private utility function. In previous work we used the COIN framework to derive the alternative Wonderful Life Utility (WLU), and experimentally established that having the agents use it induces global utility performance up to orders of magnitude superior to that induced by use of the team game utility. The WLU has a free parameter (the clamping parameter) which we simply set to zero in that previous work. Here we derive the optimal value of the clamping parameter, and demonstrate experimentally that using that optimal value can result in significantly improved performance over that of clamping to zero, over and above the improvement beyond traditional approaches.

  8. Two spatiotemporally distinct value systems shape reward-based learning in the human brain.

    Science.gov (United States)

    Fouragnan, Elsa; Retzler, Chris; Mullinger, Karen; Philiastides, Marios G

    2015-09-08

    Avoiding repeated mistakes and learning to reinforce rewarding decisions is critical for human survival and adaptive actions. Yet, the neural underpinnings of the value systems that encode different decision-outcomes remain elusive. Here coupling single-trial electroencephalography with simultaneously acquired functional magnetic resonance imaging, we uncover the spatiotemporal dynamics of two separate but interacting value systems encoding decision-outcomes. Consistent with a role in regulating alertness and switching behaviours, an early system is activated only by negative outcomes and engages arousal-related and motor-preparatory brain structures. Consistent with a role in reward-based learning, a later system differentially suppresses or activates regions of the human reward network in response to negative and positive outcomes, respectively. Following negative outcomes, the early system interacts and downregulates the late system, through a thalamic interaction with the ventral striatum. Critically, the strength of this coupling predicts participants' switching behaviour and avoidance learning, directly implicating the thalamostriatal pathway in reward-based learning.

  9. Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.

    Science.gov (United States)

    Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

    2015-01-01

    Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.

  10. Development of fuzzy algorithm with learning function for nuclear steam generator level control

    International Nuclear Information System (INIS)

    Park, Gee Yong; Seong, Poong Hyun

    1993-01-01

    A fuzzy algorithm with learning function is applied to the steam generator level control of nuclear power plant. This algorithm can make its rule base and membership functions suited for steam generator level control by use of the data obtained from the control actions of a skilled operator or of other controllers (i.e., PID controller). The rule base of fuzzy controller with learning function is divided into two parts. One part of the rule base is provided to level control of steam generator at low power level (0 % - 30 % of full power) and the other to level control at high power level (30 % - 100 % of full power). Response time of steam generator level control at low power range with this rule base is shown to be shorter than that of fuzzy controller with direct inference. (Author)

  11. Cross-organism learning method to discover new gene functionalities.

    Science.gov (United States)

    Domeniconi, Giacomo; Masseroli, Marco; Moro, Gianluca; Pinoli, Pietro

    2016-04-01

    Knowledge of gene and protein functions is paramount for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. Analyses for biomedical knowledge discovery greatly benefit from the availability of gene and protein functional feature descriptions expressed through controlled terminologies and ontologies, i.e., of gene and protein biomedical controlled annotations. In the last years, several databases of such annotations have become available; yet, these valuable annotations are incomplete, include errors and only some of them represent highly reliable human curated information. Computational techniques able to reliably predict new gene or protein annotations with an associated likelihood value are thus paramount. Here, we propose a novel cross-organisms learning approach to reliably predict new functionalities for the genes of an organism based on the known controlled annotations of the genes of another, evolutionarily related and better studied, organism. We leverage a new representation of the annotation discovery problem and a random perturbation of the available controlled annotations to allow the application of supervised algorithms to predict with good accuracy unknown gene annotations. Taking advantage of the numerous gene annotations available for a well-studied organism, our cross-organisms learning method creates and trains better prediction models, which can then be applied to predict new gene annotations of a target organism. We tested and compared our method with the equivalent single organism approach on different gene annotation datasets of five evolutionarily related organisms (Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum). Results show both the usefulness of the perturbation method of available annotations for better prediction model training and a great improvement of the cross-organism models with respect to the single-organism ones

  12. Exploiting Best-Match Equations for Efficient Reinforcement Learning

    NARCIS (Netherlands)

    van Seijen, Harm; Whiteson, Shimon; van Hasselt, Hado; Wiering, Marco

    This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of model-free methods. Best-match learning works by approximating the solution to a set of best-match equations,

  13. Iterative learning-based decentralized adaptive tracker for large-scale systems: a digital redesign approach.

    Science.gov (United States)

    Tsai, Jason Sheng-Hong; Du, Yan-Yi; Huang, Pei-Hsiang; Guo, Shu-Mei; Shieh, Leang-San; Chen, Yuhua

    2011-07-01

    In this paper, a digital redesign methodology of the iterative learning-based decentralized adaptive tracker is proposed to improve the dynamic performance of sampled-data linear large-scale control systems consisting of N interconnected multi-input multi-output subsystems, so that the system output will follow any trajectory which may not be presented by the analytic reference model initially. To overcome the interference of each sub-system and simplify the controller design, the proposed model reference decentralized adaptive control scheme constructs a decoupled well-designed reference model first. Then, according to the well-designed model, this paper develops a digital decentralized adaptive tracker based on the optimal analog control and prediction-based digital redesign technique for the sampled-data large-scale coupling system. In order to enhance the tracking performance of the digital tracker at specified sampling instants, we apply the iterative learning control (ILC) to train the control input via continual learning. As a result, the proposed iterative learning-based decentralized adaptive tracker not only has robust closed-loop decoupled property but also possesses good tracking performance at both transient and steady state. Besides, evolutionary programming is applied to search for a good learning gain to speed up the learning process of ILC. Copyright © 2011 ISA. Published by Elsevier Ltd. All rights reserved.

  14. Clarifying Inconclusive Functional Analysis Results: Assessment and Treatment of Automatically Reinforced Aggression

    Science.gov (United States)

    Saini, Valdeep; Greer, Brian D.; Fisher, Wayne W.

    2016-01-01

    We conducted a series of studies in which multiple strategies were used to clarify the inconclusive results of one boy’s functional analysis of aggression. Specifically, we (a) evaluated individual response topographies to determine the composition of aggregated response rates, (b) conducted a separate functional analysis of aggression after high rates of disruption masked the consequences maintaining aggression during the initial functional analysis, (c) modified the experimental design used during the functional analysis of aggression to improve discrimination and decrease interaction effects between conditions, and (d) evaluated a treatment matched to the reinforcer hypothesized to maintain aggression. An effective yet practical intervention for aggression was developed based on the results of these analyses and from data collected during the matched-treatment evaluation. PMID:25891269

  15. Adaptive iterated function systems filter for images highly corrupted with fixed - Value impulse noise

    Science.gov (United States)

    Shanmugavadivu, P.; Eliahim Jeevaraj, P. S.

    2014-06-01

    The Adaptive Iterated Functions Systems (AIFS) Filter presented in this paper has an outstanding potential to attenuate the fixed-value impulse noise in images. This filter has two distinct phases namely noise detection and noise correction which uses Measure of Statistics and Iterated Function Systems (IFS) respectively. The performance of AIFS filter is assessed by three metrics namely, Peak Signal-to-Noise Ratio (PSNR), Mean Structural Similarity Index Matrix (MSSIM) and Human Visual Perception (HVP). The quantitative measures PSNR and MSSIM endorse the merit of this filter in terms of degree of noise suppression and details/edge preservation respectively, in comparison with the high performing filters reported in the recent literature. The qualitative measure HVP confirms the noise suppression ability of the devised filter. This computationally simple noise filter broadly finds application wherein the images are highly degraded by fixed-value impulse noise.

  16. Teaching Functional Patterns through Robotic Applications

    Directory of Open Access Journals (Sweden)

    J. Boender

    2016-11-01

    Full Text Available We present our approach to teaching functional programming to First Year Computer Science students at Middlesex University through projects in robotics. A holistic approach is taken to the curriculum, emphasising the connections between different subject areas. A key part of the students' learning is through practical projects that draw upon and integrate the taught material. To support these, we developed the Middlesex Robotic plaTfOrm (MIRTO, an open-source platform built using Raspberry Pi, Arduino, HUB-ee wheels and running Racket (a LISP dialect. In this paper we present the motivations for our choices and explain how a number of concepts of functional programming may be employed when programming robotic applications. We present some students' work with robotics projects: we consider the use of robotics projects to have been a success, both for their value in reinforcing students' understanding of programming concepts and for their value in motivating the students.

  17. Imbalanced Learning for Functional State Assessment

    Science.gov (United States)

    Li, Feng; McKenzie, Frederick; Li, Jiang; Zhang, Guangfan; Xu, Roger; Richey, Carl; Schnell, Tom

    2011-01-01

    This paper presents results of several imbalanced learning techniques applied to operator functional state assessment where the data is highly imbalanced, i.e., some function states (majority classes) have much more training samples than other states (minority classes). Conventional machine learning techniques usually tend to classify all data samples into majority classes and perform poorly for minority classes. In this study, we implemented five imbalanced learning techniques, including random undersampling, random over-sampling, synthetic minority over-sampling technique (SMOTE), borderline-SMOTE and adaptive synthetic sampling (ADASYN) to solve this problem. Experimental results on a benchmark driving lest dataset show thai accuracies for minority classes could be improved dramatically with a cost of slight performance degradations for majority classes,

  18. Reinforcement and inference in cross-situational word learning.

    Science.gov (United States)

    Tilles, Paulo F C; Fontanari, José F

    2013-01-01

    Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation) inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only) reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.

  19. Parameters identification of photovoltaic models using self-adaptive teaching-learning-based optimization

    International Nuclear Information System (INIS)

    Yu, Kunjie; Chen, Xu; Wang, Xin; Wang, Zhenlei

    2017-01-01

    Highlights: • SATLBO is proposed to identify the PV model parameters efficiently. • In SATLBO, the learners self-adaptively select different learning phases. • An elite learning is developed in teacher phase to perform local searching. • A diversity learning is proposed in learner phase to maintain population diversity. • SATLBO achieves the first in ranking on overall performance among nine algorithms. - Abstract: Parameters identification of photovoltaic (PV) model based on measured current-voltage characteristic curves plays an important role in the simulation and evaluation of PV systems. To accurately and reliably identify the PV model parameters, a self-adaptive teaching-learning-based optimization (SATLBO) is proposed in this paper. In SATLBO, the learners can self-adaptively select different learning phases based on their knowledge level. The better learners are more likely to choose the learner phase for improving the population diversity, while the worse learners tend to choose the teacher phase to enhance the convergence rate. Thus, learners at different levels focus on different searching abilities to efficiently enhance the performance of algorithm. In addition, to improve the searching ability of different learning phases, an elite learning strategy and a diversity learning method are introduced into the teacher phase and learner phase, respectively. The performance of SATLBO is firstly evaluated on 34 benchmark functions, and experimental results show that SATLBO achieves the first in ranking on the overall performance among nine algorithms. Then, SATLBO is employed to identify parameters of different PV models, i.e., single diode, double diode, and PV module. Experimental results indicate that SATLBO exhibits high accuracy and reliability compared with other parameter extraction methods.

  20. Representations of Multiple-Valued Logic Functions

    CERN Document Server

    Stankovic, Radomir S

    2012-01-01

    Compared to binary switching functions, multiple-valued functions offer more compact representations of the information content of signals modeled by logic functions and, therefore, their use fits very well in the general settings of data compression attempts and approaches. The first task in dealing with such signals is to provide mathematical methods for their representation in a way that will make their application in practice feasible.Representation of Multiple-Valued Logic Functions is aimed at providing an accessible introduction to these mathematical techniques that are necessary for ap

  1. Executive Function and Adaptive Behavior in Muenke Syndrome.

    Science.gov (United States)

    Yarnell, Colin M P; Addissie, Yonit A; Hadley, Donald W; Guillen Sacoto, Maria J; Agochukwu, Nneamaka B; Hart, Rachel A; Wiggs, Edythe A; Platte, Petra; Paelecke, Yvonne; Collmann, Hartmut; Schweitzer, Tilmann; Kruszka, Paul; Muenke, Maximilian

    2015-08-01

    To investigate executive function and adaptive behavior in individuals with Muenke syndrome using validated instruments with a normative population and unaffected siblings as controls. Participants in this cross-sectional study included individuals with Muenke syndrome (P250R mutation in FGFR3) and their mutation-negative siblings. Participants completed validated assessments of executive functioning (Behavior Rating Inventory of Executive Function [BRIEF]) and adaptive behavior skills (Adaptive Behavior Assessment System, Second Edition [ABAS-II]). Forty-four with a positive FGFR3 mutation, median age 9 years, range 7 months to 52 years were enrolled. In addition, 10 unaffected siblings served as controls (5 males, 5 females; median age, 13 years; range, 3-18 years). For the General Executive Composite scale of the BRIEF, 32.1% of the cohort had scores greater than +1.5 SD, signifying potential clinical significance. For the General Adaptive Composite of the ABAS-II, 28.2% of affected individuals scored in the 3rd-8th percentile of the normative population, and 56.4% were below the average category (General Executive Composite and the ABAS-II General Adaptive Composite. Individuals with Muenke syndrome are at an increased risk for developing adaptive and executive function behavioral changes compared with a normative population and unaffected siblings. Published by Elsevier Inc.

  2. Cohesive fracture model for functionally graded fiber reinforced concrete

    International Nuclear Information System (INIS)

    Park, Kyoungsoo; Paulino, Glaucio H.; Roesler, Jeffery

    2010-01-01

    A simple, effective, and practical constitutive model for cohesive fracture of fiber reinforced concrete is proposed by differentiating the aggregate bridging zone and the fiber bridging zone. The aggregate bridging zone is related to the total fracture energy of plain concrete, while the fiber bridging zone is associated with the difference between the total fracture energy of fiber reinforced concrete and the total fracture energy of plain concrete. The cohesive fracture model is defined by experimental fracture parameters, which are obtained through three-point bending and split tensile tests. As expected, the model describes fracture behavior of plain concrete beams. In addition, it predicts the fracture behavior of either fiber reinforced concrete beams or a combination of plain and fiber reinforced concrete functionally layered in a single beam specimen. The validated model is also applied to investigate continuously, functionally graded fiber reinforced concrete composites.

  3. Translation and adaptation of functional auditory performance indicators (FAPI

    Directory of Open Access Journals (Sweden)

    Karina Ferreira

    2011-12-01

    Full Text Available Work with deaf children has gained new attention since the expectation and goal of therapy has expanded to language development and subsequent language learning. Many clinical tests were developed for evaluation of speech sound perception in young children in response to the need for accurate assessment of hearing skills that developed from the use of individual hearing aids or cochlear implants. These tests also allow the evaluation of the rehabilitation program. However, few of these tests are available in Portuguese. Evaluation with the Functional Auditory Performance Indicators (FAPI generates a child's functional auditory skills profile, which lists auditory skills in an integrated and hierarchical order. It has seven hierarchical categories, including sound awareness, meaningful sound, auditory feedback, sound source localizing, auditory discrimination, short-term auditory memory, and linguistic auditory processing. FAPI evaluation allows the therapist to map the child's hearing profile performance, determine the target for increasing the hearing abilities, and develop an effective therapeutic plan. Objective: Since the FAPI is an American test, the inventory was adapted for application in the Brazilian population. Material and Methods: The translation was done following the steps of translation and back translation, and reproducibility was evaluated. Four translated versions (two originals and two back-translated were compared, and revisions were done to ensure language adaptation and grammatical and idiomatic equivalence. Results: The inventory was duly translated and adapted. Conclusion: Further studies about the application of the translated FAPI are necessary to make the test practicable in Brazilian clinical use.

  4. Reduced density matrix functional theory via a wave function based approach

    Energy Technology Data Exchange (ETDEWEB)

    Schade, Robert; Bloechl, Peter [Institute for Theoretical Physics, Clausthal University of Technology, Clausthal (Germany); Pruschke, Thomas [Institute for Theoretical Physics, University of Goettingen, Goettingen (Germany)

    2016-07-01

    We propose a new method for the calculation of the electronic and atomic structure of correlated electron systems based on reduced density matrix functional theory (rDMFT). The density-matrix functional is evaluated on the fly using Levy's constrained search formalism. The present implementation rests on a local approximation of the interaction reminiscent to that of dynamical mean field theory (DMFT). We focus here on additional approximations to the exact density-matrix functional in the local approximation and evaluate their performance.

  5. Reusable Reinforcement Learning via Shallow Trails.

    Science.gov (United States)

    Yu, Yang; Chen, Shi-Yong; Da, Qing; Zhou, Zhi-Hua

    2018-06-01

    Reinforcement learning has shown great success in helping learning agents accomplish tasks autonomously from environment interactions. Meanwhile in many real-world applications, an agent needs to accomplish not only a fixed task but also a range of tasks. For this goal, an agent can learn a metapolicy over a set of training tasks that are drawn from an underlying distribution. By maximizing the total reward summed over all the training tasks, the metapolicy can then be reused in accomplishing test tasks from the same distribution. However, in practice, we face two major obstacles to train and reuse metapolicies well. First, how to identify tasks that are unrelated or even opposite with each other, in order to avoid their mutual interference in the training. Second, how to characterize task features, according to which a metapolicy can be reused. In this paper, we propose the MetA-Policy LEarning (MAPLE) approach that overcomes the two difficulties by introducing the shallow trail. It probes a task by running a roughly trained policy. Using the rewards of the shallow trail, MAPLE automatically groups similar tasks. Moreover, when the task parameters are unknown, the rewards of the shallow trail also serve as task features. Empirical studies on several controlling tasks verify that MAPLE can train metapolicies well and receives high reward on test tasks.

  6. [Probiotics as functional food products: manufacture and approaches to evaluating of the effectiveness].

    Science.gov (United States)

    Markova, Iu M; Sheveleva, S A

    2014-01-01

    This review concerns the issues of foodfortifications and the creation of functional foods (FF) and food supplements based on probiotics and covers an issue of approaches to the regulation of probiotic food products in various countries. The status of functional foods, optimizing GIT functions, as a separate category of FF is emphasized. Considering the strain-specificity effect of probiotics, the minimum criteria used for probiotics in food products are: 1) the need to identify a probiotics at genus, species, and strain levels, using the high-resolution techniques, 2) the viability and the presence of a sufficient amount of the probiotic in product at the end of shelf life, 3) the proof of functional characteristics inherent to probiotic strains, in the controlled experiments. The recommended by FA O/WHO three-stage evaluation procedure offunctional efficiency of FF includes: Phase I--safety assessment in in vitro and in vivo experiments, Phase II--Evaluation in the Double-Blind, Randomized, Placebo-Controlled trial (DBRPC) and Phase III--Post-approval monitoring. It is noted that along with the ability to obtain statistically significant results of the evaluation, there are practical difficulties of conducting DBRPC (duration, costs, difficulties in selection of target biomarkers and populations). The promising approach for assessing the functional efficacy of FF is the concept of nutrigenomics. It examines the link between the human diet and the characteristics of his genome to determine the influence of food on the expression of genes and, ultimately, to human health. Nutrigenomic approaches are promising to assess the impact of probiotics in healthy people. The focusing on the nutrigenomic response of intestinal microbial community and its individual populations (in this regard the lactobacilli can be very informative) was proposed.

  7. Evaluating a multispecies adaptive management framework: Must uncertainty impede effective decision-making?

    Science.gov (United States)

    Smith, David R.; McGowan, Conor P.; Daily, Jonathan P.; Nichols, James D.; Sweka, John A.; Lyons, James E.

    2013-01-01

    Application of adaptive management to complex natural resource systems requires careful evaluation to ensure that the process leads to improved decision-making. As part of that evaluation, adaptive policies can be compared with alternative nonadaptive management scenarios. Also, the value of reducing structural (ecological) uncertainty to achieving management objectives can be quantified.A multispecies adaptive management framework was recently adopted by the Atlantic States Marine Fisheries Commission for sustainable harvest of Delaware Bay horseshoe crabs Limulus polyphemus, while maintaining adequate stopover habitat for migrating red knots Calidris canutus rufa, the focal shorebird species. The predictive model set encompassed the structural uncertainty in the relationships between horseshoe crab spawning, red knot weight gain and red knot vital rates. Stochastic dynamic programming was used to generate a state-dependent strategy for harvest decisions given that uncertainty. In this paper, we employed a management strategy evaluation approach to evaluate the performance of this adaptive management framework. Active adaptive management was used by including model weights as state variables in the optimization and reducing structural uncertainty by model weight updating.We found that the value of information for reducing structural uncertainty is expected to be low, because the uncertainty does not appear to impede effective management. Harvest policy responded to abundance levels of both species regardless of uncertainty in the specific relationship that generated those abundances. Thus, the expected horseshoe crab harvest and red knot abundance were similar when the population generating model was uncertain or known, and harvest policy was robust to structural uncertainty as specified.Synthesis and applications. The combination of management strategy evaluation with state-dependent strategies from stochastic dynamic programming was an informative approach to

  8. Re-examination of sea lamprey control policies for the St. Marys River: Completion of an adaptive management cycle

    Science.gov (United States)

    Jones, Michael L.; Brenden, Travis O.; Irwin, Brian J.

    2015-01-01

    The St. Marys River (SMR) historically has been a major producer of sea lampreys (Petromyzon marinus) in the Laurentian Great Lakes. In the early 2000s, a decision analysis (DA) project was conducted to evaluate sea lamprey control policies for the SMR; this project suggested that an integrated policy of trapping, sterile male releases, and Bayluscide treatment was the most cost-effective policy. Further, it concluded that formal assessment of larval sea lamprey abundance and distribution in the SMR would be valuable for future evaluation of control strategies. We updated this earlier analysis, adding information from annual larval assessments conducted since 1999 and evaluating additional control policies. Bayluscide treatments continued to be critical for sea lamprey control, but high recruitment compensation minimized the effectiveness of trapping and sterile male release under current feasible ranges. Because Bayluscide control is costly, development of strategies to enhance trapping success remains a priority. This study illustrates benefits of an adaptive management cycle, wherein models inform decisions, are updated based on learning achieved from those decisions, and ultimately inform future decisions.

  9. Kernel Temporal Differences for Neural Decoding

    Science.gov (United States)

    Bae, Jihye; Sanchez Giraldo, Luis G.; Pohlmeyer, Eric A.; Francis, Joseph T.; Sanchez, Justin C.; Príncipe, José C.

    2015-01-01

    We study the feasibility and capability of the kernel temporal difference (KTD)(λ) algorithm for neural decoding. KTD(λ) is an online, kernel-based learning algorithm, which has been introduced to estimate value functions in reinforcement learning. This algorithm combines kernel-based representations with the temporal difference approach to learning. One of our key observations is that by using strictly positive definite kernels, algorithm's convergence can be guaranteed for policy evaluation. The algorithm's nonlinear functional approximation capabilities are shown in both simulations of policy evaluation and neural decoding problems (policy improvement). KTD can handle high-dimensional neural states containing spatial-temporal information at a reasonable computational complexity allowing real-time applications. When the algorithm seeks a proper mapping between a monkey's neural states and desired positions of a computer cursor or a robot arm, in both open-loop and closed-loop experiments, it can effectively learn the neural state to action mapping. Finally, a visualization of the coadaptation process between the decoder and the subject shows the algorithm's capabilities in reinforcement learning brain machine interfaces. PMID:25866504

  10. Report on Adaptive Force, a specific neuromuscular function

    Directory of Open Access Journals (Sweden)

    Marko Hoff

    2015-08-01

    Full Text Available In real life motions, as well as in sports, the adaptation of the neuromuscular systems to externally applied forces plays an important role. The term Adaptive Force (AF shall characterize the ability of the nerve-muscle-system to adapt to impacting external forces during isometric and eccentric muscle action. The focus in this paper is on the concept of this neuromuscular action, which is not yet described in this way. A measuring system was constructed and evaluated for this specific neuromuscular function, but only the main information of the evaluation of the measuring system and the preliminary reference values are mentioned here, while an article with detailed description will be published separately. This paper concentrates on the three following points: 1 What is the peculiarity of this neuromuscular function, introduced as AF? 2 Is the measuring system able to capture its specific characteristics and which phases of measurement occur? 3 It seems reasonable to discuss if AF can be distinguished and classified among the known force concepts. The article describes the measuring system and how it is able to capture special features of real life motions like submaximal intensities and the subjects’ option to react adequately on external varying forces. Furthermore, within one measurement the system records three different force qualities: the isometric submaximal Adaptive Force (AFiso, the maximal isometric Adaptive Force (AFisomax and the maximal eccentric Adaptive Force (AFeccmax. Each of these phases provide different and unique information on the nerve-muscle-system that are discussed in detail. Important, in terms of the Adaptive Force, seems to be the combination of conditional and coordinative abilities.

  11. Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task.

    Science.gov (United States)

    Kinjo, Ken; Uchibe, Eiji; Doya, Kenji

    2013-01-01

    Linearly solvable Markov Decision Process (LMDP) is a class of optimal control problem in which the Bellman's equation can be converted into a linear equation by an exponential transformation of the state value function (Todorov, 2009b). In an LMDP, the optimal value function and the corresponding control policy are obtained by solving an eigenvalue problem in a discrete state space or an eigenfunction problem in a continuous state using the knowledge of the system dynamics and the action, state, and terminal cost functions. In this study, we evaluate the effectiveness of the LMDP framework in real robot control, in which the dynamics of the body and the environment have to be learned from experience. We first perform a simulation study of a pole swing-up task to evaluate the effect of the accuracy of the learned dynamics model on the derived the action policy. The result shows that a crude linear approximation of the non-linear dynamics can still allow solution of the task, despite with a higher total cost. We then perform real robot experiments of a battery-catching task using our Spring Dog mobile robot platform. The state is given by the position and the size of a battery in its camera view and two neck joint angles. The action is the velocities of two wheels, while the neck joints were controlled by a visual servo controller. We test linear and bilinear dynamic models in tasks with quadratic and Guassian state cost functions. In the quadratic cost task, the LMDP controller derived from a learned linear dynamics model performed equivalently with the optimal linear quadratic regulator (LQR). In the non-quadratic task, the LMDP controller with a linear dynamics model showed the best performance. The results demonstrate the usefulness of the LMDP framework in real robot control even when simple linear models are used for dynamics learning.

  12. On valuing information in adaptive-management models.

    Science.gov (United States)

    Moore, Alana L; McCarthy, Michael A

    2010-08-01

    Active adaptive management looks at the benefit of using strategies that may be suboptimal in the near term but may provide additional information that will facilitate better management in the future. In many adaptive-management problems that have been studied, the optimal active and passive policies (accounting for learning when designing policies and designing policy on the basis of current best information, respectively) are very similar. This seems paradoxical; when faced with uncertainty about the best course of action, managers should spend very little effort on actively designing programs to learn about the system they are managing. We considered two possible reasons why active and passive adaptive solutions are often similar. First, the benefits of learning are often confined to the particular case study in the modeled scenario, whereas in reality information gained from local studies is often applied more broadly. Second, management objectives that incorporate the variance of an estimate may place greater emphasis on learning than more commonly used objectives that aim to maximize an expected value. We explored these issues in a case study of Merri Creek, Melbourne, Australia, in which the aim was to choose between two options for revegetation. We explicitly incorporated monitoring costs in the model. The value of the terminal rewards and the choice of objective both influenced the difference between active and passive adaptive solutions. Explicitly considering the cost of monitoring provided a different perspective on how the terminal reward and management objective affected learning. The states for which it was optimal to monitor did not always coincide with the states in which active and passive adaptive management differed. Our results emphasize that spending resources on monitoring is only optimal when the expected benefits of the options being considered are similar and when the pay-off for learning about their benefits is large.

  13. Longitudinal investigation on learned helplessness tested under negative and positive reinforcement involving stimulus control.

    Science.gov (United States)

    Oliveira, Emileane C; Hunziker, Maria Helena

    2014-07-01

    In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.

  14. Adaptive Analysis of Functional MRI Data

    International Nuclear Information System (INIS)

    Friman, Ola

    2003-01-01

    Functional Magnetic Resonance Imaging (fMRI) is a recently developed neuro-imaging technique with capacity to map neural activity with high spatial precision. To locate active brain areas, the method utilizes local blood oxygenation changes which are reflected as small intensity changes in a special type of MR images. The ability to non-invasively map brain functions provides new opportunities to unravel the mysteries and advance the understanding of the human brain, as well as to perform pre-surgical examinations in order to optimize surgical interventions. This dissertation introduces new approaches for the analysis of fMRI data. The detection of active brain areas is a challenging problem due to high noise levels and artifacts present in the data. A fundamental tool in the developed methods is Canonical Correlation Analysis (CCA). CCA is used in two novel ways. First as a method with the ability to fully exploit the spatio-temporal nature of fMRI data for detecting active brain areas. Established analysis approaches mainly focus on the temporal dimension of the data and they are for this reason commonly referred to as being mass-univariate. The new CCA detection method encompasses and generalizes the traditional mass-univariate methods and can in this terminology be viewed as a mass-multivariate approach. The concept of spatial basis functions is introduced as a spatial counterpart of the temporal basis functions already in use in fMRI analysis. The spatial basis functions implicitly perform an adaptive spatial filtering of the fMRI images, which significantly improves detection performance. It is also shown how prior information can be incorporated into the analysis by imposing constraints on the temporal and spatial models and a constrained version of CCA is devised to this end. A general Principal Component Analysis technique for generating and constraining temporal and spatial subspace models is proposed to be used in combination with the constrained CCA

  15. Challenges in the Verification of Reinforcement Learning Algorithms

    Science.gov (United States)

    Van Wesel, Perry; Goodloe, Alwyn E.

    2017-01-01

    Machine learning (ML) is increasingly being applied to a wide array of domains from search engines to autonomous vehicles. These algorithms, however, are notoriously complex and hard to verify. This work looks at the assumptions underlying machine learning algorithms as well as some of the challenges in trying to verify ML algorithms. Furthermore, we focus on the specific challenges of verifying reinforcement learning algorithms. These are highlighted using a specific example. Ultimately, we do not offer a solution to the complex problem of ML verification, but point out possible approaches for verification and interesting research opportunities.

  16. Supervised learning of tools for content-based search of image databases

    Science.gov (United States)

    Delanoy, Richard L.

    1996-03-01

    A computer environment, called the Toolkit for Image Mining (TIM), is being developed with the goal of enabling users with diverse interests and varied computer skills to create search tools for content-based image retrieval and other pattern matching tasks. Search tools are generated using a simple paradigm of supervised learning that is based on the user pointing at mistakes of classification made by the current search tool. As mistakes are identified, a learning algorithm uses the identified mistakes to build up a model of the user's intentions, construct a new search tool, apply the search tool to a test image, display the match results as feedback to the user, and accept new inputs from the user. Search tools are constructed in the form of functional templates, which are generalized matched filters capable of knowledge- based image processing. The ability of this system to learn the user's intentions from experience contrasts with other existing approaches to content-based image retrieval that base searches on the characteristics of a single input example or on a predefined and semantically- constrained textual query. Currently, TIM is capable of learning spectral and textural patterns, but should be adaptable to the learning of shapes, as well. Possible applications of TIM include not only content-based image retrieval, but also quantitative image analysis, the generation of metadata for annotating images, data prioritization or data reduction in bandwidth-limited situations, and the construction of components for larger, more complex computer vision algorithms.

  17. Solution to reinforcement learning problems with artificial potential field

    Institute of Scientific and Technical Information of China (English)

    XIE Li-juan; XIE Guang-rong; CHEN Huan-wen; LI Xiao-li

    2008-01-01

    A novel method was designed to solve reinforcement learning problems with artificial potential field. Firstly a reinforcement learning problem was transferred to a path planning problem by using artificial potential field(APF), which was a very appropriate method to model a reinforcement learning problem. Secondly, a new APF algorithm was proposed to overcome the local minimum problem in the potential field methods with a virtual water-flow concept. The performance of this new method was tested by a gridworld problem named as key and door maze. The experimental results show that within 45 trials, good and deterministic policies are found in almost all simulations. In comparison with WIERING's HQ-learning system which needs 20 000 trials for stable solution, the proposed new method can obtain optimal and stable policy far more quickly than HQ-learning. Therefore, the new method is simple and effective to give an optimal solution to the reinforcement learning problem.

  18. A Functional Approach to Reducing Runaway Behavior and Stabilizing Placements for Adolescents in Foster Care

    Science.gov (United States)

    Clark, Hewitt B.; Crosland, Kimberly A.; Geller, David; Cripe, Michael; Kenney, Terresa; Neff, Bryon; Dunlap, Glen

    2008-01-01

    Teenagers' running from foster placement is a significant problem in the field of child protection. This article describes a functional, behavior analytic approach to reducing running away through assessing the motivations for running, involving the youth in the assessment process, and implementing interventions to enhance the reinforcing value of…

  19. Learning-based adaptive prescribed performance control of postcapture space robot-target combination without inertia identifications

    Science.gov (United States)

    Wei, Caisheng; Luo, Jianjun; Dai, Honghua; Bian, Zilin; Yuan, Jianping

    2018-05-01

    In this paper, a novel learning-based adaptive attitude takeover control method is investigated for the postcapture space robot-target combination with guaranteed prescribed performance in the presence of unknown inertial properties and external disturbance. First, a new static prescribed performance controller is developed to guarantee that all the involved attitude tracking errors are uniformly ultimately bounded by quantitatively characterizing the transient and steady-state performance of the combination. Then, a learning-based supplementary adaptive strategy based on adaptive dynamic programming is introduced to improve the tracking performance of static controller in terms of robustness and adaptiveness only utilizing the input/output data of the combination. Compared with the existing works, the prominent advantage is that the unknown inertial properties are not required to identify in the development of learning-based adaptive control law, which dramatically decreases the complexity and difficulty of the relevant controller design. Moreover, the transient and steady-state performance is guaranteed a priori by designer-specialized performance functions without resorting to repeated regulations of the controller parameters. Finally, the three groups of illustrative examples are employed to verify the effectiveness of the proposed control method.

  20. Assessment of Postflight Locomotor Performance Utilizing a Test of Functional Mobility: Strategic and Adaptive Responses

    Science.gov (United States)

    Warren, L. E.; Mulavara, A. P.; Peters, B. T.; Cohen, H. S.; Richards, J. T.; Miller, C. A.; Brady, R.; Ruttley, T. M.; Bloomberg, J. J.

    2006-01-01

    Space flight induces adaptive modification in sensorimotor function, allowing crewmembers to operate in the unique microgravity environment. This adaptive state, however, is inappropriate for a terrestrial environment. During a re-adaptation period upon their return to Earth, crewmembers experience alterations in sensorimotor function, causing various disturbances in perception, spatial orientation, posture, gait, and eye-head coordination. Following long duration space flight, sensorimotor dysfunction would prevent or extend the time required to make an emergency egress from the vehicle; compromising crew safety and mission objectives. We are investigating two types of motor learning that may interact with each other and influence a crewmember's ability to re-adapt to Earth's gravity environment. In strategic learning, crewmembers make rapid modifications in their motor control strategy emphasizing error reduction. This type of learning may be critical during the first minutes and hours after landing. In adaptive learning, long-term plastic transformations occur, involving morphological changes and synaptic modification. In recent literature these two behavioral components have been associated with separate brain structures that control the execution of motor strategies: the strategic component was linked to the posterior parietal cortex and the adaptive component was linked to the cerebellum (Pisella, et al. 2004). The goal of this paper was to demonstrate the relative contributions of the strategic and adaptive components to the re-adaptation process in locomotor control after long duration space flight missions on the International Space Station (ISS). The Functional Mobility Test (FMT) was developed to assess crewmember s ability to ambulate postflight from an operational and functional perspective. Sixteen crewmembers were tested preflight (3 sessions) and postflight (days 1, 2, 4, 7, 25) following a long duration space flight (approx 6 months) on the ISS. We

  1. Evaluating spatial memory function in mice: a within-subjects comparison between the water maze test and its adaptation to dry land.

    Science.gov (United States)

    Llano Lopez, L; Hauser, J; Feldon, J; Gargiulo, P A; Yee, B K

    2010-05-01

    The Morris water maze (WM) is a common spatial memory test in rats. It has been adapted for evaluating genetic manipulations in mice. One major acknowledged problem of this cross-species translation is floating. We investigated here in mice the feasibility and practicality of an alternative paradigm-the cheeseboard (CB), which is a dry version of the WM, in a within-subject design allowing direct comparison with the conventional WM. Under identical task demands (reference or working memory), mice learned in the CB as efficiently as in the WM. Furthermore, individual differences in learning rate correlated between the two reference memory tests conducted separately in the two mazes. However, no such correlation was found with respect to reference memory retention or working memory performance. This study demonstrated that the CB is an effective alternative to the WM as spatial cognition test. Additional tests in the CB confirmed that the mice relied on extra maze cues in their spatial search. We would recommend the CB as a valuable addition to, rather than a replacement of the WM in phenotyping transgenic mice, because the two apparatus might diverge in the ability to detect individual differences in various domains of mnemonic functions.

  2. Humanoids Learning to Walk: A Natural CPG-Actor-Critic Architecture.

    Science.gov (United States)

    Li, Cai; Lowe, Robert; Ziemke, Tom

    2013-01-01

    The identification of learning mechanisms for locomotion has been the subject of much research for some time but many challenges remain. Dynamic systems theory (DST) offers a novel approach to humanoid learning through environmental interaction. Reinforcement learning (RL) has offered a promising method to adaptively link the dynamic system to the environment it interacts with via a reward-based value system. In this paper, we propose a model that integrates the above perspectives and applies it to the case of a humanoid (NAO) robot learning to walk the ability of which emerges from its value-based interaction with the environment. In the model, a simplified central pattern generator (CPG) architecture inspired by neuroscientific research and DST is integrated with an actor-critic approach to RL (cpg-actor-critic). In the cpg-actor-critic architecture, least-square-temporal-difference based learning converges to the optimal solution quickly by using natural gradient learning and balancing exploration and exploitation. Futhermore, rather than using a traditional (designer-specified) reward it uses a dynamic value function as a stability indicator that adapts to the environment. The results obtained are analyzed using a novel DST-based embodied cognition approach. Learning to walk, from this perspective, is a process of integrating levels of sensorimotor activity and value.

  3. Tunnel Ventilation Control Using Reinforcement Learning Methodology

    Science.gov (United States)

    Chu, Baeksuk; Kim, Dongnam; Hong, Daehie; Park, Jooyoung; Chung, Jin Taek; Kim, Tae-Hyung

    The main purpose of tunnel ventilation system is to maintain CO pollutant concentration and VI (visibility index) under an adequate level to provide drivers with comfortable and safe driving environment. Moreover, it is necessary to minimize power consumption used to operate ventilation system. To achieve the objectives, the control algorithm used in this research is reinforcement learning (RL) method. RL is a goal-directed learning of a mapping from situations to actions without relying on exemplary supervision or complete models of the environment. The goal of RL is to maximize a reward which is an evaluative feedback from the environment. In the process of constructing the reward of the tunnel ventilation system, two objectives listed above are included, that is, maintaining an adequate level of pollutants and minimizing power consumption. RL algorithm based on actor-critic architecture and gradient-following algorithm is adopted to the tunnel ventilation system. The simulations results performed with real data collected from existing tunnel ventilation system and real experimental verification are provided in this paper. It is confirmed that with the suggested controller, the pollutant level inside the tunnel was well maintained under allowable limit and the performance of energy consumption was improved compared to conventional control scheme.

  4. A candidate multimodal functional genetic network for thermal adaptation

    Directory of Open Access Journals (Sweden)

    Katharina C. Wollenberg Valero

    2014-09-01

    Full Text Available Vertebrate ectotherms such as reptiles provide ideal organisms for the study of adaptation to environmental thermal change. Comparative genomic and exomic studies can recover markers that diverge between warm and cold adapted lineages, but the genes that are functionally related to thermal adaptation may be difficult to identify. We here used a bioinformatics genome-mining approach to predict and identify functions for suitable candidate markers for thermal adaptation in the chicken. We first established a framework of candidate functions for such markers, and then compiled the literature on genes known to adapt to the thermal environment in different lineages of vertebrates. We then identified them in the genomes of human, chicken, and the lizard Anolis carolinensis, and established a functional genetic interaction network in the chicken. Surprisingly, markers initially identified from diverse lineages of vertebrates such as human and fish were all in close functional relationship with each other and more associated than expected by chance. This indicates that the general genetic functional network for thermoregulation and/or thermal adaptation to the environment might be regulated via similar evolutionarily conserved pathways in different vertebrate lineages. We were able to identify seven functions that were statistically overrepresented in this network, corresponding to four of our originally predicted functions plus three unpredicted functions. We describe this network as multimodal: central regulator genes with the function of relaying thermal signal (1, affect genes with different cellular functions, namely (2 lipoprotein metabolism, (3 membrane channels, (4 stress response, (5 response to oxidative stress, (6 muscle contraction and relaxation, and (7 vasodilation, vasoconstriction and regulation of blood pressure. This network constitutes a novel resource for the study of thermal adaptation in the closely related nonavian reptiles and

  5. Adaptive Critic Nonlinear Robust Control: A Survey.

    Science.gov (United States)

    Wang, Ding; He, Haibo; Liu, Derong

    2017-10-01

    Adaptive dynamic programming (ADP) and reinforcement learning are quite relevant to each other when performing intelligent optimization. They are both regarded as promising methods involving important components of evaluation and improvement, at the background of information technology, such as artificial intelligence, big data, and deep learning. Although great progresses have been achieved and surveyed when addressing nonlinear optimal control problems, the research on robustness of ADP-based control strategies under uncertain environment has not been fully summarized. Hence, this survey reviews the recent main results of adaptive-critic-based robust control design of continuous-time nonlinear systems. The ADP-based nonlinear optimal regulation is reviewed, followed by robust stabilization of nonlinear systems with matched uncertainties, guaranteed cost control design of unmatched plants, and decentralized stabilization of interconnected systems. Additionally, further comprehensive discussions are presented, including event-based robust control design, improvement of the critic learning rule, nonlinear H ∞ control design, and several notes on future perspectives. By applying the ADP-based optimal and robust control methods to a practical power system and an overhead crane plant, two typical examples are provided to verify the effectiveness of theoretical results. Overall, this survey is beneficial to promote the development of adaptive critic control methods with robustness guarantee and the construction of higher level intelligent systems.

  6. Adaptive, Distributed Control of Constrained Multi-Agent Systems

    Science.gov (United States)

    Bieniawski, Stefan; Wolpert, David H.

    2004-01-01

    Product Distribution (PO) theory was recently developed as a broad framework for analyzing and optimizing distributed systems. Here we demonstrate its use for adaptive distributed control of Multi-Agent Systems (MASS), i.e., for distributed stochastic optimization using MAS s. First we review one motivation of PD theory, as the information-theoretic extension of conventional full-rationality game theory to the case of bounded rational agents. In this extension the equilibrium of the game is the optimizer of a Lagrangian of the (Probability dist&&on on the joint state of the agents. When the game in question is a team game with constraints, that equilibrium optimizes the expected value of the team game utility, subject to those constraints. One common way to find that equilibrium is to have each agent run a Reinforcement Learning (E) algorithm. PD theory reveals this to be a particular type of search algorithm for minimizing the Lagrangian. Typically that algorithm i s quite inefficient. A more principled alternative is to use a variant of Newton's method to minimize the Lagrangian. Here we compare this alternative to RL-based search in three sets of computer experiments. These are the N Queen s problem and bin-packing problem from the optimization literature, and the Bar problem from the distributed RL literature. Our results confirm that the PD-theory-based approach outperforms the RL-based scheme in all three domains.

  7. Reinforcement Learning in Autism Spectrum Disorder

    Directory of Open Access Journals (Sweden)

    Manuela Schuetze

    2017-11-01

    Full Text Available Early behavioral interventions are recognized as integral to standard care in autism spectrum disorder (ASD, and often focus on reinforcing desired behaviors (e.g., eye contact and reducing the presence of atypical behaviors (e.g., echoing others' phrases. However, efficacy of these programs is mixed. Reinforcement learning relies on neurocircuitry that has been reported to be atypical in ASD: prefrontal-sub-cortical circuits, amygdala, brainstem, and cerebellum. Thus, early behavioral interventions rely on neurocircuitry that may function atypically in at least a subset of individuals with ASD. Recent work has investigated physiological, behavioral, and neural responses to reinforcers to uncover differences in motivation and learning in ASD. We will synthesize this work to identify promising avenues for future research that ultimately can be used to enhance the efficacy of early intervention.

  8. Humanoids Learning to Walk: a Natural CPG-Actor-Critic Architecture

    Directory of Open Access Journals (Sweden)

    CAI eLI

    2013-04-01

    Full Text Available The identification of learning mechanisms for locomotion has been the subject of much researchfor some time but many challenges remain. Dynamic systems theory (DST offers a novel approach to humanoid learning through environmental interaction. Reinforcement learning (RL has offered a promising method to adaptively link the dynamic system to the environment it interacts with via a reward-based value system.In this paper, we propose a model that integrates the above perspectives and applies it to the case of a humanoid (NAO robot learning to walk the ability of which emerges from its value-based interaction with the environment. In the model,a simplified central pattern generator (CPG architecture inspired by neuroscientific research and DST is integrated with an actor-critic approach to RL (cpg-actor-critic. In the cpg-actor-critic architecture, least-square-temporal-difference (LSTD based learning converges to the optimal solution quickly by using natural gradient and balancing exploration and exploitation. Futhermore, rather than using a traditional (designer-specified reward it uses a dynamic value function as a stability indicator (SI that adapts to the environment.The results obtained are analyzed and explained by using a novel DST embodied cognition approach. Learning to walk, from this perspective, is a process of integrating sensorimotor levels and value.

  9. Modulation transfer function estimation of optical lens system by adaptive neuro-fuzzy methodology

    Science.gov (United States)

    Petković, Dalibor; Shamshirband, Shahaboddin; Pavlović, Nenad T.; Anuar, Nor Badrul; Kiah, Miss Laiha Mat

    2014-07-01

    The quantitative assessment of image quality is an important consideration in any type of imaging system. The modulation transfer function (MTF) is a graphical description of the sharpness and contrast of an imaging system or of its individual components. The MTF is also known and spatial frequency response. The MTF curve has different meanings according to the corresponding frequency. The MTF of an optical system specifies the contrast transmitted by the system as a function of image size, and is determined by the inherent optical properties of the system. In this study, the adaptive neuro-fuzzy (ANFIS) estimator is designed and adapted to estimate MTF value of the actual optical system. Neural network in ANFIS adjusts parameters of membership function in the fuzzy logic of the fuzzy inference system. The back propagation learning algorithm is used for training this network. This intelligent estimator is implemented using Matlab/Simulink and the performances are investigated. The simulation results presented in this paper show the effectiveness of the developed method.

  10. Differences in Brain Adaptive Functional Reorganization in Right and Left Total Brachial Plexus Injury Patients.

    Science.gov (United States)

    Feng, Jun-Tao; Liu, Han-Qiu; Xu, Jian-Guang; Gu, Yu-Dong; Shen, Yun-Dong

    2015-09-01

    Total brachial plexus avulsion injury (BPAI) results in the total functional loss of the affected limb and induces extensive brain functional reorganization. However, because the dominant hand is responsible for more cognitive-related tasks, injuries on this side induce more adaptive changes in brain function. In this article, we explored the differences in brain functional reorganization after injuries in unilateral BPAI patients. We applied resting-state functional magnetic resonance imaging scanning to 10 left and 10 right BPAI patients and 20 healthy control subjects. The amplitude of low-frequency fluctuation (ALFF), which is a resting-state index, was calculated for all patients as an indication of the functional activity level of the brain. Two-sample t-tests were performed between left BPAI patients and controls, right BPAI patients and controls, and between left and right BPAI patients. Two-sample t-tests of the ALFF values revealed that right BPAIs induced larger scale brain reorganization than did left BPAIs. Both left and right BPAIs elicited a decreased ALFF value in the right precuneus (P right BPAI patients exhibited increased ALFF values in a greater number of brain regions than left BPAI patients, including the inferior temporal gyrus, lingual gyrus, calcarine sulcus, and fusiform gyrus. Our results revealed that right BPAIs induced greater extents of brain functional reorganization than left BPAIs, which reflected the relatively more extensive adaptive process that followed injuries of the dominant hand. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Human reinforcement learning subdivides structured action spaces by learning effector-specific values

    OpenAIRE

    Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.

    2009-01-01

    Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable, due to the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning – such as prediction error signals for action valuation associated with dopamine and the striatum – can cope with this “curse of dimensionality...

  12. Multi Car Elevator Control by using Learning Automaton

    Science.gov (United States)

    Shiraishi, Kazuaki; Hamagami, Tomoki; Hirata, Hironori

    We study an adaptive control technique for multi car elevators (MCEs) by adopting learning automatons (LAs.) The MCE is a high performance and a near-future elevator system with multi shafts and multi cars. A strong point of the system is that realizing a large carrying capacity in small shaft area. However, since the operation is too complicated, realizing an efficient MCE control is difficult for top-down approaches. For example, “bunching up together" is one of the typical phenomenon in a simple traffic environment like the MCE. Furthermore, an adapting to varying environment in configuration requirement is a serious issue in a real elevator service. In order to resolve these issues, having an autonomous behavior is required to the control system of each car in MCE system, so that the learning automaton, as the solutions for this requirement, is supposed to be appropriate for the simple traffic control. First, we assign a stochastic automaton (SA) to each car control system. Then, each SA varies its stochastic behavior distributions for adapting to environment in which its policy is evaluated with each passenger waiting times. That is LA which learns the environment autonomously. Using the LA based control technique, the MCE operation efficiency is evaluated through simulation experiments. Results show the technique enables reducing waiting times efficiently, and we confirm the system can adapt to the dynamic environment.

  13. The drift diffusion model as the choice rule in reinforcement learning.

    Science.gov (United States)

    Pedersen, Mads Lund; Frank, Michael J; Biele, Guido

    2017-08-01

    Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.

  14. Learning from neural control.

    Science.gov (United States)

    Wang, Cong; Hill, David J

    2006-01-01

    One of the amazing successes of biological systems is their ability to "learn by doing" and so adapt to their environment. In this paper, first, a deterministic learning mechanism is presented, by which an appropriately designed adaptive neural controller is capable of learning closed-loop system dynamics during tracking control to a periodic reference orbit. Among various neural network (NN) architectures, the localized radial basis function (RBF) network is employed. A property of persistence of excitation (PE) for RBF networks is established, and a partial PE condition of closed-loop signals, i.e., the PE condition of a regression subvector constructed out of the RBFs along a periodic state trajectory, is proven to be satisfied. Accurate NN approximation for closed-loop system dynamics is achieved in a local region along the periodic state trajectory, and a learning ability is implemented during a closed-loop feedback control process. Second, based on the deterministic learning mechanism, a neural learning control scheme is proposed which can effectively recall and reuse the learned knowledge to achieve closed-loop stability and improved control performance. The significance of this paper is that the presented deterministic learning mechanism and the neural learning control scheme provide elementary components toward the development of a biologically-plausible learning and control methodology. Simulation studies are included to demonstrate the effectiveness of the approach.

  15. Learning to push and learning to move: The adaptive control of contact forces

    Directory of Open Access Journals (Sweden)

    Maura eCasadio

    2015-11-01

    Full Text Available To be successful at manipulating objects one needs to apply simultaneously well controlled movements and contact forces. We present a computational theory of how the brain may successfully generate a vast spectrum of interactive behaviors by combining two independent processes. One process is competent to control movements in free space and the other is competent to control contact forces against rigid constraints. Free space and rigid constraints are singularities at the boundaries of a continuum of mechanical impedance. Within this continuum, forces and motions occur in compatible pairs connected by the equations of Newtonian dynamics. The force applied to an object determines its motion. Conversely, inverse dynamics determine a unique force trajectory from a movement trajectory. In this perspective, we describe motor learning as a process leading to the discovery of compatible force/motion pairs. The learned compatible pairs constitute a local representation of the environment's mechanics. Experiments on force field adaptation have already provided us with evidence that the brain is able to predict and compensate the forces encountered when one is attempting to generate a motion. Here, we tested the theory in the dual case, i.e. when one attempts at applying a desired contact force against a simulated rigid surface. If the surface becomes unexpectedly compliant, the contact point moves as a function of the applied force and this causes the applied force to deviate from its desired value. We found that, through repeated attempts at generating the desired contact force, subjects discovered the unique compatible hand motion. When, after learning, the rigid contact was unexpectedly restored, subjects displayed after effects of learning, consistent with the concurrent operation of a motion control system and a force control system. Together, theory and experiment support a new and broader view of modularity in the coordinated control of forces and

  16. Reinforcement learning in complementarity game and population dynamics.

    Science.gov (United States)

    Jost, Jürgen; Li, Wei

    2014-02-01

    We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005)] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.

  17. GrDHP: a general utility function representation for dual heuristic dynamic programming.

    Science.gov (United States)

    Ni, Zhen; He, Haibo; Zhao, Dongbin; Xu, Xin; Prokhorov, Danil V

    2015-03-01

    A general utility function representation is proposed to provide the required derivable and adjustable utility function for the dual heuristic dynamic programming (DHP) design. Goal representation DHP (GrDHP) is presented with a goal network being on top of the traditional DHP design. This goal network provides a general mapping between the system states and the derivatives of the utility function. With this proposed architecture, we can obtain the required derivatives of the utility function directly from the goal network. In addition, instead of a fixed predefined utility function in literature, we conduct an online learning process for the goal network so that the derivatives of the utility function can be adaptively tuned over time. We provide the control performance of both the proposed GrDHP and the traditional DHP approaches under the same environment and parameter settings. The statistical simulation results and the snapshot of the system variables are presented to demonstrate the improved learning and controlling performance. We also apply both approaches to a power system example to further demonstrate the control capabilities of the GrDHP approach.

  18. TEXPLORE temporal difference reinforcement learning for robots and time-constrained domains

    CERN Document Server

    Hester, Todd

    2013-01-01

    This book presents and develops new reinforcement learning methods that enable fast and robust learning on robots in real-time. Robots have the potential to solve many problems in society, because of their ability to work in dangerous places doing necessary jobs that no one wants or is able to do. One barrier to their widespread deployment is that they are mainly limited to tasks where it is possible to hand-program behaviors for every situation that may be encountered. For robots to meet their potential, they need methods that enable them to learn and adapt to novel situations that they were not programmed for. Reinforcement learning (RL) is a paradigm for learning sequential decision making processes and could solve the problems of learning and adaptation on robots. This book identifies four key challenges that must be addressed for an RL algorithm to be practical for robotic control tasks. These RL for Robotics Challenges are: 1) it must learn in very few samples; 2) it must learn in domains with continuou...

  19. Advances in the indirect, descriptive, and experimental approaches to the functional analysis of problem behavior.

    Science.gov (United States)

    Wightman, Jade; Julio, Flávia; Virués-Ortega, Javier

    2014-05-01

    Experimental functional analysis is an assessment methodology to identify the environmental factors that maintain problem behavior in individuals with developmental disabilities and in other populations. Functional analysis provides the basis for the development of reinforcement-based approaches to treatment. This article reviews the procedures, validity, and clinical implementation of the methodological variations of functional analysis and function-based interventions. We present six variations of functional analysis methodology in addition to the typical functional analysis: brief functional analysis, single-function tests, latency-based functional analysis, functional analysis of precursors, and trial-based functional analysis. We also present the three general categories of function-based interventions: extinction, antecedent manipulation, and differential reinforcement. Functional analysis methodology is a valid and efficient approach to the assessment of problem behavior and the selection of treatment strategies.

  20. On Approximation of Hyper-geometric Function Values of a Special Class

    Directory of Open Access Journals (Sweden)

    P. L. Ivankov

    2017-01-01

    Full Text Available Investigations of arithmetic properties of the hyper-geometric function values make it possible to single out two trends, namely, Siegel’s method and methods based on the effective construction of a linear approximating form. There are also methods combining both approaches mentioned.  The Siegel’s method allows obtaining the most general results concerning the abovementioned problems. In many cases it was used to establish the algebraic independence of the values of corresponding functions. Although the effective methods do not allow obtaining propositions of such generality they have nevertheless some advantages. Among these advantages one can distinguish at least two: a higher precision of the quantitative results obtained by effective methods and a possibility to study the hyper-geometric functions with irrational parameters.In this paper we apply the effective construction to estimate a measure of the linear independence of the hyper-geometric function values over the imaginary quadratic field. The functions themselves were chosen by a special way so that it could be possible to demonstrate a new approach to the effective construction of a linear approximating form. This approach makes it possible also to extend the well-known effective construction methods of the linear approximating forms for poly-logarithms to the functions of more general type.To obtain the arithmetic result we had to establish a linear independence of the functions under consideration over the field of rational functions. It is apparently impossible to apply directly known theorems containing sufficient (and in some cases needful and sufficient conditions for the system of functions appearing in the theorems mentioned. For this reason, a special technique has been developed to solve this problem.The paper presents the obtained arithmetic results concerning the values of integral functions, but, with appropriate alterations, the theorems proved can be adapted to

  1. 一种结合演示数据和演化优化的强化学习方法%Reinforcement learning method via combining demonstration data and evolutionary opti-mization

    Institute of Scientific and Technical Information of China (English)

    宋拴; 俞扬

    2014-01-01

    强化学习研究智能体如何从与环境的交互中学习最优的策略,以最大化长期奖赏。由于环境反馈的滞后性,强化学习问题面临巨大的决策空间,进行有效的搜索是获得成功学习的关键。以往的研究从多个角度对策略的搜索进行了探索,在搜索算法方面,研究结果表明基于演化优化的直接策略搜索方法能够获得优于传统方法的性能;在引入外部信息方面,通过加入用户提供的演示,可以有效帮助强化学习提高性能。然而,这两种有效方法的结合却鲜有研究。对用户演示与演化优化的结合进行研究,提出iNEAT+Q算法,尝试将演示数据通过预训练神经网络和引导演化优化的适应值函数的方式与演化强化学习方法结合。初步实验表明,iNEAT+Q较不使用演示数据的演化强化学习方法NEAT+Q有明显的性能改善。%Reinforcement learning aims at learning an optimal policy that maximizes the long term rewards, from interac-tions with the environment. Since the environment feedbacks commonly delay after a sequences of actions, reinforcement learning has to tackle the problem of searching in a huge policy space, and thus an effective search is the key to a success approach. Previous studies explore various ways to achieve effective search methods, one effective way is employing the evolutionary algorithm as the search method, and another direction is introducing user demonstration data to guide the search. In this work, it investigates the combination of the two directions, and proposes the iNEAT+Q approach, which trains a neural network using the demonstration data as well as integrating the demonstration data into the fitness function for the evolutionary algorithm. Preliminary empirical study shows that iNEAT+Q is superior to NEAT+Q, which is an classical evolutionary reinforcement learning approach.

  2. OPUS One: An Intelligent Adaptive Learning Environment Using Artificial Intelligence Support

    Science.gov (United States)

    Pedrazzoli, Attilio

    2010-06-01

    AI based Tutoring and Learning Path Adaptation are well known concepts in e-Learning scenarios today and increasingly applied in modern learning environments. In order to gain more flexibility and to enhance existing e-learning platforms, the OPUS One LMS Extension package will enable a generic Intelligent Tutored Adaptive Learning Environment, based on a holistic Multidimensional Instructional Design Model (PENTHA ID Model), allowing AI based tutoring and adaptation functionality to existing Web-based e-learning systems. Relying on "real time" adapted profiles, it allows content- / course authors to apply a dynamic course design, supporting tutored, collaborative sessions and activities, as suggested by modern pedagogy. The concept presented combines a personalized level of surveillance, learning activity- and learning path adaptation suggestions to ensure the students learning motivation and learning success. The OPUS One concept allows to implement an advanced tutoring approach combining "expert based" e-tutoring with the more "personal" human tutoring function. It supplies the "Human Tutor" with precise, extended course activity data and "adaptation" suggestions based on predefined subject matter rules. The concept architecture is modular allowing a personalized platform configuration.

  3. Scenario-based fitted Q-iteration for adaptive control of water reservoir systems under uncertainty

    Science.gov (United States)

    Bertoni, Federica; Giuliani, Matteo; Castelletti, Andrea

    2017-04-01

    Over recent years, mathematical models have largely been used to support planning and management of water resources systems. Yet, the increasing uncertainties in their inputs - due to increased variability in the hydrological regimes - are a major challenge to the optimal operations of these systems. Such uncertainty, boosted by projected changing climate, violates the stationarity principle generally used for describing hydro-meteorological processes, which assumes time persisting statistical characteristics of a given variable as inferred by historical data. As this principle is unlikely to be valid in the future, the probability density function used for modeling stochastic disturbances (e.g., inflows) becomes an additional uncertain parameter of the problem, which can be described in a deterministic and set-membership based fashion. This study contributes a novel method for designing optimal, adaptive policies for controlling water reservoir systems under climate-related uncertainty. The proposed method, called scenario-based Fitted Q-Iteration (sFQI), extends the original Fitted Q-Iteration algorithm by enlarging the state space to include the space of the uncertain system's parameters (i.e., the uncertain climate scenarios). As a result, sFQI embeds the set-membership uncertainty of the future inflow scenarios in the action-value function and is able to approximate, with a single learning process, the optimal control policy associated to any scenario included in the uncertainty set. The method is demonstrated on a synthetic water system, consisting of a regulated lake operated for ensuring reliable water supply to downstream users. Numerical results show that the sFQI algorithm successfully identifies adaptive solutions to operate the system under different inflow scenarios, which outperform the control policy designed under historical conditions. Moreover, the sFQI policy generalizes over inflow scenarios not directly experienced during the policy design

  4. Review Essay: The Multiple Roles and Functions of Evaluation in the Context of E-Learning Programs

    Directory of Open Access Journals (Sweden)

    Thomas Link

    2005-01-01

    Full Text Available The German initiative "New Media in Education—the Higher Education Sector" is well documented. The present volume describes the project's evaluation concepts and preliminary results. In four chapters about goals, methodology, and possible future directions of evaluation research as well as some presentations of e-learning projects, this book offers a rich overview of appropriate evaluation models from fields such as psychology, the social sciences and quality management. This compilation encompasses theoretical works on the concepts of evaluation as well as presentations of actual evaluation studies. The reader thus gains insight into the extent of the requests and expectations an evaluation team has to satisfy as well as the process of implementing e-learning in a university context. The articles in this book contain thought-provoking ideas like Sigmar-Olaf TERGAN's assertion that there is no automatic relationship between the quality of an e-learning program and students' learning outcomes. This could lead us to conclude that we have to put more emphasis on situational parameters and that we have to use methods that are capable of capturing the different perspectives of those involved. While many authors accentuate the need to triangulate data sources, methods, theories and observers, the empirical method used most often in the context of e-learning is surveys, if possible online. This difference leads to questions about the function evaluation studies fulfill for e-learning programs. Karin HAUBRICH's demand that e-learning programs must be allowed to fail seems especially important here in order to make evaluation appear less as a control instrument and more as a way to get reliable feedback and to provide a catalyst for new developments. After reading this book, one might have the impression—and one might ask why this is the case—that e-learning requires evaluation in greater depth than "traditional" forms of teaching. An argument put

  5. Populists as Chameleons? An Adaptive Learning Approach to the Rise of Populist Politicians

    Directory of Open Access Journals (Sweden)

    Jasper Muis

    2015-04-01

    Full Text Available This paper envisions populism as a vote- and attention-maximizing strategy. It applies an adaptive learning approach to understand successes of populist party leaders. I assume that populists are ideologically flexible and continually search for a more beneficial policy position, in terms of both electoral support and media attention, by retaining political claims that yield positive feedback and discard those that encounter negative feedback. This idea is empirically tested by analyzing the Dutch populist leader Pim Fortuyn and the development of his stance about immigration and integration issues. In contrast to the conventional wisdom, the results do not show any empirical support for the claim that Fortuyn was ideologically driven by the opinion polls or by media publicity during the 2002 Dutch parliamentary election campaign. The findings thus suggest that populist parties are perhaps less distinctive in their strategies from mainstream parties than often claimed.

  6. Self-Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning

    NARCIS (Netherlands)

    Wiering, Marco A.

    2010-01-01

    A promising approach to learn to play board games is to use reinforcement learning algorithms that can learn a game position evaluation function. In this paper we examine and compare three different methods for generating training games: 1) Learning by self-play, 2) Learning by playing against an

  7. Neuron-Adaptive PID Based Speed Control of SCSG Wind Turbine System

    Directory of Open Access Journals (Sweden)

    Shan Zuo

    2014-01-01

    Full Text Available In searching for methods to increase the power capacity of wind power generation system, superconducting synchronous generator (SCSG has appeared to be an attractive candidate to develop large-scale wind turbine due to its high energy density and unprecedented advantages in weight and size. In this paper, a high-temperature superconducting technology based large-scale wind turbine is considered and its physical structure and characteristics are analyzed. A simple yet effective single neuron-adaptive PID control scheme with Delta learning mechanism is proposed for the speed control of SCSG based wind power system, in which the RBF neural network (NN is employed to estimate the uncertain but continuous functions. Compared with the conventional PID control method, the simulation results of the proposed approach show a better performance in tracking the wind speed and maintaining a stable tip-speed ratio, therefore, achieving the maximum wind energy utilization.

  8. Evolutionary computation for reinforcement learning

    NARCIS (Netherlands)

    Whiteson, S.; Wiering, M.; van Otterlo, M.

    2012-01-01

    Algorithms for evolutionary computation, which simulate the process of natural selection to solve optimization problems, are an effective tool for discovering high-performing reinforcement-learning policies. Because they can automatically find good representations, handle continuous action spaces,

  9. Policy learning in the Eurozone crisis: modes, power and functionality.

    Science.gov (United States)

    Dunlop, Claire A; Radaelli, Claudio M

    In response to the attacks on the sovereign debt of some Eurozone countries, European Union (EU) leaders have created a set of preventive and corrective policy instruments to coordinate macro-economic policies and reforms. In this article, we deal with the European Semester, a cycle of information exchange, monitoring and surveillance. Countries that deviate from the targets are subjected to increasing monitoring and more severe 'corrective' interventions, in a pyramid of responsive exchanges between governments and EU institutions. This is supposed to generate coordination and convergence towards balanced economies via mechanisms of learning. But who is learning what? Can the EU learn in the 'wrong' mode? We contribute to the literature on theories of the policy process by showing how modes of learning can be operationalized and used in empirical analysis. We use policy learning as theoretical framework to establish empirically the prevalent mode of learning and its implications for both the power of the Commission and the normative question of whether the EU is learning in the 'correct' mode.

  10. Locally optimal control under unknown dynamics with learnt cost function: application to industrial robot positioning

    Science.gov (United States)

    Guérin, Joris; Gibaru, Olivier; Thiery, Stéphane; Nyiri, Eric

    2017-01-01

    Recent methods of Reinforcement Learning have enabled to solve difficult, high dimensional, robotic tasks under unknown dynamics using iterative Linear Quadratic Gaussian control theory. These algorithms are based on building a local time-varying linear model of the dynamics from data gathered through interaction with the environment. In such tasks, the cost function is often expressed directly in terms of the state and control variables so that it can be locally quadratized to run the algorithm. If the cost is expressed in terms of other variables, a model is required to compute the cost function from the variables manipulated. We propose a method to learn the cost function directly from the data, in the same way as for the dynamics. This way, the cost function can be defined in terms of any measurable quantity and thus can be chosen more appropriately for the task to be carried out. With our method, any sensor information can be used to design the cost function. We demonstrate the efficiency of this method through simulating, with the V-REP software, the learning of a Cartesian positioning task on several industrial robots with different characteristics. The robots are controlled in joint space and no model is provided a priori. Our results are compared with another model free technique, consisting in writing the cost function as a state variable.

  11. Leading research on brain functional information processing; No kino joho shori no sendo kenkyu

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-03-01

    This research aims at searching the concept of an information processing device with a fully different architecture from a previous ones based on the study on human brain function, sense and perception, and developing the basic fabrication technology for such system, and realizing the human-like information processing mechanism of memorization, learning, association, perception, intuition and value judgement. As an approach deriving biological and technological models from experimental brain studies, the model was derived from the brain functional information processing based on brain development/differentiation mechanism, the control mechanism/material of brain activities, and the knowledge obtained from brain measurement and study. In addition, for understanding a brain oscillation phenomenon by computational neuroscience, the cerebral cortex neural network model composed of realistic neuron models was proposed. Evaluation of the previous large-scale neural network chip system showed its ability of learning and fast processing, however, the next-generation brain computer requires further R and D of some novel architecture, device and system. 184 refs., 41 figs., 2 tabs.

  12. Case-based approaches for knowledge application and organisational learning

    DEFF Research Database (Denmark)

    Wang, Chengbo; Johansen, John; Luxhøj, James T.

    2005-01-01

    In dealing with the strategic issues within a manufacturing system, it is necessary to facilitate formulating the composing elements of a set of strategic manufacturing practices and activity patterns that will support an enterprise to reinforce and increase its competitive advantage....... These practices and activity patterns are based on learning and applying the knowledge internal and external to an organisation. To ensure their smooth formulation process, there are two important techniques designed – an expert adaptation approach and an expert evaluation approach. These two approaches provide...

  13. Machine Learning for the Knowledge Plane

    Science.gov (United States)

    2006-06-01

    Baader, Diego Calvanese, Daniele Nardi, and Peter Patel- Schneider. The Description Logic Handbook. Cambridge University Press, 2003. Shelly ...classifiers for which action to select or regression functions over actions or states. However, it can also be cast as larger-scale structures...Research in the reinforcement learning framework falls into two main paradigms. One casts control policies in terms of functions that map state

  14. Reinforcement learning of self-regulated β-oscillations for motor restoration in chronic stroke

    Directory of Open Access Journals (Sweden)

    Georgios eNaros

    2015-07-01

    Full Text Available Neurofeedback training of motor imagery-related brain-states with brain-machine interfaces (BMI is currently being explored prior to standard physiotherapy to improve the motor outcome of stroke rehabilitation. Pilot studies suggest that such a priming intervention before physiotherapy might increase the responsiveness of the brain to the subsequent physiotherapy, thereby improving the clinical outcome. However, there is little evidence up to now that these BMI-based interventions have achieved operate conditioning of specific brain states that facilitate task-specific functional gains beyond the practice of primed physiotherapy. In this context, we argue that BMI technology needs to aim at physiological features relevant for the targeted behavioral gain. Moreover, this therapeutic intervention has to be informed by concepts of reinforcement learning to develop its full potential. Such a refined neurofeedback approach would need to address the following issues (1 Defining a physiological feedback target specific to the intended behavioral gain, e.g. β-band oscillations for cortico-muscular communication. This targeted brain state could well be different from the brain state optimal for the neurofeedback task (2 Selecting a BMI classification and thresholding approach on the basis of learning principles, i.e. balancing challenge and reward of the neurofeedback task instead of maximizing the classification accuracy of the feedback device (3 Adjusting the feedback in the course of the training period to account for the cognitive load and the learning experience of the participant. The proposed neurofeedback strategy provides evidence for the feasibility of the suggested approach by demonstrating that dynamic threshold adaptation based on reinforcement learning may lead to frequency-specific operant conditioning of β-band oscillations paralleled by task-specific motor improvement; a proposal that requires investigation in a larger cohort of stroke

  15. Climate change adaptation: policy and practice

    International Nuclear Information System (INIS)

    Lynch, Amanda H.; Brunner, Ronald D.

    2007-01-01

    Full text: Full text: Worldwide, the threefold increase in the incidence of extreme weather events since 1960 was been accompanied by a ninefold increase in damages, reaching a peak of US$219 billion in 2005 due to the impacts of Hurricane Katrina. There is strong evidence that the increases in extremes, particularly heatwave and flood, are related to climate change. Adaptive governance presents an opportunity to factor the global problem into many simpler local problems to be addressed in parallel. We propose opening up the established frame, based on insights from field testing the principles of adaptive governance and independently corroborated by other research. First, in terms of science, we propose more intensive research centred on case studies of local communities and extreme events, each of which is unique under a comprehensive description. Differences among them must be taken into account to understand past damages or reduce vulnerability. Second, in terms of policy, we support a procedurally-rational approach, one that accommodates inevitable uncertainties, integrates scientific and local knowledge into policies to advance the community's common interest, and relies on learning from experience. Importantly, the approach is constructed to give something back of value to the participating communities - usually information and insight on their own circumstances - in return for their time, expertise, and good will. Third, in terms of decision-making, we suggest structural changes that begin with harvesting experience from the bottom-up, to make policies that have worked anywhere on the ground available for voluntary adaptation by similar communities elsewhere, and to inform higher-level officials about local resource needs. This approach produces lessons that can be re-contextualised to inform both scientific understanding and policy action in similar contexts directly, without going through generalisations. The common interest lies in reducing the

  16. Bayesian uncertainty analysis for complex systems biology models: emulation, global parameter searches and evaluation of gene functions.

    Science.gov (United States)

    Vernon, Ian; Liu, Junli; Goldstein, Michael; Rowe, James; Topping, Jen; Lindsey, Keith

    2018-01-02

    Many mathematical models have now been employed across every area of systems biology. These models increasingly involve large numbers of unknown parameters, have complex structure which can result in substantial evaluation time relative to the needs of the analysis, and need to be compared to observed data of various forms. The correct analysis of such models usually requires a global parameter search, over a high dimensional parameter space, that incorporates and respects the most important sources of uncertainty. This can be an extremely difficult task, but it is essential for any meaningful inference or prediction to be made about any biological system. It hence represents a fundamental challenge for the whole of systems biology. Bayesian statistical methodology for the uncertainty analysis of complex models is introduced, which is designed to address the high dimensional global parameter search problem. Bayesian emulators that mimic the systems biology model but which are extremely fast to evaluate are embeded within an iterative history match: an efficient method to search high dimensional spaces within a more formal statistical setting, while incorporating major sources of uncertainty. The approach is demonstrated via application to a model of hormonal crosstalk in Arabidopsis root development, which has 32 rate parameters, for which we identify the sets of rate parameter values that lead to acceptable matches between model output and observed trend data. The multiple insights into the model's structure that this analysis provides are discussed. The methodology is applied to a second related model, and the biological consequences of the resulting comparison, including the evaluation of gene functions, are described. Bayesian uncertainty analysis for complex models using both emulators and history matching is shown to be a powerful technique that can greatly aid the study of a large class of systems biology models. It both provides insight into model behaviour

  17. Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task

    Directory of Open Access Journals (Sweden)

    Ken eKinjo

    2013-04-01

    Full Text Available Linearly solvable Markov Decision Process (LMDP is a class of optimal control problem in whichthe Bellman’s equation can be converted into a linear equation by an exponential transformation ofthe state value function (Todorov, 2009. In an LMDP, the optimal value function and the correspondingcontrol policy are obtained by solving an eigenvalue problem in a discrete state space or an eigenfunctionproblem in a continuous state using the knowledge of the system dynamics and the action, state, andterminal cost functions.In this study, we evaluate the effectiveness of the LMDP framework in real robot control, in whichthe dynamics of the body and the environment have to be learned from experience. We first perform asimulation study of a pole swing-up task to evaluate the effect of the accuracy of the learned dynam-ics model on the derived the action policy. The result shows that a crude linear approximation of thenonlinear dynamics can still allow solution of the task, despite with a higher total cost.We then perform real robot experiments of a battery-catching task using our Spring Dog mobile robotplatform. The state is given by the position and the size of a battery in its camera view and two neck jointangles. The action is the velocities of two wheels, while the neck joints were controlled by a visual servocontroller. We test linear and bilinear dynamic models in tasks with quadratic and Guassian state costfunctions. In the quadratic cost task, the LMDP controller derived from a learned linear dynamics modelperformed equivalently with the optimal linear quadratic controller (LQR. In the non-quadratic task, theLMDP controller with a linear dynamics model showed the best performance. The results demonstratethe usefulness of the LMDP framework in real robot control even when simple linear models are usedfor dynamics learning.

  18. Second Language Word Learning through Repetition and Imitation: Functional Networks as a Function of Learning Phase and Language Distance.

    Science.gov (United States)

    Ghazi-Saidi, Ladan; Ansaldo, Ana Ines

    2017-01-01

    Introduction and Aim : Repetition and imitation are among the oldest second language (L2) teaching approaches and are frequently used in the context of L2 learning and language therapy, despite some heavy criticism. Current neuroimaging techniques allow the neural mechanisms underlying repetition and imitation to be examined. This fMRI study examines the influence of verbal repetition and imitation on network configuration. Integration changes within and between the cognitive control and language networks were studied, in a pair of linguistically close languages (Spanish and French), and compared to our previous work on a distant language pair (Ghazi-Saidi et al., 2013). Methods : Twelve healthy native Spanish-speaking (L1) adults, and 12 healthy native Persian-speaking adults learned 130 new French (L2) words, through a computerized audiovisual repetition and imitation program. The program presented colored photos of objects. Participants were instructed to look at each photo and pronounce its name as closely as possible to the native template (imitate). Repetition was encouraged as many times as necessary to learn the object's name; phonological cues were provided if necessary. Participants practiced for 15 min, over 30 days, and were tested while naming the same items during fMRI scanning, at week 1 (shallow learning phase) and week 4 (consolidation phase) of training. To compare this set of data with our previous work on Persian speakers, a similar data analysis plan including accuracy rates (AR), response times (RT), and functional integration values for the language and cognitive control network at each measure point was included, with further L1-L2 direct comparisons across the two populations. Results and Discussion : The evidence shows that learning L2 words through repetition induces neuroplasticity at the network level. Specifically, L2 word learners showed increased network integration after 3 weeks of training, with both close and distant language

  19. Learning User Preferences in Ubiquitous Systems: A User Study and a Reinforcement Learning Approach

    OpenAIRE

    Zaidenberg , Sofia; Reignier , Patrick; Mandran , Nadine

    2010-01-01

    International audience; Our study concerns a virtual assistant, proposing services to the user based on its current perceived activity and situation (ambient intelligence). Instead of asking the user to define his preferences, we acquire them automatically using a reinforcement learning approach. Experiments showed that our system succeeded the learning of user preferences. In order to validate the relevance and usability of such a system, we have first conducted a user study. 26 non-expert s...

  20. Event-Triggered Distributed Approximate Optimal State and Output Control of Affine Nonlinear Interconnected Systems.

    Science.gov (United States)

    Narayanan, Vignesh; Jagannathan, Sarangapani

    2017-06-08

    This paper presents an approximate optimal distributed control scheme for a known interconnected system composed of input affine nonlinear subsystems using event-triggered state and output feedback via a novel hybrid learning scheme. First, the cost function for the overall system is redefined as the sum of cost functions of individual subsystems. A distributed optimal control policy for the interconnected system is developed using the optimal value function of each subsystem. To generate the optimal control policy, forward-in-time, neural networks are employed to reconstruct the unknown optimal value function at each subsystem online. In order to retain the advantages of event-triggered feedback for an adaptive optimal controller, a novel hybrid learning scheme is proposed to reduce the convergence time for the learning algorithm. The development is based on the observation that, in the event-triggered feedback, the sampling instants are dynamic and results in variable interevent time. To relax the requirement of entire state measurements, an extended nonlinear observer is designed at each subsystem to recover the system internal states from the measurable feedback. Using a Lyapunov-based analysis, it is demonstrated that the system states and the observer errors remain locally uniformly ultimately bounded and the control policy converges to a neighborhood of the optimal policy. Simulation results are presented to demonstrate the performance of the developed controller.

  1. Adaptive oriented PDEs filtering methods based on new controlling speed function for discontinuous optical fringe patterns

    Science.gov (United States)

    Zhou, Qiuling; Tang, Chen; Li, Biyuan; Wang, Linlin; Lei, Zhenkun; Tang, Shuwei

    2018-01-01

    The filtering of discontinuous optical fringe patterns is a challenging problem faced in this area. This paper is concerned with oriented partial differential equations (OPDEs)-based image filtering methods for discontinuous optical fringe patterns. We redefine a new controlling speed function to depend on the orientation coherence. The orientation coherence can be used to distinguish the continuous regions and the discontinuous regions, and can be calculated by utilizing fringe orientation. We introduce the new controlling speed function to the previous OPDEs and propose adaptive OPDEs filtering models. According to our proposed adaptive OPDEs filtering models, the filtering in the continuous and discontinuous regions can be selectively carried out. We demonstrate the performance of the proposed adaptive OPDEs via application to the simulated and experimental fringe patterns, and compare our methods with the previous OPDEs.

  2. Biclustering of gene expression data using reactive greedy randomized adaptive search procedure.

    Science.gov (United States)

    Dharan, Smitha; Nair, Achuthsankar S

    2009-01-30

    Biclustering algorithms belong to a distinct class of clustering algorithms that perform simultaneous clustering of both rows and columns of the gene expression matrix and can be a very useful analysis tool when some genes have multiple functions and experimental conditions are diverse. Cheng and Church have introduced a measure called mean squared residue score to evaluate the quality of a bicluster and has become one of the most popular measures to search for biclusters. In this paper, we review basic concepts of the metaheuristics Greedy Randomized Adaptive Search Procedure (GRASP)-construction and local search phases and propose a new method which is a variant of GRASP called Reactive Greedy Randomized Adaptive Search Procedure (Reactive GRASP) to detect significant biclusters from large microarray datasets. The method has two major steps. First, high quality bicluster seeds are generated by means of k-means clustering. In the second step, these seeds are grown using the Reactive GRASP, in which the basic parameter that defines the restrictiveness of the candidate list is self-adjusted, depending on the quality of the solutions found previously. We performed statistical and biological validations of the biclusters obtained and evaluated the method against the results of basic GRASP and as well as with the classic work of Cheng and Church. The experimental results indicate that the Reactive GRASP approach outperforms the basic GRASP algorithm and Cheng and Church approach. The Reactive GRASP approach for the detection of significant biclusters is robust and does not require calibration efforts.

  3. Developing Guided Worksheet for Cognitive Apprenticeship Approach in teaching Formal Definition of The Limit of A Function

    Science.gov (United States)

    Oktaviyanthi, R.; Dahlan, J. A.

    2018-04-01

    This study aims to develop student worksheets that correspond to the Cognitive Apprenticeship learning approach. The main subject in this student worksheet is Functions and Limits with the branch of the main subject is Continuity and Limits of Functions. There are two indicators of the achievement of this learning that are intended to be developed in the student worksheet (1) the student can explain the concept of limit by using the formal definition of limit and (2) the student can evaluate the value of limit of a function using epsilon and delta. The type of research used is development research that refers to the development of Plomp products. The research flow starts from literature review, observation, interviews, work sheet design, expert validity test, and limited trial on first-year students in academic year 2016-2017 in Universitas Serang Raya, STKIP Pelita Pratama Al-Azhar Serang, and Universitas Mathla’ul Anwar Pandeglang. Based on the product development result obtained the student worksheets that correspond to the Cognitive Apprenticeship learning approach are valid and reliable.

  4. A collaborative filtering approach for protein-protein docking scoring functions.

    Science.gov (United States)

    Bourquard, Thomas; Bernauer, Julie; Azé, Jérôme; Poupon, Anne

    2011-04-22

    A protein-protein docking procedure traditionally consists in two successive tasks: a search algorithm generates a large number of candidate conformations mimicking the complex existing in vivo between two proteins, and a scoring function is used to rank them in order to extract a native-like one. We have already shown that using Voronoi constructions and a well chosen set of parameters, an accurate scoring function could be designed and optimized. However to be able to perform large-scale in silico exploration of the interactome, a near-native solution has to be found in the ten best-ranked solutions. This cannot yet be guaranteed by any of the existing scoring functions. In this work, we introduce a new procedure for conformation ranking. We previously developed a set of scoring functions where learning was performed using a genetic algorithm. These functions were used to assign a rank to each possible conformation. We now have a refined rank using different classifiers (decision trees, rules and support vector machines) in a collaborative filtering scheme. The scoring function newly obtained is evaluated using 10 fold cross-validation, and compared to the functions obtained using either genetic algorithms or collaborative filtering taken separately. This new approach was successfully applied to the CAPRI scoring ensembles. We show that for 10 targets out of 12, we are able to find a near-native conformation in the 10 best ranked solutions. Moreover, for 6 of them, the near-native conformation selected is of high accuracy. Finally, we show that this function dramatically enriches the 100 best-ranking conformations in near-native structures.

  5. Quantum Numbers and the Eigenfunction Approach to Obtain Symmetry Adapted Functions for Discrete Symmetries

    Directory of Open Access Journals (Sweden)

    Renato Lemus

    2012-11-01

    Full Text Available The eigenfunction approach used for discrete symmetries is deduced from the concept of quantum numbers. We show that the irreducible representations (irreps associated with the eigenfunctions are indeed a shorthand notation for the set of eigenvalues of the class operators (character table. The need of a canonical chain of groups to establish a complete set of commuting operators is emphasized. This analysis allows us to establish in natural form the connection between the quantum numbers and the eigenfunction method proposed by J.Q. Chen to obtain symmetry adapted functions. We then proceed to present a friendly version of the eigenfunction method to project functions.

  6. Beyond Emotion Regulation: Emotion Utilization and Adaptive Functioning

    OpenAIRE

    Izard, Carroll; Stark, Kevin; Trentacosta, Christopher; Schultz, David

    2008-01-01

    Recent research indicates that emotionality, emotion information processing, emotion knowledge, and discrete emotion experiences may influence and interact with emotion utilization, that is, the effective use of the inherently adaptive and motivational functions of emotions. Strategies individuals learn for emotion modulation and emotion utilization become stabilized in emerging affective-cognitive structures, or emotion schemas. In these emotion schemas, the feeling/motivational component of...

  7. Learning to Control Advanced Life Support Systems

    Science.gov (United States)

    Subramanian, Devika

    2004-01-01

    Advanced life support systems have many interacting processes and limited resources. Controlling and optimizing advanced life support systems presents unique challenges. In particular, advanced life support systems are nonlinear coupled dynamical systems and it is difficult for humans to take all interactions into account to design an effective control strategy. In this project. we developed several reinforcement learning controllers that actively explore the space of possible control strategies, guided by rewards from a user specified long term objective function. We evaluated these controllers using a discrete event simulation of an advanced life support system. This simulation, called BioSim, designed by Nasa scientists David Kortenkamp and Scott Bell has multiple, interacting life support modules including crew, food production, air revitalization, water recovery, solid waste incineration and power. They are implemented in a consumer/producer relationship in which certain modules produce resources that are consumed by other modules. Stores hold resources between modules. Control of this simulation is via adjusting flows of resources between modules and into/out of stores. We developed adaptive algorithms that control the flow of resources in BioSim. Our learning algorithms discovered several ingenious strategies for maximizing mission length by controlling the air and water recycling systems as well as crop planting schedules. By exploiting non-linearities in the overall system dynamics, the learned controllers easily out- performed controllers written by human experts. In sum, we accomplished three goals. We (1) developed foundations for learning models of coupled dynamical systems by active exploration of the state space, (2) developed and tested algorithms that learn to efficiently control air and water recycling processes as well as crop scheduling in Biosim, and (3) developed an understanding of the role machine learning in designing control systems for

  8. An Adaptive Speed Control Approach for DC Shunt Motors

    Directory of Open Access Journals (Sweden)

    Ruben Tapia-Olvera

    2016-11-01

    Full Text Available A B-spline neural networks-based adaptive control technique for angular speed reference trajectory tracking tasks with highly efficient performance for direct current shunt motors is proposed. A methodology for adaptive control and its proper training procedure are introduced. This algorithm sets the control signal without using a detailed mathematical model nor exact values of the parameters of the nonlinear dynamic system. The proposed robust adaptive tracking control scheme only requires measurements of the velocity output signal. Thus, real-time measurements or estimations of acceleration, current and disturbance signals are avoided. Experimental results confirm the efficient and robust performance of the proposed control approach for highly demanding motor operation conditions exposed to variable-speed reference trajectories and completely unknown load torque. Hence, laboratory experimental tests on a direct current shunt motor prove the viability of the proposed adaptive output feedback trajectory tracking control approach.

  9. Lessons learned in applying function analysis

    International Nuclear Information System (INIS)

    Mitchel, G.R.; Davey, E.; Basso, R.

    2001-01-01

    This paper summarizes the lessons learned in undertaking and applying function analysis based on the recent experience of utility, AECL and international design and assessment projects. Function analysis is an analytical technique that can be used to characterize and asses the functions of a system and is widely recognized as an essential component of a 'systematic' approach to design, on that integrated operational and user requirements into the standard design process. (author)

  10. Support patient search on pathology reports with interactive online learning based data extraction.

    Science.gov (United States)

    Zheng, Shuai; Lu, James J; Appin, Christina; Brat, Daniel; Wang, Fusheng

    2015-01-01

    Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user's interaction with minimal human effort. We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system's data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users' corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. Extracting data from pathology reports could enable

  11. Reinforcement Learning Based on the Bayesian Theorem for Electricity Markets Decision Support

    DEFF Research Database (Denmark)

    Sousa, Tiago; Pinto, Tiago; Praca, Isabel

    2014-01-01

    This paper presents the applicability of a reinforcement learning algorithm based on the application of the Bayesian theorem of probability. The proposed reinforcement learning algorithm is an advantageous and indispensable tool for ALBidS (Adaptive Learning strategic Bidding System), a multi...

  12. ADAPTIVE REUSE FOR NEW SOCIAL AND MUNICIPAL FUNCTIONS AS AN ACCEPTABLE APPROACH FOR CONSERVATION OF INDUSTRIAL HERITAGE ARCHITECTURE IN THE CZECH REPUBLIC

    Directory of Open Access Journals (Sweden)

    Oleg Fetisov

    2016-04-01

    Full Text Available The present paper deals with a problem of conservation and adaptive reuse of industrial heritage architecture. The relevance and topicality of the problem of adaptive reuse of industrial heritage architecture for new social and municipal functions as the conservation concept are defined. New insights on the typology of industrial architecture are reviewed (e. g. global changes in all European industry, new concepts and technologies in manufacturing, new features of industrial architecture and their construction and typology, first results of industrialization and changes in the typology of industrial architecture in post-industrial period. General goals and tasks of conservation in context of adaptive reuse of industrial heritage architecture are defined (e. g. historical, architectural and artistic, technical. Adaptive reuse as an acceptable approach for conservation and new use is proposed and reviewed. Moreover, the logical model of adaptive reuse of industrial heritage architecture as an acceptable approach for new use has been developed. Consequently, three general methods for the conservation of industrial heritage architecture by the adaptive reuse approach are developed: historical, architectural and artistic, technical. Relevant functional methods' concepts (social concepts are defined and classified. General beneficial effect of the adaptive reuse approach is given. On the basis of analysis results of experience in adaptive reuse of industrial architecture with new social functions general conclusions are developed.

  13. Evaluating the Appropriateness of a New Computer-Administered Measure of Adaptive Function for Children and Youth with Autism Spectrum Disorders

    Science.gov (United States)

    Coster, Wendy J.; Kramer, Jessica M.; Tian, Feng; Dooley, Meghan; Liljenquist, Kendra; Kao, Ying-Chia; Ni, Pengsheng

    2016-01-01

    The Pediatric Evaluation of Disability Inventory-Computer Adaptive Test is an alternative method for describing the adaptive function of children and youth with disabilities using a computer-administered assessment. This study evaluated the performance of the Pediatric Evaluation of Disability Inventory-Computer Adaptive Test with a national…

  14. Reinforcement learning agents providing advice in complex video games

    Science.gov (United States)

    Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa

    2014-01-01

    This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

  15. The triad value function

    DEFF Research Database (Denmark)

    Vedel, Mette

    2016-01-01

    the triad value function. Next, the applicability and validity of the concept is examined in a case study of four closed vertical supply chain triads. Findings - The case study demonstrates that the triad value function facilitates the analysis and understanding of an apparent paradox; that distributors...... are not dis-intermediated in spite of their limited contribution to activities in the triads. The results indicate practical adequacy of the triad value function. Research limitations/implications - The triad value function is difficult to apply in the study of expanded networks as the number of connections...... expands exponentially with the number of ties in the network. Moreover, it must be applied in the study of service triads and open vertical supply chain triads to further verify the practical adequacy of the concept. Practical implications - The triad value function cannot be used normatively...

  16. Amygdala subsystems and control of feeding behavior by learned cues.

    Science.gov (United States)

    Petrovich, Gorica D; Gallagher, Michela

    2003-04-01

    A combination of behavioral studies and a neural systems analysis approach has proven fruitful in defining the role of the amygdala complex and associated circuits in fear conditioning. The evidence presented in this chapter suggests that this approach is also informative in the study of other adaptive functions that involve the amygdala. In this chapter we present a novel model to study learning in an appetitive context. Furthermore, we demonstrate that long-recognized connections between the amygdala and the hypothalamus play a crucial role in allowing learning to modulate feeding behavior. In the first part we describe a behavioral model for motivational learning. In this model a cue that acquires motivational properties through pairings with food delivery when an animal is hungry can override satiety and promote eating in sated rats. Next, we present evidence that a specific amygdala subsystem (basolateral area) is responsible for allowing such learned cues to control eating (override satiety and promote eating in sated rats). We also show that basolateral amygdala mediates these actions via connectivity with the lateral hypothalamus. Lastly, we present evidence that the amygdalohypothalamic system is specific for the control of eating by learned motivational cues, as it does not mediate another function that depends on intact basolateral amygdala, namely, the ability of a conditioned cue to support new learning based on its acquired value. Knowledge about neural systems through which food-associated cues specifically control feeding behavior provides a defined model for the study of learning. In addition, this model may be informative for understanding mechanisms of maladaptive aspects of learned control of eating that contribute to eating disorders and more moderate forms of overeating.

  17. Evaluating the E-Learning Platform from the Perspective of Knowledge Management: The AHP Approach

    Directory of Open Access Journals (Sweden)

    I-Chin Wu

    2013-06-01

    Full Text Available A growing number of higher education institutions have adopted asynchronous and synchronous Web-based learning platforms to improve students’ learning efficiency and increase learning satisfaction in the past decade. Unlike traditional face-to-face learning methods, e-learning platforms allow teachers to communicate with students and discuss course content anytime or anywhere. In addition, the teaching material can be reused via the e-learning platforms. To understand how students use e-learning platforms and what the implications are, we conducted an empirical study of the iCAN e-learning platform, which has been widely used in Fu-Jen Catholic University since 2005. We use the Analytic Hierarchy Process (AHP, a well-known multi-criteria evaluation approach, to compare five practices, i.e. the functions of the iCAN teaching platform. We adopted a brainstorming approach to design a questionnaire to measure learners’ perception of the e-learning platform based on the theory of knowledge transforming process in knowledge management. Accordingly, the model considers functioning and objectivity in terms of the following three attributes of learning effectiveness: individual learning, group sharing and learning performance. Twelve criteria with twelve evaluation items were used to investigate the effectiveness of the five practices. We also evaluated the strengths and weaknesses of the functions based on the types of courses in the iCan platform. We expect that the empirical evaluation results will provide teachers with suggestions and guidelines for using the e-learning platform effectively to facilitate their teaching activities and promote students’ learning efficiency and satisfaction.

  18. Distributed Economic Dispatch in Microgrids Based on Cooperative Reinforcement Learning.

    Science.gov (United States)

    Liu, Weirong; Zhuang, Peng; Liang, Hao; Peng, Jun; Huang, Zhiwu; Weirong Liu; Peng Zhuang; Hao Liang; Jun Peng; Zhiwu Huang; Liu, Weirong; Liang, Hao; Peng, Jun; Zhuang, Peng; Huang, Zhiwu

    2018-06-01

    Microgrids incorporated with distributed generation (DG) units and energy storage (ES) devices are expected to play more and more important roles in the future power systems. Yet, achieving efficient distributed economic dispatch in microgrids is a challenging issue due to the randomness and nonlinear characteristics of DG units and loads. This paper proposes a cooperative reinforcement learning algorithm for distributed economic dispatch in microgrids. Utilizing the learning algorithm can avoid the difficulty of stochastic modeling and high computational complexity. In the cooperative reinforcement learning algorithm, the function approximation is leveraged to deal with the large and continuous state spaces. And a diffusion strategy is incorporated to coordinate the actions of DG units and ES devices. Based on the proposed algorithm, each node in microgrids only needs to communicate with its local neighbors, without relying on any centralized controllers. Algorithm convergence is analyzed, and simulations based on real-world meteorological and load data are conducted to validate the performance of the proposed algorithm.

  19. Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults.

    Science.gov (United States)

    Vernetti, Angélina; Smith, Tim J; Senju, Atsushi

    2017-03-15

    While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.

  20. Online Learning of Genetic Network Programming and its Application to Prisoner’s Dilemma Game

    Science.gov (United States)

    Mabu, Shingo; Hirasawa, Kotaro; Hu, Jinglu; Murata, Junichi

    A new evolutionary model with the network structure named Genetic Network Programming (GNP) has been proposed recently. GNP, that is, an expansion of GA and GP, represents solutions as a network structure and evolves it by using “offline learning (selection, mutation, crossover)”. GNP can memorize the past action sequences in the network flow, so it can deal with Partially Observable Markov Decision Process (POMDP) well. In this paper, in order to improve the ability of GNP, Q learning (an off-policy TD control algorithm) that is one of the famous online methods is introduced for online learning of GNP. Q learning is suitable for GNP because (1) in reinforcement learning, the rewards an agent will get in the future can be estimated, (2) TD control doesn’t need much memory and can learn quickly, and (3) off-policy is suitable in order to search for an optimal solution independently of the policy. Finally, in the simulations, online learning of GNP is applied to a player for “Prisoner’s dilemma game” and its ability for online adaptation is confirmed.

  1. Application Of Reinforcement Learning In Heading Control Of A Fixed Wing UAV Using X-Plane Platform

    Directory of Open Access Journals (Sweden)

    Kimathi

    2017-02-01

    Full Text Available Heading control of an Unmanned Aerial Vehicle UAV is a vital operation of an autopilot system. It is executed by employing a design of control algorithms that control its direction and navigation. Most commonly available autopilots exploit Proportional-Integral-Derivative PID based heading controllers. In this paper we propose an online adaptive reinforcement learning heading controller. The autopilot heading controller will be designed in MatlabSimulink for controlling a UAV in X-Plane test platform. Through this platform the performance of the controller is shown using real time simulations. The performance of this controller is compared to that of a PID controller. The results show that the proposed method performs better than a well tuned PID controller.

  2. The Use of Modeling Approach for Teaching Exponential Functions

    Science.gov (United States)

    Nunes, L. F.; Prates, D. B.; da Silva, J. M.

    2017-12-01

    This work presents a discussion related to the teaching and learning of mathematical contents related to the study of exponential functions in a freshman students group enrolled in the first semester of the Science and Technology Bachelor’s (STB of the Federal University of Jequitinhonha and Mucuri Valleys (UFVJM). As a contextualization tool strongly mentioned in the literature, the modelling approach was used as an educational teaching tool to produce contextualization in the teaching-learning process of exponential functions to these students. In this sense, were used some simple models elaborated with the GeoGebra software and, to have a qualitative evaluation of the investigation and the results, was used Didactic Engineering as a methodology research. As a consequence of this detailed research, some interesting details about the teaching and learning process were observed, discussed and described.

  3. Language and functionality of post-stroke adults: evaluation based on International Classification of Functioning, Disability and Health (ICF).

    Science.gov (United States)

    Santana, Maria Tereza Maynard; Chun, Regina Yu Shon

    2017-03-09

    Cerebrovascular accident is an important Public Health problem because of the high rates of mortality and sequelae such as language disorders. The conceptual health changes have led to the incorporation of functional and social aspects in the assessments as proposed by the World Health Organization in the International Classification of Functioning, Disability and Health. The purpose was to evaluate and classify language aspects, functionality and participation of post-stroke individuals based on the concepts of the International Classification of Functioning, Disability and Health and characterize the sociodemographic profile of participants. Data collection was carried out through the application of a clinical instrument to evaluate language, participation and functionality in fifty individuals based on the International Classification of Functioning, Disability and Health. The age of the participants varied between 32 and 88 years, and the majority were elderly men. Among body functions, the participants reported more difficulties in "memory functions". As for activity and participation, more difficulties were reported in "recreation and leisure". As for environmental factors, the component "healthcare professionals" was indicated as a facilitator by the majority of participants. The results show the impact of language difficulties in the lives of post-stroke adults and reinforce the applicability of the International Classification of Functioning, Disability and Health as an important complementary tool for assessing language, functionality and participation in a comprehensive and humane approach, towards the improvement of health assistance in ambulatory care.

  4. Multi-functional smart aggregate-based structural health monitoring of circular reinforced concrete columns subjected to seismic excitations

    International Nuclear Information System (INIS)

    Gu, Haichang; Song, Gangbing; Moslehy, Yashar; Mo, Y L; Sanders, David

    2010-01-01

    In this paper, a recently developed multi-functional piezoceramic-based device, named the smart aggregate, is used for the health monitoring of concrete columns subjected to shake table excitations. Two circular reinforced concrete columns instrumented with smart aggregates were fabricated and tested with a recorded seismic excitation at the structural laboratory at the University of Nevada—Reno. In the tests, the smart aggregates were used to perform multiple monitoring functions that included dynamic seismic response detection, structural health monitoring and white noise response detection. In the proposed health monitoring approach, a damage index was developed on the basis of the comparison of the transfer function with the baseline function obtained in the healthy state. A sensor-history damage index matrix is developed to monitor the damage evolution process. Experimental results showed that the acceleration level can be evaluated from the amplitude of the dynamic seismic response; the damage statuses at different locations were evaluated using a damage index matrix; the first modal frequency obtained from the white noise response decreased with increase of the damage severity. The proposed multi-functional smart aggregates have great potential for use in the structural health monitoring of large-scale concrete structures

  5. A Goal-Function Approach to Analysis of Control Situations

    DEFF Research Database (Denmark)

    Lind, Morten

    2010-01-01

    The concept of situations plays a central role in all theories of meaning and context. and serve to frame or group events and other occurrences into coherent meaningful wholes. Situations are typed, may be interconnected and organized into higher level structures. In operation of industrial...... processes situations should identify operational aspects relevant for control agent’s decision making in plant supervision and control. Control situations can be understood as recurrent and interconnected patterns of control with important implications for control and HMI design. Goal-Function approaches...

  6. Reaching control of a full-torso, modelled musculoskeletal robot using muscle synergies emergent under reinforcement learning

    International Nuclear Information System (INIS)

    Diamond, A; Holland, O E

    2014-01-01

    ‘Anthropomimetic’ robots mimic both human morphology and internal structure—skeleton, muscles, compliance and high redundancy—thus presenting a formidable challenge to conventional control. Here we derive a novel controller for this class of robot which learns effective reaching actions through the sustained activation of weighted muscle synergies, an approach which draws upon compelling, recent evidence from animal and human studies, but is almost unexplored to date in the musculoskeletal robot literature. Since the effective synergy patterns for a given robot will be unknown, we derive a reinforcement-learning approach intended to allow their emergence, in particular those patterns aiding linearization of control. Using an extensive physics-based model of the anthropomimetic ECCERobot, we find that effective reaching actions can be learned comprising only two sequential motor co-activation patterns, each controlled by just a single common driving signal. Factor analysis shows the emergent muscle co-activations can be largely reconstructed using weighted combinations of only 13 common fragments. Testing these ‘candidate’ synergies as drivable units, the same controller now learns the reaching task both faster and better. (paper)

  7. State feedback integral control for a rotary direct drive servo valve using a Lyapunov function approach.

    Science.gov (United States)

    Yu, Jue; Zhuang, Jian; Yu, Dehong

    2015-01-01

    This paper concerns a state feedback integral control using a Lyapunov function approach for a rotary direct drive servo valve (RDDV) while considering parameter uncertainties. Modeling of this RDDV servovalve reveals that its mechanical performance is deeply influenced by friction torques and flow torques; however, these torques are uncertain and mutable due to the nature of fluid flow. To eliminate load resistance and to achieve satisfactory position responses, this paper develops a state feedback control that integrates an integral action and a Lyapunov function. The integral action is introduced to address the nonzero steady-state error; in particular, the Lyapunov function is employed to improve control robustness by adjusting the varying parameters within their value ranges. This new controller also has the advantages of simple structure and ease of implementation. Simulation and experimental results demonstrate that the proposed controller can achieve higher control accuracy and stronger robustness. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.

  8. ARBR: Adaptive reinforcement-based routing for DTN

    KAUST Repository

    Elwhishi, Ahmed

    2010-10-01

    This paper introduces a novel routing protocol in Delay Tolerant Networks (DTNs), aiming to solve the online distributed routing problem. By manipulating a collaborative reinforcement learning technique, a group of nodes can cooperate with each other and make a forwarding decision for the stored messages based on a cost function at each contact with another node. The proposed protocol is characterized by not only considering the contact time statistics under a novel contact model, but also looks into the feedback on user behavior and network conditions, such as congestion and buffer occupancy sampled during each previous contact with any other node. Therefore, the proposed protocol can achieve high efficiency via an adaptive and intelligent routing mechanism according to network conditions. Extensive simulation is conducted to verify the proposed protocol, where a comparison is made with a number of existing encounter-based routing protocols in term of the number of transmissions of each message, message delivery delay, and delivery ratio. The results of the simulation demonstrate the effectiveness of the proposed technique.

  9. Reasoning, learning, and creativity: frontal lobe function and human decision-making.

    Directory of Open Access Journals (Sweden)

    Anne Collins

    Full Text Available The frontal lobes subserve decision-making and executive control--that is, the selection and coordination of goal-directed behaviors. Current models of frontal executive function, however, do not explain human decision-making in everyday environments featuring uncertain, changing, and especially open-ended situations. Here, we propose a computational model of human executive function that clarifies this issue. Using behavioral experiments, we show that unlike others, the proposed model predicts human decisions and their variations across individuals in naturalistic situations. The model reveals that for driving action, the human frontal function monitors up to three/four concurrent behavioral strategies and infers online their ability to predict action outcomes: whenever one appears more reliable than unreliable, this strategy is chosen to guide the selection and learning of actions that maximize rewards. Otherwise, a new behavioral strategy is tentatively formed, partly from those stored in long-term memory, then probed, and if competitive confirmed to subsequently drive action. Thus, the human executive function has a monitoring capacity limited to three or four behavioral strategies. This limitation is compensated by the binary structure of executive control that in ambiguous and unknown situations promotes the exploration and creation of new behavioral strategies. The results support a model of human frontal function that integrates reasoning, learning, and creative abilities in the service of decision-making and adaptive behavior.

  10. Reasoning, learning, and creativity: frontal lobe function and human decision-making.

    Science.gov (United States)

    Collins, Anne; Koechlin, Etienne

    2012-01-01

    The frontal lobes subserve decision-making and executive control--that is, the selection and coordination of goal-directed behaviors. Current models of frontal executive function, however, do not explain human decision-making in everyday environments featuring uncertain, changing, and especially open-ended situations. Here, we propose a computational model of human executive function that clarifies this issue. Using behavioral experiments, we show that unlike others, the proposed model predicts human decisions and their variations across individuals in naturalistic situations. The model reveals that for driving action, the human frontal function monitors up to three/four concurrent behavioral strategies and infers online their ability to predict action outcomes: whenever one appears more reliable than unreliable, this strategy is chosen to guide the selection and learning of actions that maximize rewards. Otherwise, a new behavioral strategy is tentatively formed, partly from those stored in long-term memory, then probed, and if competitive confirmed to subsequently drive action. Thus, the human executive function has a monitoring capacity limited to three or four behavioral strategies. This limitation is compensated by the binary structure of executive control that in ambiguous and unknown situations promotes the exploration and creation of new behavioral strategies. The results support a model of human frontal function that integrates reasoning, learning, and creative abilities in the service of decision-making and adaptive behavior.

  11. Adaptive learning fuzzy control of a mobile robot

    International Nuclear Information System (INIS)

    Tsukada, Akira; Suzuki, Katsuo; Fujii, Yoshio; Shinohara, Yoshikuni

    1989-11-01

    In this report a problem is studied to construct a fuzzy controller for a mobile robot to move autonomously along a given reference direction curve, for which control rules are generated and acquired through an adaptive learning process. An adaptive learning fuzzy controller has been developed for a mobile robot. Good properties of the controller are shown through the travelling experiments of the mobile robot. (author)

  12. Cerebellar-inspired adaptive control of a robot eye actuated by pneumatic artificial muscles.

    Science.gov (United States)

    Lenz, Alexander; Anderson, Sean R; Pipe, A G; Melhuish, Chris; Dean, Paul; Porrill, John

    2009-12-01

    In this paper, a model of cerebellar function is implemented and evaluated in the control of a robot eye actuated by pneumatic artificial muscles. The investigated control problem is stabilization of the visual image in response to disturbances. This is analogous to the vestibuloocular reflex (VOR) in humans. The cerebellar model is structurally based on the adaptive filter, and the learning rule is computationally analogous to least-mean squares, where parameter adaptation at the parallel fiber/Purkinje cell synapse is driven by the correlation of the sensory error signal (carried by the climbing fiber) and the motor command signal. Convergence of the algorithm is first analyzed in simulation on a model of the robot and then tested online in both one and two degrees of freedom. The results show that this model of neural function successfully works on a real-world problem, providing empirical evidence for validating: 1) the generic cerebellar learning algorithm; 2) the function of the cerebellum in the VOR; and 3) the signal transmission between functional neural components of the VOR.

  13. Optimizing Chemical Reactions with Deep Reinforcement Learning.

    Science.gov (United States)

    Zhou, Zhenpeng; Li, Xiaocheng; Zare, Richard N

    2017-12-27

    Deep reinforcement learning was employed to optimize chemical reactions. Our model iteratively records the results of a chemical reaction and chooses new experimental conditions to improve the reaction outcome. This model outperformed a state-of-the-art blackbox optimization algorithm by using 71% fewer steps on both simulations and real reactions. Furthermore, we introduced an efficient exploration strategy by drawing the reaction conditions from certain probability distributions, which resulted in an improvement on regret from 0.062 to 0.039 compared with a deterministic policy. Combining the efficient exploration policy with accelerated microdroplet reactions, optimal reaction conditions were determined in 30 min for the four reactions considered, and a better understanding of the factors that control microdroplet reactions was reached. Moreover, our model showed a better performance after training on reactions with similar or even dissimilar underlying mechanisms, which demonstrates its learning ability.

  14. Biomechanical Reconstruction Using the Tacit Learning System: Intuitive Control of Prosthetic Hand Rotation.

    Science.gov (United States)

    Oyama, Shintaro; Shimoda, Shingo; Alnajjar, Fady S K; Iwatsuki, Katsuyuki; Hoshiyama, Minoru; Tanaka, Hirotaka; Hirata, Hitoshi

    2016-01-01

    Background: For mechanically reconstructing human biomechanical function, intuitive proportional control, and robustness to unexpected situations are required. Particularly, creating a functional hand prosthesis is a typical challenge in the reconstruction of lost biomechanical function. Nevertheless, currently available control algorithms are in the development phase. The most advanced algorithms for controlling multifunctional prosthesis are machine learning and pattern recognition of myoelectric signals. Despite the increase in computational speed, these methods cannot avoid the requirement of user consciousness and classified separation errors. "Tacit Learning System" is a simple but novel adaptive control strategy that can self-adapt its posture to environment changes. We introduced the strategy in the prosthesis rotation control to achieve compensatory reduction, as well as evaluated the system and its effects on the user. Methods: We conducted a non-randomized study involving eight prosthesis users to perform a bar relocation task with/without Tacit Learning System support. Hand piece and body motions were recorded continuously with goniometers, videos, and a motion-capture system. Findings: Reduction in the participants' upper extremity rotatory compensation motion was monitored during the relocation task in all participants. The estimated profile of total body energy consumption improved in five out of six participants. Interpretation: Our system rapidly accomplished nearly natural motion without unexpected errors. The Tacit Learning System not only adapts human motions but also enhances the human ability to adapt to the system quickly, while the system amplifies compensation generated by the residual limb. The concept can be extended to various situations for reconstructing lost functions that can be compensated.

  15. APPLICATION OF COOPERATIVE LEARNING MODEL INDEX CARD MATCH TYPE IN IMPROVING STUDENT LEARNING RESULTS ON COMPOSITION AND COMPOSITION FUNCTIONS OF FUNCTIONS INVERS IN MAN 1 MATARAM

    Directory of Open Access Journals (Sweden)

    Syahrir Syahrir

    2017-12-01

    Full Text Available Lack of student response in learning mathematics caused by passive of student in process of learning progress so that student consider mathematics subject is difficult subject to be understood. The research is Classroom Action Research (PTK using 2 cycles, then the purpose of this research is how the implementation of cooperative learning type of index card match in improving student learning outcomes on the subject matter of composition function and inverse function in MAN 1 Mataram. While the results of the analysis in the study showed that there is in cycle I obtained classical completeness 78.79% with the average score of student learning outcomes 69.78 and the average value of student learning responses with the category Enough, then in cycle II shows that classical thoroughness 87 , 89% with mean score of student learning result 78,94 and average value of student learning response with good category. So it can be concluded that the implementation of Model Cooperative Learning Type Index Card Match can improve student learning outcomes on the subject matter of composition function and inverse function.

  16. Evaluating the B-cell density with various activation functions using White Noise Path Integral Approach

    Science.gov (United States)

    Aban, C. J. G.; Bacolod, R. O.; Confesor, M. N. P.

    2015-06-01

    A The White Noise Path Integral Approach is used in evaluating the B-cell density or the number of B-cell per unit volume for a basic type of immune system response based on the modeling done by Perelson and Wiegel. From the scaling principles of Perelson [1], the B- cell density is obtained where antigens and antibodies mutates and activation function f(|S-SA|) is defined describing the interaction between a specific antigen and a B-cell. If the activation function f(|S-SA|) is held constant, the major form of the B-cell density evaluated using white noise analysis is similar to the form of the B-cell density obtained by Perelson and Wiegel using a differential approach.A piecewise linear functionis also used to describe the activation f(|S-SA|). If f(|S-SA|) is zero, the density decreases exponentially. If f(|S-SA|) = S-SA-SB, the B- cell density increases exponentially until it reaches a certain maximum value. For f(|S-SA|) = 2SA-SB-S, the behavior of B-cell density is oscillating and remains to be in small values.

  17. Gain-Scheduled Model Predictive Control of Wind Turbines using Laguerre Functions

    DEFF Research Database (Denmark)

    Adegas, Fabiano Daher; Wisniewski, Rafal; Larsen, Lars Finn Sloth

    2014-01-01

    This paper presents a systematic approach to design gain-scheduled predictive controllers for wind turbines. The predictive control law is based on Laguerre functions to parameterize control signals and a parameter-dependent cost function that is analytically determined from turbine data....... These properties facilitate the design of speed controllers by placement of the closed-loop poles (when constraints are not active) and systematic adaptation towards changes in the operating point. Vibration control of undamped modes is achieved by imposing a certain degree of stability to the closed-loop system....... The approach can be utilized to the design of new controllers and to represent existing gain-scheduled controllers as predictive controllers. The numerical example and simulations illustrate the design of a speed controller augmented with active damping of the tower fore-aft displacement....

  18. Evaluating Experiential Learning in the Business Context: Contributions to Group-Based and Cross-Functional Working

    Science.gov (United States)

    Piercy, Niall

    2013-01-01

    The use of experiential learning techniques has become popular in business education. Experiential learning approaches offer major benefits for teaching contemporary management practices such as cross-functional and team-based working. However, there remains relatively little empirical data on the success of experiential pedagogies in supporting…

  19. Assessing Adaptive Functioning in Death Penalty Cases after Hall and DSM-5.

    Science.gov (United States)

    Hagan, Leigh D; Drogin, Eric Y; Guilmette, Thomas J

    2016-03-01

    DSM-5 and Hall v. Florida (2014) have dramatically refocused attention on the assessment of adaptive functioning in death penalty cases. In this article, we address strategies for assessing the adaptive functioning of defendants who seek exemption from capital punishment pursuant to Atkins v. Virginia (2002). In particular, we assert that evaluations of adaptive functioning should address assets as well as deficits; seek to identify credible and reliable evidence concerning the developmental period and across the lifespan; distinguish incapacity from the mere absence of adaptive behavior; adhere faithfully to test manual instructions for using standardized measures of adaptive functioning; and account for potential bias on the part of informants. We conclude with brief caveats regarding the standard error of measurement (SEM) in light of Hall, with reference to examples of ordinary life activities that directly illuminate adaptive functioning relevant to capital cases. © 2016 American Academy of Psychiatry and the Law.

  20. Frequency adaptation in controlled stochastic resonance utilizing delayed feedback method: two-pole approximation for response function.

    Science.gov (United States)

    Tutu, Hiroki

    2011-06-01

    Stochastic resonance (SR) enhanced by time-delayed feedback control is studied. The system in the absence of control is described by a Langevin equation for a bistable system, and possesses a usual SR response. The control with the feedback loop, the delay time of which equals to one-half of the period (2π/Ω) of the input signal, gives rise to a noise-induced oscillatory switching cycle between two states in the output time series, while its average frequency is just smaller than Ω in a small noise regime. As the noise intensity D approaches an appropriate level, the noise constructively works to adapt the frequency of the switching cycle to Ω, and this changes the dynamics into a state wherein the phase of the output signal is entrained to that of the input signal from its phase slipped state. The behavior is characterized by power loss of the external signal or response function. This paper deals with the response function based on a dichotomic model. A method of delay-coordinate series expansion, which reduces a non-Markovian transition probability flux to a series of memory fluxes on a discrete delay-coordinate system, is proposed. Its primitive implementation suggests that the method can be a potential tool for a systematic analysis of SR phenomenon with delayed feedback loop. We show that a D-dependent behavior of poles of a finite Laplace transform of the response function qualitatively characterizes the structure of the power loss, and we also show analytical results for the correlation function and the power spectral density.

  1. Alertness function of thalamus in conflict adaptation.

    Science.gov (United States)

    Wang, Xiangpeng; Zhao, Xiaoyue; Xue, Gui; Chen, Antao

    2016-05-15

    Conflict adaptation reflects the ability to improve current conflict resolution based on previously experienced conflict, which is crucial for our goal-directed behaviors. In recent years, the roles of alertness are attracting increasing attention when discussing the generation of conflict adaptation. However, due to the difficulty of manipulating alertness, very limited progress has been made in this line. Inspired by that color may affect alertness, we manipulated background color of experimental task and found that conflict adaptation significantly presented in gray and red backgrounds but did not in blue background. Furthermore, behavioral and functional magnetic resonance imaging results revealed that the modulation of color on conflict adaptation was implemented through changing alertness level. In particular, blue background eliminated conflict adaptation by damping the alertness regulating function of thalamus and the functional connectivity between thalamus and inferior frontal gyrus (IFG). In contrast, in gray and red backgrounds where alertness levels are typically high, the thalamus and the right IFG functioned normally and conflict adaptations were significant. Therefore, the alertness function of thalamus is determinant to conflict adaptation, and thalamus and right IFG are crucial nodes of the neural circuit subserving this ability. Present findings provide new insights into the neural mechanisms of conflict adaptation. Copyright © 2016 Elsevier Inc. All rights reserved.

  2. Can model-free reinforcement learning explain deontological moral judgments?

    Science.gov (United States)

    Ayars, Alisabeth

    2016-05-01

    Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. Copyright © 2016 Elsevier B.V. All rights reserved.

  3. Machine learning methods enable predictive modeling of antibody feature:function relationships in RV144 vaccinees.

    Science.gov (United States)

    Choi, Ickwon; Chung, Amy W; Suscovich, Todd J; Rerks-Ngarm, Supachai; Pitisuttithum, Punnee; Nitayaphan, Sorachai; Kaewkungwal, Jaranit; O'Connell, Robert J; Francis, Donald; Robb, Merlin L; Michael, Nelson L; Kim, Jerome H; Alter, Galit; Ackerman, Margaret E; Bailey-Kellogg, Chris

    2015-04-01

    The adaptive immune response to vaccination or infection can lead to the production of specific antibodies to neutralize the pathogen or recruit innate immune effector cells for help. The non-neutralizing role of antibodies in stimulating effector cell responses may have been a key mechanism of the protection observed in the RV144 HIV vaccine trial. In an extensive investigation of a rich set of data collected from RV144 vaccine recipients, we here employ machine learning methods to identify and model associations between antibody features (IgG subclass and antigen specificity) and effector function activities (antibody dependent cellular phagocytosis, cellular cytotoxicity, and cytokine release). We demonstrate via cross-validation that classification and regression approaches can effectively use the antibody features to robustly predict qualitative and quantitative functional outcomes. This integration of antibody feature and function data within a machine learning framework provides a new, objective approach to discovering and assessing multivariate immune correlates.

  4. Machine learning methods enable predictive modeling of antibody feature:function relationships in RV144 vaccinees.

    Directory of Open Access Journals (Sweden)

    Ickwon Choi

    2015-04-01

    Full Text Available The adaptive immune response to vaccination or infection can lead to the production of specific antibodies to neutralize the pathogen or recruit innate immune effector cells for help. The non-neutralizing role of antibodies in stimulating effector cell responses may have been a key mechanism of the protection observed in the RV144 HIV vaccine trial. In an extensive investigation of a rich set of data collected from RV144 vaccine recipients, we here employ machine learning methods to identify and model associations between antibody features (IgG subclass and antigen specificity and effector function activities (antibody dependent cellular phagocytosis, cellular cytotoxicity, and cytokine release. We demonstrate via cross-validation that classification and regression approaches can effectively use the antibody features to robustly predict qualitative and quantitative functional outcomes. This integration of antibody feature and function data within a machine learning framework provides a new, objective approach to discovering and assessing multivariate immune correlates.

  5. In Situ Preparation of Polyether Amine Functionalized MWCNT Nanofiller as Reinforcing Agents

    Directory of Open Access Journals (Sweden)

    Ayber Yıldrım

    2014-01-01

    Full Text Available In situ preparation of polyether amine functionalized cross-linked multiwalled carbon nanotube (MWCNT nanofillers may improve the thermal and mechanical properties of the composites in which they are used as reinforcing agents. The reduction and functionalization of MWCNT using ethylenediamine in the presence of polyether amine produced stitched MWCNT's due to the presence of two amine (–NH2 functionalities on both sides of the polymer. Polyether amine was chosen to polymerize the carboxylated MWCNT due to its potential to form bonds with the amino groups and carboxyl groups of MWCNT which produces a resin used as polymeric matrix for nanocomposite materials. The attachment of the polyether amine (Jeffamine groups was verified by TGA, FT-IR, XRD, SEM, and Raman spectroscopy. The temperature at which the curing enthalpy is maximum, observed by DSC, was shifted to higher values by adding functionalized MWCNT. SEM images show the polymer formation between MWCNT sheets.

  6. A Conceptual Foundation for Measures of Physical Function and Behavioral Health Function for Social Security Work Disability Evaluation

    Science.gov (United States)

    Marfeo, Elizabeth E.; Haley, Stephen M.; Jette, Alan M.; Eisen, Susan V.; Ni, Pengsheng; Bogusz, Kara; Meterko, Mark; McDonough, Christine M.; Chan, Leighton; Brandt, Diane E.; Rasch, Elizabeth K.

    2014-01-01

    Physical and mental impairments represent the two largest health condition categories for which workers receive Social Security disability benefits. Comprehensive assessment of physical and mental impairments should include aspects beyond medical conditions such as a person’s underlying capabilities as well as activity demands relevant to the context of work. The objective of this paper is to describe the initial conceptual stages of developing new measurement instruments of behavioral health and physical functioning relevant for Social Security work disability evaluation purposes. To outline a clear conceptualization of the constructs to be measured, two content models were developed using structured and informal qualitative approaches. We performed a structured literature review focusing on work disability and incorporating aspects of the International Classification of Functioning, Disability, and Health (ICF) as a unifying taxonomy for framework development. Expert interviews provided advice and consultation to enhance face validity of the resulting content models. The content model for work-related behavioral health function identifies five major domains (1) Behavior Control, (2) Basic Interactions, (3) Temperament and Personality, (4) Adaptability, and (5) Workplace Behaviors. The content model describing physical functioning includes three domains (1) Changing and Maintaining Body Position, (2) Whole Body Mobility, and (3) Carrying, Moving and Handling Objects. These content models informed subsequent measurement properties including item development, measurement scale construction, and provided conceptual coherence guiding future empirical inquiry. The proposed measurement approaches show promise to comprehensively and systematically assess physical and behavioral health functioning relevant to work. PMID:23548543

  7. The dynamic interplay among EFL learners’ ambiguity tolerance, adaptability, cultural intelligence, learning approach, and language achievement

    Directory of Open Access Journals (Sweden)

    Shadi Alahdadi

    2017-01-01

    Full Text Available A key objective of education is to prepare individuals to be fully-functioning learners. This entails developing the cognitive, metacognitive, motivational, cultural, and emotional competencies. The present study aimed to examine the interrelationships among adaptability, tolerance of ambiguity, cultural intelligence, learning approach, and language achievement as manifestations of the above competencies within a single model. The participants comprised one hundred eighty BA and MA Iranian university students studying English language teaching and translation. The instruments used in this study consisted of the translated versions of four questionnaires: second language tolerance of ambiguity scale, adaptability taken from emotional intelligence inventory, cultural intelligence (CQ inventory, and the revised study process questionnaire measuring surface and deep learning. The results estimated via structural equation modeling (SEM revealed that the proposed model containing the variables under study had a good fit with the data. It was found that all the variables except adaptability directly influenced language achievement with deep approach having the highest impact and ambiguity tolerance having the lowest influence. In addition, ambiguity tolerance was a positive and significant predictor of deep approach. CQ was found to be under the influence of both ambiguity tolerance and adaptability. The findings were discussed in the light of the yielded results.

  8. Adaptive and Energy Efficient Walking in a Hexapod Robot under Neuromechanical Control and Sensorimotor Learning

    DEFF Research Database (Denmark)

    Xiong, Xiaofeng; Wörgötter, Florentin; Manoonpong, Poramate

    2016-01-01

    The control of multilegged animal walking is a neuromechanical process, and to achieve this in an adaptive and energy efficient way is a difficult and challenging problem. This is due to the fact that this process needs in real time: 1) to coordinate very many degrees of freedom of jointed legs; 2......) to generate the proper leg stiffness (i.e., compliance); and 3) to determine joint angles that give rise to particular positions at the endpoints of the legs. To tackle this problem for a robotic application, here we present a neuromechanical controller coupled with sensorimotor learning. The controller...... energy efficient walking, compared to other small legged robots. In addition, this paper also shows that the tight combination of neural control with tunable muscle-like functions, guided by sensory feedback and coupled with sensorimotor learning, is a way forward to better understand and solve adaptive...

  9. A learning-based semi-autonomous controller for robotic exploration of unknown disaster scenes while searching for victims.

    Science.gov (United States)

    Doroodgar, Barzin; Liu, Yugang; Nejat, Goldie

    2014-12-01

    Semi-autonomous control schemes can address the limitations of both teleoperation and fully autonomous robotic control of rescue robots in disaster environments by allowing a human operator to cooperate and share such tasks with a rescue robot as navigation, exploration, and victim identification. In this paper, we present a unique hierarchical reinforcement learning-based semi-autonomous control architecture for rescue robots operating in cluttered and unknown urban search and rescue (USAR) environments. The aim of the controller is to enable a rescue robot to continuously learn from its own experiences in an environment in order to improve its overall performance in exploration of unknown disaster scenes. A direction-based exploration technique is integrated in the controller to expand the search area of the robot via the classification of regions and the rubble piles within these regions. Both simulations and physical experiments in USAR-like environments verify the robustness of the proposed HRL-based semi-autonomous controller to unknown cluttered scenes with different sizes and varying types of configurations.

  10. Approaches to Learning to Control Dynamic Uncertainty

    Directory of Open Access Journals (Sweden)

    Magda Osman

    2015-10-01

    Full Text Available In dynamic environments, when faced with a choice of which learning strategy to adopt, do people choose to mostly explore (maximizing their long term gains or exploit (maximizing their short term gains? More to the point, how does this choice of learning strategy influence one’s later ability to control the environment? In the present study, we explore whether people’s self-reported learning strategies and levels of arousal (i.e., surprise, stress correspond to performance measures of controlling a Highly Uncertain or Moderately Uncertain dynamic environment. Generally, self-reports suggest a preference for exploring the environment to begin with. After which, those in the Highly Uncertain environment generally indicated they exploited more than those in the Moderately Uncertain environment; this difference did not impact on performance on later tests of people’s ability to control the dynamic environment. Levels of arousal were also differentially associated with the uncertainty of the environment. Going beyond behavioral data, our model of dynamic decision-making revealed that, in actual fact, there was no difference in exploitation levels between those in the highly uncertain or moderately uncertain environments, but there were differences based on sensitivity to negative reinforcement. We consider the implications of our findings with respect to learning and strategic approaches to controlling dynamic uncertainty.

  11. Adaptive modified function projective synchronization of multiple time-delayed chaotic Rossler system

    International Nuclear Information System (INIS)

    Sudheer, K. Sebastian; Sabir, M.

    2011-01-01

    In this Letter we consider modified function projective synchronization of unidirectionally coupled multiple time-delayed Rossler chaotic systems using adaptive controls. Recently, delay differential equations have attracted much attention in the field of nonlinear dynamics. The high complexity of the multiple time-delayed systems can provide a new architecture for enhancing message security in chaos based encryption systems. Adaptive control can be used for synchronization when the parameters of the system are unknown. Based on Lyapunov stability theory, the adaptive control law and the parameter update law are derived to make the state of two chaotic systems are function projective synchronized. Numerical simulations are presented to demonstrate the effectiveness of the proposed adaptive controllers.

  12. Chaos Synchronization Using Adaptive Dynamic Neural Network Controller with Variable Learning Rates

    Directory of Open Access Journals (Sweden)

    Chih-Hong Kao

    2011-01-01

    Full Text Available This paper addresses the synchronization of chaotic gyros with unknown parameters and external disturbance via an adaptive dynamic neural network control (ADNNC system. The proposed ADNNC system is composed of a neural controller and a smooth compensator. The neural controller uses a dynamic RBF (DRBF network to online approximate an ideal controller. The DRBF network can create new hidden neurons online if the input data falls outside the hidden layer and prune the insignificant hidden neurons online if the hidden neuron is inappropriate. The smooth compensator is designed to compensate for the approximation error between the neural controller and the ideal controller. Moreover, the variable learning rates of the parameter adaptation laws are derived based on a discrete-type Lyapunov function to speed up the convergence rate of the tracking error. Finally, the simulation results which verified the chaotic behavior of two nonlinear identical chaotic gyros can be synchronized using the proposed ADNNC scheme.

  13. Arctigenin from Fructus Arctii (Seed of Burdock) Reinforces Intestinal Barrier Function in Caco-2 Cell Monolayers

    Science.gov (United States)

    Shin, Hee Soon; Jung, Sun Young; Back, Su Yeon; Do, Jeong-Ryong; Shon, Dong-Hwa

    2015-01-01

    Fructus Arctii is used as a traditional herbal medicine to treat inflammatory diseases in oriental countries. This study aimed to investigate effect of F. Arctii extract on intestinal barrier function in human intestinal epithelial Caco-2 cells and to reveal the active component of F. Arctii. We measured transepithelial electrical resistance (TEER) value (as an index of barrier function) and ovalbumin (OVA) permeation (as an index of permeability) to observe the changes of intestinal barrier function. The treatment of F. Arctii increased TEER value and decreased OVA influx on Caco-2 cell monolayers. Furthermore, we found that arctigenin as an active component of F. Arctii increased TEER value and reduced permeability of OVA from apical to the basolateral side but not arctiin. In the present study, we revealed that F. Arctii could enhance intestinal barrier function, and its active component was an arctigenin on the functionality. We expect that the arctigenin from F. Arctii could contribute to prevention of inflammatory, allergic, and infectious diseases by reinforcing intestinal barrier function. PMID:26550018

  14. Arctigenin from Fructus Arctii (Seed of Burdock Reinforces Intestinal Barrier Function in Caco-2 Cell Monolayers

    Directory of Open Access Journals (Sweden)

    Hee Soon Shin

    2015-01-01

    Full Text Available Fructus Arctii is used as a traditional herbal medicine to treat inflammatory diseases in oriental countries. This study aimed to investigate effect of F. Arctii extract on intestinal barrier function in human intestinal epithelial Caco-2 cells and to reveal the active component of F. Arctii. We measured transepithelial electrical resistance (TEER value (as an index of barrier function and ovalbumin (OVA permeation (as an index of permeability to observe the changes of intestinal barrier function. The treatment of F. Arctii increased TEER value and decreased OVA influx on Caco-2 cell monolayers. Furthermore, we found that arctigenin as an active component of F. Arctii increased TEER value and reduced permeability of OVA from apical to the basolateral side but not arctiin. In the present study, we revealed that F. Arctii could enhance intestinal barrier function, and its active component was an arctigenin on the functionality. We expect that the arctigenin from F. Arctii could contribute to prevention of inflammatory, allergic, and infectious diseases by reinforcing intestinal barrier function.

  15. RLAM: A Dynamic and Efficient Reinforcement Learning-Based Adaptive Mapping Scheme in Mobile WiMAX Networks

    Directory of Open Access Journals (Sweden)

    M. Louta

    2014-01-01

    Full Text Available WiMAX (Worldwide Interoperability for Microwave Access constitutes a candidate networking technology towards the 4G vision realization. By adopting the Orthogonal Frequency Division Multiple Access (OFDMA technique, the latest IEEE 802.16x amendments manage to provide QoS-aware access services with full mobility support. A number of interesting scheduling and mapping schemes have been proposed in research literature. However, they neglect a considerable asset of the OFDMA-based wireless systems: the dynamic adjustment of the downlink-to-uplink width ratio. In order to fully exploit the supported mobile WiMAX features, we design, develop, and evaluate a rigorous adaptive model, which inherits its main aspects from the reinforcement learning field. The model proposed endeavours to efficiently determine the downlink-to-uplinkwidth ratio, on a frame-by-frame basis, taking into account both the downlink and uplink traffic in the Base Station (BS. Extensive evaluation results indicate that the model proposed succeeds in providing quite accurate estimations, keeping the average error rate below 15% with respect to the optimal sub-frame configurations. Additionally, it presents improved performance compared to other learning methods (e.g., learning automata and notable improvements compared to static schemes that maintain a fixed predefined ratio in terms of service ratio and resource utilization.

  16. Conceptual foundation for measures of physical function and behavioral health function for Social Security work disability evaluation.

    Science.gov (United States)

    Marfeo, Elizabeth E; Haley, Stephen M; Jette, Alan M; Eisen, Susan V; Ni, Pengsheng; Bogusz, Kara; Meterko, Mark; McDonough, Christine M; Chan, Leighton; Brandt, Diane E; Rasch, Elizabeth K

    2013-09-01

    Physical and mental impairments represent the 2 largest health condition categories for which workers receive Social Security disability benefits. Comprehensive assessment of physical and mental impairments should include aspects beyond medical conditions such as a person's underlying capabilities as well as activity demands relevant to the context of work. The objective of this article is to describe the initial conceptual stages of developing new measurement instruments of behavioral health and physical functioning relevant for Social Security work disability evaluation purposes. To outline a clear conceptualization of the constructs to be measured, 2 content models were developed using structured and informal qualitative approaches. We performed a structured literature review focusing on work disability and incorporating aspects of the International Classification of Functioning, Disability and Health as a unifying taxonomy for framework development. Expert interviews provided advice and consultation to enhance face validity of the resulting content models. The content model for work-related behavioral health function identifies 5 major domains: (1) behavior control, (2) basic interactions, (3) temperament and personality, (4) adaptability, and (5) workplace behaviors. The content model describing physical functioning includes 3 domains: (1) changing and maintaining body position, (2) whole-body mobility, and (3) carrying, moving, and handling objects. These content models informed subsequent measurement properties including item development and measurement scale construction, and provided conceptual coherence guiding future empirical inquiry. The proposed measurement approaches show promise to comprehensively and systematically assess physical and behavioral health functioning relevant to work. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  17. Production functions for climate policy modeling. An empirical analysis

    International Nuclear Information System (INIS)

    Van der Werf, Edwin

    2008-01-01

    Quantitative models for climate policy modeling differ in the production structure used and in the sizes of the elasticities of substitution. The empirical foundation for both is generally lacking. This paper estimates the parameters of 2-level CES production functions with capital, labour and energy as inputs, and is the first to systematically compare all nesting structures. Using industry-level data from 12 OECD countries, we find that the nesting structure where capital and labour are combined first, fits the data best, but for most countries and industries we cannot reject that all three inputs can be put into one single nest. These two nesting structures are used by most climate models. However, while several climate policy models use a Cobb-Douglas function for (part of the) production function, we reject elasticities equal to one, in favour of considerably smaller values. Finally we find evidence for factor-specific technological change. With lower elasticities and with factor-specific technological change, some climate policy models may find a bigger effect of endogenous technological change on mitigating the costs of climate policy. (author)

  18. Emotion in reinforcement learning agents and robots : A survey

    NARCIS (Netherlands)

    Moerland, T.M.; Broekens, D.J.; Jonker, C.M.

    2018-01-01

    This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action

  19. Solution Approach to Automatic Generation Control Problem Using Hybridized Gravitational Search Algorithm Optimized PID and FOPID Controllers

    Directory of Open Access Journals (Sweden)

    DAHIYA, P.

    2015-05-01

    Full Text Available This paper presents the application of hybrid opposition based disruption operator in gravitational search algorithm (DOGSA to solve automatic generation control (AGC problem of four area hydro-thermal-gas interconnected power system. The proposed DOGSA approach combines the advantages of opposition based learning which enhances the speed of convergence and disruption operator which has the ability to further explore and exploit the search space of standard gravitational search algorithm (GSA. The addition of these two concepts to GSA increases its flexibility for solving the complex optimization problems. This paper addresses the design and performance analysis of DOGSA based proportional integral derivative (PID and fractional order proportional integral derivative (FOPID controllers for automatic generation control problem. The proposed approaches are demonstrated by comparing the results with the standard GSA, opposition learning based GSA (OGSA and disruption based GSA (DGSA. The sensitivity analysis is also carried out to study the robustness of DOGSA tuned controllers in order to accommodate variations in operating load conditions, tie-line synchronizing coefficient, time constants of governor and turbine. Further, the approaches are extended to a more realistic power system model by considering the physical constraints such as thermal turbine generation rate constraint, speed governor dead band and time delay.

  20. Air-Breathing Hypersonic Vehicle Tracking Control Based on Adaptive Dynamic Programming.

    Science.gov (United States)

    Mu, Chaoxu; Ni, Zhen; Sun, Changyin; He, Haibo

    2017-03-01

    In this paper, we propose a data-driven supplementary control approach with adaptive learning capability for air-breathing hypersonic vehicle tracking control based on action-dependent heuristic dynamic programming (ADHDP). The control action is generated by the combination of sliding mode control (SMC) and the ADHDP controller to track the desired velocity and the desired altitude. In particular, the ADHDP controller observes the differences between the actual velocity/altitude and the desired velocity/altitude, and then provides a supplementary control action accordingly. The ADHDP controller does not rely on the accurate mathematical model function and is data driven. Meanwhile, it is capable to adjust its parameters online over time under various working conditions, which is very suitable for hypersonic vehicle system with parameter uncertainties and disturbances. We verify the adaptive supplementary control approach versus the traditional SMC in the cruising flight, and provide three simulation studies to illustrate the improved performance with the proposed approach.

  1. Adaptive radial basis function mesh deformation using data reduction

    Science.gov (United States)

    Gillebaart, T.; Blom, D. S.; van Zuijlen, A. H.; Bijl, H.

    2016-09-01

    Radial Basis Function (RBF) mesh deformation is one of the most robust mesh deformation methods available. Using the greedy (data reduction) method in combination with an explicit boundary correction, results in an efficient method as shown in literature. However, to ensure the method remains robust, two issues are addressed: 1) how to ensure that the set of control points remains an accurate representation of the geometry in time and 2) how to use/automate the explicit boundary correction, while ensuring a high mesh quality. In this paper, we propose an adaptive RBF mesh deformation method, which ensures the set of control points always represents the geometry/displacement up to a certain (user-specified) criteria, by keeping track of the boundary error throughout the simulation and re-selecting when needed. Opposed to the unit displacement and prescribed displacement selection methods, the adaptive method is more robust, user-independent and efficient, for the cases considered. Secondly, the analysis of a single high aspect ratio cell is used to formulate an equation for the correction radius needed, depending on the characteristics of the correction function used, maximum aspect ratio, minimum first cell height and boundary error. Based on the analysis two new radial basis correction functions are derived and proposed. This proposed automated procedure is verified while varying the correction function, Reynolds number (and thus first cell height and aspect ratio) and boundary error. Finally, the parallel efficiency is studied for the two adaptive methods, unit displacement and prescribed displacement for both the CPU as well as the memory formulation with a 2D oscillating and translating airfoil with oscillating flap, a 3D flexible locally deforming tube and deforming wind turbine blade. Generally, the memory formulation requires less work (due to the large amount of work required for evaluating RBF's), but the parallel efficiency reduces due to the limited

  2. Teachers' Understanding of the Role of Executive Functions in Mathematics Learning.

    Science.gov (United States)

    Gilmore, Camilla; Cragg, Lucy

    2014-09-01

    Cognitive psychology research has suggested an important role for executive functions, the set of skills that monitor and control thought and action, in learning mathematics. However, there is currently little evidence about whether teachers are aware of the importance of these skills and, if so, how they come by this information. We conducted an online survey of teachers' views on the importance of a range of skills for mathematics learning. Teachers rated executive function skills, and in particular inhibition and shifting, to be important for mathematics. The value placed on executive function skills increased with increasing teaching experience. Most teachers reported that they were aware of these skills, although few knew the term "executive functions." This awareness had come about through their teaching experience rather than from formal instruction. Researchers and teacher educators could do more to highlight the importance of these skills to trainee or new teachers.

  3. Policy implications for familial searching.

    Science.gov (United States)

    Kim, Joyce; Mammo, Danny; Siegel, Marni B; Katsanis, Sara H

    2011-11-01

    In the United States, several states have made policy decisions regarding whether and how to use familial searching of the Combined DNA Index System (CODIS) database in criminal investigations. Familial searching pushes DNA typing beyond merely identifying individuals to detecting genetic relatedness, an application previously reserved for missing persons identifications and custody battles. The intentional search of CODIS for partial matches to an item of evidence offers law enforcement agencies a powerful tool for developing investigative leads, apprehending criminals, revitalizing cold cases and exonerating wrongfully convicted individuals. As familial searching involves a range of logistical, social, ethical and legal considerations, states are now grappling with policy options for implementing familial searching to balance crime fighting with its potential impact on society. When developing policies for familial searching, legislators should take into account the impact of familial searching on select populations and the need to minimize personal intrusion on relatives of individuals in the DNA database. This review describes the approaches used to narrow a suspect pool from a partial match search of CODIS and summarizes the economic, ethical, logistical and political challenges of implementing familial searching. We examine particular US state policies and the policy options adopted to address these issues. The aim of this review is to provide objective background information on the controversial approach of familial searching to inform policy decisions in this area. Herein we highlight key policy options and recommendations regarding effective utilization of familial searching that minimize harm to and afford maximum protection of US citizens.

  4. An adaptive bin framework search method for a beta-sheet protein homopolymer model

    Directory of Open Access Journals (Sweden)

    Hoos Holger H

    2007-04-01

    Full Text Available Abstract Background The problem of protein structure prediction consists of predicting the functional or native structure of a protein given its linear sequence of amino acids. This problem has played a prominent role in the fields of biomolecular physics and algorithm design for over 50 years. Additionally, its importance increases continually as a result of an exponential growth over time in the number of known protein sequences in contrast to a linear increase in the number of determined structures. Our work focuses on the problem of searching an exponentially large space of possible conformations as efficiently as possible, with the goal of finding a global optimum with respect to a given energy function. This problem plays an important role in the analysis of systems with complex search landscapes, and particularly in the context of ab initio protein structure prediction. Results In this work, we introduce a novel approach for solving this conformation search problem based on the use of a bin framework for adaptively storing and retrieving promising locally optimal solutions. Our approach provides a rich and general framework within which a broad range of adaptive or reactive search strategies can be realized. Here, we introduce adaptive mechanisms for choosing which conformations should be stored, based on the set of conformations already stored in memory, and for biasing choices when retrieving conformations from memory in order to overcome search stagnation. Conclusion We show that our bin framework combined with a widely used optimization method, Monte Carlo search, achieves significantly better performance than state-of-the-art generalized ensemble methods for a well-known protein-like homopolymer model on the face-centered cubic lattice.

  5. Appraising Adaptive Management

    Directory of Open Access Journals (Sweden)

    Kai N. Lee

    1999-12-01

    Full Text Available Adaptive management is appraised as a policy implementation approach by examining its conceptual, technical, equity, and practical strengths and limitations. Three conclusions are drawn: (1 Adaptive management has been more influential, so far, as an idea than as a practical means of gaining insight into the behavior of ecosystems utilized and inhabited by humans. (2 Adaptive management should be used only after disputing parties have agreed to an agenda of questions to be answered using the adaptive approach; this is not how the approach has been used. (3 Efficient, effective social learning, of the kind facilitated by adaptive management, is likely to be of strategic importance in governing ecosystems as humanity searches for a sustainable economy.

  6. Robust Adaptive Sliding Mode Control for Generalized Function Projective Synchronization of Different Chaotic Systems with Unknown Parameters

    Directory of Open Access Journals (Sweden)

    Xiuchun Li

    2013-01-01

    Full Text Available When the parameters of both drive and response systems are all unknown, an adaptive sliding mode controller, strongly robust to exotic perturbations, is designed for realizing generalized function projective synchronization. Sliding mode surface is given and the controlled system is asymptotically stable on this surface with the passage of time. Based on the adaptation laws and Lyapunov stability theory, an adaptive sliding controller is designed to ensure the occurrence of the sliding motion. Finally, numerical simulations are presented to verify the effectiveness and robustness of the proposed method even when both drive and response systems are perturbed with external disturbances.

  7. Patients with Parkinson's disease learn to control complex systems-an indication for intact implicit cognitive skill learning.

    Science.gov (United States)

    Witt, Karsten; Daniels, Christine; Daniel, Victoria; Schmitt-Eliassen, Julia; Volkmann, Jens; Deuschl, Günther

    2006-01-01

    Implicit memory and learning mechanisms are composed of multiple processes and systems. Previous studies demonstrated a basal ganglia involvement in purely cognitive tasks that form stimulus response habits by reinforcement learning such as implicit classification learning. We will test the basal ganglia influence on two cognitive implicit tasks previously described by Berry and Broadbent, the sugar production task and the personal interaction task. Furthermore, we will investigate the relationship between certain aspects of an executive dysfunction and implicit learning. To this end, we have tested 22 Parkinsonian patients and 22 age-matched controls on two implicit cognitive tasks, in which participants learned to control a complex system. They interacted with the system by choosing an input value and obtaining an output that was related in a complex manner to the input. The objective was to reach and maintain a specific target value across trials (dynamic system learning). The two tasks followed the same underlying complex rule but had different surface appearances. Subsequently, participants performed an executive test battery including the Stroop test, verbal fluency and the Wisconsin card sorting test (WCST). The results demonstrate intact implicit learning in patients, despite an executive dysfunction in the Parkinsonian group. They lead to the conclusion that the basal ganglia system affected in Parkinson's disease does not contribute to the implicit acquisition of a new cognitive skill. Furthermore, the Parkinsonian patients were able to reach a specific goal in an implicit learning context despite impaired goal directed behaviour in the WCST, a classic test of executive functions. These results demonstrate a functional independence of implicit cognitive skill learning and certain aspects of executive functions.

  8. Support patient search on pathology reports with interactive online learning based data extraction

    Directory of Open Access Journals (Sweden)

    Shuai Zheng

    2015-01-01

    Full Text Available Background: Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user′s interaction with minimal human effort. Methods : We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system′s data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users′ corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. Results: We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of

  9. A novel model of motor learning capable of developing an optimal movement control law online from scratch.

    Science.gov (United States)

    Shimansky, Yury P; Kang, Tao; He, Jiping

    2004-02-01

    A computational model of a learning system (LS) is described that acquires knowledge and skill necessary for optimal control of a multisegmental limb dynamics (controlled object or CO), starting from "knowing" only the dimensionality of the object's state space. It is based on an optimal control problem setup different from that of reinforcement learning. The LS solves the optimal control problem online while practicing the manipulation of CO. The system's functional architecture comprises several adaptive components, each of which incorporates a number of mapping functions approximated based on artificial neural nets. Besides the internal model of the CO's dynamics and adaptive controller that computes the control law, the LS includes a new type of internal model, the minimal cost (IM(mc)) of moving the controlled object between a pair of states. That internal model appears critical for the LS's capacity to develop an optimal movement trajectory. The IM(mc) interacts with the adaptive controller in a cooperative manner. The controller provides an initial approximation of an optimal control action, which is further optimized in real time based on the IM(mc). The IM(mc) in turn provides information for updating the controller. The LS's performance was tested on the task of center-out reaching to eight randomly selected targets with a 2DOF limb model. The LS reached an optimal level of performance in a few tens of trials. It also quickly adapted to movement perturbations produced by two different types of external force field. The results suggest that the proposed design of a self-optimized control system can serve as a basis for the modeling of motor learning that includes the formation and adaptive modification of the plan of a goal-directed movement.

  10. Reinforcement Learning with Autonomous Small Unmanned Aerial Vehicles in Cluttered Environments

    Science.gov (United States)

    Tran, Loc; Cross, Charles; Montague, Gilbert; Motter, Mark; Neilan, James; Qualls, Garry; Rothhaar, Paul; Trujillo, Anna; Allen, B. Danette

    2015-01-01

    We present ongoing work in the Autonomy Incubator at NASA Langley Research Center (LaRC) exploring the efficacy of a data set aggregation approach to reinforcement learning for small unmanned aerial vehicle (sUAV) flight in dense and cluttered environments with reactive obstacle avoidance. The goal is to learn an autonomous flight model using training experiences from a human piloting a sUAV around static obstacles. The training approach uses video data from a forward-facing camera that records the human pilot's flight. Various computer vision based features are extracted from the video relating to edge and gradient information. The recorded human-controlled inputs are used to train an autonomous control model that correlates the extracted feature vector to a yaw command. As part of the reinforcement learning approach, the autonomous control model is iteratively updated with feedback from a human agent who corrects undesired model output. This data driven approach to autonomous obstacle avoidance is explored for simulated forest environments furthering autonomous flight under the tree canopy research. This enables flight in previously inaccessible environments which are of interest to NASA researchers in Earth and Atmospheric sciences.

  11. Predictive Feature Selection for Genetic Policy Search

    Science.gov (United States)

    2014-05-22

    limited manual intervention are becoming increasingly desirable as more complex tasks in dynamic and high- tempo environments are explored. Reinforcement...states in many domains causes features relevant to the reward variations to be overlooked, which hinders the policy search. 3.4 Parameter Selection PFS...the current feature subset. This local minimum may be “deceptive,” meaning that it does not clearly lead to the global optimal policy ( Goldberg and

  12. Feedback error learning controller for functional electrical stimulation assistance in a hybrid robotic system for reaching rehabilitation

    Directory of Open Access Journals (Sweden)

    Francisco Resquín

    2016-07-01

    Full Text Available Hybrid robotic systems represent a novel research field, where functional electrical stimulation (FES is combined with a robotic device for rehabilitation of motor impairment. Under this approach, the design of robust FES controllers still remains an open challenge. In this work, we aimed at developing a learning FES controller to assist in the performance of reaching movements in a simple hybrid robotic system setting. We implemented a Feedback Error Learning (FEL control strategy consisting of a feedback PID controller and a feedforward controller based on a neural network. A passive exoskeleton complemented the FES controller by compensating the effects of gravity. We carried out experiments with healthy subjects to validate the performance of the system. Results show that the FEL control strategy is able to adjust the FES intensity to track the desired trajectory accurately without the need of a previous mathematical model.

  13. Controlled growth of silica-titania hybrid functional nanoparticles through a multistep microfluidic approach.

    Science.gov (United States)

    Shiba, K; Sugiyama, T; Takei, T; Yoshikawa, G

    2015-11-11

    Silica/titania-based functional nanoparticles were prepared through controlled nucleation of titania and subsequent encapsulation by silica through a multistep microfluidic approach, which was successfully applied to obtaining aminopropyl-functionalized silica/titania nanoparticles for a highly sensitive humidity sensor.

  14. When structure affects function--the need for partial volume effect correction in functional and resting state magnetic resonance imaging studies.

    Science.gov (United States)

    Dukart, Juergen; Bertolino, Alessandro

    2014-01-01

    Both functional and also more recently resting state magnetic resonance imaging have become established tools to investigate functional brain networks. Most studies use these tools to compare different populations without controlling for potential differences in underlying brain structure which might affect the functional measurements of interest. Here, we adapt a simulation approach combined with evaluation of real resting state magnetic resonance imaging data to investigate the potential impact of partial volume effects on established functional and resting state magnetic resonance imaging analyses. We demonstrate that differences in the underlying structure lead to a significant increase in detected functional differences in both types of analyses. Largest increases in functional differences are observed for highest signal-to-noise ratios and when signal with the lowest amount of partial volume effects is compared to any other partial volume effect constellation. In real data, structural information explains about 25% of within-subject variance observed in degree centrality--an established resting state connectivity measurement. Controlling this measurement for structural information can substantially alter correlational maps obtained in group analyses. Our results question current approaches of evaluating these measurements in diseased population with known structural changes without controlling for potential differences in these measurements.

  15. Model-free adaptive control optimization using a chaotic particle swarm approach

    Energy Technology Data Exchange (ETDEWEB)

    Santos Coelho, Leandro dos [Industrial and Systems Engineering Graduate Program, LAS/PPGEPS, Pontifical Catholic University of Parana, PUCPR, Imaculada Conceicao, 1155, 80215-901 Curitiba, Parana (Brazil)], E-mail: leandro.coelho@pucpr.br; Rodrigues Coelho, Antonio Augusto [Department of Automation and Systems, Federal University of Santa Catarina, Box 476, 88040-900 Florianopolis, Santa Catarina (Brazil)], E-mail: aarc@das.ufsc.br

    2009-08-30

    It is well known that conventional control theories are widely suited for applications where the processes can be reasonably described in advance. However, when the plant's dynamics are hard to characterize precisely or are subject to environmental uncertainties, one may encounter difficulties in applying the conventional controller design methodologies. Despite the difficulty in achieving high control performance, the fine tuning of controller parameters is a tedious task that always requires experts with knowledge in both control theory and process information. Nowadays, more and more studies have focused on the development of adaptive control algorithms that can be directly applied to complex processes whose dynamics are poorly modeled and/or have severe nonlinearities. In this context, the design of a Model-Free Learning Adaptive Control (MFLAC) based on pseudo-gradient concepts and optimization procedure by a Particle Swarm Optimization (PSO) approach using constriction coefficient and Henon chaotic sequences (CPSOH) is presented in this paper. PSO is a stochastic global optimization technique inspired by social behavior of bird flocking. The PSO models the exploration of a problem space by a population of particles. Each particle in PSO has a randomized velocity associated to it, which moves through the space of the problem. Since chaotic mapping enjoys certainty, ergodicity and the stochastic property, the proposed CPSOH introduces chaos mapping which introduces some flexibility in particle movements in each iteration. The chaotic sequences allow also explorations at early stages and exploitations at later stages during the search procedure of CPSOH. Motivation for application of CPSOH approach is to overcome the limitation of the conventional MFLAC design, which cannot guarantee satisfactory control performance when the plant has different gains for the operational range when designed by trial-and-error by user. Numerical results of the MFLAC with

  16. Model-free adaptive control optimization using a chaotic particle swarm approach

    International Nuclear Information System (INIS)

    Santos Coelho, Leandro dos; Rodrigues Coelho, Antonio Augusto

    2009-01-01

    It is well known that conventional control theories are widely suited for applications where the processes can be reasonably described in advance. However, when the plant's dynamics are hard to characterize precisely or are subject to environmental uncertainties, one may encounter difficulties in applying the conventional controller design methodologies. Despite the difficulty in achieving high control performance, the fine tuning of controller parameters is a tedious task that always requires experts with knowledge in both control theory and process information. Nowadays, more and more studies have focused on the development of adaptive control algorithms that can be directly applied to complex processes whose dynamics are poorly modeled and/or have severe nonlinearities. In this context, the design of a Model-Free Learning Adaptive Control (MFLAC) based on pseudo-gradient concepts and optimization procedure by a Particle Swarm Optimization (PSO) approach using constriction coefficient and Henon chaotic sequences (CPSOH) is presented in this paper. PSO is a stochastic global optimization technique inspired by social behavior of bird flocking. The PSO models the exploration of a problem space by a population of particles. Each particle in PSO has a randomized velocity associated to it, which moves through the space of the problem. Since chaotic mapping enjoys certainty, ergodicity and the stochastic property, the proposed CPSOH introduces chaos mapping which introduces some flexibility in particle movements in each iteration. The chaotic sequences allow also explorations at early stages and exploitations at later stages during the search procedure of CPSOH. Motivation for application of CPSOH approach is to overcome the limitation of the conventional MFLAC design, which cannot guarantee satisfactory control performance when the plant has different gains for the operational range when designed by trial-and-error by user. Numerical results of the MFLAC with CPSOH

  17. Oxytocin selectively facilitates learning with social feedback and increases activity and functional connectivity in emotional memory and reward processing regions.

    Science.gov (United States)

    Hu, Jiehui; Qi, Song; Becker, Benjamin; Luo, Lizhu; Gao, Shan; Gong, Qiyong; Hurlemann, René; Kendrick, Keith M

    2015-06-01

    In male Caucasian subjects, learning is facilitated by receipt of social compared with non-social feedback, and the neuropeptide oxytocin (OXT) facilitates this effect. In this study, we have first shown a cultural difference in that male Chinese subjects actually perform significantly worse in the same reinforcement associated learning task with social (emotional faces) compared with non-social feedback. Nevertheless, in two independent double-blind placebo (PLC) controlled between-subject design experiments we found OXT still selectively facilitated learning with social feedback. Similar to Caucasian subjects this OXT effect was strongest with feedback using female rather than male faces. One experiment performed in conjunction with functional magnetic resonance imaging showed that during the response, but not feedback phase of the task, OXT selectively increased activity in the amygdala, hippocampus, parahippocampal gyrus and putamen during the social feedback condition, and functional connectivity between the amygdala and insula and caudate. Therefore, OXT may be increasing the salience and reward value of anticipated social feedback. In the PLC group, response times and state anxiety scores during social feedback were associated with signal changes in these same regions but not in the OXT group. OXT may therefore have also facilitated learning by reducing anxiety in the social feedback condition. Overall our results provide the first evidence for cultural differences in social facilitation of learning per se, but a similar selective enhancement of learning with social feedback under OXT. This effect of OXT may be associated with enhanced responses and functional connectivity in emotional memory and reward processing regions. © 2015 Wiley Periodicals, Inc.

  18. Simultaneous search for symmetry-related molecules in cross-rotation functions

    International Nuclear Information System (INIS)

    Yeates, T.O.

    1989-01-01

    In a typical cross-rotation function, the Patterson function of a single search molecule is compared with an observed Patterson function, which contains a set of symmetry-related intramolecular vector sets. In principle, it is better to search for the symmetry-related molecules simultaneously, and Nordman has reported success with an algorithm of this type. In this paper, the differences between the ordinary search and a simultaneous search are investigated, and it is shown that the combined presence of crystallographic symmetry and approximate symmetry of a search model may lead to significant bias in conventional rotation functions. The nature and magnitude of this symmetry bias are discussed. An efficient algorithm is derived for generating a modified unbiased cross-rotation function map from conventional rotation functions. Two examples are described that demonstrate improvement in the quality of the rotation function maps and the ability to obtain physically meaningful correlation coefficients. (orig.)

  19. The added value of a gaming context and intelligent adaptation for a mobile application for vocabulary learning

    NARCIS (Netherlands)

    Sandberg, J.; Maris, M.; Hoogendoorn, P.

    2014-01-01

    Two groups participated in a study on the added value of a gaming context and intelligent adaptation for a mobile learning application. The control group worked at home for a fortnight with the original Mobile English Learning application (MEL-original) developed in a previous project. The

  20. Integrated Locomotor Function Tests for Countermeasure Evaluation

    Science.gov (United States)

    Bloomberg, J. J.; Mulavara, A. P.; Peters, B. T.; Cohen, H. S.; Landsness, E. C.; Black, F. O.

    2005-01-01

    Following spaceflight crewmembers experience locomotor dysfunction due to inflight adaptive alterations in sensorimotor function. Countermeasures designed to mitigate these postflight gait alterations need to be assessed with a new generation of tests that evaluate the interaction of various sensorimotor sub-systems central to locomotor control. The goal of the present study was to develop new functional tests of locomotor control that could be used to test the efficacy of countermeasures. These tests were designed to simultaneously examine the function of multiple sensorimotor systems underlying the control of locomotion and be operationally relevant to the astronaut population. Traditionally, gaze stabilization has been studied almost exclusively in seated subjects performing target acquisition tasks requiring only the involvement of coordinated eye-head movements. However, activities like walking involve full-body movement and require coordination between lower limbs and the eye-head-trunk complex to achieve stabilized gaze during locomotion. Therefore the first goal of this study was to determine how the multiple, interdependent, full-body sensorimotor gaze stabilization subsystems are functionally coordinated during locomotion. In an earlier study we investigated how alteration in gaze tasking changes full-body locomotor control strategies. Subjects walked on a treadmill and either focused on a central point target or read numeral characters. We measured: temporal parameters of gait, full body sagittal plane segmental kinematics of the head, trunk, thigh, shank and foot, accelerations along the vertical axis at the head and the shank, and the vertical forces acting on the support surface. In comparison to the point target fixation condition, the results of the number reading task showed that compensatory head pitch movements increased, peak head acceleration was reduced and knee flexion at heel-strike was increased. In a more recent study we investigated the

  1. Closing the achievement gap through modification of neurocognitive and neuroendocrine function: results from a cluster randomized controlled trial of an innovative approach to the education of children in kindergarten.

    Directory of Open Access Journals (Sweden)

    Clancy Blair

    Full Text Available Effective early education is essential for academic achievement and positive life outcomes, particularly for children in poverty. Advances in neuroscience suggest that a focus on self-regulation in education can enhance children's engagement in learning and establish beneficial academic trajectories in the early elementary grades. Here, we experimentally evaluate an innovative approach to the education of children in kindergarten that embeds support for self-regulation, particularly executive functions, into literacy, mathematics, and science learning activities. Results from a cluster randomized controlled trial involving 29 schools, 79 classrooms, and 759 children indicated positive effects on executive functions, reasoning ability, the control of attention, and levels of salivary cortisol and alpha amylase. Results also demonstrated improvements in reading, vocabulary, and mathematics at the end of kindergarten that increased into the first grade. A number of effects were specific to high-poverty schools, suggesting that a focus on executive functions and associated aspects of self-regulation in early elementary education holds promise for closing the achievement gap.

  2. Closing the achievement gap through modification of neurocognitive and neuroendocrine function: results from a cluster randomized controlled trial of an innovative approach to the education of children in kindergarten.

    Science.gov (United States)

    Blair, Clancy; Raver, C Cybele

    2014-01-01

    Effective early education is essential for academic achievement and positive life outcomes, particularly for children in poverty. Advances in neuroscience suggest that a focus on self-regulation in education can enhance children's engagement in learning and establish beneficial academic trajectories in the early elementary grades. Here, we experimentally evaluate an innovative approach to the education of children in kindergarten that embeds support for self-regulation, particularly executive functions, into literacy, mathematics, and science learning activities. Results from a cluster randomized controlled trial involving 29 schools, 79 classrooms, and 759 children indicated positive effects on executive functions, reasoning ability, the control of attention, and levels of salivary cortisol and alpha amylase. Results also demonstrated improvements in reading, vocabulary, and mathematics at the end of kindergarten that increased into the first grade. A number of effects were specific to high-poverty schools, suggesting that a focus on executive functions and associated aspects of self-regulation in early elementary education holds promise for closing the achievement gap.

  3. Auditory and Visual Working Memory Functioning in College Students with Attention-Deficit/Hyperactivity Disorder and/or Learning Disabilities.

    Science.gov (United States)

    Liebel, Spencer W; Nelson, Jason M

    2017-12-01

    We investigated the auditory and visual working memory functioning in college students with attention-deficit/hyperactivity disorder, learning disabilities, and clinical controls. We examined the role attention-deficit/hyperactivity disorder subtype status played in working memory functioning. The unique influence that both domains of working memory have on reading and math abilities was investigated. A sample of 268 individuals seeking postsecondary education comprise four groups of the present study: 110 had an attention-deficit/hyperactivity disorder diagnosis only, 72 had a learning disability diagnosis only, 35 had comorbid attention-deficit/hyperactivity disorder and learning disability diagnoses, and 60 individuals without either of these disorders comprise a clinical control group. Participants underwent a comprehensive neuropsychological evaluation, and licensed psychologists employed a multi-informant, multi-method approach in obtaining diagnoses. In the attention-deficit/hyperactivity disorder only group, there was no difference between auditory and visual working memory functioning, t(100) = -1.57, p = .12. In the learning disability group, however, auditory working memory functioning was significantly weaker compared with visual working memory, t(71) = -6.19, p attention-deficit/hyperactivity disorder only group, there were no auditory or visual working memory functioning differences between participants with either a predominantly inattentive type or a combined type diagnosis. Visual working memory did not incrementally contribute to the prediction of academic achievement skills. Individuals with attention-deficit/hyperactivity disorder did not demonstrate significant working memory differences compared with clinical controls. Individuals with a learning disability demonstrated weaker auditory working memory than individuals in either the attention-deficit/hyperactivity or clinical control groups. © The Author 2017. Published by Oxford University

  4. Functional connectivity changes in second language vocabulary learning.

    Science.gov (United States)

    Ghazi Saidi, Ladan; Perlbarg, Vincent; Marrelec, Guillaume; Pélégrini-Issac, Mélani; Benali, Habib; Ansaldo, Ana-Inés

    2013-01-01

    Functional connectivity changes in the language network (Price, 2010), and in a control network involved in second language (L2) processing (Abutalebi & Green, 2007) were examined in a group of Persian (L1) speakers learning French (L2) words. Measures of network integration that characterize the global integrative state of a network (Marrelec, Bellec et al., 2008) were gathered, in the shallow and consolidation phases of L2 vocabulary learning. Functional connectivity remained unchanged across learning phases for L1, whereas total, between- and within-network integration levels decreased as proficiency for L2 increased. The results of this study provide the first functional connectivity evidence regarding the dynamic role of the language processing and cognitive control networks in L2 learning (Abutalebi, Cappa, & Perani, 2005; Altarriba & Heredia, 2008; Leonard et al., 2011; Parker-Jones et al., 2011). Thus, increased proficiency results in a higher degree of automaticity and lower cognitive effort (Segalowitz & Hulstijn, 2005). Copyright © 2012 Elsevier Inc. All rights reserved.

  5. Dopaminergic control of motivation and reinforcement learning: a closed-circuit account for reward-oriented behavior.

    Science.gov (United States)

    Morita, Kenji; Morishima, Mieko; Sakai, Katsuyuki; Kawaguchi, Yasuo

    2013-05-15

    Humans and animals take actions quickly when they expect that the actions lead to reward, reflecting their motivation. Injection of dopamine receptor antagonists into the striatum has been shown to slow such reward-seeking behavior, suggesting that dopamine is involved in the control of motivational processes. Meanwhile, neurophysiological studies have revealed that phasic response of dopamine neurons appears to represent reward prediction error, indicating that dopamine plays central roles in reinforcement learning. However, previous attempts to elucidate the mechanisms of these dopaminergic controls have not fully explained how the motivational and learning aspects are related and whether they can be understood by the way the activity of dopamine neurons itself is controlled by their upstream circuitries. To address this issue, we constructed a closed-circuit model of the corticobasal ganglia system based on recent findings regarding intracortical and corticostriatal circuit architectures. Simulations show that the model could reproduce the observed distinct motivational effects of D1- and D2-type dopamine receptor antagonists. Simultaneously, our model successfully explains the dopaminergic representation of reward prediction error as observed in behaving animals during learning tasks and could also explain distinct choice biases induced by optogenetic stimulation of the D1 and D2 receptor-expressing striatal neurons. These results indicate that the suggested roles of dopamine in motivational control and reinforcement learning can be understood in a unified manner through a notion that the indirect pathway of the basal ganglia represents the value of states/actions at a previous time point, an empirically driven key assumption of our model.

  6. The Role of Executive Functions for Dyadic Literacy Learning in Kindergarten

    Science.gov (United States)

    van de Sande, Eva; Segers, Eliane; Verhoeven, Ludo

    2018-01-01

    The current study used a dyadic and coconstructive approach to examine how to embed exercises that support executive functioning into early literacy instruction to empower its effects. Using a randomized controlled trial design with 100 children, we examined the effects of dyadic activities in which children scaffolded each other's learning and…

  7. Disrupted expected value signaling in youth with disruptive behavior disorders to environmental reinforcers.

    Science.gov (United States)

    White, Stuart F; Fowler, Katherine A; Sinclair, Stephen; Schechter, Julia C; Majestic, Catherine M; Pine, Daniel S; Blair, R James

    2014-05-01

    Youth with disruptive behavior disorders (DBD), including conduct disorder (CD) and oppositional defiant disorder (ODD), have difficulties in reinforcement-based decision making, the neural basis of which is poorly understood. Studies examining decision making in youth with DBD have revealed reduced reward responses within the ventromedial prefrontal cortex/orbitofrontal cortex (vmPFC/OFC), increased responses to unexpected punishment within the vmPFC and striatum, and reduced use of expected value information in the anterior insula cortex and dorsal anterior cingulate cortex during the avoidance of suboptimal choices. Previous work has used only monetary reinforcement. The current study examined whether dysfunction in youth with DBD during decision making extended to environmental reinforcers. A total of 30 youth (15 healthy youth and 15 youth with DBD) completed a novel reinforcement-learning paradigm using environmental reinforcers (physical threat images, e.g., striking snake image; contamination threat images, e.g., rotting food; appetitive images, e.g., puppies) while undergoing functional magnetic resonance imaging (fMRI). Behaviorally, healthy youth were significantly more likely to avoid physical threat, but not contamination threat, stimuli than youth with DBD. Imaging results revealed that youth with DBD showed significantly reduced use of expected value information in the bilateral caudate, thalamus, and posterior cingulate cortex during the avoidance of suboptimal responses. The current data suggest that youth with DBD show deficits to environmental reinforcers similar to the deficits seen to monetary reinforcers. Importantly, this deficit was unrelated to callous-unemotional (CU) traits, suggesting that caudate impairment may be a common deficit across youth with DBD. Published by Elsevier Inc.

  8. Systemic-Functional Approach to Utilities Supplys

    Directory of Open Access Journals (Sweden)

    Nikolay I. Komkov

    2017-01-01

    Full Text Available Purpose: the purpose of the article consists in statement of management approach to development of utilities supply processes based on conflict situations decision – making search. It had appeared in the period of the transition from the planned and directive management to market development. Methods: the research methodology is based on the system analysis of full life cycle processes functioning, forecasting of complex systems development, mathematical modeling of processes of services supply and innovative and investment projects modeling as well as development of supplying services processes. Results: the results of the work are concentrated in the presentation of systemic-functional approach to managing the development of processes of municipal services, able to resolve conflict situations in this sphere. Conclusions and Relevance: the traditional management approach on the basis of elimination of "bottlenecks" and emergencies prevailing within planned and directive system at its transformation in the market conditions has led to accumulation of conflict situations and unsolvable problems. The offered systemic-functional approach based on forecasting of full life cycle of the modernized processes and the services providing systems allows to consider costs of modernization, prime cost and quality of the rendered services. 

  9. Teachers' Understanding of the Role of Executive Functions in Mathematics Learning

    Science.gov (United States)

    Gilmore, Camilla; Cragg, Lucy

    2014-01-01

    Cognitive psychology research has suggested an important role for executive functions, the set of skills that monitor and control thought and action, in learning mathematics. However, there is currently little evidence about whether teachers are aware of the importance of these skills and, if so, how they come by this information. We conducted an online survey of teachers' views on the importance of a range of skills for mathematics learning. Teachers rated executive function skills, and in particular inhibition and shifting, to be important for mathematics. The value placed on executive function skills increased with increasing teaching experience. Most teachers reported that they were aware of these skills, although few knew the term “executive functions.” This awareness had come about through their teaching experience rather than from formal instruction. Researchers and teacher educators could do more to highlight the importance of these skills to trainee or new teachers. PMID:25674156

  10. Episodic memories predict adaptive value-based decision-making

    Science.gov (United States)

    Murty, Vishnu; FeldmanHall, Oriel; Hunter, Lindsay E.; Phelps, Elizabeth A; Davachi, Lila

    2016-01-01

    Prior research illustrates that memory can guide value-based decision-making. For example, previous work has implicated both working memory and procedural memory (i.e., reinforcement learning) in guiding choice. However, other types of memories, such as episodic memory, may also influence decision-making. Here we test the role for episodic memory—specifically item versus associative memory—in supporting value-based choice. Participants completed a task where they first learned the value associated with trial unique lotteries. After a short delay, they completed a decision-making task where they could choose to re-engage with previously encountered lotteries, or new never before seen lotteries. Finally, participants completed a surprise memory test for the lotteries and their associated values. Results indicate that participants chose to re-engage more often with lotteries that resulted in high versus low rewards. Critically, participants not only formed detailed, associative memories for the reward values coupled with individual lotteries, but also exhibited adaptive decision-making only when they had intact associative memory. We further found that the relationship between adaptive choice and associative memory generalized to more complex, ecologically valid choice behavior, such as social decision-making. However, individuals more strongly encode experiences of social violations—such as being treated unfairly, suggesting a bias for how individuals form associative memories within social contexts. Together, these findings provide an important integration of episodic memory and decision-making literatures to better understand key mechanisms supporting adaptive behavior. PMID:26999046

  11. Reinforcement learning in computer vision

    Science.gov (United States)

    Bernstein, A. V.; Burnaev, E. V.

    2018-04-01

    Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.

  12. A New Approach to Teaching Biomechanics Through Active, Adaptive, and Experiential Learning.

    Science.gov (United States)

    Singh, Anita

    2017-07-01

    Demand of biomedical engineers continues to rise to meet the needs of healthcare industry. Current training of bioengineers follows the traditional and dominant model of theory-focused curricula. However, the unmet needs of the healthcare industry warrant newer skill sets in these engineers. Translational training strategies such as solving real world problems through active, adaptive, and experiential learning hold promise. In this paper, we report our findings of adding a real-world 4-week problem-based learning unit into a biomechanics capstone course for engineering students. Surveys assessed student perceptions of the activity and learning experience. While students, across three cohorts, felt challenged to solve a real-world problem identified during the simulation lab visit, they felt more confident in utilizing knowledge learned in the biomechanics course and self-directed research. Instructor evaluations indicated that the active and experiential learning approach fostered their technical knowledge and life-long learning skills while exposing them to the components of adaptive learning and innovation.

  13. Hamiltonian-Driven Adaptive Dynamic Programming for Continuous Nonlinear Dynamical Systems.

    Science.gov (United States)

    Yang, Yongliang; Wunsch, Donald; Yin, Yixin

    2017-08-01

    This paper presents a Hamiltonian-driven framework of adaptive dynamic programming (ADP) for continuous time nonlinear systems, which consists of evaluation of an admissible control, comparison between two different admissible policies with respect to the corresponding the performance function, and the performance improvement of an admissible control. It is showed that the Hamiltonian can serve as the temporal difference for continuous-time systems. In the Hamiltonian-driven ADP, the critic network is trained to output the value gradient. Then, the inner product between the critic and the system dynamics produces the value derivative. Under some conditions, the minimization of the Hamiltonian functional is equivalent to the value function approximation. An iterative algorithm starting from an arbitrary admissible control is presented for the optimal control approximation with its convergence proof. The implementation is accomplished by a neural network approximation. Two simulation studies demonstrate the effectiveness of Hamiltonian-driven ADP.

  14. Time representation in reinforcement learning models of the basal ganglia

    Directory of Open Access Journals (Sweden)

    Samuel Joseph Gershman

    2014-01-01

    Full Text Available Reinforcement learning models have been influential in understanding many aspects of basal ganglia function, from reward prediction to action selection. Time plays an important role in these models, but there is still no theoretical consensus about what kind of time representation is used by the basal ganglia. We review several theoretical accounts and their supporting evidence. We then discuss the relationship between reinforcement learning models and the timing mechanisms that have been attributed to the basal ganglia. We hypothesize that a single computational system may underlie both reinforcement learning and interval timing—the perception of duration in the range of seconds to hours. This hypothesis, which extends earlier models by incorporating a time-sensitive action selection mechanism, may have important implications for understanding disorders like Parkinson's disease in which both decision making and timing are impaired.

  15. Robust model reference adaptive output feedback tracking for uncertain linear systems with actuator fault based on reinforced dead-zone modification.

    Science.gov (United States)

    Bagherpoor, H M; Salmasi, Farzad R

    2015-07-01

    In this paper, robust model reference adaptive tracking controllers are considered for Single-Input Single-Output (SISO) and Multi-Input Multi-Output (MIMO) linear systems containing modeling uncertainties, unknown additive disturbances and actuator fault. Two new lemmas are proposed for both SISO and MIMO, under which dead-zone modification rule is improved such that the tracking error for any reference signal tends to zero in such systems. In the conventional approach, adaption of the controller parameters is ceased inside the dead-zone region which results tracking error, while preserving the system stability. In the proposed scheme, control signal is reinforced with an additive term based on tracking error inside the dead-zone which results in full reference tracking. In addition, no Fault Detection and Diagnosis (FDD) unit is needed in the proposed approach. Closed loop system stability and zero tracking error are proved by considering a suitable Lyapunov functions candidate. It is shown that the proposed control approach can assure that all the signals of the close loop system are bounded in faulty conditions. Finally, validity and performance of the new schemes have been illustrated through numerical simulations of SISO and MIMO systems in the presence of actuator faults, modeling uncertainty and output disturbance. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.

  16. Automated cross-modal mapping in robotic eye/hand systems using plastic radial basis function networks

    Science.gov (United States)

    Meng, Qinggang; Lee, M. H.

    2007-03-01

    Advanced autonomous artificial systems will need incremental learning and adaptive abilities similar to those seen in humans. Knowledge from biology, psychology and neuroscience is now inspiring new approaches for systems that have sensory-motor capabilities and operate in complex environments. Eye/hand coordination is an important cross-modal cognitive function, and is also typical of many of the other coordinations that must be involved in the control and operation of embodied intelligent systems. This paper examines a biologically inspired approach for incrementally constructing compact mapping networks for eye/hand coordination. We present a simplified node-decoupled extended Kalman filter for radial basis function networks, and compare this with other learning algorithms. An experimental system consisting of a robot arm and a pan-and-tilt head with a colour camera is used to produce results and test the algorithms in this paper. We also present three approaches for adapting to structural changes during eye/hand coordination tasks, and the robustness of the algorithms under noise are investigated. The learning and adaptation approaches in this paper have similarities with current ideas about neural growth in the brains of humans and animals during tool-use, and infants during early cognitive development.

  17. Functionalized graphene oxide-reinforced electrospun carbon nanofibers as ultrathin supercapacitor electrode

    Institute of Scientific and Technical Information of China (English)

    W.K.Chee; H.N.Lim; Y.Andou; Z.Zainal; A.A.B.Hamra; I.Harrison; M.Altarawneh; Z.T.Jiang; N.M.Huang

    2017-01-01

    Graphene oxide has been used widely as a starting precursor for applications that cater to the needs of tunable graphene. However, the hydrophilic characteristic limits their application, especially in a hydrophobic condition. Herein, a novel non-covalent surface modification approach towards graphene oxide was conducted via a UV-induced photo-polymerization technique that involves two major routes; a UV-sensitive initiator embedded via pi-pi interactions on the graphene planar rings, and the polymerization of hydrophobic polymeric chains along the surface. The functionalized graphene oxide successfully achieved the desired hydrophobicity as it displayed the characteristic of being readily dissolved in organic solvent. Upon its addition into a polymeric solution and subjected to an electrospinning process,non-woven random nanofibers embedded with graphene oxide sheets were obtained. The prepared polymeric nanofibers were subjected to two-step thermal treatments that eventually converted the polymeric chains into a carbon-rich conductive structure. A unique morphology was observed upon the addition of the functionalized graphene oxide, whereby the sheets were embedded and intercalated within the carbon nanofibers and formed a continuous structure. This reinforcement effectively enhanced the electrochemical performance of the carbon nanofibers by recording a specific capacitance of up to 140.10 F/g at the current density of 1 A/g, which was approximately three folds more than that of pristine nanofibers.It also retained the capacitance up to 96.2% after 1000 vigorous charge/discharge cycles. This functionalization technique opens up a new pathway in tuning the solubility nature of graphene oxide towards the synthesis of a graphene oxide-reinforced polymeric structure.

  18. Optimized Aircraft Electric Control System Based on Adaptive Tabu Search Algorithm and Fuzzy Logic Control

    Directory of Open Access Journals (Sweden)

    Saifullah Khalid

    2016-09-01

    Full Text Available Three conventional control constant instantaneous power control, sinusoidal current control, and synchronous reference frame techniques for extracting reference currents for shunt active power filters have been optimized using Fuzzy Logic control and Adaptive Tabu search Algorithm and their performances have been compared. Critical analysis of Comparison of the compensation ability of different control strategies based on THD and speed will be done, and suggestions will be given for the selection of technique to be used. The simulated results using MATLAB model are presented, and they will clearly prove the value of the proposed control method of aircraft shunt APF. The waveforms observed after the application of filter will be having the harmonics within the limits and the power quality will be improved.

  19. Dynamic functional brain networks involved in simple visual discrimination learning.

    Science.gov (United States)

    Fidalgo, Camino; Conejo, Nélida María; González-Pardo, Héctor; Arias, Jorge Luis

    2014-10-01

    Visual discrimination tasks have been widely used to evaluate many types of learning and memory processes. However, little is known about the brain regions involved at different stages of visual discrimination learning. We used cytochrome c oxidase histochemistry to evaluate changes in regional brain oxidative metabolism during visual discrimination learning in a water-T maze at different time points during training. As compared with control groups, the results of the present study reveal the gradual activation of cortical (prefrontal and temporal cortices) and subcortical brain regions (including the striatum and the hippocampus) associated to the mastery of a simple visual discrimination task. On the other hand, the brain regions involved and their functional interactions changed progressively over days of training. Regions associated with novelty, emotion, visuo-spatial orientation and motor aspects of the behavioral task seem to be relevant during the earlier phase of training, whereas a brain network comprising the prefrontal cortex was found along the whole learning process. This study highlights the relevance of functional interactions among brain regions to investigate learning and memory processes. Copyright © 2014 Elsevier Inc. All rights reserved.

  20. Adaptive function projective synchronization of two-cell Quantum-CNN chaotic oscillators with uncertain parameters

    International Nuclear Information System (INIS)

    Sudheer, K. Sebastian; Sabir, M.

    2009-01-01

    This work investigates function projective synchronization of two-cell Quantum-CNN chaotic oscillators using adaptive method. Quantum-CNN oscillators produce nano scale chaotic oscillations under certain conditions. By Lyapunove stability theory, the adaptive control law and the parameter update law are derived to make the state of two chaotic systems function projective synchronized. Numerical simulations are presented to demonstrate the effectiveness of the proposed adaptive controllers.

  1. Machine Learning Approaches for Clinical Psychology and Psychiatry.

    Science.gov (United States)

    Dwyer, Dominic B; Falkai, Peter; Koutsouleris, Nikolaos

    2018-05-07

    Machine learning approaches for clinical psychology and psychiatry explicitly focus on learning statistical functions from multidimensional data sets to make generalizable predictions about individuals. The goal of this review is to provide an accessible understanding of why this approach is important for future practice given its potential to augment decisions associated with the diagnosis, prognosis, and treatment of people suffering from mental illness using clinical and biological data. To this end, the limitations of current statistical paradigms in mental health research are critiqued, and an introduction is provided to critical machine learning methods used in clinical studies. A selective literature review is then presented aiming to reinforce the usefulness of machine learning methods and provide evidence of their potential. In the context of promising initial results, the current limitations of machine learning approaches are addressed, and considerations for future clinical translation are outlined.

  2. Parallel Alterations of Functional Connectivity during Execution and Imagination after Motor Imagery Learning

    Science.gov (United States)

    Zhang, Rushao; Hui, Mingqi; Long, Zhiying; Zhao, Xiaojie; Yao, Li

    2012-01-01

    Background Neural substrates underlying motor learning have been widely investigated with neuroimaging technologies. Investigations have illustrated the critical regions of motor learning and further revealed parallel alterations of functional activation during imagination and execution after learning. However, little is known about the functional connectivity associated with motor learning, especially motor imagery learning, although benefits from functional connectivity analysis attract more attention to the related explorations. We explored whether motor imagery (MI) and motor execution (ME) shared parallel alterations of functional connectivity after MI learning. Methodology/Principal Findings Graph theory analysis, which is widely used in functional connectivity exploration, was performed on the functional magnetic resonance imaging (fMRI) data of MI and ME tasks before and after 14 days of consecutive MI learning. The control group had no learning. Two measures, connectivity degree and interregional connectivity, were calculated and further assessed at a statistical level. Two interesting results were obtained: (1) The connectivity degree of the right posterior parietal lobe decreased in both MI and ME tasks after MI learning in the experimental group; (2) The parallel alterations of interregional connectivity related to the right posterior parietal lobe occurred in the supplementary motor area for both tasks. Conclusions/Significance These computational results may provide the following insights: (1) The establishment of motor schema through MI learning may induce the significant decrease of connectivity degree in the posterior parietal lobe; (2) The decreased interregional connectivity between the supplementary motor area and the right posterior parietal lobe in post-test implicates the dissociation between motor learning and task performing. These findings and explanations further revealed the neural substrates underpinning MI learning and supported that

  3. Enriching behavioral ecology with reinforcement learning methods.

    Science.gov (United States)

    Frankenhuis, Willem E; Panchanathan, Karthik; Barto, Andrew G

    2018-02-13

    This article focuses on the division of labor between evolution and development in solving sequential, state-dependent decision problems. Currently, behavioral ecologists tend to use dynamic programming methods to study such problems. These methods are successful at predicting animal behavior in a variety of contexts. However, they depend on a distinct set of assumptions. Here, we argue that behavioral ecology will benefit from drawing more than it currently does on a complementary collection of tools, called reinforcement learning methods. These methods allow for the study of behavior in highly complex environments, which conventional dynamic programming methods do not feasibly address. In addition, reinforcement learning methods are well-suited to studying how biological mechanisms solve developmental and learning problems. For instance, we can use them to study simple rules that perform well in complex environments. Or to investigate under what conditions natural selection favors fixed, non-plastic traits (which do not vary across individuals), cue-driven-switch plasticity (innate instructions for adaptive behavioral development based on experience), or developmental selection (the incremental acquisition of adaptive behavior based on experience). If natural selection favors developmental selection, which includes learning from environmental feedback, we can also make predictions about the design of reward systems. Our paper is written in an accessible manner and for a broad audience, though we believe some novel insights can be drawn from our discussion. We hope our paper will help advance the emerging bridge connecting the fields of behavioral ecology and reinforcement learning. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  4. Cognitive Functions, Personality Traits, and Social Values in Heavy Marihuana Smokers and Nonsmoker Controls

    Science.gov (United States)

    Weckowicz, Thaddeus E.; Janssen, Doug V.

    1973-01-01

    To determine the effect of chronic marihuana smoking on cognitive functions, personality traits, and social values, a group of heavy marihuana smokers was compared with a matched control group. (Author)

  5. Machine Learning Estimation of Atom Condensed Fukui Functions.

    Science.gov (United States)

    Zhang, Qingyou; Zheng, Fangfang; Zhao, Tanfeng; Qu, Xiaohui; Aires-de-Sousa, João

    2016-02-01

    To enable the fast estimation of atom condensed Fukui functions, machine learning algorithms were trained with databases of DFT pre-calculated values for ca. 23,000 atoms in organic molecules. The problem was approached as the ranking of atom types with the Bradley-Terry (BT) model, and as the regression of the Fukui function. Random Forests (RF) were trained to predict the condensed Fukui function, to rank atoms in a molecule, and to classify atoms as high/low Fukui function. Atomic descriptors were based on counts of atom types in spheres around the kernel atom. The BT coefficients assigned to atom types enabled the identification (93-94 % accuracy) of the atom with the highest Fukui function in pairs of atoms in the same molecule with differences ≥0.1. In whole molecules, the atom with the top Fukui function could be recognized in ca. 50 % of the cases and, on the average, about 3 of the top 4 atoms could be recognized in a shortlist of 4. Regression RF yielded predictions for test sets with R(2) =0.68-0.69, improving the ability of BT coefficients to rank atoms in a molecule. Atom classification (as high/low Fukui function) was obtained with RF with sensitivity of 55-61 % and specificity of 94-95 %. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Actuarial values calculated using the incomplete Gamma function

    Directory of Open Access Journals (Sweden)

    Giovanni Mingari Scarpello

    2013-03-01

    Full Text Available The complete expectation-of-life for a person and the actuarial present value of continuous life annuities are defined by integrals. In all of them at least one of the factors is a survival function value ratio. If de Moivre’s law of mortality is chosen, such integrals can easily be evaluated; but if the Makeham survival function is adopted, they are used to be calculated numerically. For the above actuarial figures, closed form integrations are hereafter provided by means of the incomplete Gamma function.

  7. Defining mental disorder. Exploring the 'natural function' approach.

    Science.gov (United States)

    Varga, Somogy

    2011-01-21

    Due to several socio-political factors, to many psychiatrists only a strictly objective definition of mental disorder, free of value components, seems really acceptable. In this paper, I will explore a variant of such an objectivist approach to defining metal disorder, natural function objectivism. Proponents of this approach make recourse to the notion of natural function in order to reach a value-free definition of mental disorder. The exploration of Christopher Boorse's 'biostatistical' account of natural function (1) will be followed an investigation of the 'hybrid naturalism' approach to natural functions by Jerome Wakefield (2). In the third part, I will explore two proposals that call into question the whole attempt to define mental disorder (3). I will conclude that while 'natural function objectivism' accounts fail to provide the backdrop for a reliable definition of mental disorder, there is no compelling reason to conclude that a definition cannot be achieved.

  8. Design of Adaptive Policy Pathways under Deep Uncertainties

    Science.gov (United States)

    Babovic, Vladan

    2013-04-01

    The design of large-scale engineering and infrastructural systems today is growing in complexity. Designers need to consider sociotechnical uncertainties, intricacies, and processes in the long- term strategic deployment and operations of these systems. In this context, water and spatial management is increasingly challenged not only by climate-associated changes such as sea level rise and increased spatio-temporal variability of precipitation, but also by pressures due to population growth and particularly accelerating rate of urbanisation. Furthermore, high investment costs and long term-nature of water-related infrastructure projects requires long-term planning perspective, sometimes extending over many decades. Adaptation to such changes is not only determined by what is known or anticipated at present, but also by what will be experienced and learned as the future unfolds, as well as by policy responses to social and water events. As a result, a pathway emerges. Instead of responding to 'surprises' and making decisions on ad hoc basis, exploring adaptation pathways into the future provide indispensable support in water management decision-making. In this contribution, a structured approach for designing a dynamic adaptive policy based on the concepts of adaptive policy making and adaptation pathways is introduced. Such an approach provides flexibility which allows change over time in response to how the future unfolds, what is learned about the system, and changes in societal preferences. The introduced flexibility provides means for dealing with complexities of adaptation under deep uncertainties. It enables engineering systems to change in the face of uncertainty to reduce impacts from downside scenarios while capitalizing on upside opportunities. This contribution presents comprehensive framework for development and deployment of adaptive policy pathway framework, and demonstrates its performance under deep uncertainties on a case study related to urban

  9. Optimization and control of a continuous stirred tank fermenter using learning system

    Energy Technology Data Exchange (ETDEWEB)

    Thibault, J [Dept. of Chemical Engineering, Laval Univ., Quebec City, PQ (Canada); Najim, K [CNRS, URA 192, GRECO SARTA, Ecole Nationale Superieure d' Ingenieurs de Genie Chimique, 31 - Toulouse (France)

    1993-05-01

    A variable structure learning automaton is used as an optimization and control of a continuous stirred tank fermenter. The alogrithm requires no modelling of the process. The use of appropriate learning rules enables to locate the optimum dilution rate in order to maximize an objective cost function. It is shown that a hierarchical structure of automata can adapt to environmental changes and can also modify efficiently the domain of variation of the control variable in order to encompass the optimum value. (orig.)

  10. Social learning pathways in the relation between parental chronic pain and daily pain severity and functional impairment in adolescents with functional abdominal pain.

    Science.gov (United States)

    Stone, Amanda L; Bruehl, Stephen; Smith, Craig A; Garber, Judy; Walker, Lynn S

    2017-10-06

    Having a parent with chronic pain (CP) may confer greater risk for persistence of CP from childhood into young adulthood. Social learning, such as parental modeling and reinforcement, represents one plausible mechanism for the transmission of risk for CP from parents to offspring. Based on a 7-day pain diary in 154 pediatric patients with functional abdominal CP, we tested a model in which parental CP predicted adolescents' daily average CP severity and functional impairment (distal outcomes) via parental modeling of pain behaviors and parental reinforcement of adolescent's pain behaviors (mediators) and adolescents' cognitive appraisals of pain threat (proximal outcome representing adolescents' encoding of parents' behaviors). Results indicated significant indirect pathways from parental CP status to adolescent average daily pain severity (b = 0.18, SE = 0.08, 95% CI: 0.04, 0.31, p = 0.03) and functional impairment (b = 0.08, SE = 0.04, 95% CI: 0.02, 0.15, p = 0.03) over the 7-day diary period via adolescents' observations of parent pain behaviors and adolescent pain threat appraisal. The indirect pathway through parental reinforcing responses to adolescents' pain did not reach significance for either adolescent pain severity or functional impairment. Identifying mechanisms of increased risk for pain and functional impairment in children of parents with CP ultimately could lead to targeted interventions aimed at improving functioning and quality of life in families with chronic pain. Parental modeling of pain behaviors represents a potentially promising target for family based interventions to ameliorate pediatric chronic pain.

  11. A novel multi-agent decentralized win or learn fast policy hill-climbing with eligibility trace algorithm for smart generation control of interconnected complex power grids

    International Nuclear Information System (INIS)

    Xi, Lei; Yu, Tao; Yang, Bo; Zhang, Xiaoshun

    2015-01-01

    Highlights: • Proposing a decentralized smart generation control scheme for the automatic generation control coordination. • A novel multi-agent learning algorithm is developed to resolve stochastic control problems in power systems. • A variable learning rate are introduced base on the framework of stochastic games. • A simulation platform is developed to test the performance of different algorithms. - Abstract: This paper proposes a multi-agent smart generation control scheme for the automatic generation control coordination in interconnected complex power systems. A novel multi-agent decentralized win or learn fast policy hill-climbing with eligibility trace algorithm is developed, which can effectively identify the optimal average policies via a variable learning rate under various operation conditions. Based on control performance standards, the proposed approach is implemented in a flexible multi-agent stochastic dynamic game-based smart generation control simulation platform. Based on the mixed strategy and average policy, it is highly adaptive in stochastic non-Markov environments and large time-delay systems, which can fulfill automatic generation control coordination in interconnected complex power systems in the presence of increasing penetration of decentralized renewable energy. Two case studies on both a two-area load–frequency control power system and the China Southern Power Grid model have been done. Simulation results verify that multi-agent smart generation control scheme based on the proposed approach can obtain optimal average policies thus improve the closed-loop system performances, and can achieve a fast convergence rate with significant robustness compared with other methods

  12. Outcome indicators for the evaluation of energy policy instruments and technical change

    International Nuclear Information System (INIS)

    Neij, Lena; Astrand, Kerstin

    2006-01-01

    The aim of this paper is to propose a framework for the evaluation of policy instruments designed to affect development and dissemination of new energy technologies. The evaluation approach is based on the analysis of selected outcome indicators describing the process of technical change, i.e. the development and dissemination of new energy technologies, on the basis of a socio-technical systems approach. The outcome indicators are used to analyse the effect, in terms of outcome, and outcome scope of the policy instruments as well as the extent to which the policy instruments support diversity, learning and institutional change. The analysis of two cases of evaluations, of energy efficiency policy and wind energy policy in Sweden, shows that the approach has several advantages, allowing continuous evaluation and providing important information for the redesign of policy instruments. There are also disadvantages associated with the approach, such as complexity, possible high cost and the requirement of qualified evaluators. Nevertheless, it is concluded that the information on the continuous performance of different policy instruments and their effects on the introduction and dissemination of new energy technologies, provided by this evaluation approach, is essential for an improved adaptation and implementation of energy and climate policy

  13. Adaptive nodes enrich nonlinear cooperative learning beyond traditional adaptation by links.

    Science.gov (United States)

    Sardi, Shira; Vardi, Roni; Goldental, Amir; Sheinin, Anton; Uzan, Herut; Kanter, Ido

    2018-03-23

    Physical models typically assume time-independent interactions, whereas neural networks and machine learning incorporate interactions that function as adjustable parameters. Here we demonstrate a new type of abundant cooperative nonlinear dynamics where learning is attributed solely to the nodes, instead of the network links which their number is significantly larger. The nodal, neuronal, fast adaptation follows its relative anisotropic (dendritic) input timings, as indicated experimentally, similarly to the slow learning mechanism currently attributed to the links, synapses. It represents a non-local learning rule, where effectively many incoming links to a node concurrently undergo the same adaptation. The network dynamics is now counterintuitively governed by the weak links, which previously were assumed to be insignificant. This cooperative nonlinear dynamic adaptation presents a self-controlled mechanism to prevent divergence or vanishing of the learning parameters, as opposed to learning by links, and also supports self-oscillations of the effective learning parameters. It hints on a hierarchical computational complexity of nodes, following their number of anisotropic inputs and opens new horizons for advanced deep learning algorithms and artificial intelligence based applications, as well as a new mechanism for enhanced and fast learning by neural networks.

  14. Medical interpreters as tools: dangers and challenges in the utilitarian approach to interpreters' roles and functions.

    Science.gov (United States)

    Hsieh, Elaine; Kramer, Eric Mark

    2012-10-01

    This study explores the tensions, challenges, and dangers when a utilitarian view of interpreter is constructed, imposed, and/or reinforced in health care settings. We conducted in-depth interviews and focus groups with 26 medical interpreters from 17 different languages and cultures and 39 providers of five specialties. Grounded theory was used for data analysis. The utilitarian view to interpreters' roles and functions influences providers in the following areas: (a) hierarchical structure and unidirectional communication, (b) the interpreter seen as information gatekeeper, (c) the interpreter seen as provider proxy, and (d) interpreter's emotional support perceived as tools. When interpreters are viewed as passive instruments, a utilitarian approach may compromise the quality of care by silencing patients' and interpreters' voice, objectifying interpreters' emotional work, and exploiting patients' needs. Providers need to recognize that a utilitarian approach to the interpreter's role and functions may create interpersonal and ethical dilemmas that compromise the quality of care. By viewing interpreters as smart technology (rather than passive instruments), both providers and interpreters can learn from and co-evolve with each other, allowing them to maintain control over their expertise and to work as collaborators in providing quality care. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  15. Functional vs. Traditional Analysis in Biomechanical Gait Data: An Alternative Statistical Approach.

    Science.gov (United States)

    Park, Jihong; Seeley, Matthew K; Francom, Devin; Reese, C Shane; Hopkins, J Ty

    2017-12-01

    In human motion studies, discrete points such as peak or average kinematic values are commonly selected to test hypotheses. The purpose of this study was to describe a functional data analysis and describe the advantages of using functional data analyses when compared with a traditional analysis of variance (ANOVA) approach. Nineteen healthy participants (age: 22 ± 2 yrs, body height: 1.7 ± 0.1 m, body mass: 73 ± 16 kg) walked under two different conditions: control and pain+effusion. Pain+effusion was induced by injection of sterile saline into the joint capsule and hypertonic saline into the infrapatellar fat pad. Sagittal-plane ankle, knee, and hip joint kinematics were recorded and compared following injections using 2×2 mixed model ANOVAs and FANOVAs. The results of ANOVAs detected a condition × time interaction for the peak ankle (F1,18 = 8.56, p = 0.01) and hip joint angle (F1,18 = 5.77, p = 0.03), but did not for the knee joint angle (F1,18 = 0.36, p = 0.56). The functional data analysis, however, found several differences at initial contact (ankle and knee joint), in the mid-stance (each joint) and at toe off (ankle). Although a traditional ANOVA is often appropriate for discrete or summary data, in biomechanical applications, the functional data analysis could be a beneficial alternative. When using the functional data analysis approach, a researcher can (1) evaluate the entire data as a function, and (2) detect the location and magnitude of differences within the evaluated function.

  16. Functional vs. Traditional Analysis in Biomechanical Gait Data: An Alternative Statistical Approach

    Directory of Open Access Journals (Sweden)

    Park Jihong

    2017-12-01

    Full Text Available In human motion studies, discrete points such as peak or average kinematic values are commonly selected to test hypotheses. The purpose of this study was to describe a functional data analysis and describe the advantages of using functional data analyses when compared with a traditional analysis of variance (ANOVA approach. Nineteen healthy participants (age: 22 ± 2 yrs, body height: 1.7 ± 0.1 m, body mass: 73 ± 16 kg walked under two different conditions: control and pain+effusion. Pain+effusion was induced by injection of sterile saline into the joint capsule and hypertonic saline into the infrapatellar fat pad. Sagittal-plane ankle, knee, and hip joint kinematics were recorded and compared following injections using 2×2 mixed model ANOVAs and FANOVAs. The results of ANOVAs detected a condition × time interaction for the peak ankle (F1,18 = 8.56, p = 0.01 and hip joint angle (F1,18 = 5.77, p = 0.03, but did not for the knee joint angle (F1,18 = 0.36, p = 0.56. The functional data analysis, however, found several differences at initial contact (ankle and knee joint, in the mid-stance (each joint and at toe off (ankle. Although a traditional ANOVA is often appropriate for discrete or summary data, in biomechanical applications, the functional data analysis could be a beneficial alternative. When using the functional data analysis approach, a researcher can (1 evaluate the entire data as a function, and (2 detect the location and magnitude of differences within the evaluated function.

  17. Short-term perceptual learning in visual conjunction search.

    Science.gov (United States)

    Su, Yuling; Lai, Yunpeng; Huang, Wanyi; Tan, Wei; Qu, Zhe; Ding, Yulong

    2014-08-01

    Although some studies showed that training can improve the ability of cross-dimension conjunction search, less is known about the underlying mechanism. Specifically, it remains unclear whether training of visual conjunction search can successfully bind different features of separated dimensions into a new function unit at early stages of visual processing. In the present study, we utilized stimulus specificity and generalization to provide a new approach to investigate the mechanisms underlying perceptual learning (PL) in visual conjunction search. Five experiments consistently showed that after 40 to 50 min of training of color-shape/orientation conjunction search, the ability to search for a certain conjunction target improved significantly and the learning effects did not transfer to a new target that differed from the trained target in both color and shape/orientation features. However, the learning effects were not strictly specific. In color-shape conjunction search, although the learning effect could not transfer to a same-shape different-color target, it almost completely transferred to a same-color different-shape target. In color-orientation conjunction search, the learning effect partly transferred to a new target that shared same color or same orientation with the trained target. Moreover, the sum of transfer effects for the same color target and the same orientation target in color-orientation conjunction search was algebraically equivalent to the learning effect for trained target, showing an additive transfer effect. The different transfer patterns in color-shape and color-orientation conjunction search learning might reflect the different complexity and discriminability between feature dimensions. These results suggested a feature-based attention enhancement mechanism rather than a unitization mechanism underlying the short-term PL of color-shape/orientation conjunction search.

  18. Self-learning fuzzy logic controllers based on reinforcement

    International Nuclear Information System (INIS)

    Wang, Z.; Shao, S.; Ding, J.

    1996-01-01

    This paper proposes a new method for learning and tuning Fuzzy Logic Controllers. The self-learning scheme in this paper is composed of Bucket-Brigade and Genetic Algorithm. The proposed method is tested on the cart-pole system. Simulation results show that our approach has good learning and control performance

  19. Construction of multi-agent mobile robots control system in the problem of persecution with using a modified reinforcement learning method based on neural networks

    Science.gov (United States)

    Patkin, M. L.; Rogachev, G. N.

    2018-02-01

    A method for constructing a multi-agent control system for mobile robots based on training with reinforcement using deep neural networks is considered. Synthesis of the management system is proposed to be carried out with reinforcement training and the modified Actor-Critic method, in which the Actor module is divided into Action Actor and Communication Actor in order to simultaneously manage mobile robots and communicate with partners. Communication is carried out by sending partners at each step a vector of real numbers that are added to the observation vector and affect the behaviour. Functions of Actors and Critic are approximated by deep neural networks. The Critics value function is trained by using the TD-error method and the Actor’s function by using DDPG. The Communication Actor’s neural network is trained through gradients received from partner agents. An environment in which a cooperative multi-agent interaction is present was developed, computer simulation of the application of this method in the control problem of two robots pursuing two goals was carried out.

  20. An Evaluation Framework for Obesity Prevention Policy Interventions

    Science.gov (United States)

    Sommers, Janice; Vu, Maihan; Jernigan, Jan; Payne, Gayle; Thompson, Diane; Heiser, Claire; Farris, Rosanne; Ammerman, Alice

    2012-01-01

    As the emphasis on preventing obesity has grown, so have calls for interventions that extend beyond individual behaviors and address changes in environments and policies. Despite the need for policy action, little is known about policy approaches that are most effective at preventing obesity. The Centers for Disease Control and Prevention (CDC) and others are funding the implementation and evaluation of new obesity prevention policies, presenting a distinct opportunity to learn from these practice-based initiatives and build the body of evidence-based approaches. However, contributions from this policy activity are limited by the incomplete and inconsistent evaluation data collected on policy processes and outcomes. We present a framework developed by the CDC-funded Center of Excellence for Training and Research Translation that public health practitioners can use to evaluate policy interventions and identify the practice-based evidence needed to fill the gaps in effective policy approaches to obesity prevention. PMID:22742594

  1. Adaptive order search and tangent-weighted trade-off for motion estimation in H.264

    Directory of Open Access Journals (Sweden)

    Srinivas Bachu

    2018-04-01

    Full Text Available Motion estimation and compensation play a major role in video compression to reduce the temporal redundancies of the input videos. A variety of block search patterns have been developed for matching the blocks with reduced computational complexity, without affecting the visual quality. In this paper, block motion estimation is achieved through integrating the square as well as the hexagonal search patterns with adaptive order. The proposed algorithm is called, AOSH (Adaptive Order Square Hexagonal Search algorithm, and it finds the best matching block with a reduced number of search points. The searching function is formulated as a trade-off criterion here. Hence, the tangent-weighted function is newly developed to evaluate the matching point. The proposed AOSH search algorithm and the tangent-weighted trade-off criterion are effectively applied to the block estimation process to enhance the visual quality and the compression performance. The proposed method is validated using three videos namely, football, garden and tennis. The quantitative performance of the proposed method and the existing methods is analysed using the Structural SImilarity Index (SSIM and the Peak Signal to Noise Ratio (PSNR. The results prove that the proposed method offers good visual quality than the existing methods. Keywords: Block motion estimation, Square search, Hexagon search, H.264, Video coding

  2. Generalised Adaptive Harmony Search: A Comparative Analysis of Modern Harmony Search

    Directory of Open Access Journals (Sweden)

    Jaco Fourie

    2013-01-01

    Full Text Available Harmony search (HS was introduced in 2001 as a heuristic population-based optimisation algorithm. Since then HS has become a popular alternative to other heuristic algorithms like simulated annealing and particle swarm optimisation. However, some flaws, like the need for parameter tuning, were identified and have been a topic of study for much research over the last 10 years. Many variants of HS were developed to address some of these flaws, and most of them have made substantial improvements. In this paper we compare the performance of three recent HS variants: exploratory harmony search, self-adaptive harmony search, and dynamic local-best harmony search. We compare the accuracy of these algorithms, using a set of well-known optimisation benchmark functions that include both unimodal and multimodal problems. Observations from this comparison led us to design a novel hybrid that combines the best attributes of these modern variants into a single optimiser called generalised adaptive harmony search.

  3. Adaptive Policies for Reducing Inequalities in the Social Determinants of Health

    Directory of Open Access Journals (Sweden)

    Gemma Carey

    2015-11-01

    Full Text Available Inequalities in the social determinants of health (SDH, which drive avoidable health disparities between different individuals or groups, is a major concern for a number of international organisations, including the World Health Organization (WHO. Despite this, the pathways to changing inequalities in the SDH remain elusive. The methodologies and concepts within system science are now viewed as important domains of knowledge, ideas and skills for tackling issues of inequality, which are increasingly understood as emergent properties of complex systems. In this paper, we introduce and expand the concept of adaptive policies to reduce inequalities in the distribution of the SDH. The concept of adaptive policy for health equity was developed through reviewing the literature on learning and adaptive policies. Using a series of illustrative examples from education and poverty alleviation, which have their basis in real world policies, we demonstrate how an adaptive policy approach is more suited to the management of the emergent properties of inequalities in the SDH than traditional policy approaches. This is because they are better placed to handle future uncertainties. Our intention is that these examples are illustrative, rather than prescriptive, and serve to create a conversation regarding appropriate adaptive policies for progressing policy action on the SDH.

  4. Mechanical Behavior of Nanostructured Hybrids Based on Poly(Vinyl Alcohol/Bioactive Glass Reinforced with Functionalized Carbon Nanotubes

    Directory of Open Access Journals (Sweden)

    H. S. Mansur

    2012-01-01

    Full Text Available This study reports the synthesis and characterization of novel tridimensional porous hybrids based on PVA combined with bioactive glass and reinforced by chemically functionalized carbon nanotubes (CNT for potential use in bone tissue engineering. The functionalization of CNT was performed by introducing carboxylic groups in multiwall nanotubes. This process aimed at enhancing the affinity of CNTs with the water-soluble PVA polymer derived by the hydrogen bonds formed among alcohol (PVA and carboxylic groups (CNT–COOH. In the sequence, the CNT–COOH (0.25 wt% were used as the nanostructure modifier for the hybrid system based on PVA associated with the bioactive glass (BaG. The mechanical properties of the nanostructured hybrids reinforced with CNT–COOH were evaluated by axial compression tests, and they were compared to reference hybrid. The averaged yield stresses of macroporous hybrids were (2.3 ± 0.9 and (4.4 ± 1.0 MPa for the reference and the CNT reinforced materials, respectively. Moreover, yield strain and Young's modulus were significantly enhanced by about 30% for the CNT–COOH hybrids. Hence, as far as the mechanical properties are concerned, the results have clearly showed the feasibility of utilizing these new hybrids reinforced with functionalized CNT in repairing cancellous bone tissues.

  5. Defining mental disorder. Exploring the 'natural function' approach

    Directory of Open Access Journals (Sweden)

    Varga Somogy

    2011-01-01

    Full Text Available Abstract Due to several socio-political factors, to many psychiatrists only a strictly objective definition of mental disorder, free of value components, seems really acceptable. In this paper, I will explore a variant of such an objectivist approach to defining metal disorder, natural function objectivism. Proponents of this approach make recourse to the notion of natural function in order to reach a value-free definition of mental disorder. The exploration of Christopher Boorse's 'biostatistical' account of natural function (1 will be followed an investigation of the 'hybrid naturalism' approach to natural functions by Jerome Wakefield (2. In the third part, I will explore two proposals that call into question the whole attempt to define mental disorder (3. I will conclude that while 'natural function objectivism' accounts fail to provide the backdrop for a reliable definition of mental disorder, there is no compelling reason to conclude that a definition cannot be achieved.

  6. Nonparametric bayesian reward segmentation for skill discovery using inverse reinforcement learning

    CSIR Research Space (South Africa)

    Ranchod, P

    2015-10-01

    Full Text Available We present a method for segmenting a set of unstructured demonstration trajectories to discover reusable skills using inverse reinforcement learning (IRL). Each skill is characterised by a latent reward function which the demonstrator is assumed...

  7. On the Communication Complexity of Secure Function Evaluation with Long Output

    DEFF Research Database (Denmark)

    Hubacek, Pavel; Wichs, Daniel

    2015-01-01

    We study the communication complexity of secure function evaluation (SFE). Consider a setting where Alice has a short input χA, Bob has an input χB and we want Bob to learn some function y = f(χA, χB) with large output size. For example, Alice has a small secret decryption key, Bob has a large...... value. Moreover, we show that even in an offline/online protocol, the communication of the online phase must have output-size dependence. This negative result uses an incompressibility argument and it generalizes several recent lower bounds for functional encryption and (reusable) garbled circuits...

  8. Quality of life, psychological adjustment, and adaptive functioning of patients with intoxication-type inborn errors of metabolism - a systematic review.

    Science.gov (United States)

    Zeltner, Nina A; Huemer, Martina; Baumgartner, Matthias R; Landolt, Markus A

    2014-10-25

    their methodological approaches, assessment instruments and norm populations. A disease-specific standard assessment procedure for HrQoL is not available. Psychosocial risk factors for HrQoL, psychological adjustment, or adaptive functioning have not been investigated. Considering psychosocial variables and their corresponding risk factors for IT-IEM would allow evaluation of outcomes and treatments as well as the planning of effective social and psychological interventions to enhance the patients' HrQoL.

  9. Effect of Play-based Therapy on Meta-cognitive and Behavioral Aspects of Executive Function: A Randomized, Controlled, Clinical Trial on the Students With Learning Disabilities.

    Science.gov (United States)

    Karamali Esmaili, Samaneh; Shafaroodi, Narges; Hassani Mehraban, Afsoon; Parand, Akram; Zarei, Masoume; Akbari-Zardkhaneh, Saeed

    2017-01-01

    Although the effect of educational methods on executive function (EF) is well known, training this function by a playful method is debatable. The current study aimed at investigating if a play-based intervention is effective on metacognitive and behavioral skills of EF in students with specific learning disabilities. In the current randomized, clinical trial, 49 subjects within the age range of 7 to 11 years with specific learning disabilities were randomly assigned into the intervention (25 subjects; mean age 8.5±1.33 years) and control (24 subjects; mean age 8.7±1.03 years) groups. Subjects in the intervention group received EF group training based on playing activities; subjects in the control group received no intervention. The behavior rating inventory of executive function (BRIEF) was administered to evaluate the behavioral and cognitive aspects of EF. The duration of the intervention was 6 hours per week for 9 weeks. Multivariate analysis of covariance was used to compare mean changes (before and after) in the BRIEF scores between the groups. The assumptions of multivariate analysis of covariance were examined. After controlling pre-test conditions, the intervention and control groups scored significantly differently on both the metacognition (P=0.002; effect size=0.20) and behavior regulation indices (P=0.01; effect size=0.12) of BRIEF. Play-based therapy is effective on the metacognitive and behavioral aspects of EF in students with specific learning disabilities. Professionals can use play-based therapy rather than educational approaches in clinical practice to enhance EF skills.

  10. Preference learning with evolutionary Multivariate Adaptive Regression Spline model

    DEFF Research Database (Denmark)

    Abou-Zleikha, Mohamed; Shaker, Noor; Christensen, Mads Græsbøll

    2015-01-01

    This paper introduces a novel approach for pairwise preference learning through combining an evolutionary method with Multivariate Adaptive Regression Spline (MARS). Collecting users' feedback through pairwise preferences is recommended over other ranking approaches as this method is more appealing...... for function approximation as well as being relatively easy to interpret. MARS models are evolved based on their efficiency in learning pairwise data. The method is tested on two datasets that collectively provide pairwise preference data of five cognitive states expressed by users. The method is analysed...

  11. Functional MRI mapping of visual function and selective attention for performance assessment and presurgical planning using conjunctive visual search.

    Science.gov (United States)

    Parker, Jason G; Zalusky, Eric J; Kirbas, Cemil

    2014-03-01

    Accurate mapping of visual function and selective attention using fMRI is important in the study of human performance as well as in presurgical treatment planning of lesions in or near visual centers of the brain. Conjunctive visual search (CVS) is a useful tool for mapping visual function during fMRI because of its greater activation extent compared with high-capacity parallel search processes. The purpose of this work was to develop and evaluate a CVS that was capable of generating consistent activation in the basic and higher level visual areas of the brain by using a high number of distractors as well as an optimized contrast condition. Images from 10 healthy volunteers were analyzed and brain regions of greatest activation and deactivation were determined using a nonbiased decomposition of the results at the hemisphere, lobe, and gyrus levels. The results were quantified in terms of activation and deactivation extent and mean z-statistic. The proposed CVS was found to generate robust activation of the occipital lobe, as well as regions in the middle frontal gyrus associated with coordinating eye movements and in regions of the insula associated with task-level control and focal attention. As expected, the task demonstrated deactivation patterns commonly implicated in the default-mode network. Further deactivation was noted in the posterior region of the cerebellum, most likely associated with the formation of optimal search strategy. We believe the task will be useful in studies of visual and selective attention in the neuroscience community as well as in mapping visual function in clinical fMRI.

  12. Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters.

    Science.gov (United States)

    Khamassi, Mehdi; Enel, Pierre; Dominey, Peter Ford; Procyk, Emmanuel

    2013-01-01

    Converging evidence suggest that the medial prefrontal cortex (MPFC) is involved in feedback categorization, performance monitoring, and task monitoring, and may contribute to the online regulation of reinforcement learning (RL) parameters that would affect decision-making processes in the lateral prefrontal cortex (LPFC). Previous neurophysiological experiments have shown MPFC activities encoding error likelihood, uncertainty, reward volatility, as well as neural responses categorizing different types of feedback, for instance, distinguishing between choice errors and execution errors. Rushworth and colleagues have proposed that the involvement of MPFC in tracking the volatility of the task could contribute to the regulation of one of RL parameters called the learning rate. We extend this hypothesis by proposing that MPFC could contribute to the regulation of other RL parameters such as the exploration rate and default action values in case of task shifts. Here, we analyze the sensitivity to RL parameters of behavioral performance in two monkey decision-making tasks, one with a deterministic reward schedule and the other with a stochastic one. We show that there exist optimal parameter values specific to each of these tasks, that need to be found for optimal performance and that are usually hand-tuned in computational models. In contrast, automatic online regulation of these parameters using some heuristics can help producing a good, although non-optimal, behavioral performance in each task. We finally describe our computational model of MPFC-LPFC interaction used for online regulation of the exploration rate and its application to a human-robot interaction scenario. There, unexpected uncertainties are produced by the human introducing cued task changes or by cheating. The model enables the robot to autonomously learn to reset exploration in response to such uncertain cues and events. The combined results provide concrete evidence specifying how prefrontal

  13. Place preference and vocal learning rely on distinct reinforcers in songbirds.

    Science.gov (United States)

    Murdoch, Don; Chen, Ruidong; Goldberg, Jesse H

    2018-04-30

    In reinforcement learning (RL) agents are typically tasked with maximizing a single objective function such as reward. But it remains poorly understood how agents might pursue distinct objectives at once. In machines, multiobjective RL can be achieved by dividing a single agent into multiple sub-agents, each of which is shaped by agent-specific reinforcement, but it remains unknown if animals adopt this strategy. Here we use songbirds to test if navigation and singing, two behaviors with distinct objectives, can be differentially reinforced. We demonstrate that strobe flashes aversively condition place preference but not song syllables. Brief noise bursts aversively condition song syllables but positively reinforce place preference. Thus distinct behavior-generating systems, or agencies, within a single animal can be shaped by correspondingly distinct reinforcement signals. Our findings suggest that spatially segregated vocal circuits can solve a credit assignment problem associated with multiobjective learning.

  14. Age-related normal structural and functional ventricular values in cardiac function assessed by magnetic resonance

    International Nuclear Information System (INIS)

    Fiechter, Michael; Gaemperli, Oliver; Kaufmann, Philipp A; Fuchs, Tobias A; Gebhard, Catherine; Stehli, Julia; Klaeser, Bernd; Stähli, Barbara E; Manka, Robert; Manes, Costantina; Tanner, Felix C

    2013-01-01

    The heart is subject to structural and functional changes with advancing age. However, the magnitude of cardiac age-dependent transformation has not been conclusively elucidated. This retrospective cardiac magnetic resonance (CMR) study included 183 subjects with normal structural and functional ventricular values. End systolic volume (ESV), end diastolic volume (EDV), and ejection fraction (EF) were obtained from the left and the right ventricle in breath-hold cine CMR. Patients were classified into four age groups (20–29, 30–49, 50–69, and ≥70 years) and cardiac measurements were compared using Pearson’s rank correlation over the four different groups. With advanced age a slight but significant decrease in ESV (r=−0.41 for both ventricles, P<0.001) and EDV (r=−0.39 for left ventricle, r=−0.35 for right ventricle, P<0.001) were observed associated with a significant increase in left (r=0.28, P<0.001) and right (r=0.27, P<0.01) ventricular EF reaching a maximal increase in EF of +8.4% (P<0.001) for the left and +6.1% (P<0.01) for the right ventricle in the oldest compared to the youngest patient group. Left ventricular myocardial mass significantly decreased over the four different age groups (P<0.05). The aging process is associated with significant changes in left and right ventricular EF, ESV and EDV in subjects with no cardiac functional and structural abnormalities. These findings underline the importance of using age adapted values as standard of reference when evaluating CMR studies

  15. E-learning: controlling costs and increasing value.

    Science.gov (United States)

    Walsh, Kieran

    2015-04-01

    E-learning now accounts for a substantial proportion of medical education provision. This progress has required significant investment and this investment has in turn come under increasing scrutiny so that the costs of e-learning may be controlled and its returns maximised. There are multiple methods by which the costs of e-learning can be controlled and its returns maximised. This short paper reviews some of those methods that are likely to be most effective and that are likely to save costs without compromising quality. Methods might include accessing free or low-cost resources from elsewhere; create short learning resources that will work on multiple devices; using open source platforms to host content; using in-house faculty to create content; sharing resources between institutions; and promoting resources to ensure high usage. Whatever methods are used to control costs or increase value, it is most important to evaluate the impact of these methods.

  16. Functional neuroanatomy of Drosophila olfactory memory formation

    OpenAIRE

    Guven-Ozkan, Tugba; Davis, Ronald L.

    2014-01-01

    New approaches, techniques and tools invented over the last decade and a half have revolutionized the functional dissection of neural circuitry underlying Drosophila learning. The new methodologies have been used aggressively by researchers attempting to answer three critical questions about olfactory memories formed with appetitive and aversive reinforcers: (1) Which neurons within the olfactory nervous system mediate the acquisition of memory? (2) What is the complete neural circuitry exten...

  17. A neural learning classifier system with self-adaptive constructivism for mobile robot control.

    Science.gov (United States)

    Hurst, Jacob; Bull, Larry

    2006-01-01

    For artificial entities to achieve true autonomy and display complex lifelike behavior, they will need to exploit appropriate adaptable learning algorithms. In this context adaptability implies flexibility guided by the environment at any given time and an open-ended ability to learn appropriate behaviors. This article examines the use of constructivism-inspired mechanisms within a neural learning classifier system architecture that exploits parameter self-adaptation as an approach to realize such behavior. The system uses a rule structure in which each rule is represented by an artificial neural network. It is shown that appropriate internal rule complexity emerges during learning at a rate controlled by the learner and that the structure indicates underlying features of the task. Results are presented in simulated mazes before moving to a mobile robot platform.

  18. Functional Correspondence between Evaluators and Abstract Machines

    DEFF Research Database (Denmark)

    Ager, Mads Stig; Biernacki, Dariusz; Danvy, Olivier

    2003-01-01

    We bridge the gap between functional evaluators and abstract machines for the λ-calculus, using closure conversion, transformation into continuation-passing style, and defunctionalization.We illustrate this approach by deriving Krivine's abstract machine from an ordinary call-by-name evaluator...... and by deriving an ordinary call-by-value evaluator from Felleisen et al.'s CEK machine. The first derivation is strikingly simpler than what can be found in the literature. The second one is new. Together, they show that Krivine's abstract machine and the CEK machine correspond to the call-by-name and call...

  19. Learning a decision maker's utility function from (possibly) inconsistent behavior

    DEFF Research Database (Denmark)

    Nielsen, Thomas Dyhre; Jensen, Finn Verner

    2004-01-01

    developed for learning the probabilities from a database.However, methods for learning the utilities have only received limitedattention in the computer science community. A promising approach for learning a decision maker's utility function is to takeoutset in the decision maker's observed behavioral...... patterns, and then find autility function which (together with a domain model) can explainthis behavior. That is, it is assumed that decision maker's preferences arereflected in the behavior. Standard learning algorithmsalso assume that the decision maker is behavioralconsistent, i.e., given a model ofthe...... decision problem, there exists a utility function which canaccount for all the observed behavior. Unfortunately, this assumption israrely valid in real-world decision problems, and in these situationsexisting learning methods may only identify a trivial utilityfunction. In this paper we relax...

  20. Towards Static Analysis of Policy-Based Self-adaptive Computing Systems

    DEFF Research Database (Denmark)

    Margheri, Andrea; Nielson, Hanne Riis; Nielson, Flemming

    2016-01-01

    For supporting the design of self-adaptive computing systems, the PSCEL language offers a principled approach that relies on declarative definitions of adaptation and authorisation policies enforced at runtime. Policies permit managing system components by regulating their interactions...... and by dynamically introducing new actions to accomplish task-oriented goals. However, the runtime evaluation of policies and their effects on system components make the prediction of system behaviour challenging. In this paper, we introduce the construction of a flow graph that statically points out the policy...... evaluations that can take place at runtime and exploit it to analyse the effects of policy evaluations on the progress of system components....

  1. Intelligent Broadcasting in Mobile Ad Hoc Networks: Three Classes of Adaptive Protocols

    Directory of Open Access Journals (Sweden)

    Michael D. Colagrosso

    2006-11-01

    Full Text Available Because adaptability greatly improves the performance of a broadcast protocol, we identify three ways in which machine learning can be applied to broadcasting in a mobile ad hoc network (MANET. We chose broadcasting because it functions as a foundation of MANET communication. Unicast, multicast, and geocast protocols utilize broadcasting as a building block, providing important control and route establishment functionality. Therefore, any improvements to the process of broadcasting can be immediately realized by higher-level MANET functionality and applications. While efficient broadcast protocols have been proposed, no single broadcasting protocol works well in all possible MANET conditions. Furthermore, protocols tend to fail catastrophically in severe network environments. Our three classes of adaptive protocols are pure machine learning, intra-protocol learning, and inter-protocol learning. In the pure machine learning approach, we exhibit a new approach to the design of a broadcast protocol: the decision of whether to rebroadcast a packet is cast as a classification problem. Each mobile node (MN builds a classifier and trains it on data collected from the network environment. Using intra-protocol learning, each MN consults a simple machine model for the optimal value of one of its free parameters. Lastly, in inter-protocol learning, MNs learn to switch between different broadcasting protocols based on network conditions. For each class of learning method, we create a prototypical protocol and examine its performance in simulation.

  2. Intelligent Broadcasting in Mobile Ad Hoc Networks: Three Classes of Adaptive Protocols

    Directory of Open Access Journals (Sweden)

    Colagrosso Michael D

    2007-01-01

    Full Text Available Because adaptability greatly improves the performance of a broadcast protocol, we identify three ways in which machine learning can be applied to broadcasting in a mobile ad hoc network (MANET. We chose broadcasting because it functions as a foundation of MANET communication. Unicast, multicast, and geocast protocols utilize broadcasting as a building block, providing important control and route establishment functionality. Therefore, any improvements to the process of broadcasting can be immediately realized by higher-level MANET functionality and applications. While efficient broadcast protocols have been proposed, no single broadcasting protocol works well in all possible MANET conditions. Furthermore, protocols tend to fail catastrophically in severe network environments. Our three classes of adaptive protocols are pure machine learning, intra-protocol learning, and inter-protocol learning. In the pure machine learning approach, we exhibit a new approach to the design of a broadcast protocol: the decision of whether to rebroadcast a packet is cast as a classification problem. Each mobile node (MN builds a classifier and trains it on data collected from the network environment. Using intra-protocol learning, each MN consults a simple machine model for the optimal value of one of its free parameters. Lastly, in inter-protocol learning, MNs learn to switch between different broadcasting protocols based on network conditions. For each class of learning method, we create a prototypical protocol and examine its performance in simulation.

  3. The Quantitative Evaluation of Functional Neuroimaging Experiments: Mutual Information Learning Curves

    DEFF Research Database (Denmark)

    Kjems, Ulrik; Hansen, Lars Kai; Anderson, Jon

    2002-01-01

    Learning curves are presented as an unbiased means for evaluating the performance of models for neuroimaging data analysis. The learning curve measures the predictive performance in terms of the generalization or prediction error as a function of the number of independent examples (e.g., subjects......) used to determine the parameters in the model. Cross-validation resampling is used to obtain unbiased estimates of a generic multivariate Gaussian classifier, for training set sizes from 2 to 16 subjects. We apply the framework to four different activation experiments, in this case \\$\\backslash......\\$[/sup 15/ O]water data sets, although the framework is equally valid for multisubject fMRI studies. We demonstrate how the prediction error can be expressed as the mutual information between the scan and the scan label, measured in units of bits. The mutual information learning curve can be used...

  4. A reward optimization method based on action subrewards in hierarchical reinforcement learning.

    Science.gov (United States)

    Fu, Yuchen; Liu, Quan; Ling, Xionghong; Cui, Zhiming

    2014-01-01

    Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

  5. Wireless Adaptive Therapeutic TeleGaming in a Pervasive Computing Environment

    Science.gov (United States)

    Peters, James F.; Szturm, Tony; Borkowski, Maciej; Lockery, Dan; Ramanna, Sheela; Shay, Barbara

    This chapter introduces a wireless, pervasive computing approach to adaptive therapeutic telegaming considered in the context of near set theory. Near set theory provides a formal basis for observation, comparison and classification of perceptual granules. A perceptual granule is defined by a collection of objects that are graspable by the senses or by the mind. In the proposed pervasive computing approach to telegaming, a handicapped person (e.g., stroke patient with limited hand, finger, arm function) plays a video game by interacting with familiar instrumented objects such as cups, cutlery, soccer balls, nozzles, screw top-lids, spoons, so that the technology that makes therapeutic exercise game-playing possible is largely invisible (Archives of Physical Medicine and Rehabilitation 89:2213-2217, 2008). The basic approach to adaptive learning (AL) in the proposed telegaming environment is ethology-inspired and is quite different from the traditional approach to reinforcement learning. In biologically-inspired learning, organisms learn to achieve some goal by durable modification of behaviours in response to signals from the environment resulting from specific experiences (Animal Behavior, 1995). The term adaptive is used here in an ethological sense, where learning by an organism results from modifying behaviour in response to perceived changes in the environment. To instill adaptivity in a video game, it is assumed that learning by a video game is episodic. During an episode, the behaviour of a player is measured indirectly by tracking the occurrence of gaming events such as a hit or a miss of a target (e.g., hitting a moving ball with a game paddle). An ethogram provides a record of behaviour feature values that provide a basis a functional registry for handicapped players for gaming adaptivity. An important practical application of adaptive gaming is therapeutic rehabilitation exercise carried out in parallel with playing action video games. Enjoyable and

  6. Output Tracking Control of Switched Hybrid Systems: A Fliess Functional Expansion Approach

    Directory of Open Access Journals (Sweden)

    Fenghua He

    2013-01-01

    Full Text Available The output tracking problem is investigated for a nonlinear affine system with multiple modes of continuous control inputs. We convert the family of nonlinear affine systems under consideration into a switched hybrid system by introducing a multiple-valued logic variable. The Fliess functional expansion is adopted to express the input and output relationship of the switched hybrid system. The optimal switching control is determined for a multiple-step output tracking performance index. The proposed approach is applied to a multitarget tracking problem for a flight vehicle aiming for one real target with several decoys flying around it in the terminal guidance course. These decoys appear as apparent targets and have to be distinguished with the approaching of the flight vehicle. The guidance problem of one flight vehicle versus multiple apparent targets should be considered if no large miss distance might be caused due to the limitation of the flight vehicle maneuverability. The target orientation at each time interval is determined. Simulation results show the effectiveness of the proposed method.

  7. Adaptation in the fuzzy self-organising controller

    DEFF Research Database (Denmark)

    Jantzen, Jan; Poulsen, Niels Kjølstad

    2003-01-01

    This simulation study provides an analysis of the adaptation mechanism in the self-organising fuzzy controller, SOC. The approach is to apply a traditional adaptive control viewpoint. A simplified performance measure in the SOC controller is used in a loss function, and thus the MIT rule implies...... an update mechanism similar to the SOC update mechanism. Two simulations of proportionally controlled systems show the behaviour of the proportional gain as it adapts to a specified behaviour....

  8. Computer Simulation Tests of Feedback Error Learning Controller with IDM and ISM for Functional Electrical Stimulation in Wrist Joint Control

    OpenAIRE

    Watanabe, Takashi; Sugi, Yoshihiro

    2010-01-01

    Feedforward controller would be useful for hybrid Functional Electrical Stimulation (FES) system using powered orthotic devices. In this paper, Feedback Error Learning (FEL) controller for FES (FEL-FES controller) was examined using an inverse statics model (ISM) with an inverse dynamics model (IDM) to realize a feedforward FES controller. For FES application, the ISM was tested in learning off line using training data obtained by PID control of very slow movements. Computer simulation tests ...

  9. In vitro reinforcement of hippocampal bursting: a search for Skinner's atoms of behavior.

    OpenAIRE

    Stein, L; Xue, B G; Belluzzi, J D

    1994-01-01

    A novel "in vitro reinforcement" paradigm was used to investigate Skinner's (1953) hypotheses (a) that operant behavior is made up of infinitesimal "response elements" or "behavioral atoms" and (b) that these very small units, and not whole responses, are the functional units of reinforcement. Our tests are based on the assumption that behavioral atoms may plausibly be represented at the neural level by individual cellular responses. As a first approach, we attempted to reinforce the bursting...

  10. Measuring the CO2 shadow price for wastewater treatment: A directional distance function approach

    International Nuclear Information System (INIS)

    Molinos-Senante, María; Hanley, Nick; Sala-Garrido, Ramón

    2015-01-01

    Highlights: • The shadow price of CO 2 informs about the marginal abatement cost of this pollutant. • It is estimated the shadow price of CO 2 for wastewater treatment plants. • The shadow prices depend on the setting of the directional vectors of the distance function. • Sewage sludge treatment technology affects the CO 2 shadow price. - Abstract: The estimation of the value of carbon emissions has become a major research and policy topic since the establishment of the Kyoto Protocol. The shadow price of CO 2 provides information about the marginal abatement cost of this pollutant. It is an essential element in guiding environmental policy issues, since the CO 2 shadow price can be used when fixing carbon tax rates, in environmental cost-benefit analysis and in ascertaining an initial market price for a trading system. The water industry could play an important role in the reduction of greenhouse gas (GHG) emissions. This paper estimates the shadow price of CO 2 for a sample of wastewater treatment plants (WWTPs), using a parametric quadratic directional distance function. Following this, in a sensitivity analysis, the paper evaluates the impact of different settings of directional vectors on the shadow prices. Applying the Mann–Whitney and Kruskal–Wallis non-parametric tests, factors affecting CO 2 prices are investigated. The variation of CO 2 shadow prices across the WWTPs evaluated argues in favour of a market-based approach to CO 2 mitigation as opposed to command-and-control regulation. The paper argues that the estimation of the shadow price of CO 2 for non-power enterprises can provide incentives for reducing GHG emissions

  11. A quantum speedup in machine learning: finding an N-bit Boolean function for a classification

    International Nuclear Information System (INIS)

    Yoo, Seokwon; Lee, Jinhyoung; Bang, Jeongho; Lee, Changhyoup

    2014-01-01

    We compare quantum and classical machines designed for learning an N-bit Boolean function in order to address how a quantum system improves the machine learning behavior. The machines of the two types consist of the same number of operations and control parameters, but only the quantum machines utilize the quantum coherence naturally induced by unitary operators. We show that quantum superposition enables quantum learning that is faster than classical learning by expanding the approximate solution regions, i.e., the acceptable regions. This is also demonstrated by means of numerical simulations with a standard feedback model, namely random search, and a practical model, namely differential evolution. (paper)

  12. Functional rehabilitation of upper limb apraxia in poststroke patients: study protocol for a randomized controlled trial.

    Science.gov (United States)

    Pérez-Mármol, Jose Manuel; García-Ríos, M Carmen; Barrero-Hernandez, Francisco J; Molina-Torres, Guadalupe; Brown, Ted; Aguilar-Ferrándiz, María Encarnación

    2015-11-05

    Upper limb apraxia is a common disorder associated with stroke that can reduce patients' independence levels in activities of daily living and increase levels of disability. Traditional rehabilitation programs designed to promote the recovery of upper limb function have mainly focused on restorative or compensatory approaches. However, no previous studies have been completed that evaluate a combined intervention method approach, where patients concurrently receive cognitive training and learn compensatory strategies for enhancing daily living activities. This study will use a two-arm, assessor-blinded, parallel, randomized controlled trial design, involving 40 patients who present a left- or right-sided unilateral vascular lesion poststroke and a clinical diagnosis of upper limb apraxia. Participants will be randomized to either a combined functional rehabilitation or a traditional health education group. The experimental group will receive an 8-week combined functional program at home, including physical and occupational therapy focused on restorative and compensatory techniques for upper limb apraxia, 3 days per week in 30-min intervention periods. The control group will receive a conventional health education program once a month over 8 weeks, based on improving awareness of physical and functional limitations and facilitating the adaptation of patients to the home. Study outcomes will be assessed immediately postintervention and at the 2-month follow-up. The primary outcome measure will be basic activities of daily living skills as assessed with the Barthel Index. Secondary outcome measures will include the following: 1) the Lawton and Brody Instrumental Activities of Daily Living Scale, 2) the Observation and Scoring of ADL-Activities, 3) the De Renzi Test for Ideational Apraxia, 4) the De Renzi Test for Ideomotor Apraxia, 5) Recognition of Gestures, 6) the Test of Upper Limb Apraxia (TULIA), and 7) the Quality of Life Scale For Stroke (ECVI-38). This trial is

  13. An adaptive random search for short term generation scheduling with network constraints.

    Directory of Open Access Journals (Sweden)

    J A Marmolejo

    Full Text Available This paper presents an adaptive random search approach to address a short term generation scheduling with network constraints, which determines the startup and shutdown schedules of thermal units over a given planning horizon. In this model, we consider the transmission network through capacity limits and line losses. The mathematical model is stated in the form of a Mixed Integer Non Linear Problem with binary variables. The proposed heuristic is a population-based method that generates a set of new potential solutions via a random search strategy. The random search is based on the Markov Chain Monte Carlo method. The main key of the proposed method is that the noise level of the random search is adaptively controlled in order to exploring and exploiting the entire search space. In order to improve the solutions, we consider coupling a local search into random search process. Several test systems are presented to evaluate the performance of the proposed heuristic. We use a commercial optimizer to compare the quality of the solutions provided by the proposed method. The solution of the proposed algorithm showed a significant reduction in computational effort with respect to the full-scale outer approximation commercial solver. Numerical results show the potential and robustness of our approach.

  14. Modelling energy technology dynamics: methodology for adaptive expectations models with learning by doing and learning by searching

    International Nuclear Information System (INIS)

    Kouvaritakis, N.; Soria, A.; Isoard, S.

    2000-01-01

    This paper presents a module endogenising technical change which is capable of being attached to large scale energy models that follow an adaptive-expectations. The formulation includes, apart from the more classical learning by doing effects, quantitative relationships between technology performance and R and D expenditure. It even attempts to go further by partially endogenising the latter by incorporating an optimisation module describing private equipment manufacturers' R and D budget allocation in a context of risk and expectation. Having presented this module in abstract, the paper proceeds to describe how an operational version of it has been constructed and implemented inside a large-scale partial equilibrium world energy model (the POLES model). Concerning learning functions problems associated with the data are alluded to, the hybrid econometric methods used to estimate them are presented as well as the adjustments which had to be effected to ensure a smooth incorporation into the large model. In the final sections is explained the use of the model itself to generate partial foresight parameters for the determination of return expectations particularly in view of CO 2 constraints and associated carbon values. (orig.)

  15. Learning stage-dependent effect of M1 disruption on value-based motor decisions.

    Science.gov (United States)

    Derosiere, Gerard; Vassiliadis, Pierre; Demaret, Sophie; Zénon, Alexandre; Duque, Julie

    2017-11-15

    The present study aimed at characterizing the impact of M1 disruption on the implementation of implicit value information in motor decisions, at both early stages (during reinforcement learning) and late stages (after consolidation) of action value encoding. Fifty subjects performed, over three consecutive days, a task that required them to select between two finger responses according to the color (instruction) and to the shape (implicit, undisclosed rule) of an imperative signal: considering the implicit rule in addition to the instruction allowed subjects to earn more money. We investigated the functional contribution of M1 to the implementation of the implicit rule in subjects' motor decisions. Continuous theta burst stimulation (cTBS) was applied over M1 either on Day 1 or on Day 3, producing a temporary lesion either during reinforcement learning (cTBS Learning group) or after consolidation of the implicit rule, during decision-making (cTBS Decision group), respectively. Interestingly, disrupting M1 activity on Day 1 improved the reliance on the implicit rule, plausibly because M1 cTBS increased dopamine release in the putamen in an indirect way. This finding corroborates the view that cTBS may affect activity in unstimulated areas, such as the basal ganglia. Notably, this effect was short-lasting; it did not persist overnight, suggesting that the functional integrity of M1 during learning is a prerequisite for the consolidation of implicit value information to occur. Besides, cTBS over M1 did not impact the use of the implicit rule when applied on Day 3, although it did so when applied on Day 2 in a recent study where the reliance on the implicit rule declined following cTBS (Derosiere et al., 2017). Overall, these findings indicate that the human M1 is functionally involved in the consolidation and implementation of implicit value information underlying motor decisions. However, M1 contribution seems to vanish as subjects become more experienced in using

  16. Neuropsychological evaluation of deficits in executive functioning for ADHD children with or without learning disabilities.

    Science.gov (United States)

    Wu, Kitty K; Anderson, Vicki; Castiello, Umberto

    2002-01-01

    This study investigates multiple aspects of executive functioning in children with attention deficit/hyperactivity disorder (ADHD). These areas include attentional components, impulsiveness, planning, and problem solving. The rationale of the study is based on neurophysiological studies that suggest frontal lobe dysfunction in ADHD. As frontal lobe functioning is related to abilities in executive control, ADHD is hypothesised to be associated with deficits in various areas of executive functioning. The specific effect of comorbidity of learning disability (LD) was also investigated. Eighty-three children with ADHD and 29 age-matched controls (age 7-13) participated in the study. A battery of neuropsychological tests was utilized to evaluate specific deficits in speed of processing, selective attention, switching attention, sustained attention, attentional capacity, impulsiveness, planning and problem solving. Findings indicated that children with ADHD have slower verbal responses and sustained attention deficit. Deficits in selective attention and attentional capacity observed were largely related to the presence of LD. No specific deficit associated with ADHD or the comorbidity of LD was identified in switching attention, impulsiveness, planning, and problem solving. These results revealed that ADHD is not associated with a general deficit in executive functioning. Instead, ADHD is related to a specific deficit in regulation for attentional resources. The importance of isolating the deficit related to LDs for examining the specific deficit associated with ADHD is highlighted. Results also emphasised the importance of isolating the effect of lower level of abilities (e.g., speed of processing) and the utilization of specific definition for the examination of executive functions.

  17. Output Feedback-Based Boundary Control of Uncertain Coupled Semilinear Parabolic PDE Using Neurodynamic Programming.

    Science.gov (United States)

    Talaei, Behzad; Jagannathan, Sarangapani; Singler, John

    2018-04-01

    In this paper, neurodynamic programming-based output feedback boundary control of distributed parameter systems governed by uncertain coupled semilinear parabolic partial differential equations (PDEs) under Neumann or Dirichlet boundary control conditions is introduced. First, Hamilton-Jacobi-Bellman (HJB) equation is formulated in the original PDE domain and the optimal control policy is derived using the value functional as the solution of the HJB equation. Subsequently, a novel observer is developed to estimate the system states given the uncertain nonlinearity in PDE dynamics and measured outputs. Consequently, the suboptimal boundary control policy is obtained by forward-in-time estimation of the value functional using a neural network (NN)-based online approximator and estimated state vector obtained from the NN observer. Novel adaptive tuning laws in continuous time are proposed for learning the value functional online to satisfy the HJB equation along system trajectories while ensuring the closed-loop stability. Local uniformly ultimate boundedness of the closed-loop system is verified by using Lyapunov theory. The performance of the proposed controller is verified via simulation on an unstable coupled diffusion reaction process.

  18. Understanding tobacco control policy at the national level: bridging the gap between public policy and tobacco control advocacy

    Directory of Open Access Journals (Sweden)

    Marc C. Willemsen

    2018-03-01

    Full Text Available Background While some countries have advanced tobacco control policies, other countries struggle to adopt and implement FCTC's measures. This presentation uncovers the main factors that explain such variations, taking insights from public policy and political science as a starting point for a case study. Methods A case study of tobacco control policy making in the Netherlands, covering the period from the 1960s until the present. The study consisted of a systematic search and analysis of documents and proceedings of parliamentary debates on tobacco policy, supplemented with 22 interviews with key informants from the government, health organisations, politicians, and the tobacco industry. In addition, documents from the Truth Tobacco Industry Documents database, pertaining to the influence of the tobacco industry on Dutch policy making, were analysed. Results The Dutch government started relatively late to regulate tobacco. The choices in tobacco control policy making at the national level and the tempo in which they are made are explained by the interaction of the five main elements of the tobacco control policy making process: Relatively stable context factors (constitutional structures, 'rules of the policy making game', national cultural values Relatively dynamic context factors (regime changes, EU regulation and FCTC guidelines, changing social norms, public support Transfer of ideas (availability and interpretation of scientific evidence Pro and anti-tobacco control networks and coalitions (their organisational and lobby strength Agenda-setting (changes in problem definition, issue framing, media advocacy Conclusions Despite worldwide convergence of tobacco control policies, accelerated by the ratification of the FCTC treaty by most nations, governments develop approaches to tobacco control in line with cultural values, ideological preferences and specific national institutional arrangements. There is no one-size-fits-all approach. The

  19. Implementing European climate adaptation policy. How local policymakers react to European policy

    Directory of Open Access Journals (Sweden)

    Thomas Hartmann

    2015-04-01

    Full Text Available EU policy and projects have an increasing influence on policymaking for climate adaptation. This is especially evident in the development of new climate adaptation policies in transnational city networks. Until now, climate adaptation literature has paid little attention to the influence that these EU networks have on the adaptive capacity in cities. This paper uses two Dutch cities as an empirical base to evaluate the influence of two EU climate adaptation projects on both the experience of local public officials and the adaptive capacity in the respective cities. The main conclusion is that EU climate adaptation projects do not automatically lead to an increased adaptive capacity in the cities involved. This is due to the political opportunistic use of EU funding, which hampers the implementation of climate adaptation policies. Furthermore, these EU projects draw attention away from local network building focused on the development and implementation of climate adaptation policies. These factors have a negative cumulative impact on the performance of these transnational policy networks at the adaptive capacity level in the cities involved. Therefore, in order to strengthen the adaptive capacity in today’s European cities, a context-specific, integrative approach in urban planning is needed at all spatial levels. Hence, policy entrepreneurs should aim to create linkage between the issues in the transnational city network and the concerns in local politics and local networks.

  20. An adaptive image enhancement technique by combining cuckoo search and particle swarm optimization algorithm.

    Science.gov (United States)

    Ye, Zhiwei; Wang, Mingwei; Hu, Zhengbing; Liu, Wei

    2015-01-01

    Image enhancement is an important procedure of image processing and analysis. This paper presents a new technique using a modified measure and blending of cuckoo search and particle swarm optimization (CS-PSO) for low contrast images to enhance image adaptively. In this way, contrast enhancement is obtained by global transformation of the input intensities; it employs incomplete Beta function as the transformation function and a novel criterion for measuring image quality considering three factors which are threshold, entropy value, and gray-level probability density of the image. The enhancement process is a nonlinear optimization problem with several constraints. CS-PSO is utilized to maximize the objective fitness criterion in order to enhance the contrast and detail in an image by adapting the parameters of a novel extension to a local enhancement technique. The performance of the proposed method has been compared with other existing techniques such as linear contrast stretching, histogram equalization, and evolutionary computing based image enhancement methods like backtracking search algorithm, differential search algorithm, genetic algorithm, and particle swarm optimization in terms of processing time and image quality. Experimental results demonstrate that the proposed method is robust and adaptive and exhibits the better performance than other methods involved in the paper.

  1. An Adaptive Image Enhancement Technique by Combining Cuckoo Search and Particle Swarm Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    Zhiwei Ye

    2015-01-01

    Full Text Available Image enhancement is an important procedure of image processing and analysis. This paper presents a new technique using a modified measure and blending of cuckoo search and particle swarm optimization (CS-PSO for low contrast images to enhance image adaptively. In this way, contrast enhancement is obtained by global transformation of the input intensities; it employs incomplete Beta function as the transformation function and a novel criterion for measuring image quality considering three factors which are threshold, entropy value, and gray-level probability density of the image. The enhancement process is a nonlinear optimization problem with several constraints. CS-PSO is utilized to maximize the objective fitness criterion in order to enhance the contrast and detail in an image by adapting the parameters of a novel extension to a local enhancement technique. The performance of the proposed method has been compared with other existing techniques such as linear contrast stretching, histogram equalization, and evolutionary computing based image enhancement methods like backtracking search algorithm, differential search algorithm, genetic algorithm, and particle swarm optimization in terms of processing time and image quality. Experimental results demonstrate that the proposed method is robust and adaptive and exhibits the better performance than other methods involved in the paper.

  2. A Learning Progression for Elementary Students' Functional Thinking

    Science.gov (United States)

    Stephens, Ana C.; Fonger, Nicole; Strachota, Susanne; Isler, Isil; Blanton, Maria; Knuth, Eric; Murphy Gardiner, Angela

    2017-01-01

    In this article we advance characterizations of and supports for elementary students' progress in generalizing and representing functional relationships as part of a comprehensive approach to early algebra. Our learning progressions approach to early algebra research involves the coordination of a curricular framework and progression, an…

  3. Explicit and implicit reinforcement learning across the psychosis spectrum.

    Science.gov (United States)

    Barch, Deanna M; Carter, Cameron S; Gold, James M; Johnson, Sheri L; Kring, Ann M; MacDonald, Angus W; Pizzagalli, Diego A; Ragland, J Daniel; Silverstein, Steven M; Strauss, Milton E

    2017-07-01

    Motivational and hedonic impairments are core features of a variety of types of psychopathology. An important aspect of motivational function is reinforcement learning (RL), including implicit (i.e., outside of conscious awareness) and explicit (i.e., including explicit representations about potential reward associations) learning, as well as both positive reinforcement (learning about actions that lead to reward) and punishment (learning to avoid actions that lead to loss). Here we present data from paradigms designed to assess both positive and negative components of both implicit and explicit RL, examine performance on each of these tasks among individuals with schizophrenia, schizoaffective disorder, and bipolar disorder with psychosis, and examine their relative relationships to specific symptom domains transdiagnostically. None of the diagnostic groups differed significantly from controls on the implicit RL tasks in either bias toward a rewarded response or bias away from a punished response. However, on the explicit RL task, both the individuals with schizophrenia and schizoaffective disorder performed significantly worse than controls, but the individuals with bipolar did not. Worse performance on the explicit RL task, but not the implicit RL task, was related to worse motivation and pleasure symptoms across all diagnostic categories. Performance on explicit RL, but not implicit RL, was related to working memory, which accounted for some of the diagnostic group differences. However, working memory did not account for the relationship of explicit RL to motivation and pleasure symptoms. These findings suggest transdiagnostic relationships across the spectrum of psychotic disorders between motivation and pleasure impairments and explicit RL. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  4. Motor sequence learning-induced neural efficiency in functional brain connectivity.

    Science.gov (United States)

    Karim, Helmet T; Huppert, Theodore J; Erickson, Kirk I; Wollam, Mariegold E; Sparto, Patrick J; Sejdić, Ervin; VanSwearingen, Jessie M

    2017-02-15

    Previous studies have shown the functional neural circuitry differences before and after an explicitly learned motor sequence task, but have not assessed these changes during the process of motor skill learning. Functional magnetic resonance imaging activity was measured while participants (n=13) were asked to tap their fingers to visually presented sequences in blocks that were either the same sequence repeated (learning block) or random sequences (control block). Motor learning was associated with a decrease in brain activity during learning compared to control. Lower brain activation was noted in the posterior parietal association area and bilateral thalamus during the later periods of learning (not during the control). Compared to the control condition, we found the task-related motor learning was associated with decreased connectivity between the putamen and left inferior frontal gyrus and left middle cingulate brain regions. Motor learning was associated with changes in network activity, spatial extent, and connectivity. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Efficient Pseudorecursive Evaluation Schemes for Non-adaptive Sparse Grids

    KAUST Repository

    Buse, Gerrit; Pflü ger, Dirk; Jacob, Riko

    2014-01-01

    In this work we propose novel algorithms for storing and evaluating sparse grid functions, operating on regular (not spatially adaptive), yet potentially dimensionally adaptive grid types. Besides regular sparse grids our approach includes truncated

  6. Reinforced dynamics for enhanced sampling in large atomic and molecular systems

    Science.gov (United States)

    Zhang, Linfeng; Wang, Han; E, Weinan

    2018-03-01

    A new approach for efficiently exploring the configuration space and computing the free energy of large atomic and molecular systems is proposed, motivated by an analogy with reinforcement learning. There are two major components in this new approach. Like metadynamics, it allows for an efficient exploration of the configuration space by adding an adaptively computed biasing potential to the original dynamics. Like deep reinforcement learning, this biasing potential is trained on the fly using deep neural networks, with data collected judiciously from the exploration and an uncertainty indicator from the neural network model playing the role of the reward function. Parameterization using neural networks makes it feasible to handle cases with a large set of collective variables. This has the potential advantage that selecting precisely the right set of collective variables has now become less critical for capturing the structural transformations of the system. The method is illustrated by studying the full-atom explicit solvent models of alanine dipeptide and tripeptide, as well as the system of a polyalanine-10 molecule with 20 collective variables.

  7. Carbon capture and sequestration (CCS) technological innovation system in China: Structure, function evaluation and policy implication

    International Nuclear Information System (INIS)

    Lai Xianjin; Ye Zhonghua; Xu Zhengzhong; Husar Holmes, Maja; Henry Lambright, W.

    2012-01-01

    Carbon capture and sequestration (CCS) can be an important technology option for China in addressing global climate change and developing clean energy technologies. Promoted by international climate conventions and supported by government research and development programs, an increasing number of CCS pilot and demonstration projects have been launched in China. In this study, we analyze the structure of China’s CCS effort from a technological innovation system (TIS) perspective. Within this system, key socio-political components, including institutions, actor-networks, and technology development, are examined to evaluate the state of the innovation system. The study assessed the perceived capacity of seven functional areas of the CCS innovation system through a survey of key CCS actors and stakeholders. The findings suggest that China’s CCS innovation system has a strong functional capacity for knowledge and technology development. It is significantly weaker in the innovative functions of knowledge diffusion, market formation, facilitating entrepreneurs and new entrants into the CCS market. Based on the evaluation of China’s technological innovation system to develop CCS, the article articulates specific public policies to formulate a more robust innovation system to traverse the “valley of death” from research and development to commercial deployment and accelerate energy innovation in China. - Highlights: ► We analyze and evaluate China’s CCS innovation system from TIS perspective. ► Strong and systematic CCS innovation system structure has come into being in China. ► The system has acquired high knowledge development and accumulation. ► Weak innovation functions are identified: market creation, guidance, etc. ► Public policies are needed to improve the innovation system performance.

  8. Fragment approach to constrained density functional theory calculations using Daubechies wavelets

    International Nuclear Information System (INIS)

    Ratcliff, Laura E.; Genovese, Luigi; Mohr, Stephan; Deutsch, Thierry

    2015-01-01

    In a recent paper, we presented a linear scaling Kohn-Sham density functional theory (DFT) code based on Daubechies wavelets, where a minimal set of localized support functions are optimized in situ and therefore adapted to the chemical properties of the molecular system. Thanks to the systematically controllable accuracy of the underlying basis set, this approach is able to provide an optimal contracted basis for a given system: accuracies for ground state energies and atomic forces are of the same quality as an uncontracted, cubic scaling approach. This basis set offers, by construction, a natural subset where the density matrix of the system can be projected. In this paper, we demonstrate the flexibility of this minimal basis formalism in providing a basis set that can be reused as-is, i.e., without reoptimization, for charge-constrained DFT calculations within a fragment approach. Support functions, represented in the underlying wavelet grid, of the template fragments are roto-translated with high numerical precision to the required positions and used as projectors for the charge weight function. We demonstrate the interest of this approach to express highly precise and efficient calculations for preparing diabatic states and for the computational setup of systems in complex environments

  9. Fragment approach to constrained density functional theory calculations using Daubechies wavelets

    Energy Technology Data Exchange (ETDEWEB)

    Ratcliff, Laura E., E-mail: lratcliff@anl.gov [Argonne Leadership Computing Facility, Argonne National Laboratory, Lemont, Illinois 60439 (United States); Université de Grenoble Alpes, CEA, INAC-SP2M, L-Sim, F-38000 Grenoble (France); Genovese, Luigi; Mohr, Stephan; Deutsch, Thierry [Université de Grenoble Alpes, CEA, INAC-SP2M, L-Sim, F-38000 Grenoble (France)

    2015-06-21

    In a recent paper, we presented a linear scaling Kohn-Sham density functional theory (DFT) code based on Daubechies wavelets, where a minimal set of localized support functions are optimized in situ and therefore adapted to the chemical properties of the molecular system. Thanks to the systematically controllable accuracy of the underlying basis set, this approach is able to provide an optimal contracted basis for a given system: accuracies for ground state energies and atomic forces are of the same quality as an uncontracted, cubic scaling approach. This basis set offers, by construction, a natural subset where the density matrix of the system can be projected. In this paper, we demonstrate the flexibility of this minimal basis formalism in providing a basis set that can be reused as-is, i.e., without reoptimization, for charge-constrained DFT calculations within a fragment approach. Support functions, represented in the underlying wavelet grid, of the template fragments are roto-translated with high numerical precision to the required positions and used as projectors for the charge weight function. We demonstrate the interest of this approach to express highly precise and efficient calculations for preparing diabatic states and for the computational setup of systems in complex environments.

  10. Adaptive management and the value of information: learning via intervention in epidemiology

    Science.gov (United States)

    Shea, Katriona; Tildesley, Michael J.; Runge, Michael C.; Fonnesbeck, Christopher J.; Ferrari, Matthew J.

    2014-01-01

    Optimal intervention for disease outbreaks is often impeded by severe scientific uncertainty. Adaptive management (AM), long-used in natural resource management, is a structured decision-making approach to solving dynamic problems that accounts for the value of resolving uncertainty via real-time evaluation of alternative models. We propose an AM approach to design and evaluate intervention strategies in epidemiology, using real-time surveillance to resolve model uncertainty as management proceeds, with foot-and-mouth disease (FMD) culling and measles vaccination as case studies. We use simulations of alternative intervention strategies under competing models to quantify the effect of model uncertainty on decision making, in terms of the value of information, and quantify the benefit of adaptive versus static intervention strategies. Culling decisions during the 2001 UK FMD outbreak were contentious due to uncertainty about the spatial scale of transmission. The expected benefit of resolving this uncertainty prior to a new outbreak on a UK-like landscape would be £45–£60 million relative to the strategy that minimizes livestock losses averaged over alternate transmission models. AM during the outbreak would be expected to recover up to £20.1 million of this expected benefit. AM would also recommend a more conservative initial approach (culling of infected premises and dangerous contact farms) than would a fixed strategy (which would additionally require culling of contiguous premises). For optimal targeting of measles vaccination, based on an outbreak in Malawi in 2010, AM allows better distribution of resources across the affected region; its utility depends on uncertainty about both the at-risk population and logistical capacity. When daily vaccination rates are highly constrained, the optimal initial strategy is to conduct a small, quick campaign; a reduction in expected burden of approximately 10,000 cases could result if campaign targets can be updated on

  11. Functioning strategy study on control systems of large physical installations used with a digital computer

    International Nuclear Information System (INIS)

    Bel'man, L.B.; Lavrikov, S.A.; Lenskij, O.D.

    1975-01-01

    A criterion to evaluate the efficiency of a control system functioning of large physical installations by means of a control computer. The criteria are the object utilization factor and computer load factor. Different strategies of control system functioning are described, and their comparative analysis is made. A choice of such important parameters as sampling time and parameter correction time is made. A single factor to evaluate the system functioning efficiency is introduced and its dependence on the sampling interval value is given. Using diagrams attached, it is easy to find the optimum value of the sampling interval and the corresponding maximum value of the single efficiency factor proposed

  12. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Science.gov (United States)

    Kato, Ayaka; Morita, Kenji

    2016-10-01

    It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2) value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning

  13. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

    Directory of Open Access Journals (Sweden)

    Ayaka Kato

    2016-10-01

    Full Text Available It has been suggested that dopamine (DA represents reward-prediction-error (RPE defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1 decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2 value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i slowdown of behavior by post-training blockade of DA signaling, (ii observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems

  14. Biomimetic approach to tacit learning based on compound control.

    Science.gov (United States)

    Shimoda, Shingo; Kimura, Hidenori

    2010-02-01

    The remarkable capability of living organisms to adapt to unknown environments is due to learning mechanisms that are totally different from the current artificial machine-learning paradigm. Computational media composed of identical elements that have simple activity rules play a major role in biological control, such as the activities of neurons in brains and the molecular interactions in intracellular control. As a result of integrations of the individual activities of the computational media, new behavioral patterns emerge to adapt to changing environments. We previously implemented this feature of biological controls in a form of machine learning and succeeded to realize bipedal walking without the robot model or trajectory planning. Despite the success of bipedal walking, it was a puzzle as to why the individual activities of the computational media could achieve the global behavior. In this paper, we answer this question by taking a statistical approach that connects the individual activities of computational media to global network behaviors. We show that the individual activities can generate optimized behaviors from a particular global viewpoint, i.e., autonomous rhythm generation and learning of balanced postures, without using global performance indices.

  15. The Two-Dimensional Gabor Function Adapted to Natural Image Statistics: A Model of Simple-Cell Receptive Fields and Sparse Structure in Images.

    Science.gov (United States)

    Loxley, P N

    2017-10-01

    The two-dimensional Gabor function is adapted to natural image statistics, leading to a tractable probabilistic generative model that can be used to model simple cell receptive field profiles, or generate basis functions for sparse coding applications. Learning is found to be most pronounced in three Gabor function parameters representing the size and spatial frequency of the two-dimensional Gabor function and characterized by a nonuniform probability distribution with heavy tails. All three parameters are found to be strongly correlated, resulting in a basis of multiscale Gabor functions with similar aspect ratios and size-dependent spatial frequencies. A key finding is that the distribution of receptive-field sizes is scale invariant over a wide range of values, so there is no characteristic receptive field size selected by natural image statistics. The Gabor function aspect ratio is found to be approximately conserved by the learning rules and is therefore not well determined by natural image statistics. This allows for three distinct solutions: a basis of Gabor functions with sharp orientation resolution at the expense of spatial-frequency resolution, a basis of Gabor functions with sharp spatial-frequency resolution at the expense of orientation resolution, or a basis with unit aspect ratio. Arbitrary mixtures of all three cases are also possible. Two parameters controlling the shape of the marginal distributions in a probabilistic generative model fully account for all three solutions. The best-performing probabilistic generative model for sparse coding applications is found to be a gaussian copula with Pareto marginal probability density functions.

  16. An evolutionary computation approach to examine functional brain plasticity

    Directory of Open Access Journals (Sweden)

    Arnab eRoy

    2016-04-01

    Full Text Available One common research goal in systems neurosciences is to understand how the functional relationship between a pair of regions of interest (ROIs evolves over time. Examining neural connectivity in this way is well-suited for the study of developmental processes, learning, and even in recovery or treatment designs in response to injury. For most fMRI based studies, the strength of the functional relationship between two ROIs is defined as the correlation between the average signal representing each region. The drawback to this approach is that much information is lost due to averaging heterogeneous voxels, and therefore, the functional relationship between a ROI-pair that evolve at a spatial scale much finer than the ROIs remain undetected. To address this shortcoming, we introduce a novel evolutionary computation (EC based voxel-level procedure to examine functional plasticity between an investigator defined ROI-pair by simultaneously using subject-specific BOLD-fMRI data collected from two sessions seperated by finite duration of time. This data-driven procedure detects a sub-region composed of spatially connected voxels from each ROI (a so-called sub-regional-pair such that the pair shows a significant gain/loss of functional relationship strength across the two time points. The procedure is recursive and iteratively finds all statistically significant sub-regional-pairs within the ROIs. Using this approach, we examine functional plasticity between the default mode network (DMN and the executive control network (ECN during recovery from traumatic brain injury (TBI; the study includes 14 TBI and 12 healthy control subjects. We demonstrate that the EC based procedure is able to detect functional plasticity where a traditional averaging based approach fails. The subject-specific plasticity estimates obtained using the EC-procedure are highly consistent across multiple runs. Group-level analyses using these plasticity estimates showed an increase in

  17. Neural Control of a Tracking Task via Attention-Gated Reinforcement Learning for Brain-Machine Interfaces.

    Science.gov (United States)

    Wang, Yiwen; Wang, Fang; Xu, Kai; Zhang, Qiaosheng; Zhang, Shaomin; Zheng, Xiaoxiang

    2015-05-01

    Reinforcement learning (RL)-based brain machine interfaces (BMIs) enable the user to learn from the environment through interactions to complete the task without desired signals, which is promising for clinical applications. Previous studies exploited Q-learning techniques to discriminate neural states into simple directional actions providing the trial initial timing. However, the movements in BMI applications can be quite complicated, and the action timing explicitly shows the intention when to move. The rich actions and the corresponding neural states form a large state-action space, imposing generalization difficulty on Q-learning. In this paper, we propose to adopt attention-gated reinforcement learning (AGREL) as a new learning scheme for BMIs to adaptively decode high-dimensional neural activities into seven distinct movements (directional moves, holdings and resting) due to the efficient weight-updating. We apply AGREL on neural data recorded from M1 of a monkey to directly predict a seven-action set in a time sequence to reconstruct the trajectory of a center-out task. Compared to Q-learning techniques, AGREL could improve the target acquisition rate to 90.16% in average with faster convergence and more stability to follow neural activity over multiple days, indicating the potential to achieve better online decoding performance for more complicated BMI tasks.

  18. Density functionals from deep learning

    OpenAIRE

    McMahon, Jeffrey M.

    2016-01-01

    Density-functional theory is a formally exact description of a many-body quantum system in terms of its density; in practice, however, approximations to the universal density functional are required. In this work, a model based on deep learning is developed to approximate this functional. Deep learning allows computational models that are capable of naturally discovering intricate structure in large and/or high-dimensional data sets, with multiple levels of abstraction. As no assumptions are ...

  19. Persistent Functional Languages: Toward Functional Relational Databases

    NARCIS (Netherlands)

    Wevers, L.

    2014-01-01

    Functional languages provide new approaches to concurrency control, based on techniques such as lazy evaluation and memoization. We have designed and implemented a persistent functional language based on these ideas, which we plan to use for the implementation of a relational database system. With

  20. Financial and testamentary capacity evaluations: procedures and assessment instruments underneath a functional approach.

    Science.gov (United States)

    Sousa, Liliana B; Simões, Mário R; Firmino, Horácio; Peisah, Carmelle

    2014-02-01

    Mental health professionals are frequently involved in mental capacity determinations. However, there is a lack of specific measures and well-defined procedures for these evaluations. The main purpose of this paper is to provide a review of financial and testamentary capacity evaluation procedures, including not only the traditional neuropsychological and functional assessment but also the more recently developed forensic assessment instruments (FAIs), which have been developed to provide a specialized answer to legal systems regarding civil competencies. Here the main guidelines, papers, and other references are reviewed in order to achieve a complete and comprehensive selection of instruments used in the assessment of financial and testamentary capacity. Although some specific measures for financial abilities have been developed recently, the same is not true for testamentary capacity. Here are presented several instruments or methodologies for assessing financial and testamentary capacity, including neuropsychological assessment, functional assessment scales, performance based functional assessment instruments, and specific FAIs. FAIs are the only specific instruments intended to provide a specific and direct answer to the assessment of financial capacity based on legal systems. Considering the need to move from a diagnostic to a functional approach in financial and testamentary capacity evaluations, it is essential to consider both general functional examination as well as cognitive functioning.