fuzzy reinforcement learning: Topics by WorldWideScience.org

Sample records for fuzzy reinforcement learning

Structure identification in fuzzy inference using reinforcement learning

Science.gov (United States)

Berenji, Hamid R.; Khedkar, Pratap

1993-01-01

In our previous work on the GARIC architecture, we have shown that the system can start with surface structure of the knowledge base (i.e., the linguistic expression of the rules) and learn the deep structure (i.e., the fuzzy membership functions of the labels used in the rules) by using reinforcement learning. Assuming the surface structure, GARIC refines the fuzzy membership functions used in the consequents of the rules using a gradient descent procedure. This hybrid fuzzy logic and reinforcement learning approach can learn to balance a cart-pole system and to backup a truck to its docking location after a few trials. In this paper, we discuss how to do structure identification using reinforcement learning in fuzzy inference systems. This involves identifying both surface as well as deep structure of the knowledge base. The term set of fuzzy linguistic labels used in describing the values of each control variable must be derived. In this process, splitting a label refers to creating new labels which are more granular than the original label and merging two labels creates a more general label. Splitting and merging of labels directly transform the structure of the action selection network used in GARIC by increasing or decreasing the number of hidden layer nodes.
A Neuro-Control Design Based on Fuzzy Reinforcement Learning

DEFF Research Database (Denmark)

Katebi, S.D.; Blanke, M.

This paper describes a neuro-control fuzzy critic design procedure based on reinforcement learning. An important component of the proposed intelligent control configuration is the fuzzy credit assignment unit which acts as a critic, and through fuzzy implications provides adjustment mechanisms....... The fuzzy credit assignment unit comprises a fuzzy system with the appropriate fuzzification, knowledge base and defuzzification components. When an external reinforcement signal (a failure signal) is received, sequences of control actions are evaluated and modified by the action applier unit. The desirable...... ones instruct the neuro-control unit to adjust its weights and are simultaneously stored in the memory unit during the training phase. In response to the internal reinforcement signal (set point threshold deviation), the stored information is retrieved by the action applier unit and utilized for re...
Systems control with generalized probabilistic fuzzy-reinforcement learning

NARCIS (Netherlands)

Hinojosa, J.; Nefti, S.; Kaymak, U.

2011-01-01

Reinforcement learning (RL) is a valuable learning method when the systems require a selection of control actions whose consequences emerge over long periods for which input-output data are not available. In most combinations of fuzzy systems and RL, the environment is considered to be
GA-based fuzzy reinforcement learning for control of a magnetic bearing system.

Science.gov (United States)

Lin, C T; Jou, C P

2000-01-01

This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.
Design issues of a reinforcement-based self-learning fuzzy controller for petrochemical process control

Science.gov (United States)

Yen, John; Wang, Haojin; Daugherity, Walter C.

1992-01-01

Fuzzy logic controllers have some often-cited advantages over conventional techniques such as PID control, including easier implementation, accommodation to natural language, and the ability to cover a wider range of operating conditions. One major obstacle that hinders the broader application of fuzzy logic controllers is the lack of a systematic way to develop and modify their rules; as a result the creation and modification of fuzzy rules often depends on trial and error or pure experimentation. One of the proposed approaches to address this issue is a self-learning fuzzy logic controller (SFLC) that uses reinforcement learning techniques to learn the desirability of states and to adjust the consequent part of its fuzzy control rules accordingly. Due to the different dynamics of the controlled processes, the performance of a self-learning fuzzy controller is highly contingent on its design. The design issue has not received sufficient attention. The issues related to the design of a SFLC for application to a petrochemical process are discussed, and its performance is compared with that of a PID and a self-tuning fuzzy logic controller.
Fuzzy OLAP association rules mining-based modular reinforcement learning approach for multiagent systems.

Science.gov (United States)

Kaya, Mehmet; Alhajj, Reda

2005-04-01

Multiagent systems and data mining have recently attracted considerable attention in the field of computing. Reinforcement learning is the most commonly used learning process for multiagent systems. However, it still has some drawbacks, including modeling other learning agents present in the domain as part of the state of the environment, and some states are experienced much less than others, or some state-action pairs are never visited during the learning phase. Further, before completing the learning process, an agent cannot exhibit a certain behavior in some states that may be experienced sufficiently. In this study, we propose a novel multiagent learning approach to handle these problems. Our approach is based on utilizing the mining process for modular cooperative learning systems. It incorporates fuzziness and online analytical processing (OLAP) based mining to effectively process the information reported by agents. First, we describe a fuzzy data cube OLAP architecture which facilitates effective storage and processing of the state information reported by agents. This way, the action of the other agent, not even in the visual environment. of the agent under consideration, can simply be predicted by extracting online association rules, a well-known data mining technique, from the constructed data cube. Second, we present a new action selection model, which is also based on association rules mining. Finally, we generalize not sufficiently experienced states, by mining multilevel association rules from the proposed fuzzy data cube. Experimental results obtained on two different versions of a well-known pursuit domain show the robustness and effectiveness of the proposed fuzzy OLAP mining based modular learning approach. Finally, we tested the scalability of the approach presented in this paper and compared it with our previous work on modular-fuzzy Q-learning and ordinary Q-learning.
Episodic reinforcement learning control approach for biped walking

Directory of Open Access Journals (Sweden)

Katić Duško

2012-01-01

Full Text Available This paper presents a hybrid dynamic control approach to the realization of humanoid biped robotic walk, focusing on the policy gradient episodic reinforcement learning with fuzzy evaluative feedback. The proposed structure of controller involves two feedback loops: a conventional computed torque controller and an episodic reinforcement learning controller. The reinforcement learning part includes fuzzy information about Zero-Moment- Point errors. Simulation tests using a medium-size 36-DOF humanoid robot MEXONE were performed to demonstrate the effectiveness of our method.
Self-learning fuzzy logic controllers based on reinforcement

International Nuclear Information System (INIS)

Wang, Z.; Shao, S.; Ding, J.

1996-01-01

This paper proposes a new method for learning and tuning Fuzzy Logic Controllers. The self-learning scheme in this paper is composed of Bucket-Brigade and Genetic Algorithm. The proposed method is tested on the cart-pole system. Simulation results show that our approach has good learning and control performance
Fuzzy self-learning control for magnetic servo system

Science.gov (United States)

Tarn, J. H.; Kuo, L. T.; Juang, K. Y.; Lin, C. E.

1994-01-01

It is known that an effective control system is the key condition for successful implementation of high-performance magnetic servo systems. Major issues to design such control systems are nonlinearity; unmodeled dynamics, such as secondary effects for copper resistance, stray fields, and saturation; and that disturbance rejection for the load effect reacts directly on the servo system without transmission elements. One typical approach to design control systems under these conditions is a special type of nonlinear feedback called gain scheduling. It accommodates linear regulators whose parameters are changed as a function of operating conditions in a preprogrammed way. In this paper, an on-line learning fuzzy control strategy is proposed. To inherit the wealth of linear control design, the relations between linear feedback and fuzzy logic controllers have been established. The exercise of engineering axioms of linear control design is thus transformed into tuning of appropriate fuzzy parameters. Furthermore, fuzzy logic control brings the domain of candidate control laws from linear into nonlinear, and brings new prospects into design of the local controllers. On the other hand, a self-learning scheme is utilized to automatically tune the fuzzy rule base. It is based on network learning infrastructure; statistical approximation to assign credit; animal learning method to update the reinforcement map with a fast learning rate; and temporal difference predictive scheme to optimize the control laws. Different from supervised and statistical unsupervised learning schemes, the proposed method learns on-line from past experience and information from the process and forms a rule base of an FLC system from randomly assigned initial control rules.
Prediction of Elastic Constants of the Fuzzy Fibre Reinforced Polymer Using Computational Micromechanics

Science.gov (United States)

Pawlik, Marzena; Lu, Yiling

2018-05-01

Computational micromechanics is a useful tool to predict properties of carbon fibre reinforced polymers. In this paper, a representative volume element (RVE) is used to investigate a fuzzy fibre reinforced polymer. The fuzzy fibre results from the introduction of nanofillers in the fibre surface. The composite being studied contains three phases, namely: the T650 carbon fibre, the carbon nanotubes (CNTs) reinforced interphase and the epoxy resin EPIKOTE 862. CNTs are radially grown on the surface of the carbon fibre, and thus resultant interphase composed of nanotubes and matrix is transversely isotropic. Transversely isotropic properties of the interphase are numerically implemented in the ANSYS FEM software using element orientation command. Obtained numerical predictions are compared with the available analytical models. It is found that the CNTs interphase significantly increased the transverse mechanical properties of the fuzzy fibre reinforced polymer. This extent of enhancement changes monotonically with the carbon fibre volume fraction. This RVE model enables to investigate different orientation of CNTs in the fuzzy fibre model.
A neural fuzzy controller learning by fuzzy error propagation

Science.gov (United States)

Nauck, Detlef; Kruse, Rudolf

1992-01-01

In this paper, we describe a procedure to integrate techniques for the adaptation of membership functions in a linguistic variable based fuzzy control environment by using neural network learning principles. This is an extension to our work. We solve this problem by defining a fuzzy error that is propagated back through the architecture of our fuzzy controller. According to this fuzzy error and the strength of its antecedent each fuzzy rule determines its amount of error. Depending on the current state of the controlled system and the control action derived from the conclusion, each rule tunes the membership functions of its antecedent and its conclusion. By this we get an unsupervised learning technique that enables a fuzzy controller to adapt to a control task by knowing just about the global state and the fuzzy error.
Ellipsoidal fuzzy learning for smart car platoons

Science.gov (United States)

Dickerson, Julie A.; Kosko, Bart

1993-12-01

A neural-fuzzy system combined supervised and unsupervised learning to find and tune the fuzzy-rules. An additive fuzzy system approximates a function by covering its graph with fuzzy rules. A fuzzy rule patch can take the form of an ellipsoid in the input-output space. Unsupervised competitive learning found the statistics of data clusters. The covariance matrix of each synaptic quantization vector defined on ellipsoid centered at the centroid of the data cluster. Tightly clustered data gave smaller ellipsoids or more certain rules. Sparse data gave larger ellipsoids or less certain rules. Supervised learning tuned the ellipsoids to improve the approximation. The supervised neural system used gradient descent to find the ellipsoidal fuzzy patches. It locally minimized the mean-squared error of the fuzzy approximation. Hybrid ellipsoidal learning estimated the control surface for a smart car controller.
Fuzzy-logic based learning style prediction in e-learning using web ...

Indian Academy of Sciences (India)

tion, especially in web environments and proposes to use Fuzzy rules to handle the uncertainty in .... learning in safe and supportive environment ... working of the proposed Fuzzy-logic based learning style prediction in e-learning. Section 4.
A Fuzzy Approach to Classify Learning Disability

OpenAIRE

Pooja Manghirmalani; Darshana More; Kavita Jain

2012-01-01

The endeavor of this work is to support the special education community in their quest to be with the mainstream. The initial segment of the paper gives an exhaustive study of the different mechanisms of diagnosing learning disability. After diagnosis of learning disability the further classification of learning disability that is dyslexia, dysgraphia or dyscalculia are fuzzy. Hence the paper proposes a model based on Fuzzy Expert System which enables the classification of learning disability...
Fuzzy Sarsa with Focussed Replacing Eligibility Traces for Robust and Accurate Control

Science.gov (United States)

Kamdem, Sylvain; Ohki, Hidehiro; Sueda, Naomichi

Several methods of reinforcement learning in continuous state and action spaces that utilize fuzzy logic have been proposed in recent years. This paper introduces Fuzzy Sarsa(λ), an on-policy algorithm for fuzzy learning that relies on a novel way of computing replacing eligibility traces to accelerate the policy evaluation. It is tested against several temporal difference learning algorithms: Sarsa(λ), Fuzzy Q(λ), an earlier fuzzy version of Sarsa and an actor-critic algorithm. We perform detailed evaluations on two benchmark problems : a maze domain and the cart pole. Results of various tests highlight the strengths and weaknesses of these algorithms and show that Fuzzy Sarsa(λ) outperforms all other algorithms tested for a larger granularity of design and under noisy conditions. It is a highly competitive method of learning in realistic noisy domains where a denser fuzzy design over the state space is needed for a more precise control.
Algorithms for Reinforcement Learning

CERN Document Server

Szepesvari, Csaba

2010-01-01

Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms'
DYNAMIC AND INCREMENTAL EXPLORATION STRATEGY IN FUSION ADAPTIVE RESONANCE THEORY FOR ONLINE REINFORCEMENT LEARNING

Directory of Open Access Journals (Sweden)

Budhitama Subagdja

2016-06-01

Full Text Available One of the fundamental challenges in reinforcement learning is to setup a proper balance between exploration and exploitation to obtain the maximum cummulative reward in the long run. Most protocols for exploration bound the overall values to a convergent level of performance. If new knowledge is inserted or the environment is suddenly changed, the issue becomes more intricate as the exploration must compromise the pre-existing knowledge. This paper presents a type of multi-channel adaptive resonance theory (ART neural network model called fusion ART which serves as a fuzzy approximator for reinforcement learning with inherent features that can regulate the exploration strategy. This intrinsic regulation is driven by the condition of the knowledge learnt so far by the agent. The model offers a stable but incremental reinforcement learning that can involve prior rules as bootstrap knowledge for guiding the agent to select the right action. Experiments in obstacle avoidance and navigation tasks demonstrate that in the configuration of learning wherein the agent learns from scratch, the inherent exploration model in fusion ART model is comparable to the basic E-greedy policy. On the other hand, the model is demonstrated to deal with prior knowledge and strike a balance between exploration and exploitation.
The Reinforcement Learning Competition 2014

OpenAIRE

Dimitrakakis, Christos; Li, Guangliang; Tziortziotis, Nikoalos

2014-01-01

Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the development of very general learning agents for arbitrary reinforcement learning problems and to provide a test-bed for the unbiased evaluation of algorithms.
Design and implementation of an adaptive critic-based neuro-fuzzy controller on an unmanned bicycle

OpenAIRE

Shafiekhani, Ali; Mahjoob, Mohammad J.; Akraminia, Mehdi

2017-01-01

Fuzzy critic-based learning forms a reinforcement learning method based on dynamic programming. In this paper, an adaptive critic-based neuro-fuzzy system is presented for an unmanned bicycle. The only information available for the critic agent is the system feedback which is interpreted as the last action performed by the controller in the previous state. The signal produced by the critic agent is used along with the error back propagation to tune (online) conclusion parts of the fuzzy infer...
Adaptive learning fuzzy control of a mobile robot

International Nuclear Information System (INIS)

Tsukada, Akira; Suzuki, Katsuo; Fujii, Yoshio; Shinohara, Yoshikuni

1989-11-01

In this report a problem is studied to construct a fuzzy controller for a mobile robot to move autonomously along a given reference direction curve, for which control rules are generated and acquired through an adaptive learning process. An adaptive learning fuzzy controller has been developed for a mobile robot. Good properties of the controller are shown through the travelling experiments of the mobile robot. (author)

Design of fuzzy learning control systems for steam generator water level control

International Nuclear Information System (INIS)

Park, Gee Yong

1996-02-01

A fuzzy learning algorithm is developed in order to construct the useful control rules and tune the membership functions in the fuzzy logic controller used for water level control of nuclear steam generator. The fuzzy logic controllers have shown to perform better than conventional controllers for ill-defined or complex processes such as nuclear steam generator. Whereas the fuzzy logic controller does not need a detailed mathematical model of a plant to be controlled, its structure is to be made on the basis of the operator's linguistic information experienced from the plant operations. It is not an easy work and also there is no systematic way to translate the operator's linguistic information into quantitative information. When the linguistic information of operators is incomplete, tuning the parameters of fuzzy controller is to be performed for better control performance. It is the time and effort consuming procedure that controller designer has to tune the structure of fuzzy logic controller for optimal performance. And if the number of control inputs is many and the rule base is constructed in multidimensional space, it is very difficult for a controller designer to tune the fuzzy controller structure. Hence, the difficulty in putting the experimental knowledge into quantitative (or numerical) data and the difficulty in tuning the rules are the major problems in designing fuzzy logic controller. In order to overcome the problems described above, a learning algorithm by gradient descent method is included in the fuzzy control system such that the membership functions are tuned and the necessary rules are created automatically for good control performance. For stable learning in gradient descent method, the optimal range of learning coefficient not to be trapped and not to provide too slow learning speed is investigated. With the optimal range of learning coefficient, the optimal value of learning coefficient is suggested and with this value, the gradient
Value learning through reinforcement : The basics of dopamine and reinforcement learning

NARCIS (Netherlands)

Daw, N.D.; Tobler, P.N.; Glimcher, P.W.; Fehr, E.

2013-01-01

This chapter provides an overview of reinforcement learning and temporal difference learning and relates these topics to the firing properties of midbrain dopamine neurons. First, we review the RescorlaWagner learning rule and basic learning phenomena, such as blocking, which the rule explains. Then
Fuzzy comprehensive evaluation model of interuniversity collaborative learning based on network

Science.gov (United States)

Wenhui, Ma; Yu, Wang

2017-06-01

Learning evaluation is an effective method, which plays an important role in the network education evaluation system. But most of the current network learning evaluation methods still use traditional university education evaluation system, which do not take into account of web-based learning characteristics, and they are difficult to fit the rapid development of interuniversity collaborative learning based on network. Fuzzy comprehensive evaluation method is used to evaluate interuniversity collaborative learning based on the combination of fuzzy theory and analytic hierarchy process. Analytic hierarchy process is used to determine the weight of evaluation factors of each layer and to carry out the consistency check. According to the fuzzy comprehensive evaluation method, we establish interuniversity collaborative learning evaluation mathematical model. The proposed scheme provides a new thought for interuniversity collaborative learning evaluation based on network.
An Efficient Inductive Genetic Learning Algorithm for Fuzzy Relational Rules

Directory of Open Access Journals (Sweden)

Antonio

2012-04-01

Full Text Available Fuzzy modelling research has traditionally focused on certain types of fuzzy rules. However, the use of alternative rule models could improve the ability of fuzzy systems to represent a specific problem. In this proposal, an extended fuzzy rule model, that can include relations between variables in the antecedent of rules is presented. Furthermore, a learning algorithm based on the iterative genetic approach which is able to represent the knowledge using this model is proposed as well. On the other hand, potential relations among initial variables imply an exponential growth in the feasible rule search space. Consequently, two filters for detecting relevant potential relations are added to the learning algorithm. These filters allows to decrease the search space complexity and increase the algorithm efficiency. Finally, we also present an experimental study to demonstrate the benefits of using fuzzy relational rules.
Reinforcement Learning State-of-the-Art

CERN Document Server

Wiering, Marco

2012-01-01

Reinforcement learning encompasses both a science of adaptive behavior of rational beings in uncertain environments and a computational methodology for finding optimal behaviors for challenging problems in control, optimization and adaptive behavior of intelligent agents. As a field, reinforcement learning has progressed tremendously in the past decade. The main goal of this book is to present an up-to-date series of survey articles on the main contemporary sub-fields of reinforcement learning. This includes surveys on partially observable environments, hierarchical task decompositions, relational knowledge representation and predictive state representations. Furthermore, topics such as transfer, evolutionary methods and continuous spaces in reinforcement learning are surveyed. In addition, several chapters review reinforcement learning methods in robotics, in games, and in computational neuroscience. In total seventeen different subfields are presented by mostly young experts in those areas, and together the...
Sensitivity-based self-learning fuzzy logic control for a servo system

NARCIS (Netherlands)

Balenovic, M.

1998-01-01

Describes an experimental verification of a self-learning fuzzy logic controller (SLFLC). The SLFLC contains a learning algorithm that utilizes a second-order reference model and a sensitivity model related to the fuzzy controller parameters. The effectiveness of the proposed controller has been
A fuzzy controller with a robust learning function

International Nuclear Information System (INIS)

Tanji, Jun-ichi; Kinoshita, Mitsuo

1987-01-01

A self-organizing fuzzy controller is able to use linguistic decision rules of control strategy and has a strong adaptive property by virture of its rule learning function. While a simple linguistic description of the learning algorithm first introduced by Procyk, et al. has much flexibility for applications to a wide range of different processes, its detailed formulation, in particular with control stability and learning process convergence, is not clear. In this paper, we describe the formulation of an analytical basis for a self-organizing fuzzy controller by using a method of model reference adaptive control systems (MRACS) for which stability in the adaptive loop is theoretically proven. A detailed formulation is described regarding performance evaluation and rule modification in the rule learning process of the controller. Furthermore, an improved learning algorithm using adaptive rule is proposed. An adaptive rule gives a modification coefficient for a rule change estimating the effect of disturbance occurrence in performance evaluation. The effect of introducing an adaptive rule to improve the learning convergency is described by using a simple iterative formulation. Simulation tests are presented for an application of the proposed self-organizing fuzzy controller to the pressure control system in a Boiling Water Reactor (BWR) plant. Results with the tests confirm the improved learning algorithm has strong convergent properties, even in a very disturbed environment. (author)
Fuzzy comprehensive evaluation model of interuniversity collaborative learning based on network

Directory of Open Access Journals (Sweden)

Wenhui Ma

2017-06-01

Full Text Available Learning evaluation is an effective method, which plays an important role in the network education evaluation system. But most of the current network learning evaluation methods still use traditional university education evaluation system, which do not take into account of web-based learning characteristics, and they are difficult to fit the rapid development of interuniversity collaborative learning based on network. Fuzzy comprehensive evaluation method is used to evaluate interuniversity collaborative learning based on the combination of fuzzy theory and analytic hierarchy process. Analytic hierarchy process is used to determine the weight of evaluation factors of each layer and to carry out the consistency check. According to the fuzzy comprehensive evaluation method, we establish interuniversity collaborative learning evaluation mathematical model. The proposed scheme provides a new thought for interuniversity collaborative learning evaluation based on network.
Airline Passenger Profiling Based on Fuzzy Deep Machine Learning.

Science.gov (United States)

Zheng, Yu-Jun; Sheng, Wei-Guo; Sun, Xing-Ming; Chen, Sheng-Yong

2017-12-01

Passenger profiling plays a vital part of commercial aviation security, but classical methods become very inefficient in handling the rapidly increasing amounts of electronic records. This paper proposes a deep learning approach to passenger profiling. The center of our approach is a Pythagorean fuzzy deep Boltzmann machine (PFDBM), whose parameters are expressed by Pythagorean fuzzy numbers such that each neuron can learn how a feature affects the production of the correct output from both the positive and negative sides. We propose a hybrid algorithm combining a gradient-based method and an evolutionary algorithm for training the PFDBM. Based on the novel learning model, we develop a deep neural network (DNN) for classifying normal passengers and potential attackers, and further develop an integrated DNN for identifying group attackers whose individual features are insufficient to reveal the abnormality. Experiments on data sets from Air China show that our approach provides much higher learning ability and classification accuracy than existing profilers. It is expected that the fuzzy deep learning approach can be adapted for a variety of complex pattern analysis tasks.
Deep Reinforcement Learning: An Overview

OpenAIRE

Li, Yuxi

2017-01-01

We give an overview of recent exciting achievements of deep reinforcement learning (RL). We discuss six core elements, six important mechanisms, and twelve applications. We start with background of machine learning, deep learning and reinforcement learning. Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. After that, we discuss important mechanisms for RL, including attention and memory, unsuperv...
Reinforcement learning in supply chains.

Science.gov (United States)

Valluri, Annapurna; North, Michael J; Macal, Charles M

2009-10-01

Effective management of supply chains creates value and can strategically position companies. In practice, human beings have been found to be both surprisingly successful and disappointingly inept at managing supply chains. The related fields of cognitive psychology and artificial intelligence have postulated a variety of potential mechanisms to explain this behavior. One of the leading candidates is reinforcement learning. This paper applies agent-based modeling to investigate the comparative behavioral consequences of three simple reinforcement learning algorithms in a multi-stage supply chain. For the first time, our findings show that the specific algorithm that is employed can have dramatic effects on the results obtained. Reinforcement learning is found to be valuable in multi-stage supply chains with several learning agents, as independent agents can learn to coordinate their behavior. However, learning in multi-stage supply chains using these postulated approaches from cognitive psychology and artificial intelligence take extremely long time periods to achieve stability which raises questions about their ability to explain behavior in real supply chains. The fact that it takes thousands of periods for agents to learn in this simple multi-agent setting provides new evidence that real world decision makers are unlikely to be using strict reinforcement learning in practice.
Rational and Mechanistic Perspectives on Reinforcement Learning

Science.gov (United States)

Chater, Nick

2009-01-01

This special issue describes important recent developments in applying reinforcement learning models to capture neural and cognitive function. But reinforcement learning, as a theoretical framework, can apply at two very different levels of description: "mechanistic" and "rational." Reinforcement learning is often viewed in mechanistic terms--as…
Reinforcement learning in computer vision

Science.gov (United States)

Bernstein, A. V.; Burnaev, E. V.

2018-04-01

Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.
Learning to trade via direct reinforcement.

Science.gov (United States)

Moody, J; Saffell, M

2001-01-01

We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision-making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills.
Self-learning fuzzy controllers based on temporal back propagation

Science.gov (United States)

Jang, Jyh-Shing R.

1992-01-01

This paper presents a generalized control strategy that enhances fuzzy controllers with self-learning capability for achieving prescribed control objectives in a near-optimal manner. This methodology, termed temporal back propagation, is model-insensitive in the sense that it can deal with plants that can be represented in a piecewise-differentiable format, such as difference equations, neural networks, GMDH structures, and fuzzy models. Regardless of the numbers of inputs and outputs of the plants under consideration, the proposed approach can either refine the fuzzy if-then rules if human experts, or automatically derive the fuzzy if-then rules obtained from human experts are not available. The inverted pendulum system is employed as a test-bed to demonstrate the effectiveness of the proposed control scheme and the robustness of the acquired fuzzy controller.
Genetic Learning of Fuzzy Parameters in Predictive and Decision Support Modelling

Directory of Open Access Journals (Sweden)

Nebot

2012-04-01

Full Text Available In this research a genetic fuzzy system (GFS is proposed that performs discretization parameter learning in the context of the Fuzzy Inductive Reasoning (FIR methodology and the Linguistic Rule FIR (LR-FIR algorithm. The main goal of the GFS is to take advantage of the potentialities of GAs to learn the fuzzification parameters of the FIR and LR-FIR approaches in order to obtain reliable and useful predictive (FIR models and decision support (LR-FIR models. The GFS is evaluated in an e-learning context.
A BCM theory of meta-plasticity for online self-reorganizing fuzzy-associative learning.

Science.gov (United States)

Tan, Javan; Quek, Chai

2010-06-01

Self-organizing neurofuzzy approaches have matured in their online learning of fuzzy-associative structures under time-invariant conditions. To maximize their operative value for online reasoning, these self-sustaining mechanisms must also be able to reorganize fuzzy-associative knowledge in real-time dynamic environments. Hence, it is critical to recognize that they would require self-reorganizational skills to rebuild fluid associative structures when their existing organizations fail to respond well to changing circumstances. In this light, while Hebbian theory (Hebb, 1949) is the basic computational framework for associative learning, it is less attractive for time-variant online learning because it suffers from stability limitations that impedes unlearning. Instead, this paper adopts the Bienenstock-Cooper-Munro (BCM) theory of neurological learning via meta-plasticity principles (Bienenstock et al., 1982) that provides for both online associative and dissociative learning. For almost three decades, BCM theory has been shown to effectively brace physiological evidence of synaptic potentiation (association) and depression (dissociation) into a sound mathematical framework for computational learning. This paper proposes an interpretation of the BCM theory of meta-plasticity for an online self-reorganizing fuzzy-associative learning system to realize online-reasoning capabilities. Experimental findings are twofold: 1) the analysis using S&P-500 stock index illustrated that the self-reorganizing approach could follow the trajectory shifts in the time-variant S&P-500 index for about 60 years, and 2) the benchmark profiles showed that the fuzzy-associative approach yielded comparable results with other fuzzy-precision models with similar online objectives.
Human-level control through deep reinforcement learning

Science.gov (United States)

Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

2015-02-01

The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Human-level control through deep reinforcement learning.

Science.gov (United States)

Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A; Veness, Joel; Bellemare, Marc G; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis

2015-02-26

The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Evaluation-Function-based Model-free Adaptive Fuzzy Control

Directory of Open Access Journals (Sweden)

Agus Naba

2016-12-01

Full Text Available Designs of adaptive fuzzy controllers (AFC are commonly based on the Lyapunov approach, which requires a known model of the controlled plant. They need to consider a Lyapunov function candidate as an evaluation function to be minimized. In this study these drawbacks were handled by designing a model-free adaptive fuzzy controller (MFAFC using an approximate evaluation function defined in terms of the current state, the next state, and the control action. MFAFC considers the approximate evaluation function as an evaluative control performance measure similar to the state-action value function in reinforcement learning. The simulation results of applying MFAFC to the inverted pendulum benchmark veriﬁed the proposed scheme’s efficacy.

Framework for robot skill learning using reinforcement learning

Science.gov (United States)

Wei, Yingzi; Zhao, Mingyang

2003-09-01

Robot acquiring skill is a process similar to human skill learning. Reinforcement learning (RL) is an on-line actor critic method for a robot to develop its skill. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. We present an augmented reward function that provides a new way for RL controller to incorporate prior knowledge and experience into the RL controller. Also, the difference form of augmented reward function is considered carefully. The additional reward beyond conventional reward will provide more heuristic information for RL. In this paper, we present a strategy for the task of complex skill learning. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. The new form of value function is introduced to attain smooth motion switching swiftly. We present a formal, but practical, framework for robot skill learning and also illustrate with an example the utility of method for learning skilled robot control on line.
Application of a fuzzy control algorithm with improved learning speed to nuclear steam generator level control

International Nuclear Information System (INIS)

Park, Gee Yong; Seong, Poong Hyun

1994-01-01

In order to reduce the load of tuning works by trial-and-error for obtaining the best control performance of conventional fuzzy control algorithm, a fuzzy control algorithm with learning function is investigated in this work. This fuzzy control algorithm can make its rule base and tune the membership functions automatically by use of learning function which needs the data from the control actions of the plant operator or other controllers. Learning process in fuzzy control algorithm is to find the optimal values of parameters, which consist of the membership functions and the rule base, by gradient descent method. Learning speed of gradient descent is significantly improved in this work with the addition of modified momentum. This control algorithm is applied to the steam generator level control by computer simulations. The simulation results confirm the good performance of this control algorithm for level control and show that the fuzzy learning algorithm has the generalization capability for the relation of inputs and outputs and it also has the excellent capability of disturbance rejection
A new learning algorithm for a fully connected neuro-fuzzy inference system.

Science.gov (United States)

Chen, C L Philip; Wang, Jing; Wang, Chi-Hsu; Chen, Long

2014-10-01

A traditional neuro-fuzzy system is transformed into an equivalent fully connected three layer neural network (NN), namely, the fully connected neuro-fuzzy inference systems (F-CONFIS). The F-CONFIS differs from traditional NNs by its dependent and repeated weights between input and hidden layers and can be considered as the variation of a kind of multilayer NN. Therefore, an efficient learning algorithm for the F-CONFIS to cope these repeated weights is derived. Furthermore, a dynamic learning rate is proposed for neuro-fuzzy systems via F-CONFIS where both premise (hidden) and consequent portions are considered. Several simulation results indicate that the proposed approach achieves much better accuracy and fast convergence.
SCAFFOLDINGAND REINFORCEMENT: USING DIGITAL LOGBOOKS IN LEARNING VOCABULARY

OpenAIRE

Khalifa, Salma Hasan Almabrouk; Shabdin, Ahmad Affendi

2016-01-01

Reinforcement and scaffolding are tested approaches to enhance learning achievements. Keeping a record of the learning process as well as the new learned words functions as scaffolding to help learners build a comprehensive vocabulary. Similarly, repetitive learning of new words reinforces permanent learning for long-term memory. Paper-based logbooks may prove to be good records of the learning process, but if learners use digital logbooks, the results may be even better. Digital logbooks wit...
Reinforcement learning in complementarity game and population dynamics.

Science.gov (United States)

Jost, Jürgen; Li, Wei

2014-02-01

We systematically test and compare different reinforcement learning schemes in a complementarity game [J. Jost and W. Li, Physica A 345, 245 (2005)] played between members of two populations. More precisely, we study the Roth-Erev, Bush-Mosteller, and SoftMax reinforcement learning schemes. A modified version of Roth-Erev with a power exponent of 1.5, as opposed to 1 in the standard version, performs best. We also compare these reinforcement learning strategies with evolutionary schemes. This gives insight into aspects like the issue of quick adaptation as opposed to systematic exploration or the role of learning rates.
Belief reward shaping in reinforcement learning

CSIR Research Space (South Africa)

Marom, O

2018-02-01

Full Text Available A key challenge in many reinforcement learning problems is delayed rewards, which can significantly slow down learning. Although reward shaping has previously been introduced to accelerate learning by bootstrapping an agent with additional...
Evaluation of E-Learning Web Sites Using Fuzzy Axiomatic Design Based Approach

Directory of Open Access Journals (Sweden)

2010-04-01

Full Text Available High quality web site has been generally recognized as a critical enabler to conduct online business. Numerous studies exist in the literature to measure the business performance in relation to web site quality. In this paper, an axiomatic design based approach for fuzzy group decision making is adopted to evaluate the quality of e-learning web sites. Another multi-criteria decision making technique, namely fuzzy TOPSIS, is applied in order to validate the outcome. The methodology proposed in this paper has the advantage of incorporating requirements and enabling reductions in the problem size, as compared to fuzzy TOPSIS. A case study focusing on Turkish e-learning websites is presented, and based on the empirical findings, managerial implications and recommendations for future research are offered.
Seizure Classification From EEG Signals Using Transfer Learning, Semi-Supervised Learning and TSK Fuzzy System.

Science.gov (United States)

Jiang, Yizhang; Wu, Dongrui; Deng, Zhaohong; Qian, Pengjiang; Wang, Jun; Wang, Guanjin; Chung, Fu-Lai; Choi, Kup-Sze; Wang, Shitong

2017-12-01

Recognition of epileptic seizures from offline EEG signals is very important in clinical diagnosis of epilepsy. Compared with manual labeling of EEG signals by doctors, machine learning approaches can be faster and more consistent. However, the classification accuracy is usually not satisfactory for two main reasons: the distributions of the data used for training and testing may be different, and the amount of training data may not be enough. In addition, most machine learning approaches generate black-box models that are difficult to interpret. In this paper, we integrate transductive transfer learning, semi-supervised learning and TSK fuzzy system to tackle these three problems. More specifically, we use transfer learning to reduce the discrepancy in data distribution between the training and testing data, employ semi-supervised learning to use the unlabeled testing data to remedy the shortage of training data, and adopt TSK fuzzy system to increase model interpretability. Two learning algorithms are proposed to train the system. Our experimental results show that the proposed approaches can achieve better performance than many state-of-the-art seizure classification algorithms.
Adaptive representations for reinforcement learning

NARCIS (Netherlands)

Whiteson, S.

2010-01-01

This book presents new algorithms for reinforcement learning, a form of machine learning in which an autonomous agent seeks a control policy for a sequential decision task. Since current methods typically rely on manually designed solution representations, agents that automatically adapt their own
Fuzzy gain scheduling of velocity PI controller with intelligent learning algorithm for reactor control

International Nuclear Information System (INIS)

Dong Yun Kim; Poong Hyun Seong; .

1997-01-01

In this research, we propose a fuzzy gain scheduler (FGS) with an intelligent learning algorithm for a reactor control. In the proposed algorithm, the gradient descent method is used in order to generate the rule bases of a fuzzy algorithm by learning. These rule bases are obtained by minimizing an objective function, which is called a performance cost function. The objective of the FGS with an intelligent learning algorithm is to generate gains, which minimize the error of system. The proposed algorithm can reduce the time and effort required for obtaining the fuzzy rules through the intelligent learning function. It is applied to reactor control of nuclear power plant (NPP), and the results are compared with those of a conventional PI controller with fixed gains. As a result, it is shown that the proposed algorithm is superior to the conventional PI controller. (author)
Punishment Insensitivity and Impaired Reinforcement Learning in Preschoolers

Science.gov (United States)

Briggs-Gowan, Margaret J.; Nichols, Sara R.; Voss, Joel; Zobel, Elvira; Carter, Alice S.; McCarthy, Kimberly J.; Pine, Daniel S.; Blair, James; Wakschlag, Lauren S.

2014-01-01

Background: Youth and adults with psychopathic traits display disrupted reinforcement learning. Advances in measurement now enable examination of this association in preschoolers. The current study examines relations between reinforcement learning in preschoolers and parent ratings of reduced responsiveness to socialization, conceptualized as a…
Reinforcement learning in continuous state and action spaces

NARCIS (Netherlands)

H. P. van Hasselt (Hado); M.A. Wiering; M. van Otterlo

2012-01-01

textabstractMany traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action
Neural Basis of Reinforcement Learning and Decision Making

Science.gov (United States)

Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan

2012-01-01

Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain. PMID:22462543
Analysis of Learning Development With Sugeno Fuzzy Logic And Clustering

Directory of Open Access Journals (Sweden)

Maulana Erwin Saputra

2017-06-01

Full Text Available In the first journal, I made this attempt to analyze things that affect the achievement of students in each school of course vary. Because students are one of the goals of achieving the goals of successful educational organizations. The mental influence of students’ emotions and behaviors themselves in relation to learning performance. Fuzzy logic can be used in various fields as well as Clustering for grouping, as in Learning Development analyzes. The process will be performed on students based on the symptoms that exist. In this research will use fuzzy logic and clustering. Fuzzy is an uncertain logic but its excess is capable in the process of language reasoning so that in its design is not required complicated mathematical equations. However Clustering method is K-Means method is method where data analysis is broken down by group k (k = 1,2,3, .. k. To know the optimal number of Performance group. The results of the research is with a questionnaire entered into matlab will produce a value that means in generating the graph. And simplify the school in seeing Student performance in the learning process by using certain criteria. So from the system that obtained the results for a decision-making required by the school.
Autonomous reinforcement learning with experience replay.

Science.gov (United States)

Wawrzyński, Paweł; Tanwani, Ajay Kumar

2013-05-01

This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. Copyright © 2012 Elsevier Ltd. All rights reserved.
Reinforcement Learning in Repeated Portfolio Decisions

OpenAIRE

Diao, Linan; Rieskamp, Jörg

2011-01-01

How do people make investment decisions when they receive outcome feedback? We examined how well the standard mean-variance model and two reinforcement models predict people's portfolio decisions. The basic reinforcement model predicts a learning process that relies solely on the portfolio's overall return, whereas the proposed extended reinforcement model also takes the risk and covariance of the investments into account. The experimental results illustrate that people reacted sensitively to...
Reinforcement learning improves behaviour from evaluative feedback

Science.gov (United States)

Littman, Michael L.

2015-05-01

Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.
Solution to reinforcement learning problems with artificial potential field

Institute of Scientific and Technical Information of China (English)

XIE Li-juan; XIE Guang-rong; CHEN Huan-wen; LI Xiao-li

2008-01-01

A novel method was designed to solve reinforcement learning problems with artificial potential field. Firstly a reinforcement learning problem was transferred to a path planning problem by using artificial potential field(APF), which was a very appropriate method to model a reinforcement learning problem. Secondly, a new APF algorithm was proposed to overcome the local minimum problem in the potential field methods with a virtual water-flow concept. The performance of this new method was tested by a gridworld problem named as key and door maze. The experimental results show that within 45 trials, good and deterministic policies are found in almost all simulations. In comparison with WIERING's HQ-learning system which needs 20 000 trials for stable solution, the proposed new method can obtain optimal and stable policy far more quickly than HQ-learning. Therefore, the new method is simple and effective to give an optimal solution to the reinforcement learning problem.
Why fuzzy controllers should be fuzzy

International Nuclear Information System (INIS)

Nowe, A.

1996-01-01

Fuzzy controllers are usually looked at as crisp valued mappings especially when artificial intelligence learning techniques are used to build up the controller. By doing so the semantics of a fuzzy conclusion being a fuzzy restriction on the viable control actions is non-existing. In this paper the authors criticise from an approximation point of view using a fuzzy controller to express a crisp mapping does not seem the right way to go. Secondly it is illustrated that interesting information is contained in a fuzzy conclusion when indeed this conclusion is considered as a fuzzy restriction. This information turns out to be very valuable when viability problems are concerned, i.e. problems where the objective is to keep a system within predefined boundaries
SVC control enhancement applying self-learning fuzzy algorithm for islanded microgrid

Directory of Open Access Journals (Sweden)

Hossam Gabbar

2016-03-01

Full Text Available Maintaining voltage stability, within acceptable levels, for islanded Microgrids (MGs is a challenge due to limited exchange power between generation and loads. This paper proposes an algorithm to enhance the dynamic performance of islanded MGs in presence of load disturbance using Static VAR Compensator (SVC with Fuzzy Model Reference Learning Controller (FMRLC. The proposed algorithm compensates MG nonlinearity via fuzzy membership functions and inference mechanism imbedded in both controller and inverse model. Hence, MG keeps the desired performance as required at any operating condition. Furthermore, the self-learning capability of the proposed control algorithm compensates for grid parameter’s variation even with inadequate information about load dynamics. A reference model was designed to reject bus voltage disturbance with achievable performance by the proposed fuzzy controller. Three simulations scenarios have been presented to investigate effectiveness of proposed control algorithm in improving steady-state and transient performance of islanded MGs. The first scenario conducted without SVC, second conducted with SVC using PID controller and third conducted using FMRLC algorithm. A comparison for results shows ability of proposed control algorithm to enhance disturbance rejection due to learning process.

A Fuzzy Logic Framework for Integrating Multiple Learned Models

Energy Technology Data Exchange (ETDEWEB)

Hartog, Bobi Kai Den [Univ. of Nebraska, Lincoln, NE (United States)

1999-03-01

The Artificial Intelligence field of Integrating Multiple Learned Models (IMLM) explores ways to combine results from sets of trained programs. Aroclor Interpretation is an ill-conditioned problem in which trained programs must operate in scenarios outside their training ranges because it is intractable to train them completely. Consequently, they fail in ways related to the scenarios. We developed a general-purpose IMLM solution, the Combiner, and applied it to Aroclor Interpretation. The Combiner's first step, Scenario Identification (M), learns rules from very sparse, synthetic training data consisting of results from a suite of trained programs called Methods. S1 produces fuzzy belief weights for each scenario by approximately matching the rules. The Combiner's second step, Aroclor Presence Detection (AP), classifies each of three Aroclors as present or absent in a sample. The third step, Aroclor Quantification (AQ), produces quantitative values for the concentration of each Aroclor in a sample. AP and AQ use automatically learned empirical biases for each of the Methods in each scenario. Through fuzzy logic, AP and AQ combine scenario weights, automatically learned biases for each of the Methods in each scenario, and Methods' results to determine results for a sample.
Reinforcement Learning in Autism Spectrum Disorder

Directory of Open Access Journals (Sweden)

Manuela Schuetze

2017-11-01

Full Text Available Early behavioral interventions are recognized as integral to standard care in autism spectrum disorder (ASD, and often focus on reinforcing desired behaviors (e.g., eye contact and reducing the presence of atypical behaviors (e.g., echoing others' phrases. However, efficacy of these programs is mixed. Reinforcement learning relies on neurocircuitry that has been reported to be atypical in ASD: prefrontal-sub-cortical circuits, amygdala, brainstem, and cerebellum. Thus, early behavioral interventions rely on neurocircuitry that may function atypically in at least a subset of individuals with ASD. Recent work has investigated physiological, behavioral, and neural responses to reinforcers to uncover differences in motivation and learning in ASD. We will synthesize this work to identify promising avenues for future research that ultimately can be used to enhance the efficacy of early intervention.
Fuzzy gain scheduling of velocity PI controller with intelligent learning algorithm for reactor control

International Nuclear Information System (INIS)

Kim, Dong Yun

1997-02-01

In this research, we propose a fuzzy gain scheduler (FGS) with an intelligent learning algorithm for a reactor control. In the proposed algorithm, the gradient descent method is used in order to generate the rule bases of a fuzzy algorithm by learning. These rule bases are obtained by minimizing an objective function, which is called a performance cost function. The objective of the FGS with an intelligent learning algorithm is to generate adequate gains, which minimize the error of system. The proposed algorithm can reduce the time and efforts required for obtaining the fuzzy rules through the intelligent learning function. The evolutionary programming algorithm is modified and adopted as the method in order to find the optimal gains which are used as the initial gains of FGS with learning function. It is applied to reactor control of nuclear power plant (NPP), and the results are compared with those of a conventional PI controller with fixed gains. As a result, it is shown that the proposed algorithm is superior to the conventional PI controller
Reinforcement and inference in cross-situational word learning.

Science.gov (United States)

Tilles, Paulo F C; Fontanari, José F

2013-01-01

Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation) inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only) reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.
Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

OpenAIRE

He, Frank S.; Liu, Yang; Schwing, Alexander G.; Peng, Jian

2016-01-01

We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and...
Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.

Science.gov (United States)

Ren, Zhipeng; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Zhipeng Ren; Daoyi Dong; Huaxiong Li; Chunlin Chen; Dong, Daoyi; Li, Huaxiong; Chen, Chunlin; Ren, Zhipeng

2018-06-01

In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.
Smart damping of laminated fuzzy fiber reinforced composite shells using 1–3 piezoelectric composites

International Nuclear Information System (INIS)

Kundalwal, S I; Suresh Kumar, R; Ray, M C

2013-01-01

This paper deals with the investigation of active constrained layer damping (ACLD) of smart laminated continuous fuzzy fiber reinforced composite (FFRC) shells. The distinct constructional feature of a novel FFRC is that the uniformly spaced short carbon nanotubes (CNTs) are radially grown on the circumferential surfaces of the continuous carbon fiber reinforcements. The constraining layer of the ACLD treatment is considered to be made of vertically/obliquely reinforced 1–3 piezoelectric composite materials. A finite element (FE) model is developed for the laminated FFRC shells integrated with the two patches of the ACLD treatment to investigate the damping characteristics of the laminated FFRC shells. The effect of variation of the orientation angle of the piezoelectric fibers on the damping characteristics of the laminated FFRC shells has been studied when the piezoelectric fibers are coplanar with either of the two mutually orthogonal vertical planes of the piezoelectric composite layer. It is revealed that radial growth of CNTs on the circumferential surfaces of the carbon fibers enhances the attenuation of the amplitude of vibrations and the natural frequencies of the laminated FFRC shells over those of laminated base composite shells without CNTs. (paper)
Manifold Regularized Reinforcement Learning.

Science.gov (United States)

Li, Hongliang; Liu, Derong; Wang, Ding

2018-04-01

This paper introduces a novel manifold regularized reinforcement learning scheme for continuous Markov decision processes. Smooth feature representations for value function approximation can be automatically learned using the unsupervised manifold regularization method. The learned features are data-driven, and can be adapted to the geometry of the state space. Furthermore, the scheme provides a direct basis representation extension for novel samples during policy learning and control. The performance of the proposed scheme is evaluated on two benchmark control tasks, i.e., the inverted pendulum and the energy storage problem. Simulation results illustrate the concepts of the proposed scheme and show that it can obtain excellent performance.
Reinforcement learning for microgrid energy management

International Nuclear Information System (INIS)

Kuznetsova, Elizaveta; Li, Yan-Fu; Ruiz, Carlos; Zio, Enrico; Ault, Graham; Bell, Keith

2013-01-01

We consider a microgrid for energy distribution, with a local consumer, a renewable generator (wind turbine) and a storage facility (battery), connected to the external grid via a transformer. We propose a 2 steps-ahead reinforcement learning algorithm to plan the battery scheduling, which plays a key role in the achievement of the consumer goals. The underlying framework is one of multi-criteria decision-making by an individual consumer who has the goals of increasing the utilization rate of the battery during high electricity demand (so as to decrease the electricity purchase from the external grid) and increasing the utilization rate of the wind turbine for local use (so as to increase the consumer independence from the external grid). Predictions of available wind power feed the reinforcement learning algorithm for selecting the optimal battery scheduling actions. The embedded learning mechanism allows to enhance the consumer knowledge about the optimal actions for battery scheduling under different time-dependent environmental conditions. The developed framework gives the capability to intelligent consumers to learn the stochastic environment and make use of the experience to select optimal energy management actions. - Highlights: • A consumer exploits a 2 steps-ahead reinforcement learning for battery scheduling. • The Q-learning based mechanism is fed by the predictions of available wind power. • Wind speed state evolutions are modeled with a Markov chain model. • Optimal scheduling actions are learned through the occurrence of similar scenarios. • The consumer manifests a continuous enhance of his knowledge about optimal actions
Reinforcement learning or active inference?

Science.gov (United States)

Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J

2009-07-29

This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.
Reinforcement learning or active inference?

Directory of Open Access Journals (Sweden)

Karl J Friston

2009-07-01

Full Text Available This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.
Decentralized Reinforcement Learning of robot behaviors

NARCIS (Netherlands)

Leottau, David L.; Ruiz-del-Solar, Javier; Babuska, R.

2018-01-01

A multi-agent methodology is proposed for Decentralized Reinforcement Learning (DRL) of individual behaviors in problems where multi-dimensional action spaces are involved. When using this methodology, sub-tasks are learned in parallel by individual agents working toward a common goal. In
Continuous residual reinforcement learning for traffic signal control optimization

NARCIS (Netherlands)

Aslani, Mohammad; Seipel, Stefan; Wiering, Marco

2018-01-01

Traffic signal control can be naturally regarded as a reinforcement learning problem. Unfortunately, it is one of the most difficult classes of reinforcement learning problems owing to its large state space. A straightforward approach to address this challenge is to control traffic signals based on
Effect of reinforcement learning on coordination of multiangent systems

Science.gov (United States)

Bukkapatnam, Satish T. S.; Gao, Greg

2000-12-01

For effective coordination of distributed environments involving multiagent systems, learning ability of each agent in the environment plays a crucial role. In this paper, we develop a simple group learning method based on reinforcement, and study its effect on coordination through application to a supply chain procurement scenario involving a computer manufacturer. Here, all parties are represented by self-interested, autonomous agents, each capable of performing specific simple tasks. They negotiate with each other to perform complex tasks and thus coordinate supply chain procurement. Reinforcement learning is intended to enable each agent to reach a best negotiable price within a shortest possible time. Our simulations of the application scenario under different learning strategies reveals the positive effects of reinforcement learning on an agent's as well as the system's performance.
A Simple and Effective Remedial Learning System with a Fuzzy Expert System

Science.gov (United States)

Lin, C.-C.; Guo, K.-H.; Lin, Y.-C.

2016-01-01

This study aims at implementing a simple and effective remedial learning system. Based on fuzzy inference, a remedial learning material selection system is proposed for a digital logic course. Two learning concepts of the course have been used in the proposed system: number systems and combinational logic. We conducted an experiment to validate…
Fuzzy gain scheduling of velocity PI controller with intelligent learning algorithm for reactor control

International Nuclear Information System (INIS)

Kim, Dong Yun; Seong, Poong Hyun

1996-01-01

In this study, we proposed a fuzzy gain scheduler with intelligent learning algorithm for a reactor control. In the proposed algorithm, we used the gradient descent method to learn the rule bases of a fuzzy algorithm. These rule bases are learned toward minimizing an objective function, which is called a performance cost function. The objective of fuzzy gain scheduler with intelligent learning algorithm is the generation of adequate gains, which minimize the error of system. The condition of every plant is generally changed as time gose. That is, the initial gains obtained through the analysis of system are no longer suitable for the changed plant. And we need to set new gains, which minimize the error stemmed from changing the condition of a plant. In this paper, we applied this strategy for reactor control of nuclear power plant (NPP), and the results were compared with those of a simple PI controller, which has fixed gains. As a result, it was shown that the proposed algorithm was superior to the simple PI controller
Can model-free reinforcement learning explain deontological moral judgments?

Science.gov (United States)

Ayars, Alisabeth

2016-05-01

Dual-systems frameworks propose that moral judgments are derived from both an immediate emotional response, and controlled/rational cognition. Recently Cushman (2013) proposed a new dual-system theory based on model-free and model-based reinforcement learning. Model-free learning attaches values to actions based on their history of reward and punishment, and explains some deontological, non-utilitarian judgments. Model-based learning involves the construction of a causal model of the world and allows for far-sighted planning; this form of learning fits well with utilitarian considerations that seek to maximize certain kinds of outcomes. I present three concerns regarding the use of model-free reinforcement learning to explain deontological moral judgment. First, many actions that humans find aversive from model-free learning are not judged to be morally wrong. Moral judgment must require something in addition to model-free learning. Second, there is a dearth of evidence for central predictions of the reinforcement account-e.g., that people with different reinforcement histories will, all else equal, make different moral judgments. Finally, to account for the effect of intention within the framework requires certain assumptions which lack support. These challenges are reasonable foci for future empirical/theoretical work on the model-free/model-based framework. Copyright © 2016 Elsevier B.V. All rights reserved.
Social Cognition as Reinforcement Learning: Feedback Modulates Emotion Inference.

Science.gov (United States)

Zaki, Jamil; Kallman, Seth; Wimmer, G Elliott; Ochsner, Kevin; Shohamy, Daphna

2016-09-01

Neuroscientific studies of social cognition typically employ paradigms in which perceivers draw single-shot inferences about the internal states of strangers. Real-world social inference features much different parameters: People often encounter and learn about particular social targets (e.g., friends) over time and receive feedback about whether their inferences are correct or incorrect. Here, we examined this process and, more broadly, the intersection between social cognition and reinforcement learning. Perceivers were scanned using fMRI while repeatedly encountering three social targets who produced conflicting visual and verbal emotional cues. Perceivers guessed how targets felt and received feedback about whether they had guessed correctly. Visual cues reliably predicted one target's emotion, verbal cues predicted a second target's emotion, and neither reliably predicted the third target's emotion. Perceivers successfully used this information to update their judgments over time. Furthermore, trial-by-trial learning signals-estimated using two reinforcement learning models-tracked activity in ventral striatum and ventromedial pFC, structures associated with reinforcement learning, and regions associated with updating social impressions, including TPJ. These data suggest that learning about others' emotions, like other forms of feedback learning, relies on domain-general reinforcement mechanisms as well as domain-specific social information processing.
Development of fuzzy algorithm with learning function for nuclear steam generator level control

International Nuclear Information System (INIS)

Park, Gee Yong; Seong, Poong Hyun

1993-01-01

A fuzzy algorithm with learning function is applied to the steam generator level control of nuclear power plant. This algorithm can make its rule base and membership functions suited for steam generator level control by use of the data obtained from the control actions of a skilled operator or of other controllers (i.e., PID controller). The rule base of fuzzy controller with learning function is divided into two parts. One part of the rule base is provided to level control of steam generator at low power level (0 % - 30 % of full power) and the other to level control at high power level (30 % - 100 % of full power). Response time of steam generator level control at low power range with this rule base is shown to be shorter than that of fuzzy controller with direct inference. (Author)
Human demonstrations for fast and safe exploration in reinforcement learning

NARCIS (Netherlands)

Schonebaum, G.K.; Junell, J.L.; van Kampen, E.

2017-01-01

Reinforcement learning is a promising framework for controlling complex vehicles with a high level of autonomy, since it does not need a dynamic model of the vehicle, and it is able to adapt to changing conditions. When learning from scratch, the performance of a reinforcement learning controller

Reinforcement Learning in Continuous Action Spaces

NARCIS (Netherlands)

Hasselt, H. van; Wiering, M.A.

2007-01-01

Quite some research has been done on Reinforcement Learning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new class of algorithms named Continuous Actor Critic Learning Automaton (CACLA)
Rule-bases construction through self-learning for a table-based Sugeno-Takagi fuzzy logic control system

Directory of Open Access Journals (Sweden)

C. Boldisor

2009-12-01

Full Text Available A self-learning based methodology for building the rule-base of a fuzzy logic controller (FLC is presented and verified, aiming to engage intelligent characteristics to a fuzzy logic control systems. The methodology is a simplified version of those presented in today literature. Some aspects are intentionally ignored since it rarely appears in control system engineering and a SISO process is considered here. The fuzzy inference system obtained is a table-based Sugeno-Takagi type. System’s desired performance is defined by a reference model and rules are extracted from recorded data, after the correct control actions are learned. The presented algorithm is tested in constructing the rule-base of a fuzzy controller for a DC drive application. System’s performances and method’s viability are analyzed.
Magnetic induction of hyperthermia by a modified self-learning fuzzy temperature controller

Science.gov (United States)

Wang, Wei-Cheng; Tai, Cheng-Chi

2017-07-01

The aim of this study involved developing a temperature controller for magnetic induction hyperthermia (MIH). A closed-loop controller was applied to track a reference model to guarantee a desired temperature response. The MIH system generated an alternating magnetic field to heat a high magnetic permeability material. This wireless induction heating had few side effects when it was extensively applied to cancer treatment. The effects of hyperthermia strongly depend on the precise control of temperature. However, during the treatment process, the control performance is degraded due to severe perturbations and parameter variations. In this study, a modified self-learning fuzzy logic controller (SLFLC) with a gain tuning mechanism was implemented to obtain high control performance in a wide range of treatment situations. This implementation was performed by appropriately altering the output scaling factor of a fuzzy inverse model to adjust the control rules. In this study, the proposed SLFLC was compared to the classical self-tuning fuzzy logic controller and fuzzy model reference learning control. Additionally, the proposed SLFLC was verified by conducting in vitro experiments with porcine liver. The experimental results indicated that the proposed controller showed greater robustness and excellent adaptability with respect to the temperature control of the MIH system.
A comparative analysis of three metaheuristic methods applied to fuzzy cognitive maps learning

Directory of Open Access Journals (Sweden)

Bruno A. Angélico

2013-12-01

Full Text Available This work analyses the performance of three different population-based metaheuristic approaches applied to Fuzzy cognitive maps (FCM learning in qualitative control of processes. Fuzzy cognitive maps permit to include the previous specialist knowledge in the control rule. Particularly, Particle Swarm Optimization (PSO, Genetic Algorithm (GA and an Ant Colony Optimization (ACO are considered for obtaining appropriate weight matrices for learning the FCM. A statistical convergence analysis within 10000 simulations of each algorithm is presented. In order to validate the proposed approach, two industrial control process problems previously described in the literature are considered in this work.
Human reinforcement learning subdivides structured action spaces by learning effector-specific values.

Science.gov (United States)

Gershman, Samuel J; Pesaran, Bijan; Daw, Nathaniel D

2009-10-28

Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.
Evolutionary computation for reinforcement learning

NARCIS (Netherlands)

Whiteson, S.; Wiering, M.; van Otterlo, M.

2012-01-01

Algorithms for evolutionary computation, which simulate the process of natural selection to solve optimization problems, are an effective tool for discovering high-performing reinforcement-learning policies. Because they can automatically find good representations, handle continuous action spaces,
Surface blemish detection from passive imagery using learned fuzzy set concepts

International Nuclear Information System (INIS)

Gurbuz, S.; Carver, A.; Schalkoff, R.

1997-12-01

An image analysis method for real-time surface blemish detection using passive imagery and fuzzy set concepts is described. The method develops an internal knowledge representation for surface blemish characteristics on the basis of experience, thus facilitating autonomous learning based upon positive and negative exemplars. The method incorporates fuzzy set concepts in the learning subsystem and image segmentation algorithms, thereby mimicking human visual perception. This enables a generic solution for color image segmentation. This method has been applied in the development of ARIES (Autonomous Robotic Inspection Experimental System), designed to inspect DOE warehouse waste storage drums for rust. In this project, the ARIES vision system is used to acquire drum surface images under controlled conditions and subsequently perform visual inspection leading to the classification of the drum as acceptable or suspect
Introduction to Fuzzy Set Theory

Science.gov (United States)

Kosko, Bart

1990-01-01

An introduction to fuzzy set theory is described. Topics covered include: neural networks and fuzzy systems; the dynamical systems approach to machine intelligence; intelligent behavior as adaptive model-free estimation; fuzziness versus probability; fuzzy sets; the entropy-subsethood theorem; adaptive fuzzy systems for backing up a truck-and-trailer; product-space clustering with differential competitive learning; and adaptive fuzzy system for target tracking.
Human reinforcement learning subdivides structured action spaces by learning effector-specific values

OpenAIRE

Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.

2009-01-01

Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable, due to the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning – such as prediction error signals for action valuation associated with dopamine and the striatum – can cope with this “curse of dimensionality...
Reinforcement Learning Based Novel Adaptive Learning Framework for Smart Grid Prediction

Directory of Open Access Journals (Sweden)

Tian Li

2017-01-01

Full Text Available Smart grid is a potential infrastructure to supply electricity demand for end users in a safe and reliable manner. With the rapid increase of the share of renewable energy and controllable loads in smart grid, the operation uncertainty of smart grid has increased briskly during recent years. The forecast is responsible for the safety and economic operation of the smart grid. However, most existing forecast methods cannot account for the smart grid due to the disabilities to adapt to the varying operational conditions. In this paper, reinforcement learning is firstly exploited to develop an online learning framework for the smart grid. With the capability of multitime scale resolution, wavelet neural network has been adopted in the online learning framework to yield reinforcement learning and wavelet neural network (RLWNN based adaptive learning scheme. The simulations on two typical prediction problems in smart grid, including wind power prediction and load forecast, validate the effectiveness and the scalability of the proposed RLWNN based learning framework and algorithm.
Compositions of fuzzy relations applied to veryfication learning outcomes on the example of the major “Geodesy and Cartography”

Directory of Open Access Journals (Sweden)

A. Mreła

2015-05-01

Abstract The paper presents discussion about using mathematical functions in order to help academic teachers to verify acquirement of learning outcomes by students on the example of the major “geodesy and cartography”. It is relatively easy to build fuzzy relation describing levels of realization and validation learning outcomes during subject examinations and the fuzzy relation with students’ grades is already built by teachers, the problem is to combine these two relations to get one which describes the level of acquiring learning outcomes by students. There are two main requirements facing this combinations and the paper shows that the best combination according to these requirements is algebraic composition. Keywords: learning outcome, fuzzy relation, algebraic composition.
Reference Function Based Spatiotemporal Fuzzy Logic Control Design Using Support Vector Regression Learning

Directory of Open Access Journals (Sweden)

Xian-Xia Zhang

2013-01-01

Full Text Available This paper presents a reference function based 3D FLC design methodology using support vector regression (SVR learning. The concept of reference function is introduced to 3D FLC for the generation of 3D membership functions (MF, which enhance the capability of the 3D FLC to cope with more kinds of MFs. The nonlinear mathematical expression of the reference function based 3D FLC is derived, and spatial fuzzy basis functions are defined. Via relating spatial fuzzy basis functions of a 3D FLC to kernel functions of an SVR, an equivalence relationship between a 3D FLC and an SVR is established. Therefore, a 3D FLC can be constructed using the learned results of an SVR. Furthermore, the universal approximation capability of the proposed 3D fuzzy system is proven in terms of the finite covering theorem. Finally, the proposed method is applied to a catalytic packed-bed reactor and simulation results have verified its effectiveness.
Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models.

Science.gov (United States)

Najnin, Shamima; Banerjee, Bonny

2018-01-01

Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The "novel words to novel objects" language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task.
Instructional control of reinforcement learning: a behavioral and neurocomputational investigation.

Science.gov (United States)

Doll, Bradley B; Jacobs, W Jake; Sanfey, Alan G; Frank, Michael J

2009-11-24

Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.
Reinforcement Learning Based Artificial Immune Classifier

Directory of Open Access Journals (Sweden)

Mehmet Karakose

2013-01-01

Full Text Available One of the widely used methods for classification that is a decision-making process is artificial immune systems. Artificial immune systems based on natural immunity system can be successfully applied for classification, optimization, recognition, and learning in real-world problems. In this study, a reinforcement learning based artificial immune classifier is proposed as a new approach. This approach uses reinforcement learning to find better antibody with immune operators. The proposed new approach has many contributions according to other methods in the literature such as effectiveness, less memory cell, high accuracy, speed, and data adaptability. The performance of the proposed approach is demonstrated by simulation and experimental results using real data in Matlab and FPGA. Some benchmark data and remote image data are used for experimental results. The comparative results with supervised/unsupervised based artificial immune system, negative selection classifier, and resource limited artificial immune classifier are given to demonstrate the effectiveness of the proposed new method.
Online reinforcement learning control for aerospace systems

NARCIS (Netherlands)

Zhou, Y.

2018-01-01

Reinforcement Learning (RL) methods are relatively new in the field of aerospace guidance, navigation, and control. This dissertation aims to exploit RL methods to improve the autonomy and online learning of aerospace systems with respect to the a priori unknown system and environment, dynamical
Multi-agent machine learning a reinforcement approach

CERN Document Server

Schwartz, H M

2014-01-01

The book begins with a chapter on traditional methods of supervised learning, covering recursive least squares learning, mean square error methods, and stochastic approximation. Chapter 2 covers single agent reinforcement learning. Topics include learning value functions, Markov games, and TD learning with eligibility traces. Chapter 3 discusses two player games including two player matrix games with both pure and mixed strategies. Numerous algorithms and examples are presented. Chapter 4 covers learning in multi-player games, stochastic games, and Markov games, focusing on learning multi-pla
Reinforcement Learning Based on the Bayesian Theorem for Electricity Markets Decision Support

DEFF Research Database (Denmark)

Sousa, Tiago; Pinto, Tiago; Praca, Isabel

2014-01-01

This paper presents the applicability of a reinforcement learning algorithm based on the application of the Bayesian theorem of probability. The proposed reinforcement learning algorithm is an advantageous and indispensable tool for ALBidS (Adaptive Learning strategic Bidding System), a multi...
Using a board game to reinforce learning.

Science.gov (United States)

Yoon, Bona; Rodriguez, Leslie; Faselis, Charles J; Liappis, Angelike P

2014-03-01

Experiential gaming strategies offer a variation on traditional learning. A board game was used to present synthesized content of fundamental catheter care concepts and reinforce evidence-based practices relevant to nursing. Board games are innovative educational tools that can enhance active learning. Copyright 2014, SLACK Incorporated.
"Notice of Violation of IEEE Publication Principles" Multiobjective Reinforcement Learning: A Comprehensive Overview.

Science.gov (United States)

Liu, Chunming; Xu, Xin; Hu, Dewen

2013-04-29

Reinforcement learning is a powerful mechanism for enabling agents to learn in an unknown environment, and most reinforcement learning algorithms aim to maximize some numerical value, which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control problems; therefore, recently, there has been growing interest in solving multiobjective reinforcement learning (MORL) problems with multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. In this paper, the basic architecture, research topics, and naive solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical reinforcement learning, and multi-agent reinforcement learning. Finally, research challenges and open problems of MORL techniques are highlighted.

Exploiting Best-Match Equations for Efficient Reinforcement Learning

NARCIS (Netherlands)

van Seijen, Harm; Whiteson, Shimon; van Hasselt, Hado; Wiering, Marco

This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of model-free methods. Best-match learning works by approximating the solution to a set of best-match equations,
Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing

OpenAIRE

Le, Minh; Fokkens, Antske

2017-01-01

Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this paper, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error propagation. Reinforcement learning improves accuracy of both labeled and unlabeled dependencies of the Stanford Neural Dependency Parser, a high performance greedy parser, while maintaining its eff...
Longitudinal investigation on learned helplessness tested under negative and positive reinforcement involving stimulus control.

Science.gov (United States)

Oliveira, Emileane C; Hunziker, Maria Helena

2014-07-01

In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title. Copyright © 2014. Published by Elsevier B.V.
Immune Genetic Learning of Fuzzy Cognitive Map

Institute of Scientific and Technical Information of China (English)

LIN Chun-mei; HE Yue; TANG Bing-yong

2006-01-01

This paper presents a hybrid methodology of automatically constructing fuzzy cognitive map (FCM). The method uses immune genetic algorithm to learn the connection matrix of FCM. In the algorithm, the DNA coding method is used and an immune operator based on immune mechanism is constructed. The characteristics of the system and the experts' knowledge are abstracted as vaccine for restraining the degenerative phenomena during evolution so as to improve the algorithmic efficiency. Finally, an illustrative example is provided, and its results suggest that the method is capable of automatically generating FCM model.
Adaptive Trajectory Tracking Control using Reinforcement Learning for Quadrotor

Directory of Open Access Journals (Sweden)

Wenjie Lou

2016-02-01

Full Text Available Inaccurate system parameters and unpredicted external disturbances affect the performance of non-linear controllers. In this paper, a new adaptive control algorithm under the reinforcement framework is proposed to stabilize a quadrotor helicopter. Based on a command-filtered non-linear control algorithm, adaptive elements are added and learned by policy-search methods. To predict the inaccurate system parameters, a new kernel-based regression learning method is provided. In addition, Policy learning by Weighting Exploration with the Returns (PoWER and Return Weighted Regression (RWR are utilized to learn the appropriate parameters for adaptive elements in order to cancel the effect of external disturbance. Furthermore, numerical simulations under several conditions are performed, and the ability of adaptive trajectory-tracking control with reinforcement learning are demonstrated.
Enriching behavioral ecology with reinforcement learning methods.

Science.gov (United States)

Frankenhuis, Willem E; Panchanathan, Karthik; Barto, Andrew G

2018-02-13

This article focuses on the division of labor between evolution and development in solving sequential, state-dependent decision problems. Currently, behavioral ecologists tend to use dynamic programming methods to study such problems. These methods are successful at predicting animal behavior in a variety of contexts. However, they depend on a distinct set of assumptions. Here, we argue that behavioral ecology will benefit from drawing more than it currently does on a complementary collection of tools, called reinforcement learning methods. These methods allow for the study of behavior in highly complex environments, which conventional dynamic programming methods do not feasibly address. In addition, reinforcement learning methods are well-suited to studying how biological mechanisms solve developmental and learning problems. For instance, we can use them to study simple rules that perform well in complex environments. Or to investigate under what conditions natural selection favors fixed, non-plastic traits (which do not vary across individuals), cue-driven-switch plasticity (innate instructions for adaptive behavioral development based on experience), or developmental selection (the incremental acquisition of adaptive behavior based on experience). If natural selection favors developmental selection, which includes learning from environmental feedback, we can also make predictions about the design of reward systems. Our paper is written in an accessible manner and for a broad audience, though we believe some novel insights can be drawn from our discussion. We hope our paper will help advance the emerging bridge connecting the fields of behavioral ecology and reinforcement learning. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

Science.gov (United States)

Krigolson, Olav E; Hassall, Cameron D; Handy, Todd C

2014-03-01

Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward
Flow Navigation by Smart Microswimmers via Reinforcement Learning

Science.gov (United States)

Colabrese, Simona; Biferale, Luca; Celani, Antonio; Gustavsson, Kristian

2017-11-01

We have numerically modeled active particles which are able to acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. We show that those swimmers can learn effective strategies just by experience, using a reinforcement learning algorithm. As an example, we focus on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, exploiting the underlying flow whenever possible. The reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This work paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. ERC AdG NewTURB 339032.
The drift diffusion model as the choice rule in reinforcement learning.

Science.gov (United States)

Pedersen, Mads Lund; Frank, Michael J; Biele, Guido

2017-08-01

Current reinforcement-learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential-sampling models of decision making account for both choice accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement-learning models in which the drift diffusion model describes the choice process, thereby capturing both within- and across-trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement-learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with the best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enabled accurate estimation of the model parameters. The model approach described here, using simultaneous estimation of reinforcement-learning and drift diffusion model parameters, shows promise for revealing new insights into the cognitive and neural mechanisms of learning and decision making, as well as the alteration of such processes in clinical groups.
Embedded Incremental Feature Selection for Reinforcement Learning

Science.gov (United States)

2012-05-01

Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature
Flexible Heuristic Dynamic Programming for Reinforcement Learning in Quadrotors

NARCIS (Netherlands)

Helmer, Alexander; de Visser, C.C.; van Kampen, E.

2018-01-01

Reinforcement learning is a paradigm for learning decision-making tasks from interaction with the environment. Function approximators solve a part of the curse of dimensionality when learning in high-dimensional state and/or action spaces. It can be a time-consuming process to learn a good policy in
Working Memory and Reinforcement Schedule Jointly Determine Reinforcement Learning in Children: Potential Implications for Behavioral Parent Training

Directory of Open Access Journals (Sweden)

Elien Segers

2018-03-01

Full Text Available Introduction: Behavioral Parent Training (BPT is often provided for childhood psychiatric disorders. These disorders have been shown to be associated with working memory impairments. BPT is based on operant learning principles, yet how operant principles shape behavior (through the partial reinforcement (PRF extinction effect, i.e., greater resistance to extinction that is created when behavior is reinforced partially rather than continuously and the potential role of working memory therein is scarcely studied in children. This study explored the PRF extinction effect and the role of working memory therein using experimental tasks in typically developing children.Methods: Ninety-seven children (age 6–10 completed a working memory task and an operant learning task, in which children acquired a response-sequence rule under either continuous or PRF (120 trials, followed by an extinction phase (80 trials. Data of 88 children were used for analysis.Results: The PRF extinction effect was confirmed: We observed slower acquisition and extinction in the PRF condition as compared to the continuous reinforcement (CRF condition. Working memory was negatively related to acquisition but not extinction performance.Conclusion: Both reinforcement contingencies and working memory relate to acquisition performance. Potential implications for BPT are that decreasing working memory load may enhance the chance of optimally learning through reinforcement.
Reinforcement learning: Solving two case studies

Science.gov (United States)

Duarte, Ana Filipa; Silva, Pedro; dos Santos, Cristina Peixoto

2012-09-01

Reinforcement Learning algorithms offer interesting features for the control of autonomous systems, such as the ability to learn from direct interaction with the environment, and the use of a simple reward signalas opposed to the input-outputs pairsused in classic supervised learning. The reward signal indicates the success of failure of the actions executed by the agent in the environment. In this work, are described RL algorithmsapplied to two case studies: the Crawler robot and the widely known inverted pendulum. We explore RL capabilities to autonomously learn a basic locomotion pattern in the Crawler, andapproach the balancing problem of biped locomotion using the inverted pendulum.
Evaluation of students' perceptions on game based learning program using fuzzy set conjoint analysis

Science.gov (United States)

Sofian, Siti Siryani; Rambely, Azmin Sham

2017-04-01

An effectiveness of a game based learning (GBL) can be determined from an application of fuzzy set conjoint analysis. The analysis was used due to the fuzziness in determining individual perceptions. This study involved a survey collected from 36 students aged 16 years old of SMK Mersing, Johor who participated in a Mathematics Discovery Camp organized by UKM research group called PRISMatik. The aim of this research was to determine the effectiveness of the module delivered to cultivate interest in mathematics subject in the form of game based learning through different values. There were 11 games conducted for the participants and students' perceptions based on the evaluation of six criteria were measured. A seven-point Likert scale method was used to collect students' preferences and perceptions. This scale represented seven linguistic terms to indicate their perceptions on each module of GBLs. Score of perceptions were transformed into degree of similarity using fuzzy set conjoint analysis. It was found that Geometric Analysis Recreation (GEAR) module was able to increase participant preference corresponded to the six attributes generated. The computations were also made for the other 10 games conducted during the camp. Results found that interest, passion and team work were the strongest values obtained from GBL activities in this camp as participants stated very strongly agreed that these attributes fulfilled their preferences in every module. This was an indicator of efficiency for the program. The evaluation using fuzzy conjoint analysis implicated the successfulness of a fuzzy approach to evaluate students' perceptions toward GBL.
Efficient abstraction selection in reinforcement learning

NARCIS (Netherlands)

Seijen, H. van; Whiteson, S.; Kester, L.

2013-01-01

This paper introduces a novel approach for abstraction selection in reinforcement learning problems modelled as factored Markov decision processes (MDPs), for which a state is described via a set of state components. In abstraction selection, an agent must choose an abstraction from a set of
Fuzzy Control Tutorial

DEFF Research Database (Denmark)

Dotoli, M.; Jantzen, Jan

1999-01-01

The tutorial concerns automatic control of an inverted pendulum, especially rule based control by means of fuzzy logic. A ball balancer, implemented in a software simulator in Matlab, is used as a practical case study. The objectives of the tutorial are to teach the basics of fuzzy control......, and to show how to apply fuzzy logic in automatic control. The tutorial is distance learning, where students interact one-to-one with the teacher using e-mail....
Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play

NARCIS (Netherlands)

van der Ree, Michiel; Wiering, Marco

2013-01-01

This paper compares three strategies in using reinforcement learning algorithms to let an artificial agent learnto play the game of Othello. The three strategies that are compared are: Learning by self-play, learning from playing against a fixed opponent, and learning from playing against a fixed
Determining e-Portfolio Elements in Learning Process Using Fuzzy Delphi Analysis

Science.gov (United States)

Mohamad, Syamsul Nor Azlan; Embi, Mohamad Amin; Nordin, Norazah

2015-01-01

The present article introduces the Fuzzy Delphi method results obtained in the study on determining e-Portfolio elements in learning process for art and design context. This method bases on qualified experts that assure the validity of the collected information. In particular, the confirmation of elements is based on experts' opinion and…
Adolescent-specific patterns of behavior and neural activity during social reinforcement learning.

Science.gov (United States)

Jones, Rebecca M; Somerville, Leah H; Li, Jian; Ruberry, Erika J; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, B J

2014-06-01

Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.
Time representation in reinforcement learning models of the basal ganglia

Directory of Open Access Journals (Sweden)

Samuel Joseph Gershman

2014-01-01

Full Text Available Reinforcement learning models have been influential in understanding many aspects of basal ganglia function, from reward prediction to action selection. Time plays an important role in these models, but there is still no theoretical consensus about what kind of time representation is used by the basal ganglia. We review several theoretical accounts and their supporting evidence. We then discuss the relationship between reinforcement learning models and the timing mechanisms that have been attributed to the basal ganglia. We hypothesize that a single computational system may underlie both reinforcement learning and interval timing—the perception of duration in the range of seconds to hours. This hypothesis, which extends earlier models by incorporating a time-sensitive action selection mechanism, may have important implications for understanding disorders like Parkinson's disease in which both decision making and timing are impaired.

Safe Exploration of State and Action Spaces in Reinforcement Learning

OpenAIRE

Garcia, Javier; Fernandez, Fernando

2014-01-01

In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some sta...
Adversarial Reinforcement Learning in a Cyber Security Simulation}

OpenAIRE

Elderman, Richard; Pater, Leon; Thie, Albert; Drugan, Madalina; Wiering, Marco

2017-01-01

This paper focuses on cyber-security simulations in networks modeled as a Markov game with incomplete information and stochastic elements. The resulting game is an adversarial sequential decision making problem played with two agents, the attacker and defender. The two agents pit one reinforcement learning technique, like neural networks, Monte Carlo learning and Q-learning, against each other and examine their effectiveness against learning opponents. The results showed that Monte Carlo lear...
Logical Characterisation of Ontology Construction using Fuzzy Description Logics

DEFF Research Database (Denmark)

Badie, Farshad; Götzsche, Hans

had the extension of ontologies with Fuzzy Logic capabilities which plan to make proper backgrounds for ontology driven reasoning and argumentation on vague and imprecise domains. This presentation conceptualises learning from fuzzy classes using the Inductive Logic Programming framework. Then......, employs Description Logics in characterising and analysing fuzzy statements. And finally, provides a conceptual framework describing fuzzy concept learning in ontologies using the Inductive Logic Programming....
Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing

NARCIS (Netherlands)

Le, M.N.; Fokkens, A.S.

Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this paper, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error
The Computational Development of Reinforcement Learning during Adolescence.

Directory of Open Access Journals (Sweden)

Stefano Palminteri

2016-06-01

Full Text Available Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed. Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback and a value contextualisation module (enabling symmetrical reward and punishment learning. Unlike adults, adolescent performance did not benefit from counterfactual (complete feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.
Fuzzy logic of Aristotelian forms

Energy Technology Data Exchange (ETDEWEB)

Perlovsky, L.I. [Nichols Research Corp., Lexington, MA (United States)

1996-12-31

Model-based approaches to pattern recognition and machine vision have been proposed to overcome the exorbitant training requirements of earlier computational paradigms. However, uncertainties in data were found to lead to a combinatorial explosion of the computational complexity. This issue is related here to the roles of a priori knowledge vs. adaptive learning. What is the a-priori knowledge representation that supports learning? I introduce Modeling Field Theory (MFT), a model-based neural network whose adaptive learning is based on a priori models. These models combine deterministic, fuzzy, and statistical aspects to account for a priori knowledge, its fuzzy nature, and data uncertainties. In the process of learning, a priori fuzzy concepts converge to crisp or probabilistic concepts. The MFT is a convergent dynamical system of only linear computational complexity. Fuzzy logic turns out to be essential for reducing the combinatorial complexity to linear one. I will discuss the relationship of the new computational paradigm to two theories due to Aristotle: theory of Forms and logic. While theory of Forms argued that the mind cannot be based on ready-made a priori concepts, Aristotelian logic operated with just such concepts. I discuss an interpretation of MFT suggesting that its fuzzy logic, combining a-priority and adaptivity, implements Aristotelian theory of Forms (theory of mind). Thus, 2300 years after Aristotle, a logic is developed suitable for his theory of mind.
Reinforcement learning agents providing advice in complex video games

Science.gov (United States)

Taylor, Matthew E.; Carboni, Nicholas; Fachantidis, Anestis; Vlahavas, Ioannis; Torrey, Lisa

2014-01-01

This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.
A multiplicative reinforcement learning model capturing learning dynamics and interindividual variability in mice

OpenAIRE

Bathellier, Brice; Tee, Sui Poh; Hrovat, Christina; Rumpel, Simon

2013-01-01

Learning speed can strongly differ across individuals. This is seen in humans and animals. Here, we measured learning speed in mice performing a discrimination task and developed a theoretical model based on the reinforcement learning framework to account for differences between individual mice. We found that, when using a multiplicative learning rule, the starting connectivity values of the model strongly determine the shape of learning curves. This is in contrast to current learning models ...
Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

Directory of Open Access Journals (Sweden)

Takahiro Ezaki

2016-07-01

Full Text Available Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner's dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.
Reinforcement Learning for a New Piano Mover

Directory of Open Access Journals (Sweden)

Yuko Ishiwaka

2005-08-01

Full Text Available We attempt to achieve corporative behavior of autonomous decentralized agents constructed via Q-Learning, which is a type of reinforcement learning. As such, in the present paper, we examine the piano mover's problem. We propose a multi-agent architecture that has a training agent, learning agents and intermediate agent. Learning agents are heterogeneous and can communicate with each other. The movement of an object with three kinds of agent depends on the composition of the actions of the learning agents. By learning its own shape through the learning agents, avoidance of obstacles by the object is expected. We simulate the proposed method in a two-dimensional continuous world. Results obtained in the present investigation reveal the effectiveness of the proposed method.
Place preference and vocal learning rely on distinct reinforcers in songbirds.

Science.gov (United States)

Murdoch, Don; Chen, Ruidong; Goldberg, Jesse H

2018-04-30

In reinforcement learning (RL) agents are typically tasked with maximizing a single objective function such as reward. But it remains poorly understood how agents might pursue distinct objectives at once. In machines, multiobjective RL can be achieved by dividing a single agent into multiple sub-agents, each of which is shaped by agent-specific reinforcement, but it remains unknown if animals adopt this strategy. Here we use songbirds to test if navigation and singing, two behaviors with distinct objectives, can be differentially reinforced. We demonstrate that strobe flashes aversively condition place preference but not song syllables. Brief noise bursts aversively condition song syllables but positively reinforce place preference. Thus distinct behavior-generating systems, or agencies, within a single animal can be shaped by correspondingly distinct reinforcement signals. Our findings suggest that spatially segregated vocal circuits can solve a credit assignment problem associated with multiobjective learning.
eFSM--a novel online neural-fuzzy semantic memory model.

Science.gov (United States)

Tung, Whye Loon; Quek, Chai

2010-01-01

Fuzzy rule-based systems (FRBSs) have been successfully applied to many areas. However, traditional fuzzy systems are often manually crafted, and their rule bases that represent the acquired knowledge are static and cannot be trained to improve the modeling performance. This subsequently leads to intensive research on the autonomous construction and tuning of a fuzzy system directly from the observed training data to address the knowledge acquisition bottleneck, resulting in well-established hybrids such as neural-fuzzy systems (NFSs) and genetic fuzzy systems (GFSs). However, the complex and dynamic nature of real-world problems demands that fuzzy rule-based systems and models be able to adapt their parameters and ultimately evolve their rule bases to address the nonstationary (time-varying) characteristics of their operating environments. Recently, considerable research efforts have been directed to the study of evolving Tagaki-Sugeno (T-S)-type NFSs based on the concept of incremental learning. In contrast, there are very few incremental learning Mamdani-type NFSs reported in the literature. Hence, this paper presents the evolving neural-fuzzy semantic memory (eFSM) model, a neural-fuzzy Mamdani architecture with a data-driven progressively adaptive structure (i.e., rule base) based on incremental learning. Issues related to the incremental learning of the eFSM rule base are carefully investigated, and a novel parameter learning approach is proposed for the tuning of the fuzzy set parameters in eFSM. The proposed eFSM model elicits highly interpretable semantic knowledge in the form of Mamdani-type if-then fuzzy rules from low-level numeric training data. These Mamdani fuzzy rules define the computing structure of eFSM and are incrementally learned with the arrival of each training data sample. New rules are constructed from the emergence of novel training data and obsolete fuzzy rules that no longer describe the recently observed data trends are pruned. This
A reward optimization method based on action subrewards in hierarchical reinforcement learning.

Science.gov (United States)

Fu, Yuchen; Liu, Quan; Ling, Xionghong; Cui, Zhiming

2014-01-01

Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.
Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning

Directory of Open Access Journals (Sweden)

Yuntian Feng

2017-01-01

Full Text Available We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.
Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning.

Science.gov (United States)

Feng, Yuntian; Zhang, Hongjun; Hao, Wenning; Chen, Gang

2017-01-01

We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q -Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.
Pleasurable music affects reinforcement learning according to the listener

Science.gov (United States)

Gold, Benjamin P.; Frank, Michael J.; Bogert, Brigitte; Brattico, Elvira

2013-01-01

Mounting evidence links the enjoyment of music to brain areas implicated in emotion and the dopaminergic reward system. In particular, dopamine release in the ventral striatum seems to play a major role in the rewarding aspect of music listening. Striatal dopamine also influences reinforcement learning, such that subjects with greater dopamine efficacy learn better to approach rewards while those with lesser dopamine efficacy learn better to avoid punishments. In this study, we explored the practical implications of musical pleasure through its ability to facilitate reinforcement learning via non-pharmacological dopamine elicitation. Subjects from a wide variety of musical backgrounds chose a pleasurable and a neutral piece of music from an experimenter-compiled database, and then listened to one or both of these pieces (according to pseudo-random group assignment) as they performed a reinforcement learning task dependent on dopamine transmission. We assessed musical backgrounds as well as typical listening patterns with the new Helsinki Inventory of Music and Affective Behaviors (HIMAB), and separately investigated behavior for the training and test phases of the learning task. Subjects with more musical experience trained better with neutral music and tested better with pleasurable music, while those with less musical experience exhibited the opposite effect. HIMAB results regarding listening behaviors and subjective music ratings indicate that these effects arose from different listening styles: namely, more affective listening in non-musicians and more analytical listening in musicians. In conclusion, musical pleasure was able to influence task performance, and the shape of this effect depended on group and individual factors. These findings have implications in affective neuroscience, neuroaesthetics, learning, and music therapy. PMID:23970875
Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators.

Science.gov (United States)

Yang, Qinmin; Jagannathan, Sarangapani

2012-04-01

In this paper, reinforcement learning state- and output-feedback-based adaptive critic controller designs are proposed by using the online approximators (OLAs) for a general multi-input and multioutput affine unknown nonlinear discretetime systems in the presence of bounded disturbances. The proposed controller design has two entities, an action network that is designed to produce optimal signal and a critic network that evaluates the performance of the action network. The critic estimates the cost-to-go function which is tuned online using recursive equations derived from heuristic dynamic programming. Here, neural networks (NNs) are used both for the action and critic whereas any OLAs, such as radial basis functions, splines, fuzzy logic, etc., can be utilized. For the output-feedback counterpart, an additional NN is designated as the observer to estimate the unavailable system states, and thus, separation principle is not required. The NN weight tuning laws for the controller schemes are also derived while ensuring uniform ultimate boundedness of the closed-loop system using Lyapunov theory. Finally, the effectiveness of the two controllers is tested in simulation on a pendulum balancing system and a two-link robotic arm system.
Simple Neuron-Fuzzy Tool for Small Control Devices

DEFF Research Database (Denmark)

Madsen, Per Printz

2008-01-01

Small control computers, running a kind of Fuzzy controller, are more and more used in many systems from household machines to large industrial systems. The purpose of this paper is firstly to describe a tool that is easy to use for implementing self learning Fuzzy systems, that can be executed...... can be described by four different kinds of membership functions. The output fuzzyfication is based on singletons. The rule base can be written in a natural language. The result of the learning is a new version of the Fuzzy system described in the FuNNy language. A simple shower control example...... is shown. This example shows that FuNNy is able to control the shower and that the learning is able to optimize the Fuzzy system....
An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning

National Research Council Canada - National Science Library

Bowling, Michael

2000-01-01

.... In this paper we contribute a comprehensive presentation of the relevant techniques for solving stochastic games from both the game theory community and reinforcement learning communities. We examine the assumptions and limitations of these algorithms, and identify similarities between these algorithms, single agent reinforcement learners, and basic game theory techniques.
Reinforcement function design and bias for efficient learning in mobile robots

International Nuclear Information System (INIS)

Touzet, C.; Santos, J.M.

1998-01-01

The main paradigm in sub-symbolic learning robot domain is the reinforcement learning method. Various techniques have been developed to deal with the memorization/generalization problem, demonstrating the superior ability of artificial neural network implementations. In this paper, the authors address the issue of designing the reinforcement so as to optimize the exploration part of the learning. They also present and summarize works relative to the use of bias intended to achieve the effective synthesis of the desired behavior. Demonstrative experiments involving a self-organizing map implementation of the Q-learning and real mobile robots (Nomad 200 and Khepera) in a task of obstacle avoidance behavior synthesis are described. 3 figs., 5 tabs

Towards autonomous neuroprosthetic control using Hebbian reinforcement learning.

Science.gov (United States)

Mahmoudi, Babak; Pohlmeyer, Eric A; Prins, Noeline W; Geng, Shijia; Sanchez, Justin C

2013-12-01

Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.
Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement.

Science.gov (United States)

Fernández, Thalía; Bosch-Bayard, Jorge; Harmony, Thalía; Caballero, María I; Díaz-Comas, Lourdes; Galán, Lídice; Ricardo-Garcell, Josefina; Aubert, Eduardo; Otero-Ojeda, Gloria

2016-03-01

Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer.
Reinforcement learning for optimal control of low exergy buildings

International Nuclear Information System (INIS)

Yang, Lei; Nagy, Zoltan; Goffin, Philippe; Schlueter, Arno

2015-01-01

Highlights: • Implementation of reinforcement learning control for LowEx Building systems. • Learning allows adaptation to local environment without prior knowledge. • Presentation of reinforcement learning control for real-life applications. • Discussion of the applicability for real-life situations. - Abstract: Over a third of the anthropogenic greenhouse gas (GHG) emissions stem from cooling and heating buildings, due to their fossil fuel based operation. Low exergy building systems are a promising approach to reduce energy consumption as well as GHG emissions. They consists of renewable energy technologies, such as PV, PV/T and heat pumps. Since careful tuning of parameters is required, a manual setup may result in sub-optimal operation. A model predictive control approach is unnecessarily complex due to the required model identification. Therefore, in this work we present a reinforcement learning control (RLC) approach. The studied building consists of a PV/T array for solar heat and electricity generation, as well as geothermal heat pumps. We present RLC for the PV/T array, and the full building model. Two methods, Tabular Q-learning and Batch Q-learning with Memory Replay, are implemented with real building settings and actual weather conditions in a Matlab/Simulink framework. The performance is evaluated against standard rule-based control (RBC). We investigated different neural network structures and find that some outperformed RBC already during the learning phase. Overall, every RLC strategy for PV/T outperformed RBC by over 10% after the third year. Likewise, for the full building, RLC outperforms RBC in terms of meeting the heating demand, maintaining the optimal operation temperature and compensating more effectively for ground heat. This allows to reduce engineering costs associated with the setup of these systems, as well as decrease the return-of-invest period, both of which are necessary to create a sustainable, zero-emission building
Intranasal oxytocin enhances socially-reinforced learning in rhesus monkeys

Directory of Open Access Journals (Sweden)

Lisa A Parr

2014-09-01

Full Text Available There are currently no drugs approved for the treatment of social deficits associated with autism spectrum disorders (ASD. One hypothesis for these deficits is that individuals with ASD lack the motivation to attend to social cues because those cues are not implicitly rewarding. Therefore, any drug that could enhance the rewarding quality of social stimuli could have a profound impact on the treatment of ASD, and other social disorders. Oxytocin (OT is a neuropeptide that has been effective in enhancing social cognition and social reward in humans. The present study examined the ability of OT to selectively enhance learning after social compared to nonsocial reward in rhesus monkeys, an important species for modeling the neurobiology of social behavior in humans. Monkeys were required to learn an implicit visual matching task after receiving either intranasal (IN OT or Placebo (saline. Correct trials were rewarded with the presentation of positive and negative social (play faces/threat faces or nonsocial (banana/cage locks stimuli, plus food. Incorrect trials were not rewarded. Results demonstrated a strong effect of socially-reinforced learning, monkeys’ performed significantly better when reinforced with social versus nonsocial stimuli. Additionally, socially-reinforced learning was significantly better and occurred faster after IN-OT compared to placebo treatment. Performance in the IN-OT, but not Placebo, condition was also significantly better when the reinforcement stimuli were emotionally positive compared to negative facial expressions. These data support the hypothesis that OT may function to enhance prosocial behavior in primates by increasing the rewarding quality of emotionally positive, social compared to emotionally negative or nonsocial images. These data also support the use of the rhesus monkey as a model for exploring the neurobiological basis of social behavior and its impairment.
applying reinforcement learning to the weapon assignment problem

African Journals Online (AJOL)

ismith

Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy ..... closest to the threat should fire (that weapon also had the highest probability to ... Monte Carlo ..... “Reinforcement learning: Theory, methods and application to.
A Fuzzy Knowledge Representation Model for Student Performance Assessment

DEFF Research Database (Denmark)

Badie, Farshad

Knowledge representation models based on Fuzzy Description Logics (DLs) can provide a foundation for reasoning in intelligent learning environments. While basic DLs are suitable for expressing crisp concepts and binary relationships, Fuzzy DLs are capable of processing degrees of truth/completene......Knowledge representation models based on Fuzzy Description Logics (DLs) can provide a foundation for reasoning in intelligent learning environments. While basic DLs are suitable for expressing crisp concepts and binary relationships, Fuzzy DLs are capable of processing degrees of truth....../completeness about vague or imprecise information. This paper tackles the issue of representing fuzzy classes using OWL2 in a dataset describing Performance Assessment Results of Students (PARS)....
Dynamic Fuzzy Logic-Based Quality of Interaction within Blended-Learning: The Rare and Contemporary Dance Cases

Science.gov (United States)

Dias, Sofia B.; Diniz, José A.; Hadjileontiadis, Leontios J.

2014-01-01

The combination of the process of pedagogical planning within the Blended (b-) learning environment with the users' quality of interaction ("QoI") with the Learning Management System (LMS) is explored here. The required "QoI" (both for professors and students) is estimated by adopting a fuzzy logic-based modeling approach,…
Reinforcement Learning for Ramp Control: An Analysis of Learning Parameters

Directory of Open Access Journals (Sweden)

Chao Lu

2016-08-01

Full Text Available Reinforcement Learning (RL has been proposed to deal with ramp control problems under dynamic traffic conditions; however, there is a lack of sufficient research on the behaviour and impacts of different learning parameters. This paper describes a ramp control agent based on the RL mechanism and thoroughly analyzed the influence of three learning parameters; namely, learning rate, discount rate and action selection parameter on the algorithm performance. Two indices for the learning speed and convergence stability were used to measure the algorithm performance, based on which a series of simulation-based experiments were designed and conducted by using a macroscopic traffic flow model. Simulation results showed that, compared with the discount rate, the learning rate and action selection parameter made more remarkable impacts on the algorithm performance. Based on the analysis, some suggestionsabout how to select suitable parameter values that can achieve a superior performance were provided.
Applying reinforcement learning to the weapon assignment problem in air defence

CSIR Research Space (South Africa)

Mouton, H

2011-12-01

Full Text Available . The techniques investigated in this article were two methods from the machine-learning subfield of reinforcement learning (RL), namely a Monte Carlo (MC) control algorithm with exploring starts (MCES), and an off-policy temporal-difference (TD) learning...
The combination of appetitive and aversive reinforcers and the nature of their interaction during auditory learning.

Science.gov (United States)

Ilango, A; Wetzel, W; Scheich, H; Ohl, F W

2010-03-31

Learned changes in behavior can be elicited by either appetitive or aversive reinforcers. It is, however, not clear whether the two types of motivation, (approaching appetitive stimuli and avoiding aversive stimuli) drive learning in the same or different ways, nor is their interaction understood in situations where the two types are combined in a single experiment. To investigate this question we have developed a novel learning paradigm for Mongolian gerbils, which not only allows rewards and punishments to be presented in isolation or in combination with each other, but also can use these opposite reinforcers to drive the same learned behavior. Specifically, we studied learning of tone-conditioned hurdle crossing in a shuttle box driven by either an appetitive reinforcer (brain stimulation reward) or an aversive reinforcer (electrical footshock), or by a combination of both. Combination of the two reinforcers potentiated speed of acquisition, led to maximum possible performance, and delayed extinction as compared to either reinforcer alone. Additional experiments, using partial reinforcement protocols and experiments in which one of the reinforcers was omitted after the animals had been previously trained with the combination of both reinforcers, indicated that appetitive and aversive reinforcers operated together but acted in different ways: in this particular experimental context, punishment appeared to be more effective for initial acquisition and reward more effective to maintain a high level of conditioned responses (CRs). The results imply that learning mechanisms in problem solving were maximally effective when the initial punishment of mistakes was combined with the subsequent rewarding of correct performance. Copyright 2010 IBRO. Published by Elsevier Ltd. All rights reserved.
Reinforcement and Systemic Machine Learning for Decision Making

CERN Document Server

Kulkarni, Parag

2012-01-01

Reinforcement and Systemic Machine Learning for Decision Making There are always difficulties in making machines that learn from experience. Complete information is not always available-or it becomes available in bits and pieces over a period of time. With respect to systemic learning, there is a need to understand the impact of decisions and actions on a system over that period of time. This book takes a holistic approach to addressing that need and presents a new paradigm-creating new learning applications and, ultimately, more intelligent machines. The first book of its kind in this new an
Reinforcement active learning in the vibrissae system: optimal object localization.

Science.gov (United States)

Gordon, Goren; Dorfman, Nimrod; Ahissar, Ehud

2013-01-01

Rats move their whiskers to acquire information about their environment. It has been observed that they palpate novel objects and objects they are required to localize in space. We analyze whisker-based object localization using two complementary paradigms, namely, active learning and intrinsic-reward reinforcement learning. Active learning algorithms select the next training samples according to the hypothesized solution in order to better discriminate between correct and incorrect labels. Intrinsic-reward reinforcement learning uses prediction errors as the reward to an actor-critic design, such that behavior converges to the one that optimizes the learning process. We show that in the context of object localization, the two paradigms result in palpation whisking as their respective optimal solution. These results suggest that rats may employ principles of active learning and/or intrinsic reward in tactile exploration and can guide future research to seek the underlying neuronal mechanisms that implement them. Furthermore, these paradigms are easily transferable to biomimetic whisker-based artificial sensors and can improve the active exploration of their environment. Copyright © 2012 Elsevier Ltd. All rights reserved.
Neural correlates of reinforcement learning and social preferences in competitive bidding.

Science.gov (United States)

van den Bos, Wouter; Talwar, Arjun; McClure, Samuel M

2013-01-30

In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.
Switched Two-Level H∞ and Robust Fuzzy Learning Control of an Overhead Crane

Directory of Open Access Journals (Sweden)

Kao-Ting Hung

2013-01-01

Full Text Available Overhead cranes are typical dynamic systems which can be modeled as a combination of a nominal linear part and a highly nonlinear part. For such kind of systems, we propose a control scheme that deals with each part separately, yet ensures global Lyapunov stability. The former part is readily controllable by the H∞ PDC techniques, and the latter part is compensated by fuzzy mixture of affine constants, leaving the remaining unmodeled dynamics or modeling error under robust learning control using the Nelder-Mead simplex algorithm. Comparison with the adaptive fuzzy control method is given via simulation studies, and the validity of the proposed control scheme is demonstrated by experiments on a prototype crane system.
Design of fuzzy systems using neurofuzzy networks.

Science.gov (United States)

Figueiredo, M; Gomide, F

1999-01-01

This paper introduces a systematic approach for fuzzy system design based on a class of neural fuzzy networks built upon a general neuron model. The network structure is such that it encodes the knowledge learned in the form of if-then fuzzy rules and processes data following fuzzy reasoning principles. The technique provides a mechanism to obtain rules covering the whole input/output space as well as the membership functions (including their shapes) for each input variable. Such characteristics are of utmost importance in fuzzy systems design and application. In addition, after learning, it is very simple to extract fuzzy rules in the linguistic form. The network has universal approximation capability, a property very useful in, e.g., modeling and control applications. Here we focus on function approximation problems as a vehicle to illustrate its usefulness and to evaluate its performance. Comparisons with alternative approaches are also included. Both, nonnoisy and noisy data have been studied and considered in the computational experiments. The neural fuzzy network developed here and, consequently, the underlying approach, has shown to provide good results from the accuracy, complexity, and system design points of view.
Influence of the Migration Process on the Learning Performances of Fuzzy Knowledge Bases

DEFF Research Database (Denmark)

Akrout, Khaled; Baron, Luc; Balazinski, Marek

2007-01-01

This paper presents the influence of the process of migration between populations in GENO-FLOU, which is an environment of learning of fuzzy knowledge bases by genetic algorithms. Initially the algorithm did not use the process of migration. For the learning, the algorithm uses a hybrid coding......, binary for the base of rules and real for the data base. This hybrid coding used with a set of specialized operators of reproduction proven to be an effective environment of learning. Simulations were made in this environment by adding a process of migration. While varying the number of populations...
Bi-directional effect of increasing doses of baclofen on reinforcement learning

Directory of Open Access Journals (Sweden)

Jean eTerrier

2011-07-01

Full Text Available In rodents as well as in humans, efficient reinforcement learning depends on dopamine (DA released from ventral tegmental area (VTA neurons. It has been shown that in brain slices of mice, GABAB-receptor agonists at low concentrations increase the firing frequency of VTA-DA neurons, while high concentrations reduce the firing frequency. It remains however elusive whether baclofen can modulate reinforcement learning. Here, in a double blind study in 34 healthy human volunteers, we tested the effects of a low and a high concentration of oral baclofen in a gambling task associated with monetary reward. A low (20 mg dose of baclofen increased the efficiency of reward-associated learning but had no effect on the avoidance of monetary loss. A high (50 mg dose of baclofen on the other hand did not affect the learning curve. At the end of the task, subjects who received 20 mg baclofen p.o. were more accurate in choosing the symbol linked to the highest probability of earning money compared to the control group (89.55±1.39% vs 81.07±1.55%, p=0.002. Our results support a model where baclofen, at low concentrations, causes a disinhibition of DA neurons, increases DA levels and thus facilitates reinforcement learning.
Traffic light control by multiagent reinforcement learning systems

NARCIS (Netherlands)

Bakker, B.; Whiteson, S.; Kester, L.; Groen, F.C.A.; Babuška, R.; Groen, F.C.A.

2010-01-01

Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of
Traffic Light Control by Multiagent Reinforcement Learning Systems

NARCIS (Netherlands)

Bakker, B.; Whiteson, S.; Kester, L.J.H.M.; Groen, F.C.A.

2010-01-01

Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of
A Robust Cooperated Control Method with Reinforcement Learning and Adaptive H∞ Control

Science.gov (United States)

Obayashi, Masanao; Uchiyama, Shogo; Kuremoto, Takashi; Kobayashi, Kunikazu

This study proposes a robust cooperated control method combining reinforcement learning with robust control to control the system. A remarkable characteristic of the reinforcement learning is that it doesn't require model formula, however, it doesn't guarantee the stability of the system. On the other hand, robust control system guarantees stability and robustness, however, it requires model formula. We employ both the actor-critic method which is a kind of reinforcement learning with minimal amount of computation to control continuous valued actions and the traditional robust control, that is, H∞ control. The proposed system was compared method with the conventional control method, that is, the actor-critic only used, through the computer simulation of controlling the angle and the position of a crane system, and the simulation result showed the effectiveness of the proposed method.

TEXPLORE temporal difference reinforcement learning for robots and time-constrained domains

CERN Document Server

Hester, Todd

2013-01-01

This book presents and develops new reinforcement learning methods that enable fast and robust learning on robots in real-time. Robots have the potential to solve many problems in society, because of their ability to work in dangerous places doing necessary jobs that no one wants or is able to do. One barrier to their widespread deployment is that they are mainly limited to tasks where it is possible to hand-program behaviors for every situation that may be encountered. For robots to meet their potential, they need methods that enable them to learn and adapt to novel situations that they were not programmed for. Reinforcement learning (RL) is a paradigm for learning sequential decision making processes and could solve the problems of learning and adaptation on robots. This book identifies four key challenges that must be addressed for an RL algorithm to be practical for robotic control tasks. These RL for Robotics Challenges are: 1) it must learn in very few samples; 2) it must learn in domains with continuou...
A New Fuzzy Cognitive Map Learning Algorithm for Speech Emotion Recognition

OpenAIRE

Zhang, Wei; Zhang, Xueying; Sun, Ying

2017-01-01

Selecting an appropriate recognition method is crucial in speech emotion recognition applications. However, the current methods do not consider the relationship between emotions. Thus, in this study, a speech emotion recognition system based on the fuzzy cognitive map (FCM) approach is constructed. Moreover, a new FCM learning algorithm for speech emotion recognition is proposed. This algorithm includes the use of the pleasure-arousal-dominance emotion scale to calculate the weights between e...
Perceptual learning rules based on reinforcers and attention

NARCIS (Netherlands)

Roelfsema, Pieter R.; van Ooyen, Arjen; Watanabe, Takeo

2010-01-01

How does the brain learn those visual features that are relevant for behavior? In this article, we focus on two factors that guide plasticity of visual representations. First, reinforcers cause the global release of diffusive neuromodulatory signals that gate plasticity. Second, attentional feedback
Optimizing microstimulation using a reinforcement learning framework.

Science.gov (United States)

Brockmeier, Austin J; Choi, John S; Distasio, Marcello M; Francis, Joseph T; Príncipe, José C

2011-01-01

The ability to provide sensory feedback is desired to enhance the functionality of neuroprosthetics. Somatosensory feedback provides closed-loop control to the motor system, which is lacking in feedforward neuroprosthetics. In the case of existing somatosensory function, a template of the natural response can be used as a template of desired response elicited by electrical microstimulation. In the case of no initial training data, microstimulation parameters that produce responses close to the template must be selected in an online manner. We propose using reinforcement learning as a framework to balance the exploration of the parameter space and the continued selection of promising parameters for further stimulation. This approach avoids an explicit model of the neural response from stimulation. We explore a preliminary architecture--treating the task as a k-armed bandit--using offline data recorded for natural touch and thalamic microstimulation, and we examine the methods efficiency in exploring the parameter space while concentrating on promising parameter forms. The best matching stimulation parameters, from k = 68 different forms, are selected by the reinforcement learning algorithm consistently after 334 realizations.
Experiments with Online Reinforcement Learning in Real-Time Strategy Games

DEFF Research Database (Denmark)

Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

2009-01-01

Real-time strategy (RTS) games provide a challenging platform to implement online reinforcement learning (RL) techniques in a real application. Computer, as one game player, monitors opponents' (human or other computers) strategies and then updates its own policy using RL methods. In this article......, we first examine the suitability of applying the online RL in various computer games. Reinforcement learning application depends on both RL complexity and the game features. We then propose a multi-layer framework for implementing online RL in an RTS game. The framework significantly reduces RL...... the effectiveness of our proposed framework and shed light on relevant issues in using online RL in RTS games....
Fuzzy neural network theory and application

CERN Document Server

Liu, Puyin

2004-01-01

This book systematically synthesizes research achievements in the field of fuzzy neural networks in recent years. It also provides a comprehensive presentation of the developments in fuzzy neural networks, with regard to theory as well as their application to system modeling and image restoration. Special emphasis is placed on the fundamental concepts and architecture analysis of fuzzy neural networks. The book is unique in treating all kinds of fuzzy neural networks and their learning algorithms and universal approximations, and employing simulation examples which are carefully designed to he
Temporal Memory Reinforcement Learning for the Autonomous Micro-mobile Robot Based-behavior

Institute of Scientific and Technical Information of China (English)

Yang Yujun(杨玉君); Cheng Junshi; Chen Jiapin; Li Xiaohai

2004-01-01

This paper presents temporal memory reinforcement learning for the autonomous micro-mobile robot based-behavior. Human being has a memory oblivion process, i.e. the earlier to memorize, the earlier to forget, only the repeated thing can be remembered firmly. Enlightening forms this, and the robot need not memorize all the past states, at the same time economizes the EMS memory space, which is not enough in the MPU of our AMRobot. The proposed algorithm is an extension of the Q-learning, which is an incremental reinforcement learning method. The results of simulation have shown that the algorithm is valid.
Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments

OpenAIRE

Kidziński, Łukasz; Mohanty, Sharada Prasanna; Ong, Carmichael; Huang, Zhewei; Zhou, Shuchang; Pechenko, Anton; Stelmaszczyk, Adam; Jarosik, Piotr; Pavlov, Mikhail; Kolesnikov, Sergey; Plis, Sergey; Chen, Zhibo; Zhang, Zhizheng; Chen, Jiale; Shi, Jun

2018-01-01

In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar ...
FPGA implementation of neuro-fuzzy system with improved PSO learning.

Science.gov (United States)

Karakuzu, Cihan; Karakaya, Fuat; Çavuşlu, Mehmet Ali

2016-07-01

This paper presents the first hardware implementation of neuro-fuzzy system (NFS) with its metaheuristic learning ability on field programmable gate array (FPGA). Metaheuristic learning of NFS for all of its parameters is accomplished by using the improved particle swarm optimization (iPSO). As a second novelty, a new functional approach, which does not require any memory and multiplier usage, is proposed for the Gaussian membership functions of NFS. NFS and its learning using iPSO are implemented on Xilinx Virtex5 xc5vlx110-3ff1153 and efficiency of the proposed implementation tested on two dynamic system identification problems and licence plate detection problem as a practical application. Results indicate that proposed NFS implementation and membership function approximation is as effective as the other approaches available in the literature but requires less hardware resources. Copyright © 2016 Elsevier Ltd. All rights reserved.
Online Pedagogical Tutorial Tactics Optimization Using Genetic-Based Reinforcement Learning.

Science.gov (United States)

Lin, Hsuan-Ta; Lee, Po-Ming; Hsiao, Tzu-Chien

2015-01-01

Tutorial tactics are policies for an Intelligent Tutoring System (ITS) to decide the next action when there are multiple actions available. Recent research has demonstrated that when the learning contents were controlled so as to be the same, different tutorial tactics would make difference in students' learning gains. However, the Reinforcement Learning (RL) techniques that were used in previous studies to induce tutorial tactics are insufficient when encountering large problems and hence were used in offline manners. Therefore, we introduced a Genetic-Based Reinforcement Learning (GBML) approach to induce tutorial tactics in an online-learning manner without basing on any preexisting dataset. The introduced method can learn a set of rules from the environment in a manner similar to RL. It includes a genetic-based optimizer for rule discovery task by generating new rules from the old ones. This increases the scalability of a RL learner for larger problems. The results support our hypothesis about the capability of the GBML method to induce tutorial tactics. This suggests that the GBML method should be favorable in developing real-world ITS applications in the domain of tutorial tactics induction.
Functional Contour-following via Haptic Perception and Reinforcement Learning.

Science.gov (United States)

Hellman, Randall B; Tekin, Cem; van der Schaar, Mihaela; Santos, Veronica J

2018-01-01

Many tasks involve the fine manipulation of objects despite limited visual feedback. In such scenarios, tactile and proprioceptive feedback can be leveraged for task completion. We present an approach for real-time haptic perception and decision-making for a haptics-driven, functional contour-following task: the closure of a ziplock bag. This task is challenging for robots because the bag is deformable, transparent, and visually occluded by artificial fingertip sensors that are also compliant. A deep neural net classifier was trained to estimate the state of a zipper within a robot's pinch grasp. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards by balancing exploration versus exploitation of the state-action space. The C-MAB learner outperformed a benchmark Q-learner by more efficiently exploring the state-action space while learning a hard-to-code task. The learned C-MAB policy was tested with novel ziplock bag scenarios and contours (wire, rope). Importantly, this work contributes to the development of reinforcement learning approaches that account for limited resources such as hardware life and researcher time. As robots are used to perform complex, physically interactive tasks in unstructured or unmodeled environments, it becomes important to develop methods that enable efficient and effective learning with physical testbeds.
Challenges in the Verification of Reinforcement Learning Algorithms

Science.gov (United States)

Van Wesel, Perry; Goodloe, Alwyn E.

2017-01-01

Machine learning (ML) is increasingly being applied to a wide array of domains from search engines to autonomous vehicles. These algorithms, however, are notoriously complex and hard to verify. This work looks at the assumptions underlying machine learning algorithms as well as some of the challenges in trying to verify ML algorithms. Furthermore, we focus on the specific challenges of verifying reinforcement learning algorithms. These are highlighted using a specific example. Ultimately, we do not offer a solution to the complex problem of ML verification, but point out possible approaches for verification and interesting research opportunities.
5th International Conference on Fuzzy and Neuro Computing

CERN Document Server

Panigrahi, Bijaya; Das, Swagatam; Suganthan, Ponnuthurai

2015-01-01

This proceedings bring together contributions from researchers from academia and industry to report the latest cutting edge research made in the areas of Fuzzy Computing, Neuro Computing and hybrid Neuro-Fuzzy Computing in the paradigm of Soft Computing. The FANCCO 2015 conference explored new application areas, design novel hybrid algorithms for solving different real world application problems. After a rigorous review of the 68 submissions from all over the world, the referees panel selected 27 papers to be presented at the Conference. The accepted papers have a good, balanced mix of theory and applications. The techniques ranged from fuzzy neural networks, decision trees, spiking neural networks, self organizing feature map, support vector regression, adaptive neuro fuzzy inference system, extreme learning machine, fuzzy multi criteria decision making, machine learning, web usage mining, Takagi-Sugeno Inference system, extended Kalman filter, Goedel type logic, fuzzy formal concept analysis, biclustering e...
What Can Reinforcement Learning Teach Us About Non-Equilibrium Quantum Dynamics

Science.gov (United States)

Bukov, Marin; Day, Alexandre; Sels, Dries; Weinberg, Phillip; Polkovnikov, Anatoli; Mehta, Pankaj

Equilibrium thermodynamics and statistical physics are the building blocks of modern science and technology. Yet, our understanding of thermodynamic processes away from equilibrium is largely missing. In this talk, I will reveal the potential of what artificial intelligence can teach us about the complex behaviour of non-equilibrium systems. Specifically, I will discuss the problem of finding optimal drive protocols to prepare a desired target state in quantum mechanical systems by applying ideas from Reinforcement Learning [one can think of Reinforcement Learning as the study of how an agent (e.g. a robot) can learn and perfect a given policy through interactions with an environment.]. The driving protocols learnt by our agent suggest that the non-equilibrium world features possibilities easily defying intuition based on equilibrium physics.
Social Learning, Reinforcement and Crime: Evidence from Three European Cities

Science.gov (United States)

Tittle, Charles R.; Antonaccio, Olena; Botchkovar, Ekaterina

2012-01-01

This study reports a cross-cultural test of Social Learning Theory using direct measures of social learning constructs and focusing on the causal structure implied by the theory. Overall, the results strongly confirm the main thrust of the theory. Prior criminal reinforcement and current crime-favorable definitions are highly related in all three…
Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning.

Science.gov (United States)

Mkrtchian, Anahit; Aylward, Jessica; Dayan, Peter; Roiser, Jonathan P; Robinson, Oliver J

2017-10-01

Serious and debilitating symptoms of anxiety are the most common mental health problem worldwide, accounting for around 5% of all adult years lived with disability in the developed world. Avoidance behavior-avoiding social situations for fear of embarrassment, for instance-is a core feature of such anxiety. However, as for many other psychiatric symptoms the biological mechanisms underlying avoidance remain unclear. Reinforcement learning models provide formal and testable characterizations of the mechanisms of decision making; here, we examine avoidance in these terms. A total of 101 healthy participants and individuals with mood and anxiety disorders completed an approach-avoidance go/no-go task under stress induced by threat of unpredictable shock. We show an increased reliance in the mood and anxiety group on a parameter of our reinforcement learning model that characterizes a prepotent (pavlovian) bias to withhold responding in the face of negative outcomes. This was particularly the case when the mood and anxiety group was under stress. This formal description of avoidance within the reinforcement learning framework provides a new means of linking clinical symptoms with biophysically plausible models of neural circuitry and, as such, takes us closer to a mechanistic understanding of mood and anxiety disorders. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Reinforcement Learning for Online Control of Evolutionary Algorithms

NARCIS (Netherlands)

Eiben, A.; Horvath, Mark; Kowalczyk, Wojtek; Schut, Martijn

2007-01-01

The research reported in this paper is concerned with assessing the usefulness of reinforcment learning (RL) for on-line calibration of parameters in evolutionary algorithms (EA). We are running an RL procedure and the EA simultaneously and the RL is changing the EA parameters on-the-fly. We
Video Demo: Deep Reinforcement Learning for Coordination in Traffic Light Control

NARCIS (Netherlands)

van der Pol, E.; Oliehoek, F.A.; Bosse, T.; Bredeweg, B.

2016-01-01

This video demonstration contrasts two approaches to coordination in traffic light control using reinforcement learning: earlier work, based on a deconstruction of the state space into a linear combination of vehicle states, and our own approach based on the Deep Q-learning algorithm.
Determining e-learning success factor in higher education based on user perspective using Fuzzy AHP

Directory of Open Access Journals (Sweden)

Anggrainingsih Rini

2018-01-01

Full Text Available Recently almost all universities in the world have implemented E-learning to support their academic system. Previous studies have been conducted to determine CSF using Analytic Hierarchy Process (AHP method. However, AHP method cannot handle the uncertainty and vagueness of the human’s opinion, so then it causes less appropriate decision. Some researcher has proposed to use fuzzy sets theory with AHP to increase the ability of AHP to deal problem regarding the uncertainty/fuzziness. This study aims to determine ranks of priorities of the multiple factors which influence the E-learning success using FAHP method. The respondents consist of ten e-learning’s experts, 305 lecturers, and 4195 students at Sebelas Maret University. The result describes similar success factors ranking between both experienced and non-experienced user (lecturer and student. Then, the result shows that there are five most influencial success factors of e-learning at Sebelas Maret University based on the lectures perspective Financial Policy, Regulatory Policy, Course quality, Relevant Content and Technical Support. On the other hand, according to the student's point of view five most e-learning, critical success factors are Quality of Course, Relevant of Content, Completeness of Content, Attitudes toward Student, and Flexibility in taking Course. Therefore, this finding can be used by E-learning management of Sebelas Maret University to deteremine a strategy to to achieve successful implementation of e-learning at Sebelas Maret University with consider these factors.
Simulation-based optimization parametric optimization techniques and reinforcement learning

CERN Document Server

Gosavi, Abhijit

2003-01-01

Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning introduces the evolving area of simulation-based optimization. The book's objective is two-fold: (1) It examines the mathematical governing principles of simulation-based optimization, thereby providing the reader with the ability to model relevant real-life problems using these techniques. (2) It outlines the computational technology underlying these methods. Taken together these two aspects demonstrate that the mathematical and computational methods discussed in this book do work. Broadly speaking, the book has two parts: (1) parametric (static) optimization and (2) control (dynamic) optimization. Some of the book's special features are: *An accessible introduction to reinforcement learning and parametric-optimization techniques. *A step-by-step description of several algorithms of simulation-based optimization. *A clear and simple introduction to the methodology of neural networks. *A gentle introduction to converg...

Perception-based Co-evolutionary Reinforcement Learning for UAV Sensor Allocation

National Research Council Canada - National Science Library

Berenji, Hamid

2003-01-01

.... A Perception-based reasoning approach based on co-evolutionary reinforcement learning was developed for jointly addressing sensor allocation on each individual UAV and allocation of a team of UAVs...
A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention-A Neuroeducation Study.

Science.gov (United States)

Anderson, Sarah J; Hecker, Kent G; Krigolson, Olave E; Jamniczky, Heather A

2018-01-01

In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise.
A self-learning rule base for command following in dynamical systems

Science.gov (United States)

Tsai, Wei K.; Lee, Hon-Mun; Parlos, Alexander

1992-01-01

In this paper, a self-learning Rule Base for command following in dynamical systems is presented. The learning is accomplished though reinforcement learning using an associative memory called SAM. The main advantage of SAM is that it is a function approximator with explicit storage of training samples. A learning algorithm patterned after the dynamic programming is proposed. Two artificially created, unstable dynamical systems are used for testing, and the Rule Base was used to generate a feedback control to improve the command following ability of the otherwise uncontrolled systems. The numerical results are very encouraging. The controlled systems exhibit a more stable behavior and a better capability to follow reference commands. The rules resulting from the reinforcement learning are explicitly stored and they can be modified or augmented by human experts. Due to overlapping storage scheme of SAM, the stored rules are similar to fuzzy rules.
Fuzzy cognitive maps for applied sciences and engineering from fundamentals to extensions and learning algorithms

CERN Document Server

2014-01-01

Fuzzy Cognitive Maps (FCM) constitute cognitive models in the form of fuzzy directed graphs consisting of two basic elements: the nodes, which basically correspond to “concepts” bearing different states of activation depending on the knowledge they represent, and the “edges” denoting the causal effects that each source node exercises on the receiving concept expressed through weights. Weights take values in the interval [-1,1], which denotes the positive, negative or neutral causal relationship between two concepts. An FCM can be typically obtained through linguistic terms, inherent to fuzzy systems, but with a structure similar to the neural networks, which facilitates data processing, and has capabilities for training and adaptation. During the last 10 years, an exponential growth of published papers in FCMs was followed showing great impact potential. Different FCM structures and learning schemes have been developed, while numerous studies report their use in many contexts with highly successful m...
Fuzzy Adaptation Algorithms’ Control for Robot Manipulators with Uncertainty Modelling Errors

Directory of Open Access Journals (Sweden)

Yongqing Fan

2018-01-01

Full Text Available A novel fuzzy control scheme with adaptation algorithms is developed for robot manipulators’ system. At the beginning, one adjustable parameter is introduced in the fuzzy logic system, the robot manipulators system with uncertain nonlinear terms as the master device and a reference model dynamic system as the slave robot system. To overcome the limitations such as online learning computation burden and logic structure in conventional fuzzy logic systems, a parameter should be used in fuzzy logic system, which composes fuzzy logic system with updated parameter laws, and can be formed for a new fashioned adaptation algorithms controller. The error closed-loop dynamical system can be stabilized based on Lyapunov analysis, the number of online learning computation burdens can be reduced greatly, and the different kinds of fuzzy logic systems with fuzzy rules or without any fuzzy rules are also suited. Finally, effectiveness of the proposed approach has been shown in simulation example.
Cloud E-Learning Service Strategies for Improving E-Learning Innovation Performance in a Fuzzy Environment by Using a New Hybrid Fuzzy Multiple Attribute Decision-Making Model

Science.gov (United States)

Su, Chiu Hung; Tzeng, Gwo-Hshiung; Hu, Shu-Kung

2016-01-01

The purpose of this study was to address this problem by applying a new hybrid fuzzy multiple criteria decision-making model including (a) using the fuzzy decision-making trial and evaluation laboratory (DEMATEL) technique to construct the fuzzy scope influential network relationship map (FSINRM) and determine the fuzzy influential weights of the…
Reinforcement learning on slow features of high-dimensional input streams.

Directory of Open Access Journals (Sweden)

Robert Legenstein

Full Text Available Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning.
Online constrained model-based reinforcement learning

CSIR Research Space (South Africa)

Van Niekerk, B

2017-08-01

Full Text Available Constrained Model-based Reinforcement Learning Benjamin van Niekerk School of Computer Science University of the Witwatersrand South Africa Andreas Damianou∗ Amazon.com Cambridge, UK Benjamin Rosman Council for Scientific and Industrial Research, and School... MULTIPLE SHOOTING Using direct multiple shooting (Bock and Plitt, 1984), problem (1) can be transformed into a structured non- linear program (NLP). First, the time horizon [t0, t0 + T ] is partitioned into N equal subintervals [tk, tk+1] for k = 0...
Reinforcement learning account of network reciprocity.

Science.gov (United States)

Ezaki, Takahiro; Masuda, Naoki

2017-01-01

Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model) approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results.
Distributed Economic Dispatch in Microgrids Based on Cooperative Reinforcement Learning.

Science.gov (United States)

Liu, Weirong; Zhuang, Peng; Liang, Hao; Peng, Jun; Huang, Zhiwu; Weirong Liu; Peng Zhuang; Hao Liang; Jun Peng; Zhiwu Huang; Liu, Weirong; Liang, Hao; Peng, Jun; Zhuang, Peng; Huang, Zhiwu

2018-06-01

Microgrids incorporated with distributed generation (DG) units and energy storage (ES) devices are expected to play more and more important roles in the future power systems. Yet, achieving efficient distributed economic dispatch in microgrids is a challenging issue due to the randomness and nonlinear characteristics of DG units and loads. This paper proposes a cooperative reinforcement learning algorithm for distributed economic dispatch in microgrids. Utilizing the learning algorithm can avoid the difficulty of stochastic modeling and high computational complexity. In the cooperative reinforcement learning algorithm, the function approximation is leveraged to deal with the large and continuous state spaces. And a diffusion strategy is incorporated to coordinate the actions of DG units and ES devices. Based on the proposed algorithm, each node in microgrids only needs to communicate with its local neighbors, without relying on any centralized controllers. Algorithm convergence is analyzed, and simulations based on real-world meteorological and load data are conducted to validate the performance of the proposed algorithm.
Design of interpretable fuzzy systems

CERN Document Server

Cpałka, Krzysztof

2017-01-01

This book shows that the term “interpretability” goes far beyond the concept of readability of a fuzzy set and fuzzy rules. It focuses on novel and precise operators of aggregation, inference, and defuzzification leading to flexible Mamdani-type and logical-type systems that can achieve the required accuracy using a less complex rule base. The individual chapters describe various aspects of interpretability, including appropriate selection of the structure of a fuzzy system, focusing on improving the interpretability of fuzzy systems designed using both gradient-learning and evolutionary algorithms. It also demonstrates how to eliminate various system components, such as inputs, rules and fuzzy sets, whose reduction does not adversely affect system accuracy. It illustrates the performance of the developed algorithms and methods with commonly used benchmarks. The book provides valuable tools for possible applications in many fields including expert systems, automatic control and robotics.
Emotion in reinforcement learning agents and robots : A survey

NARCIS (Netherlands)

Moerland, T.M.; Broekens, D.J.; Jonker, C.M.

2018-01-01

This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action
Recognition of Handwritten Arabic words using a neuro-fuzzy network

International Nuclear Information System (INIS)

Boukharouba, Abdelhak; Bennia, Abdelhak

2008-01-01

We present a new method for the recognition of handwritten Arabic words based on neuro-fuzzy hybrid network. As a first step, connected components (CCs) of black pixels are detected. Then the system determines which CCs are sub-words and which are stress marks. The stress marks are then isolated and identified separately and the sub-words are segmented into graphemes. Each grapheme is described by topological and statistical features. Fuzzy rules are extracted from training examples by a hybrid learning scheme comprised of two phases: rule generation phase from data using a fuzzy c-means, and rule parameter tuning phase using gradient descent learning. After learning, the network encodes in its topology the essential design parameters of a fuzzy inference system.The contribution of this technique is shown through the significant tests performed on a handwritten Arabic words database
Reinforcement learning account of network reciprocity.

Directory of Open Access Journals (Sweden)

Takahiro Ezaki

Full Text Available Evolutionary game theory predicts that cooperation in social dilemma games is promoted when agents are connected as a network. However, when networks are fixed over time, humans do not necessarily show enhanced mutual cooperation. Here we show that reinforcement learning (specifically, the so-called Bush-Mosteller model approximately explains the experimentally observed network reciprocity and the lack thereof in a parameter region spanned by the benefit-to-cost ratio and the node's degree. Thus, we significantly extend previously obtained numerical results.
A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study

Science.gov (United States)

Anderson, Sarah J.; Hecker, Kent G.; Krigolson, Olave E.; Jamniczky, Heather A.

2018-01-01

In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT) methods incorporate pre-class exercises (typically online) meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG) as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP) components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise. PMID:29467638
A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study

Directory of Open Access Journals (Sweden)

Sarah J. Anderson

2018-02-01

Full Text Available In anatomy education, a key hurdle to engaging in higher-level discussion in the classroom is recognizing and understanding the extensive terminology used to identify and describe anatomical structures. Given the time-limited classroom environment, seeking methods to impart this foundational knowledge to students in an efficient manner is essential. Just-in-Time Teaching (JiTT methods incorporate pre-class exercises (typically online meant to establish foundational knowledge in novice learners so subsequent instructor-led sessions can focus on deeper, more complex concepts. Determining how best do we design and assess pre-class exercises requires a detailed examination of learning and retention in an applied educational context. Here we used electroencephalography (EEG as a quantitative dependent variable to track learning and examine the efficacy of JiTT activities to teach anatomy. Specifically, we examined changes in the amplitude of the N250 and reward positivity event-related brain potential (ERP components alongside behavioral performance as novice students participated in a series of computerized reinforcement-based learning modules to teach neuroanatomical structures. We found that as students learned to identify anatomical structures, the amplitude of the N250 increased and reward positivity amplitude decreased in response to positive feedback. Both on a retention and transfer exercise when learners successfully remembered and translated their knowledge to novel images, the amplitude of the reward positivity remained decreased compared to early learning. Our findings suggest ERPs can be used as a tool to track learning, retention, and transfer of knowledge and that employing the reinforcement learning paradigm is an effective educational approach for developing anatomical expertise.
Optimal Control via Reinforcement Learning with Symbolic Policy Approximation

NARCIS (Netherlands)

Kubalìk, Jiřì; Alibekov, Eduard; Babuska, R.; Dochain, Denis; Henrion, Didier; Peaucelle, Dimitri

2017-01-01

Model-based reinforcement learning (RL) algorithms can be used to derive optimal control laws for nonlinear dynamic systems. With continuous-valued state and input variables, RL algorithms have to rely on function approximators to represent the value function and policy mappings. This paper
Multi-label learning with fuzzy hypergraph regularization for protein subcellular location prediction.

Science.gov (United States)

Chen, Jing; Tang, Yuan Yan; Chen, C L Philip; Fang, Bin; Lin, Yuewei; Shang, Zhaowei

2014-12-01

Protein subcellular location prediction aims to predict the location where a protein resides within a cell using computational methods. Considering the main limitations of the existing methods, we propose a hierarchical multi-label learning model FHML for both single-location proteins and multi-location proteins. The latent concepts are extracted through feature space decomposition and label space decomposition under the nonnegative data factorization framework. The extracted latent concepts are used as the codebook to indirectly connect the protein features to their annotations. We construct dual fuzzy hypergraphs to capture the intrinsic high-order relations embedded in not only feature space, but also label space. Finally, the subcellular location annotation information is propagated from the labeled proteins to the unlabeled proteins by performing dual fuzzy hypergraph Laplacian regularization. The experimental results on the six protein benchmark datasets demonstrate the superiority of our proposed method by comparing it with the state-of-the-art methods, and illustrate the benefit of exploiting both feature correlations and label correlations.
Learning Similar Actions by Reinforcement or Sensory-Prediction Errors Rely on Distinct Physiological Mechanisms.

Science.gov (United States)

Uehara, Shintaro; Mawase, Firas; Celnik, Pablo

2017-09-14

Humans can acquire knowledge of new motor behavior via different forms of learning. The two forms most commonly studied have been the development of internal models based on sensory-prediction errors (error-based learning) and success-based feedback (reinforcement learning). Human behavioral studies suggest these are distinct learning processes, though the neurophysiological mechanisms that are involved have not been characterized. Here, we evaluated physiological markers from the cerebellum and the primary motor cortex (M1) using noninvasive brain stimulations while healthy participants trained finger-reaching tasks. We manipulated the extent to which subjects rely on error-based or reinforcement by providing either vector or binary feedback about task performance. Our results demonstrated a double dissociation where learning the task mainly via error-based mechanisms leads to cerebellar plasticity modifications but not long-term potentiation (LTP)-like plasticity changes in M1; while learning a similar action via reinforcement mechanisms elicited M1 LTP-like plasticity but not cerebellar plasticity changes. Our findings indicate that learning complex motor behavior is mediated by the interplay of different forms of learning, weighing distinct neural mechanisms in M1 and the cerebellum. Our study provides insights for designing effective interventions to enhance human motor learning. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Optimal control in microgrid using multi-agent reinforcement learning.

Science.gov (United States)

Li, Fu-Dong; Wu, Min; He, Yong; Chen, Xin

2012-11-01

This paper presents an improved reinforcement learning method to minimize electricity costs on the premise of satisfying the power balance and generation limit of units in a microgrid with grid-connected mode. Firstly, the microgrid control requirements are analyzed and the objective function of optimal control for microgrid is proposed. Then, a state variable "Average Electricity Price Trend" which is used to express the most possible transitions of the system is developed so as to reduce the complexity and randomicity of the microgrid, and a multi-agent architecture including agents, state variables, action variables and reward function is formulated. Furthermore, dynamic hierarchical reinforcement learning, based on change rate of key state variable, is established to carry out optimal policy exploration. The analysis shows that the proposed method is beneficial to handle the problem of "curse of dimensionality" and speed up learning in the unknown large-scale world. Finally, the simulation results under JADE (Java Agent Development Framework) demonstrate the validity of the presented method in optimal control for a microgrid with grid-connected mode. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

A Study on the Rare Factors Exploration of Learning Effectiveness by Using Fuzzy Data Mining

Science.gov (United States)

Chen, Chen-Tung; Chang, Kai-Yi

2017-01-01

The phenomenon of low fertility has been negatively impacted on the social structure of the educational environment in Taiwan. To increase the learning effectiveness of students became the most important issue for the Universities in Taiwan. Due to the subjective judgment of evaluators and the attributes of influenced factors are always fuzzy, it…
Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

Science.gov (United States)

Kastner, Lucas; Kube, Jana; Villringer, Arno; Neumann, Jane

2017-01-01

Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning. PMID:29163004
Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning

Directory of Open Access Journals (Sweden)

Lucas Kastner

2017-10-01

Full Text Available Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1 Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2 Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3 Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.
Decomposition of fuzzy continuity and fuzzy ideal continuity via fuzzy idealization

International Nuclear Information System (INIS)

Zahran, A.M.; Abbas, S.E.; Abd El-baki, S.A.; Saber, Y.M.

2009-01-01

Recently, El-Naschie has shown that the notion of fuzzy topology may be relevant to quantum paretical physics in connection with string theory and E-infinity space time theory. In this paper, we study the concepts of r-fuzzy semi-I-open, r-fuzzy pre-I-open, r-fuzzy α-I-open and r-fuzzy β-I-open sets, which is properly placed between r-fuzzy openness and r-fuzzy α-I-openness (r-fuzzy pre-I-openness) sets regardless the fuzzy ideal topological space in Sostak sense. Moreover, we give a decomposition of fuzzy continuity, fuzzy ideal continuity and fuzzy ideal α-continuity, and obtain several characterization and some properties of these functions. Also, we investigate their relationship with other types of function.
Ensemble Network Architecture for Deep Reinforcement Learning

Directory of Open Access Journals (Sweden)

Xi-liang Chen

2018-01-01

Full Text Available The popular deep Q learning algorithm is known to be instability because of the Q-value’s shake and overestimation action values under certain conditions. These issues tend to adversely affect their performance. In this paper, we develop the ensemble network architecture for deep reinforcement learning which is based on value function approximation. The temporal ensemble stabilizes the training process by reducing the variance of target approximation error and the ensemble of target values reduces the overestimate and makes better performance by estimating more accurate Q-value. Our results show that this architecture leads to statistically significant better value evaluation and more stable and better performance on several classical control tasks at OpenAI Gym environment.
Integrating distributed Bayesian inference and reinforcement learning for sensor management

NARCIS (Netherlands)

Grappiolo, C.; Whiteson, S.; Pavlin, G.; Bakker, B.

2009-01-01

This paper introduces a sensor management approach that integrates distributed Bayesian inference (DBI) and reinforcement learning (RL). DBI is implemented using distributed perception networks (DPNs), a multiagent approach to performing efficient inference, while RL is used to automatically
Runoff forecasting using a Takagi-Sugeno neuro-fuzzy model with online learning

Science.gov (United States)

Talei, Amin; Chua, Lloyd Hock Chye; Quek, Chai; Jansson, Per-Erik

2013-04-01

SummaryA study using local learning Neuro-Fuzzy System (NFS) was undertaken for a rainfall-runoff modeling application. The local learning model was first tested on three different catchments: an outdoor experimental catchment measuring 25 m2 (Catchment 1), a small urban catchment 5.6 km2 in size (Catchment 2), and a large rural watershed with area of 241.3 km2 (Catchment 3). The results obtained from the local learning model were comparable or better than results obtained from physically-based, i.e. Kinematic Wave Model (KWM), Storm Water Management Model (SWMM), and Hydrologiska Byråns Vattenbalansavdelning (HBV) model. The local learning algorithm also required a shorter training time compared to a global learning NFS model. The local learning model was next tested in real-time mode, where the model was continuously adapted when presented with current information in real time. The real-time implementation of the local learning model gave better results, without the need for retraining, when compared to a batch NFS model, where it was found that the batch model had to be retrained periodically in order to achieve similar results.
Learning User Preferences in Ubiquitous Systems: A User Study and a Reinforcement Learning Approach

OpenAIRE

Zaidenberg , Sofia; Reignier , Patrick; Mandran , Nadine

2010-01-01

International audience; Our study concerns a virtual assistant, proposing services to the user based on its current perceived activity and situation (ambient intelligence). Instead of asking the user to define his preferences, we acquire them automatically using a reinforcement learning approach. Experiments showed that our system succeeded the learning of user preferences. In order to validate the relevance and usability of such a system, we have first conducted a user study. 26 non-expert s...
Adaptive neuro-fuzzy and expert systems for power quality analysis and prediction of abnormal operation

Science.gov (United States)

Ibrahim, Wael Refaat Anis

The present research involves the development of several fuzzy expert systems for power quality analysis and diagnosis. Intelligent systems for the prediction of abnormal system operation were also developed. The performance of all intelligent modules developed was either enhanced or completely produced through adaptive fuzzy learning techniques. Neuro-fuzzy learning is the main adaptive technique utilized. The work presents a novel approach to the interpretation of power quality from the perspective of the continuous operation of a single system. The research includes an extensive literature review pertaining to the applications of intelligent systems to power quality analysis. Basic definitions and signature events related to power quality are introduced. In addition, detailed discussions of various artificial intelligence paradigms as well as wavelet theory are included. A fuzzy-based intelligent system capable of identifying normal from abnormal operation for a given system was developed. Adaptive neuro-fuzzy learning was applied to enhance its performance. A group of fuzzy expert systems that could perform full operational diagnosis were also developed successfully. The developed systems were applied to the operational diagnosis of 3-phase induction motors and rectifier bridges. A novel approach for learning power quality waveforms and trends was developed. The technique, which is adaptive neuro fuzzy-based, learned, compressed, and stored the waveform data. The new technique was successfully tested using a wide variety of power quality signature waveforms, and using real site data. The trend-learning technique was incorporated into a fuzzy expert system that was designed to predict abnormal operation of a monitored system. The intelligent system learns and stores, in compressed format, trends leading to abnormal operation. The system then compares incoming data to the retained trends continuously. If the incoming data matches any of the learned trends, an
Fuzzy Expert System to Characterize Students

Science.gov (United States)

Van Hecke, T.

2011-01-01

Students wanting to succeed in higher education are required to adopt an adequate learning approach. By analyzing individual learning characteristics, teachers can give personal advice to help students identify their learning success factors. An expert system based on fuzzy logic can provide economically viable solutions to help students identify…
Fuzzy controllers in nuclear material accounting

International Nuclear Information System (INIS)

Zardecki, A.

1994-01-01

Fuzzy controllers are applied to predicting and modeling a time series, with particular emphasis on anomaly detection in nuclear material inventory differences. As compared to neural networks, the fuzzy controllers can operate in real time; their learning process does not require many iterations to converge. For this reason fuzzy controllers are potentially useful in time series forecasting, where the authors want to detect and identify trends in real time. They describe an object-oriented implementation of the algorithm advanced by Wang and Mendel. Numerical results are presented both for inventory data and time series corresponding to chaotic situations, such as encountered in the context of strange attractors. In the latter case, the effects of noise on the predictive power of the fuzzy controller are explored
Multiagent cooperation and competition with deep reinforcement learning.

Directory of Open Access Journals (Sweden)

Ardi Tampuu

Full Text Available Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.
Multiagent cooperation and competition with deep reinforcement learning

Science.gov (United States)

Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

2017-01-01

Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments. PMID:28380078
Multiagent cooperation and competition with deep reinforcement learning.

Science.gov (United States)

Tampuu, Ardi; Matiisen, Tambet; Kodelja, Dorian; Kuzovkin, Ilya; Korjus, Kristjan; Aru, Juhan; Aru, Jaan; Vicente, Raul

2017-01-01

Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.
Optimal and Autonomous Control Using Reinforcement Learning: A Survey.

Science.gov (United States)

Kiumarsi, Bahare; Vamvoudakis, Kyriakos G; Modares, Hamidreza; Lewis, Frank L

2018-06-01

This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.
A model reference and sensitivity model-based self-learning fuzzy logic controller as a solution for control of nonlinear servo systems

NARCIS (Netherlands)

Kovacic, Z.; Bogdan, S.; Balenovic, M.

1999-01-01

In this paper, the design, simulation and experimental verification of a self-learning fuzzy logic controller (SLFLC) suitable for the control of nonlinear servo systems are described. The SLFLC contains a learning algorithm that utilizes a second-order reference model and a sensitivity model
Universal effect of dynamical reinforcement learning mechanism in spatial evolutionary games

International Nuclear Information System (INIS)

Zhang, Hai-Feng; Wu, Zhi-Xi; Wang, Bing-Hong

2012-01-01

One of the prototypical mechanisms in understanding the ubiquitous cooperation in social dilemma situations is the win–stay, lose–shift rule. In this work, a generalized win–stay, lose–shift learning model—a reinforcement learning model with dynamic aspiration level—is proposed to describe how humans adapt their social behaviors based on their social experiences. In the model, the players incorporate the information of the outcomes in previous rounds with time-dependent aspiration payoffs to regulate the probability of choosing cooperation. By investigating such a reinforcement learning rule in the spatial prisoner's dilemma game and public goods game, a most noteworthy viewpoint is that moderate greediness (i.e. moderate aspiration level) favors best the development and organization of collective cooperation. The generality of this observation is tested against different regulation strengths and different types of network of interaction as well. We also make comparisons with two recently proposed models to highlight the importance of the mechanism of adaptive aspiration level in supporting cooperation in structured populations
Fuzzy linear programming based optimal fuel scheduling incorporating blending/transloading facilities

Energy Technology Data Exchange (ETDEWEB)

Djukanovic, M.; Babic, B.; Milosevic, B. [Electrical Engineering Inst. Nikola Tesla, Belgrade (Yugoslavia); Sobajic, D.J. [EPRI, Palo Alto, CA (United States). Power System Control; Pao, Y.H. [Case Western Reserve Univ., Cleveland, OH (United States)]|[AI WARE, Inc., Cleveland, OH (United States)

1996-05-01

In this paper the blending/transloading facilities are modeled using an interactive fuzzy linear programming (FLP), in order to allow the decision-maker to solve the problem of uncertainty of input information within the fuel scheduling optimization. An interactive decision-making process is formulated in which decision-maker can learn to recognize good solutions by considering all possibilities of fuzziness. The application of the fuzzy formulation is accompanied by a careful examination of the definition of fuzziness, appropriateness of the membership function and interpretation of results. The proposed concept provides a decision support system with integration-oriented features, whereby the decision-maker can learn to recognize the relative importance of factors in the specific domain of optimal fuel scheduling (OFS) problem. The formulation of a fuzzy linear programming problem to obtain a reasonable nonfuzzy solution under consideration of the ambiguity of parameters, represented by fuzzy numbers, is introduced. An additional advantage of the FLP formulation is its ability to deal with multi-objective problems.
New backpropagation algorithm with type-2 fuzzy weights for neural networks

CERN Document Server

Gaxiola, Fernando; Valdez, Fevrier

2016-01-01

In this book a neural network learning method with type-2 fuzzy weight adjustment is proposed. The mathematical analysis of the proposed learning method architecture and the adaptation of type-2 fuzzy weights are presented. The proposed method is based on research of recent methods that handle weight adaptation and especially fuzzy weights. The internal operation of the neuron is changed to work with two internal calculations for the activation function to obtain two results as outputs of the proposed method. Simulation results and a comparative study among monolithic neural networks, neural network with type-1 fuzzy weights and neural network with type-2 fuzzy weights are presented to illustrate the advantages of the proposed method. The proposed approach is based on recent methods that handle adaptation of weights using fuzzy logic of type-1 and type-2. The proposed approach is applied to a cases of prediction for the Mackey-Glass (for ô=17) and Dow-Jones time series, and recognition of person with iris bi...
Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain.

Science.gov (United States)

Niv, Yael; Edlund, Jeffrey A; Dayan, Peter; O'Doherty, John P

2012-01-11

Humans and animals are exquisitely, though idiosyncratically, sensitive to risk or variance in the outcomes of their actions. Economic, psychological, and neural aspects of this are well studied when information about risk is provided explicitly. However, we must normally learn about outcomes from experience, through trial and error. Traditional models of such reinforcement learning focus on learning about the mean reward value of cues and ignore higher order moments such as variance. We used fMRI to test whether the neural correlates of human reinforcement learning are sensitive to experienced risk. Our analysis focused on anatomically delineated regions of a priori interest in the nucleus accumbens, where blood oxygenation level-dependent (BOLD) signals have been suggested as correlating with quantities derived from reinforcement learning. We first provide unbiased evidence that the raw BOLD signal in these regions corresponds closely to a reward prediction error. We then derive from this signal the learned values of cues that predict rewards of equal mean but different variance and show that these values are indeed modulated by experienced risk. Moreover, a close neurometric-psychometric coupling exists between the fluctuations of the experience-based evaluations of risky options that we measured neurally and the fluctuations in behavioral risk aversion. This suggests that risk sensitivity is integral to human learning, illuminating economic models of choice, neuroscientific models of affective learning, and the workings of the underlying neural mechanisms.

A Fuzzy Control Course on the TED Server

DEFF Research Database (Denmark)

Dotoli, Mariagrazia; Jantzen, Jan

1999-01-01

, an educational server that serves as a learning central for students and professionals working with fuzzy logic. Through the server, TED offers an online course on fuzzy control. The course concerns automatic control of an inverted pendulum, with a focus on rule based control by means of fuzzy logic. A ball......The Training and Education Committee (TED) is a committee under ERUDIT, a Network of Excellence for fuzzy technology and uncertainty in Europe. The main objective of TED is to improve the training and educational possibilities for the nodes of ERUDIT. Since early 1999, TED has set up the TED server...
A Framework for Hierarchical Perception-Action Learning Utilizing Fuzzy Reasoning.

Science.gov (United States)

Windridge, David; Felsberg, Michael; Shaukat, Affan

2013-02-01

Perception-action (P-A) learning is an approach to cognitive system building that seeks to reduce the complexity associated with conventional environment-representation/action-planning approaches. Instead, actions are directly mapped onto the perceptual transitions that they bring about, eliminating the need for intermediate representation and significantly reducing training requirements. We here set out a very general learning framework for cognitive systems in which online learning of the P-A mapping may be conducted within a symbolic processing context, so that complex contextual reasoning can influence the P-A mapping. In utilizing a variational calculus approach to define a suitable objective function, the P-A mapping can be treated as an online learning problem via gradient descent using partial derivatives. Our central theoretical result is to demonstrate top-down modulation of low-level perceptual confidences via the Jacobian of the higher levels of a subsumptive P-A hierarchy. Thus, the separation of the Jacobian as a multiplying factor between levels within the objective function naturally enables the integration of abstract symbolic manipulation in the form of fuzzy deductive logic into the P-A mapping learning. We experimentally demonstrate that the resulting framework achieves significantly better accuracy than using P-A learning without top-down modulation. We also demonstrate that it permits novel forms of context-dependent multilevel P-A mapping, applying the mechanism in the context of an intelligent driver assistance system.
Multiagent Reinforcement Learning with Regret Matching for Robot Soccer

Directory of Open Access Journals (Sweden)

Qiang Liu

2013-01-01

Full Text Available This paper proposes a novel multiagent reinforcement learning (MARL algorithm Nash- learning with regret matching, in which regret matching is used to speed up the well-known MARL algorithm Nash- learning. It is critical that choosing a suitable strategy for action selection to harmonize the relation between exploration and exploitation to enhance the ability of online learning for Nash- learning. In Markov Game the joint action of agents adopting regret matching algorithm can converge to a group of points of no-regret that can be viewed as coarse correlated equilibrium which includes Nash equilibrium in essence. It is can be inferred that regret matching can guide exploration of the state-action space so that the rate of convergence of Nash- learning algorithm can be increased. Simulation results on robot soccer validate that compared to original Nash- learning algorithm, the use of regret matching during the learning phase of Nash- learning has excellent ability of online learning and results in significant performance in terms of scores, average reward and policy convergence.
A Fuzzy Knowledge Representation Model for Student Performance Assessment

DEFF Research Database (Denmark)

Badie, Farshad

Knowledge representation models based on Fuzzy Description Logics (DLs) can provide a foundation for reasoning in intelligent learning environments. While basic DLs are suitable for expressing crisp concepts and binary relationships, Fuzzy DLs are capable of processing degrees of truth/completene...
Multiagent Reinforcement Learning Dynamic Spectrum Access in Cognitive Radios

Directory of Open Access Journals (Sweden)

Wu Chun

2014-02-01

Full Text Available A multiuser independent Q-learning method which does not need information interaction is proposed for multiuser dynamic spectrum accessing in cognitive radios. The method adopts self-learning paradigm, in which each CR user performs reinforcement learning only through observing individual performance reward without spending communication resource on information interaction with others. The reward is defined suitably to present channel quality and channel conflict status. The learning strategy of sufficient exploration, preference for good channel, and punishment for channel conflict is designed to implement multiuser dynamic spectrum accessing. In two users two channels scenario, a fast learning algorithm is proposed and the convergence to maximal whole reward is proved. The simulation results show that, with the proposed method, the CR system can obtain convergence of Nash equilibrium with large probability and achieve great performance of whole reward.
A fuzzy Hopfield neural network for medical image segmentation

International Nuclear Information System (INIS)

Lin, J.S.; Cheng, K.S.; Mao, C.W.

1996-01-01

In this paper, an unsupervised parallel segmentation approach using a fuzzy Hopfield neural network (FHNN) is proposed. The main purpose is to embed fuzzy clustering into neural networks so that on-line learning and parallel implementation for medical image segmentation are feasible. The idea is to cast a clustering problem as a minimization problem where the criteria for the optimum segmentation is chosen as the minimization of the Euclidean distance between samples to class centers. In order to generate feasible results, a fuzzy c-means clustering strategy is included in the Hopfield neural network to eliminate the need of finding weighting factors in the energy function, which is formulated and based on a basic concept commonly used in pattern classification, called the within-class scatter matrix principle. The suggested fuzzy c-means clustering strategy has also been proven to be convergent and to allow the network to learn more effectively than the conventional Hopfield neural network. The fuzzy Hopfield neural network based on the within-class scatter matrix shows the promising results in comparison with the hard c-means method
A Review of the Relationship between Novelty, Intrinsic Motivation and Reinforcement Learning

Directory of Open Access Journals (Sweden)

Siddique Nazmul

2017-11-01

Full Text Available This paper presents a review on the tri-partite relationship between novelty, intrinsic motivation and reinforcement learning. The paper first presents a literature survey on novelty and the different computational models of novelty detection, with a specific focus on the features of stimuli that trigger a Hedonic value for generating a novelty signal. It then presents an overview of intrinsic motivation and investigations into different models with the aim of exploring deeper co-relationships between specific features of a novelty signal and its effect on intrinsic motivation in producing a reward function. Finally, it presents survey results on reinforcement learning, different models and their functional relationship with intrinsic motivation.
Learning Control of Fixed-Wing Unmanned Aerial Vehicles Using Fuzzy Neural Networks

Directory of Open Access Journals (Sweden)

Erdal Kayacan

2017-01-01

Full Text Available A learning control strategy is preferred for the control and guidance of a fixed-wing unmanned aerial vehicle to deal with lack of modeling and flight uncertainties. For learning the plant model as well as changing working conditions online, a fuzzy neural network (FNN is used in parallel with a conventional P (proportional controller. Among the learning algorithms in the literature, a derivative-free one, sliding mode control (SMC theory-based learning algorithm, is preferred as it has been proved to be computationally efficient in real-time applications. Its proven robustness and finite time converging nature make the learning algorithm appropriate for controlling an unmanned aerial vehicle as the computational power is always limited in unmanned aerial vehicles (UAVs. The parameter update rules and stability conditions of the learning are derived, and the proof of the stability of the learning algorithm is shown by using a candidate Lyapunov function. Intensive simulations are performed to illustrate the applicability of the proposed controller which includes the tracking of a three-dimensional trajectory by the UAV subject to time-varying wind conditions. The simulation results show the efficiency of the proposed control algorithm, especially in real-time control systems because of its computational efficiency.
Abrasive slurry jet cutting model based on fuzzy relations

Science.gov (United States)

Qiang, C. H.; Guo, C. W.

2017-12-01

The cutting process of pre-mixed abrasive slurry or suspension jet (ASJ) is a complex process affected by many factors, and there is a highly nonlinear relationship between the cutting parameters and cutting quality. In this paper, guided by fuzzy theory, the fuzzy cutting model of ASJ was developed. In the modeling of surface roughness, the upper surface roughness prediction model and the lower surface roughness prediction model were established respectively. The adaptive fuzzy inference system combines the learning mechanism of neural networks and the linguistic reasoning ability of the fuzzy system, membership functions, and fuzzy rules are obtained by adaptive adjustment. Therefore, the modeling process is fast and effective. In this paper, the ANFIS module of MATLAB fuzzy logic toolbox was used to establish the fuzzy cutting model of ASJ, which is found to be quite instrumental to ASJ cutting applications.
Dopamine-Dependent Reinforcement of Motor Skill Learning: Evidence from Gilles de la Tourette Syndrome

Science.gov (United States)

Palminteri, Stefano; Lebreton, Mael; Worbe, Yulia; Hartmann, Andreas; Lehericy, Stephane; Vidailhet, Marie; Grabli, David; Pessiglione, Mathias

2011-01-01

Reinforcement learning theory has been extensively used to understand the neural underpinnings of instrumental behaviour. A central assumption surrounds dopamine signalling reward prediction errors, so as to update action values and ensure better choices in the future. However, educators may share the intuitive idea that reinforcements not only…
Neural networks and statistical learning

CERN Document Server

Du, Ke-Lin

2014-01-01

Providing a broad but in-depth introduction to neural network and machine learning in a statistical framework, this book provides a single, comprehensive resource for study and further research. All the major popular neural network models and statistical learning approaches are covered with examples and exercises in every chapter to develop a practical working understanding of the content. Each of the twenty-five chapters includes state-of-the-art descriptions and important research results on the respective topics. The broad coverage includes the multilayer perceptron, the Hopfield network, associative memory models, clustering models and algorithms, the radial basis function network, recurrent neural networks, principal component analysis, nonnegative matrix factorization, independent component analysis, discriminant analysis, support vector machines, kernel methods, reinforcement learning, probabilistic and Bayesian networks, data fusion and ensemble learning, fuzzy sets and logic, neurofuzzy models, hardw...
Joy, Distress, Hope, and Fear in Reinforcement Learning (Extended Abstract)

NARCIS (Netherlands)

Jacobs, E.J.; Broekens, J.; Jonker, C.M.

2014-01-01

In this paper we present a mapping between joy, distress, hope and fear, and Reinforcement Learning primitives. Joy / distress is a signal that is derived from the RL update signal, while hope/fear is derived from the utility of the current state. Agent-based simulation experiments replicate
Application of ANNs approach for solving fully fuzzy polynomials system

Directory of Open Access Journals (Sweden)

R. Novin

2017-11-01

Full Text Available In processing indecisive or unclear information, the advantages of fuzzy logic and neurocomputing disciplines should be taken into account and combined by fuzzy neural networks. The current research intends to present a fuzzy modeling method using multi-layer fuzzy neural networks for solving a fully fuzzy polynomials system. To clarify the point, it is necessary to inform that a supervised gradient descent-based learning law is employed. The feasibility of the method is examined using computer simulations on a numerical example. The experimental results obtained from the investigation of the proposed method are valid and delivers very good approximation results.
Introduction to type-2 fuzzy logic control theory and applications

CERN Document Server

Mendel, Jerry M; Tan, Woei-Wan; Melek, William W; Ying, Hao

2014-01-01

Written by world-class leaders in type-2 fuzzy logic control, this book offers a self-contained reference for both researchers and students. The coverage provides both background and an extensive literature survey on fuzzy logic and related type-2 fuzzy control. It also includes research questions, experiment and simulation results, and downloadable computer programs on an associated website. This key resource will prove useful to students and engineers wanting to learn type-2 fuzzy control theory and its applications.
Applications of Deep Learning and Reinforcement Learning to Biological Data.

Science.gov (United States)

Mahmud, Mufti; Kaiser, Mohammed Shamim; Hussain, Amir; Vassanelli, Stefano

2018-06-01

Rapid advances in hardware-based technologies during the past decades have opened up new possibilities for life scientists to gather multimodal data in various application domains, such as omics, bioimaging, medical imaging, and (brain/body)-machine interfaces. These have generated novel opportunities for development of dedicated data-intensive machine learning techniques. In particular, recent research in deep learning (DL), reinforcement learning (RL), and their combination (deep RL) promise to revolutionize the future of artificial intelligence. The growth in computational power accompanied by faster and increased data storage, and declining computing costs have already allowed scientists in various fields to apply these techniques on data sets that were previously intractable owing to their size and complexity. This paper provides a comprehensive survey on the application of DL, RL, and deep RL techniques in mining biological data. In addition, we compare the performances of DL techniques when applied to different data sets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.
Gaze-contingent reinforcement learning reveals incentive value of social signals in young children and adults.

Science.gov (United States)

Vernetti, Angélina; Smith, Tim J; Senju, Atsushi

2017-03-15

While numerous studies have demonstrated that infants and adults preferentially orient to social stimuli, it remains unclear as to what drives such preferential orienting. It has been suggested that the learned association between social cues and subsequent reward delivery might shape such social orienting. Using a novel, spontaneous indication of reinforcement learning (with the use of a gaze contingent reward-learning task), we investigated whether children and adults' orienting towards social and non-social visual cues can be elicited by the association between participants' visual attention and a rewarding outcome. Critically, we assessed whether the engaging nature of the social cues influences the process of reinforcement learning. Both children and adults learned to orient more often to the visual cues associated with reward delivery, demonstrating that cue-reward association reinforced visual orienting. More importantly, when the reward-predictive cue was social and engaging, both children and adults learned the cue-reward association faster and more efficiently than when the reward-predictive cue was social but non-engaging. These new findings indicate that social engaging cues have a positive incentive value. This could possibly be because they usually coincide with positive outcomes in real life, which could partly drive the development of social orienting. © 2017 The Authors.
Adaptive Load Balancing of Parallel Applications with Multi-Agent Reinforcement Learning on Heterogeneous Systems

Directory of Open Access Journals (Sweden)

Johan Parent

2004-01-01

Full Text Available We report on the improvements that can be achieved by applying machine learning techniques, in particular reinforcement learning, for the dynamic load balancing of parallel applications. The applications being considered in this paper are coarse grain data intensive applications. Such applications put high pressure on the interconnect of the hardware. Synchronization and load balancing in complex, heterogeneous networks need fast, flexible, adaptive load balancing algorithms. Viewing a parallel application as a one-state coordination game in the framework of multi-agent reinforcement learning, and by using a recently introduced multi-agent exploration technique, we are able to improve upon the classic job farming approach. The improvements are achieved with limited computation and communication overhead.
Fuzzy control in robot-soccer, evolutionary learning in the first layer of control

Directory of Open Access Journals (Sweden)

Peter J Thomas

2003-02-01

Full Text Available In this paper an evolutionary algorithm is developed to learn a fuzzy knowledge base for the control of a soccer playing micro-robot from any configuration belonging to a grid of initial configurations to hit the ball along the ball to goal line of sight. The knowledge base uses relative co-ordinate system including left and right wheel velocities of the robot. Final path positions allow forward and reverse facing robot to ball and include its physical dimensions.
Learning alternative movement coordination patterns using reinforcement feedback.

Science.gov (United States)

Lin, Tzu-Hsiang; Denomme, Amber; Ranganathan, Rajiv

2018-05-01

One of the characteristic features of the human motor system is redundancy-i.e., the ability to achieve a given task outcome using multiple coordination patterns. However, once participants settle on using a specific coordination pattern, the process of learning to use a new alternative coordination pattern to perform the same task is still poorly understood. Here, using two experiments, we examined this process of how participants shift from one coordination pattern to another using different reinforcement schedules. Participants performed a virtual reaching task, where they moved a cursor to different targets positioned on the screen. Our goal was to make participants use a coordination pattern with greater trunk motion, and to this end, we provided reinforcement by making the cursor disappear if the trunk motion during the reach did not cross a specified threshold value. In Experiment 1, we compared two reinforcement schedules in two groups of participants-an abrupt group, where the threshold was introduced immediately at the beginning of practice; and a gradual group, where the threshold was introduced gradually with practice. Results showed that both abrupt and gradual groups were effective in shifting their coordination patterns to involve greater trunk motion, but the abrupt group showed greater retention when the reinforcement was removed. In Experiment 2, we examined the basis of this advantage in the abrupt group using two additional control groups. Results showed that the advantage of the abrupt group was because of a greater number of practice trials with the desired coordination pattern. Overall, these results show that reinforcement can be successfully used to shift coordination patterns, which has potential in the rehabilitation of movement disorders.
Nonparametric bayesian reward segmentation for skill discovery using inverse reinforcement learning

CSIR Research Space (South Africa)

Ranchod, P

2015-10-01

Full Text Available We present a method for segmenting a set of unstructured demonstration trajectories to discover reusable skills using inverse reinforcement learning (IRL). Each skill is characterised by a latent reward function which the demonstrator is assumed...

Reusable Reinforcement Learning via Shallow Trails.

Science.gov (United States)

Yu, Yang; Chen, Shi-Yong; Da, Qing; Zhou, Zhi-Hua

2018-06-01

Reinforcement learning has shown great success in helping learning agents accomplish tasks autonomously from environment interactions. Meanwhile in many real-world applications, an agent needs to accomplish not only a fixed task but also a range of tasks. For this goal, an agent can learn a metapolicy over a set of training tasks that are drawn from an underlying distribution. By maximizing the total reward summed over all the training tasks, the metapolicy can then be reused in accomplishing test tasks from the same distribution. However, in practice, we face two major obstacles to train and reuse metapolicies well. First, how to identify tasks that are unrelated or even opposite with each other, in order to avoid their mutual interference in the training. Second, how to characterize task features, according to which a metapolicy can be reused. In this paper, we propose the MetA-Policy LEarning (MAPLE) approach that overcomes the two difficulties by introducing the shallow trail. It probes a task by running a roughly trained policy. Using the rewards of the shallow trail, MAPLE automatically groups similar tasks. Moreover, when the task parameters are unknown, the rewards of the shallow trail also serve as task features. Empirical studies on several controlling tasks verify that MAPLE can train metapolicies well and receives high reward on test tasks.
Explicit and implicit reinforcement learning across the psychosis spectrum.

Science.gov (United States)

Barch, Deanna M; Carter, Cameron S; Gold, James M; Johnson, Sheri L; Kring, Ann M; MacDonald, Angus W; Pizzagalli, Diego A; Ragland, J Daniel; Silverstein, Steven M; Strauss, Milton E

2017-07-01

Motivational and hedonic impairments are core features of a variety of types of psychopathology. An important aspect of motivational function is reinforcement learning (RL), including implicit (i.e., outside of conscious awareness) and explicit (i.e., including explicit representations about potential reward associations) learning, as well as both positive reinforcement (learning about actions that lead to reward) and punishment (learning to avoid actions that lead to loss). Here we present data from paradigms designed to assess both positive and negative components of both implicit and explicit RL, examine performance on each of these tasks among individuals with schizophrenia, schizoaffective disorder, and bipolar disorder with psychosis, and examine their relative relationships to specific symptom domains transdiagnostically. None of the diagnostic groups differed significantly from controls on the implicit RL tasks in either bias toward a rewarded response or bias away from a punished response. However, on the explicit RL task, both the individuals with schizophrenia and schizoaffective disorder performed significantly worse than controls, but the individuals with bipolar did not. Worse performance on the explicit RL task, but not the implicit RL task, was related to worse motivation and pleasure symptoms across all diagnostic categories. Performance on explicit RL, but not implicit RL, was related to working memory, which accounted for some of the diagnostic group differences. However, working memory did not account for the relationship of explicit RL to motivation and pleasure symptoms. These findings suggest transdiagnostic relationships across the spectrum of psychotic disorders between motivation and pleasure impairments and explicit RL. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Manufacturing Scheduling Using Colored Petri Nets and Reinforcement Learning

Directory of Open Access Journals (Sweden)

Maria Drakaki

2017-02-01

Full Text Available Agent-based intelligent manufacturing control systems are capable to efficiently respond and adapt to environmental changes. Manufacturing system adaptation and evolution can be addressed with learning mechanisms that increase the intelligence of agents. In this paper a manufacturing scheduling method is presented based on Timed Colored Petri Nets (CTPNs and reinforcement learning (RL. CTPNs model the manufacturing system and implement the scheduling. In the search for an optimal solution a scheduling agent uses RL and in particular the Q-learning algorithm. A warehouse order-picking scheduling is presented as a case study to illustrate the method. The proposed scheduling method is compared to existing methods. Simulation and state space results are used to evaluate performance and identify system properties.
Creating Clinical Fuzzy Automata with Fuzzy Arden Syntax.

Science.gov (United States)

de Bruin, Jeroen S; Steltzer, Heinz; Rappelsberger, Andrea; Adlassnig, Klaus-Peter

2017-01-01

Formal constructs for fuzzy sets and fuzzy logic are incorporated into Arden Syntax version 2.9 (Fuzzy Arden Syntax). With fuzzy sets, the relationships between measured or observed data and linguistic terms are expressed as degrees of compatibility that model the unsharpness of the boundaries of linguistic terms. Propositional uncertainty due to incomplete knowledge of relationships between clinical linguistic concepts is modeled with fuzzy logic. Fuzzy Arden Syntax also supports the construction of fuzzy state monitors. The latter are defined as monitors that employ fuzzy automata to observe gradual transitions between different stages of disease. As a use case, we re-implemented FuzzyARDS, a previously published clinical monitoring system for patients suffering from acute respiratory distress syndrome (ARDS). Using the re-implementation as an example, we show how key concepts of fuzzy automata, i.e., fuzzy states and parallel fuzzy state transitions, can be implemented in Fuzzy Arden Syntax. The results showed that fuzzy state monitors can be implemented in a straightforward manner.
An Improved Reinforcement Learning System Using Affective Factors

Directory of Open Access Journals (Sweden)

Takashi Kuremoto

2013-07-01

Full Text Available As a powerful and intelligent machine learning method, reinforcement learning (RL has been widely used in many fields such as game theory, adaptive control, multi-agent system, nonlinear forecasting, and so on. The main contribution of this technique is its exploration and exploitation approaches to find the optimal solution or semi-optimal solution of goal-directed problems. However, when RL is applied to multi-agent systems (MASs, problems such as “curse of dimension”, “perceptual aliasing problem”, and uncertainty of the environment constitute high hurdles to RL. Meanwhile, although RL is inspired by behavioral psychology and reward/punishment from the environment is used, higher mental factors such as affects, emotions, and motivations are rarely adopted in the learning procedure of RL. In this paper, to challenge agents learning in MASs, we propose a computational motivation function, which adopts two principle affective factors “Arousal” and “Pleasure” of Russell’s circumplex model of affects, to improve the learning performance of a conventional RL algorithm named Q-learning (QL. Compared with the conventional QL, computer simulations of pursuit problems with static and dynamic preys were carried out, and the results showed that the proposed method results in agents having a faster and more stable learning performance.
Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning.

Science.gov (United States)

Pilarski, Patrick M; Dawson, Michael R; Degris, Thomas; Fahimi, Farbod; Carey, Jason P; Sutton, Richard S

2011-01-01

As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis. © 2011 IEEE
Optimizing Chemical Reactions with Deep Reinforcement Learning.

Science.gov (United States)

Zhou, Zhenpeng; Li, Xiaocheng; Zare, Richard N

2017-12-27

Deep reinforcement learning was employed to optimize chemical reactions. Our model iteratively records the results of a chemical reaction and chooses new experimental conditions to improve the reaction outcome. This model outperformed a state-of-the-art blackbox optimization algorithm by using 71% fewer steps on both simulations and real reactions. Furthermore, we introduced an efficient exploration strategy by drawing the reaction conditions from certain probability distributions, which resulted in an improvement on regret from 0.062 to 0.039 compared with a deterministic policy. Combining the efficient exploration policy with accelerated microdroplet reactions, optimal reaction conditions were determined in 30 min for the four reactions considered, and a better understanding of the factors that control microdroplet reactions was reached. Moreover, our model showed a better performance after training on reactions with similar or even dissimilar underlying mechanisms, which demonstrates its learning ability.
'Proactive' use of cue-context congruence for building reinforcement learning's reward function.

Science.gov (United States)

Zsuga, Judit; Biro, Klara; Tajti, Gabor; Szilasi, Magdolna Emma; Papp, Csaba; Juhasz, Bela; Gesztelyi, Rudolf

2016-10-28

Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent's knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability. The central objective of reinforcement learning is to solve these two functions outside the agent's control either using, or not using a model. In the present paper, using the proactive model of reinforcement learning we offer insight on how the brain creates simplified representations of the environment, and how these representations are organized to support the identification of relevant stimuli and action. Furthermore, we identify neurobiological correlates of our model by suggesting that the reward and policy functions, attributes of the Bellman equitation, are built by the orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC), respectively. Based on this we propose that the OFC assesses cue-context congruence to activate the most context frame. Furthermore given the bidirectional neuroanatomical link between the OFC and model-free structures, we suggest that model-based input is incorporated into the reward prediction error (RPE) signal, and conversely RPE signal may be used to update the reward-related information of context frames and the policy underlying action selection in the OFC and ACC, respectively. Furthermore clinical implications for cognitive behavioral interventions are discussed.
Neuro-fuzzy inverse model control structure of robotic manipulators utilized for physiotherapy applications

Directory of Open Access Journals (Sweden)

A.A. Fahmy

2013-12-01

Full Text Available This paper presents a new neuro-fuzzy controller for robot manipulators. First, an inductive learning technique is applied to generate the required inverse modeling rules from input/output data recorded in the off-line structure learning phase. Second, a fully differentiable fuzzy neural network is developed to construct the inverse dynamics part of the controller for the online parameter learning phase. Finally, a fuzzy-PID-like incremental controller was employed as Feedback servo controller. The proposed control system was tested using dynamic model of a six-axis industrial robot. The control system showed good results compared to the conventional PID individual joint controller.
Adaptive neuro-fuzzy controller of switched reluctance motor

Directory of Open Access Journals (Sweden)

Tahour Ahmed

2007-01-01

Full Text Available This paper presents an application of adaptive neuro-fuzzy (ANFIS control for switched reluctance motor (SRM speed. The ANFIS has the advantages of expert knowledge of the fuzzy inference system and the learning capability of neural networks. An adaptive neuro-fuzzy controller of the motor speed is then designed and simulated. Digital simulation results show that the designed ANFIS speed controller realizes a good dynamic behaviour of the motor, a perfect speed tracking with no overshoot and a good rejection of impact loads disturbance. The results of applying the adaptive neuro-fuzzy controller to a SRM give better performance and high robustness than those obtained by the application of a conventional controller (PI.
Building of fuzzy decision trees using ID3 algorithm

Science.gov (United States)

Begenova, S. B.; Avdeenko, T. V.

2018-05-01

Decision trees are widely used in the field of machine learning and artificial intelligence. Such popularity is due to the fact that with the help of decision trees graphic models, text rules can be built and they are easily understood by the final user. Because of the inaccuracy of observations, uncertainties, the data, collected in the environment, often take an unclear form. Therefore, fuzzy decision trees becoming popular in the field of machine learning. This article presents a method that includes the features of the two above-mentioned approaches: a graphical representation of the rules system in the form of a tree and a fuzzy representation of the data. The approach uses such advantages as high comprehensibility of decision trees and the ability to cope with inaccurate and uncertain information in fuzzy representation. The received learning method is suitable for classifying problems with both numerical and symbolic features. In the article, solution illustrations and numerical results are given.
A fuzzy neural network for sensor signal estimation

International Nuclear Information System (INIS)

Na, Man Gyun

2000-01-01

In this work, a fuzzy neural network is used to estimate the relevant sensor signal using other sensor signals. Noise components in input signals into the fuzzy neural network are removed through the wavelet denoising technique. Principal component analysis (PCA) is used to reduce the dimension of an input space without losing a significant amount of information. A lower dimensional input space will also usually reduce the time necessary to train a fuzzy-neural network. Also, the principal component analysis makes easy the selection of the input signals into the fuzzy neural network. The fuzzy neural network parameters are optimized by two learning methods. A genetic algorithm is used to optimize the antecedent parameters of the fuzzy neural network and a least-squares algorithm is used to solve the consequent parameters. The proposed algorithm was verified through the application to the pressurizer water level and the hot-leg flowrate measurements in pressurized water reactors
Genetic Learning of Fuzzy Expert Systems for Decision Support in the Automated Process of Wooden Boards Cutting

Directory of Open Access Journals (Sweden)

Yaroslav MATSYSHYN

2014-03-01

Full Text Available Sawing solid wood (lumber, wooden boards into blanks is an important technological operation, which has significant influence on the efficiency of the woodworking industry as a whole. Selecting a rational variant of lumber cutting is a complex multicriteria problem with many stochastic factors, characterized by incomplete information and fuzzy attributes. About this property by currently used automatic optimizing cross-cut saw is not always rational use of wood raw material. And since the optimization algorithms of these saw functions as a “black box”, their improvement is not possible. Therefore topical the task of developing a new approach to the optimal cross-cutting that takes into account stochastic properties of wood as a material from biological origin. Here we propose a new approach to the problem of lumber optimal cutting in the conditions of uncertainty of lumber quantity and fuzziness lengths of defect-free areas. To account for these conditions, we applied the methods of fuzzy sets theory and used a genetic algorithm to simulate the process of human learning in the implementation the technological operation. Thus, the rules of behavior with yet another defect-free area is defined in fuzzy expert system that can be configured to perform specific production tasks using genetic algorithm. The author's implementation of the genetic algorithm is used to set up the parameters of fuzzy expert system. Working capacity of the developed system verified on simulated and real-world data. Implementation of this approach will make it suitable for the control of automated or fully automatic optimizing cross cutting of solid wood.
Credit Scoring by Fuzzy Support Vector Machines with a Novel Membership Function

Directory of Open Access Journals (Sweden)

Jian Shi

2016-11-01

Full Text Available Due to the recent financial crisis and European debt crisis, credit risk evaluation has become an increasingly important issue for financial institutions. Reliable credit scoring models are crucial for commercial banks to evaluate the financial performance of clients and have been widely studied in the fields of statistics and machine learning. In this paper a novel fuzzy support vector machine (SVM credit scoring model is proposed for credit risk analysis, in which fuzzy membership is adopted to indicate different contribution of each input point to the learning of SVM classification hyperplane. Considering the methodological consistency, support vector data description (SVDD is introduced to construct the fuzzy membership function and to reduce the effect of outliers and noises. The SVDD-based fuzzy SVM model is tested against the traditional fuzzy SVM on two real-world datasets and the research results confirm the effectiveness of the presented method.
The "proactive" model of learning: Integrative framework for model-free and model-based reinforcement learning utilizing the associative learning-based proactive brain concept.

Science.gov (United States)

Zsuga, Judit; Biro, Klara; Papp, Csaba; Tajti, Gabor; Gesztelyi, Rudolf

2016-02-01

Reinforcement learning (RL) is a powerful concept underlying forms of associative learning governed by the use of a scalar reward signal, with learning taking place if expectations are violated. RL may be assessed using model-based and model-free approaches. Model-based reinforcement learning involves the amygdala, the hippocampus, and the orbitofrontal cortex (OFC). The model-free system involves the pedunculopontine-tegmental nucleus (PPTgN), the ventral tegmental area (VTA) and the ventral striatum (VS). Based on the functional connectivity of VS, model-free and model based RL systems center on the VS that by integrating model-free signals (received as reward prediction error) and model-based reward related input computes value. Using the concept of reinforcement learning agent we propose that the VS serves as the value function component of the RL agent. Regarding the model utilized for model-based computations we turned to the proactive brain concept, which offers an ubiquitous function for the default network based on its great functional overlap with contextual associative areas. Hence, by means of the default network the brain continuously organizes its environment into context frames enabling the formulation of analogy-based association that are turned into predictions of what to expect. The OFC integrates reward-related information into context frames upon computing reward expectation by compiling stimulus-reward and context-reward information offered by the amygdala and hippocampus, respectively. Furthermore we suggest that the integration of model-based expectations regarding reward into the value signal is further supported by the efferent of the OFC that reach structures canonical for model-free learning (e.g., the PPTgN, VTA, and VS). (c) 2016 APA, all rights reserved).
Multi-objective evolutionary algorithms for fuzzy classification in survival prediction.

Science.gov (United States)

Jiménez, Fernando; Sánchez, Gracia; Juárez, José M

2014-03-01

This paper presents a novel rule-based fuzzy classification methodology for survival/mortality prediction in severe burnt patients. Due to the ethical aspects involved in this medical scenario, physicians tend not to accept a computer-based evaluation unless they understand why and how such a recommendation is given. Therefore, any fuzzy classifier model must be both accurate and interpretable. The proposed methodology is a three-step process: (1) multi-objective constrained optimization of a patient's data set, using Pareto-based elitist multi-objective evolutionary algorithms to maximize accuracy and minimize the complexity (number of rules) of classifiers, subject to interpretability constraints; this step produces a set of alternative (Pareto) classifiers; (2) linguistic labeling, which assigns a linguistic label to each fuzzy set of the classifiers; this step is essential to the interpretability of the classifiers; (3) decision making, whereby a classifier is chosen, if it is satisfactory, according to the preferences of the decision maker. If no classifier is satisfactory for the decision maker, the process starts again in step (1) with a different input parameter set. The performance of three multi-objective evolutionary algorithms, niched pre-selection multi-objective algorithm, elitist Pareto-based multi-objective evolutionary algorithm for diversity reinforcement (ENORA) and the non-dominated sorting genetic algorithm (NSGA-II), was tested using a patient's data set from an intensive care burn unit and a standard machine learning data set from an standard machine learning repository. The results are compared using the hypervolume multi-objective metric. Besides, the results have been compared with other non-evolutionary techniques and validated with a multi-objective cross-validation technique. Our proposal improves the classification rate obtained by other non-evolutionary techniques (decision trees, artificial neural networks, Naive Bayes, and case
Tank War Using Online Reinforcement Learning

DEFF Research Database (Denmark)

Toftgaard Andersen, Kresten; Zeng, Yifeng; Dahl Christensen, Dennis

2009-01-01

Real-Time Strategy(RTS) games provide a challenging platform to implement online reinforcement learning(RL) techniques in a real application. Computer as one player monitors opponents'(human or other computers) strategies and then updates its own policy using RL methods. In this paper, we propose...... a multi-layer framework for implementing the online RL in a RTS game. The framework significantly reduces the RL computational complexity by decomposing the state space in a hierarchical manner. We implement the RTS game - Tank General, and perform a thorough test on the proposed framework. The results...... show the effectiveness of our proposed framework and shed light on relevant issues on using the RL in RTS games....
A Classification Model and an Open E-Learning System Based on Intuitionistic Fuzzy Sets for Instructional Design Concepts

Science.gov (United States)

Güyer, Tolga; Aydogdu, Seyhmus

2016-01-01

This study suggests a classification model and an e-learning system based on this model for all instructional theories, approaches, models, strategies, methods, and technics being used in the process of instructional design that constitutes a direct or indirect resource for educational technology based on the theory of intuitionistic fuzzy sets…
A Model to Explain the Emergence of Reward Expectancy neurons using Reinforcement Learning and Neural Network

OpenAIRE

Shinya, Ishii; Munetaka, Shidara; Katsunari, Shibata

2006-01-01

In an experiment of multi-trial task to obtain a reward, reward expectancy neurons,###which responded only in the non-reward trials that are necessary to advance###toward the reward, have been observed in the anterior cingulate cortex of monkeys.###In this paper, to explain the emergence of the reward expectancy neuron in###terms of reinforcement learning theory, a model that consists of a recurrent neural###network trained based on reinforcement learning is proposed. The analysis of the###hi...
FMRQ-A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks.

Science.gov (United States)

Zhang, Zhen; Zhao, Dongbin; Gao, Junwei; Wang, Dongqing; Dai, Yujie

2017-06-01

In this paper, we propose a multiagent reinforcement learning algorithm dealing with fully cooperative tasks. The algorithm is called frequency of the maximum reward Q-learning (FMRQ). FMRQ aims to achieve one of the optimal Nash equilibria so as to optimize the performance index in multiagent systems. The frequency of obtaining the highest global immediate reward instead of immediate reward is used as the reinforcement signal. With FMRQ each agent does not need the observation of the other agents' actions and only shares its state and reward at each step. We validate FMRQ through case studies of repeated games: four cases of two-player two-action and one case of three-player two-action. It is demonstrated that FMRQ can converge to one of the optimal Nash equilibria in these cases. Moreover, comparison experiments on tasks with multiple states and finite steps are conducted. One is box-pushing and the other one is distributed sensor network problem. Experimental results show that the proposed algorithm outperforms others with higher performance.

Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning.

Science.gov (United States)

Zhu, Lusha; Mathewson, Kyle E; Hsu, Ming

2012-01-31

Decision-making in the presence of other competitive intelligent agents is fundamental for social and economic behavior. Such decisions require agents to behave strategically, where in addition to learning about the rewards and punishments available in the environment, they also need to anticipate and respond to actions of others competing for the same rewards. However, whereas we know much about strategic learning at both theoretical and behavioral levels, we know relatively little about the underlying neural mechanisms. Here, we show using a multi-strategy competitive learning paradigm that strategic choices can be characterized by extending the reinforcement learning (RL) framework to incorporate agents' beliefs about the actions of their opponents. Furthermore, using this characterization to generate putative internal values, we used model-based functional magnetic resonance imaging to investigate neural computations underlying strategic learning. We found that the distinct notions of prediction errors derived from our computational model are processed in a partially overlapping but distinct set of brain regions. Specifically, we found that the RL prediction error was correlated with activity in the ventral striatum. In contrast, activity in the ventral striatum, as well as the rostral anterior cingulate (rACC), was correlated with a previously uncharacterized belief-based prediction error. Furthermore, activity in rACC reflected individual differences in degree of engagement in belief learning. These results suggest a model of strategic behavior where learning arises from interaction of dissociable reinforcement and belief-based inputs.
Reinforcement Learning with Autonomous Small Unmanned Aerial Vehicles in Cluttered Environments

Science.gov (United States)

Tran, Loc; Cross, Charles; Montague, Gilbert; Motter, Mark; Neilan, James; Qualls, Garry; Rothhaar, Paul; Trujillo, Anna; Allen, B. Danette

2015-01-01

We present ongoing work in the Autonomy Incubator at NASA Langley Research Center (LaRC) exploring the efficacy of a data set aggregation approach to reinforcement learning for small unmanned aerial vehicle (sUAV) flight in dense and cluttered environments with reactive obstacle avoidance. The goal is to learn an autonomous flight model using training experiences from a human piloting a sUAV around static obstacles. The training approach uses video data from a forward-facing camera that records the human pilot's flight. Various computer vision based features are extracted from the video relating to edge and gradient information. The recorded human-controlled inputs are used to train an autonomous control model that correlates the extracted feature vector to a yaw command. As part of the reinforcement learning approach, the autonomous control model is iteratively updated with feedback from a human agent who corrects undesired model output. This data driven approach to autonomous obstacle avoidance is explored for simulated forest environments furthering autonomous flight under the tree canopy research. This enables flight in previously inaccessible environments which are of interest to NASA researchers in Earth and Atmospheric sciences.
Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

Science.gov (United States)

Kato, Ayaka; Morita, Kenji

2016-10-01

It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2) value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning
Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

Directory of Open Access Journals (Sweden)

Ayaka Kato

2016-10-01

Full Text Available It has been suggested that dopamine (DA represents reward-prediction-error (RPE defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1 decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2 value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i slowdown of behavior by post-training blockade of DA signaling, (ii observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems
Vicarious reinforcement learning signals when instructing others.

Science.gov (United States)

Apps, Matthew A J; Lesage, Elise; Ramnani, Narender

2015-02-18

Reinforcement learning (RL) theory posits that learning is driven by discrepancies between the predicted and actual outcomes of actions (prediction errors [PEs]). In social environments, learning is often guided by similar RL mechanisms. For example, teachers monitor the actions of students and provide feedback to them. This feedback evokes PEs in students that guide their learning. We report the first study that investigates the neural mechanisms that underpin RL signals in the brain of a teacher. Neurons in the anterior cingulate cortex (ACC) signal PEs when learning from the outcomes of one's own actions but also signal information when outcomes are received by others. Does a teacher's ACC signal PEs when monitoring a student's learning? Using fMRI, we studied brain activity in human subjects (teachers) as they taught a confederate (student) action-outcome associations by providing positive or negative feedback. We examined activity time-locked to the students' responses, when teachers infer student predictions and know actual outcomes. We fitted a RL-based computational model to the behavior of the student to characterize their learning, and examined whether a teacher's ACC signals when a student's predictions are wrong. In line with our hypothesis, activity in the teacher's ACC covaried with the PE values in the model. Additionally, activity in the teacher's insula and ventromedial prefrontal cortex covaried with the predicted value according to the student. Our findings highlight that the ACC signals PEs vicariously for others' erroneous predictions, when monitoring and instructing their learning. These results suggest that RL mechanisms, processed vicariously, may underpin and facilitate teaching behaviors. Copyright © 2015 Apps et al.
Neural-Network-Based Fuzzy Logic Navigation Control for Intelligent Vehicles

Directory of Open Access Journals (Sweden)

Ahcene Farah

2002-06-01

Full Text Available This paper proposes a Neural-Network-Based Fuzzy logic system for navigation control of intelligent vehicles. First, the use of Neural Networks and Fuzzy Logic to provide intelligent vehicles with more autonomy and intelligence is discussed. Second, the system for the obstacle avoidance behavior is developed. Fuzzy Logic improves Neural Networks (NN obstacle avoidance approach by handling imprecision and rule-based approximate reasoning. This system must make the vehicle able, after supervised learning, to achieve two tasks: 1- to make one’s way towards its target by a NN, and 2- to avoid static or dynamic obstacles by a Fuzzy NN capturing the behavior of a human expert. Afterwards, two association phases between each task and the appropriate actions are carried out by Trial and Error learning and their coordination allows to decide the appropriate action. Finally, the simulation results display the generalization and adaptation abilities of the system by testing it in new unexplored environments.
Amygdala and ventral striatum make distinct contributions to reinforcement learning

Science.gov (United States)

Costa, Vincent D.; Monte, Olga Dal; Lucas, Daniel R.; Murray, Elisabeth A.; Averbeck, Bruno B.

2016-01-01

Summary Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with a RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys’ choice reaction times, which emphasized a speed-accuracy tradeoff that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL. PMID:27720488
Active-learning strategies: the use of a game to reinforce learning in nursing education. A case study.

Science.gov (United States)

Boctor, Lisa

2013-03-01

The majority of nursing students are kinesthetic learners, preferring a hands-on, active approach to education. Research shows that active-learning strategies can increase student learning and satisfaction. This study looks at the use of one active-learning strategy, a Jeopardy-style game, 'Nursopardy', to reinforce Fundamentals of Nursing material, aiding in students' preparation for a standardized final exam. The game was created keeping students varied learning styles and the NCLEX blueprint in mind. The blueprint was used to create 5 categories, with 26 total questions. Student survey results, using a five-point Likert scale showed that they did find this learning method enjoyable and beneficial to learning. More research is recommended regarding learning outcomes, when using active-learning strategies, such as games. Copyright © 2012 Elsevier Ltd. All rights reserved.
Reinforcement learning for dpm of embedded visual sensor nodes

International Nuclear Information System (INIS)

Khani, U.; Sadhayo, I. H.

2014-01-01

This paper proposes a RL (Reinforcement Learning) based DPM (Dynamic Power Management) technique to learn time out policies during a visual sensor node's operation which has multiple power/performance states. As opposed to the widely used static time out policies, our proposed DPM policy which is also referred to as OLTP (Online Learning of Time out Policies), learns to dynamically change the time out decisions in the different node states including the non-operational states. The selection of time out values in different power/performance states of a visual sensing platform is based on the workload estimates derived from a ML-ANN (Multi-Layer Artificial Neural Network) and an objective function given by weighted performance and power parameters. The DPM approach is also able to dynamically adjust the power-performance weights online to satisfy a given constraint of either power consumption or performance. Results show that the proposed learning algorithm explores the power-performance tradeoff with non-stationary workload and outperforms other DPM policies. It also performs the online adjustment of the tradeoff parameters in order to meet a user-specified constraint. (author)
Identification-based chaos control via backstepping design using self-organizing fuzzy neural networks

International Nuclear Information System (INIS)

Peng Yafu; Hsu, C.-F.

2009-01-01

This paper proposes an identification-based adaptive backstepping control (IABC) for the chaotic systems. The IABC system is comprised of a neural backstepping controller and a robust compensation controller. The neural backstepping controller containing a self-organizing fuzzy neural network (SOFNN) identifier is the principal controller, and the robust compensation controller is designed to dispel the effect of minimum approximation error introduced by the SOFNN identifier. The SOFNN identifier is used to online estimate the chaotic dynamic function with structure and parameter learning phases of fuzzy neural network. The structure learning phase consists of the growing and pruning of fuzzy rules; thus the SOFNN identifier can avoid the time-consuming trial-and-error tuning procedure for determining the neural structure of fuzzy neural network. The parameter learning phase adjusts the interconnection weights of neural network to achieve favorable approximation performance. Finally, simulation results verify that the proposed IABC can achieve favorable tracking performance.
Adolescent-specific patterns of behavior and neural activity during social reinforcement learning

OpenAIRE

Jones, Rebecca M.; Somerville, Leah H.; Li, Jian; Ruberry, Erika J.; Powers, Alisa; Mehta, Natasha; Dyke, Jonathan; Casey, BJ

2014-01-01

Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The current study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated....
On Intuitionistic Fuzzy Filters of Intuitionistic Fuzzy Coframes

Directory of Open Access Journals (Sweden)

Rajesh K. Thumbakara

2013-01-01

Full Text Available Frame theory is the study of topology based on its open set lattice, and it was studied extensively by various authors. In this paper, we study quotients of intuitionistic fuzzy filters of an intuitionistic fuzzy coframe. The quotients of intuitionistic fuzzy filters are shown to be filters of the given intuitionistic fuzzy coframe. It is shown that the collection of all intuitionistic fuzzy filters of a coframe and the collection of all intutionistic fuzzy quotient filters of an intuitionistic fuzzy filter are coframes.
Now comes the time to defuzzify neuro-fuzzy models

International Nuclear Information System (INIS)

Bersini, H.; Bontempi, G.

1996-01-01

Fuzzy models present a singular Janus-faced : on one hand, they are knowledge-based software environments constructed from a collection of linguistic IF-THEN rules, and on the other hand, they realize nonlinear mappings which have interesting mathematical properties like low-order interpolation and universal function approximation. Neuro-fuzzy basically provides fuzzy models with the capacity, based on the available data, to compensate for the missing human knowledge by an automatic self-tuning of the structure and the parameters. A first consequence of this hybridization between the architectural and representational aspect of fuzzy models and the learning mechanisms of neural networks has been to progressively increase and fuzzify the contrast between the two Janus faces: readability or performance
High and low temperatures have unequal reinforcing properties in Drosophila spatial learning.

Science.gov (United States)

Zars, Melissa; Zars, Troy

2006-07-01

Small insects regulate their body temperature solely through behavior. Thus, sensing environmental temperature and implementing an appropriate behavioral strategy can be critical for survival. The fly Drosophila melanogaster prefers 24 degrees C, avoiding higher and lower temperatures when tested on a temperature gradient. Furthermore, temperatures above 24 degrees C have negative reinforcing properties. In contrast, we found that flies have a preference in operant learning experiments for a low-temperature-associated position rather than the 24 degrees C alternative in the heat-box. Two additional differences between high- and low-temperature reinforcement, i.e., temperatures above and below 24 degrees C, were found. Temperatures equally above and below 24 degrees C did not reinforce equally and only high temperatures supported increased memory performance with reversal conditioning. Finally, low- and high-temperature reinforced memories are similarly sensitive to two genetic mutations. Together these results indicate the qualitative meaning of temperatures below 24 degrees C depends on the dynamics of the temperatures encountered and that the reinforcing effects of these temperatures depend on at least some common genetic components. Conceptualizing these results using the Wolf-Heisenberg model of operant conditioning, we propose the maximum difference in experienced temperatures determines the magnitude of the reinforcement input to a conditioning circuit.
Spared internal but impaired external reward prediction error signals in major depressive disorder during reinforcement learning.

Science.gov (United States)

Bakic, Jasmina; Pourtois, Gilles; Jepma, Marieke; Duprat, Romain; De Raedt, Rudi; Baeken, Chris

2017-01-01

Major depressive disorder (MDD) creates debilitating effects on a wide range of cognitive functions, including reinforcement learning (RL). In this study, we sought to assess whether reward processing as such, or alternatively the complex interplay between motivation and reward might potentially account for the abnormal reward-based learning in MDD. A total of 35 treatment resistant MDD patients and 44 age matched healthy controls (HCs) performed a standard probabilistic learning task. RL was titrated using behavioral, computational modeling and event-related brain potentials (ERPs) data. MDD patients showed comparable learning rate compared to HCs. However, they showed decreased lose-shift responses as well as blunted subjective evaluations of the reinforcers used during the task, relative to HCs. Moreover, MDD patients showed normal internal (at the level of error-related negativity, ERN) but abnormal external (at the level of feedback-related negativity, FRN) reward prediction error (RPE) signals during RL, selectively when additional efforts had to be made to establish learning. Collectively, these results lend support to the assumption that MDD does not impair reward processing per se during RL. Instead, it seems to alter the processing of the emotional value of (external) reinforcers during RL, when additional intrinsic motivational processes have to be engaged. © 2016 Wiley Periodicals, Inc.
A TSK neuro-fuzzy approach for modeling highly dynamic systems

NARCIS (Netherlands)

Acampora, G.

2011-01-01

This paper introduces a new type of TSK-based neuro-fuzzy approach and its application to modeling highly dynamic systems. In details, our proposal performs an adaptive supervised learning on a collection of time series in order to create a so-called Timed Automata Based Fuzzy Controller, i.e. an
Evaluation of a Multi-Variable Self-Learning Fuzzy Logic Controller ...

African Journals Online (AJOL)

In spite of the usefulness of fuzzy control, its main drawback comes from lack of a systematic control design methodology. The most challenging aspect of the design of a fuzzy logic controller is the elicitation of the control rules for its rule base. In this paper, a scheme capable of elicitation of acceptable rules for multivariable ...
Emotion in reinforcement learning agents and robots: A survey

OpenAIRE

Moerland, T.M.; Broekens, D.J.; Jonker, C.M.

2018-01-01

This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action selection. Therefore, computational emotion models are usually grounded in the agent's decision making architecture, of which RL is an important subclass. Studying emotions in RL-based agents is useful for ...
Risk Mapping of Cutaneous Leishmaniasis via a Fuzzy C Means-based Neuro-Fuzzy Inference System

Science.gov (United States)

Akhavan, P.; Karimi, M.; Pahlavani, P.

2014-10-01

Finding pathogenic factors and how they are spread in the environment has become a global demand, recently. Cutaneous Leishmaniasis (CL) created by Leishmania is a special parasitic disease which can be passed on to human through phlebotomus of vector-born. Studies show that economic situation, cultural issues, as well as environmental and ecological conditions can affect the prevalence of this disease. In this study, Data Mining is utilized in order to predict CL prevalence rate and obtain a risk map. This case is based on effective environmental parameters on CL and a Neuro-Fuzzy system was also used. Learning capacity of Neuro-Fuzzy systems in neural network on one hand and reasoning power of fuzzy systems on the other, make it very efficient to use. In this research, in order to predict CL prevalence rate, an adaptive Neuro-fuzzy inference system with fuzzy inference structure of fuzzy C Means clustering was applied to determine the initial membership functions. Regarding to high incidence of CL in Ilam province, counties of Ilam, Mehran, and Dehloran have been examined and evaluated. The CL prevalence rate was predicted in 2012 by providing effective environmental map and topography properties including temperature, moisture, annual, rainfall, vegetation and elevation. Results indicate that the model precision with fuzzy C Means clustering structure rises acceptable RMSE values of both training and checking data and support our analyses. Using the proposed data mining technology, the pattern of disease spatial distribution and vulnerable areas become identifiable and the map can be used by experts and decision makers of public health as a useful tool in management and optimal decision-making.
Risk Mapping of Cutaneous Leishmaniasis via a Fuzzy C Means-based Neuro-Fuzzy Inference System

Directory of Open Access Journals (Sweden)

P. Akhavan

2014-10-01

Full Text Available Finding pathogenic factors and how they are spread in the environment has become a global demand, recently. Cutaneous Leishmaniasis (CL created by Leishmania is a special parasitic disease which can be passed on to human through phlebotomus of vector-born. Studies show that economic situation, cultural issues, as well as environmental and ecological conditions can affect the prevalence of this disease. In this study, Data Mining is utilized in order to predict CL prevalence rate and obtain a risk map. This case is based on effective environmental parameters on CL and a Neuro-Fuzzy system was also used. Learning capacity of Neuro-Fuzzy systems in neural network on one hand and reasoning power of fuzzy systems on the other, make it very efficient to use. In this research, in order to predict CL prevalence rate, an adaptive Neuro-fuzzy inference system with fuzzy inference structure of fuzzy C Means clustering was applied to determine the initial membership functions. Regarding to high incidence of CL in Ilam province, counties of Ilam, Mehran, and Dehloran have been examined and evaluated. The CL prevalence rate was predicted in 2012 by providing effective environmental map and topography properties including temperature, moisture, annual, rainfall, vegetation and elevation. Results indicate that the model precision with fuzzy C Means clustering structure rises acceptable RMSE values of both training and checking data and support our analyses. Using the proposed data mining technology, the pattern of disease spatial distribution and vulnerable areas become identifiable and the map can be used by experts and decision makers of public health as a useful tool in management and optimal decision-making.

Deep reinforcement learning for automated radiation adaptation in lung cancer.

Science.gov (United States)

Tseng, Huan-Hsin; Luo, Yi; Cui, Sunan; Chien, Jen-Tzung; Ten Haken, Randall K; Naqa, Issam El

2017-12-01

To investigate deep reinforcement learning (DRL) based on historical treatment plans for developing automated radiation adaptation protocols for nonsmall cell lung cancer (NSCLC) patients that aim to maximize tumor local control at reduced rates of radiation pneumonitis grade 2 (RP2). In a retrospective population of 114 NSCLC patients who received radiotherapy, a three-component neural networks framework was developed for deep reinforcement learning (DRL) of dose fractionation adaptation. Large-scale patient characteristics included clinical, genetic, and imaging radiomics features in addition to tumor and lung dosimetric variables. First, a generative adversarial network (GAN) was employed to learn patient population characteristics necessary for DRL training from a relatively limited sample size. Second, a radiotherapy artificial environment (RAE) was reconstructed by a deep neural network (DNN) utilizing both original and synthetic data (by GAN) to estimate the transition probabilities for adaptation of personalized radiotherapy patients' treatment courses. Third, a deep Q-network (DQN) was applied to the RAE for choosing the optimal dose in a response-adapted treatment setting. This multicomponent reinforcement learning approach was benchmarked against real clinical decisions that were applied in an adaptive dose escalation clinical protocol. In which, 34 patients were treated based on avid PET signal in the tumor and constrained by a 17.2% normal tissue complication probability (NTCP) limit for RP2. The uncomplicated cure probability (P+) was used as a baseline reward function in the DRL. Taking our adaptive dose escalation protocol as a blueprint for the proposed DRL (GAN + RAE + DQN) architecture, we obtained an automated dose adaptation estimate for use at ∼2/3 of the way into the radiotherapy treatment course. By letting the DQN component freely control the estimated adaptive dose per fraction (ranging from 1-5 Gy), the DRL automatically favored dose
Fuzziness-based active learning framework to enhance hyperspectral image classification performance for discriminative and generative classifiers.

Directory of Open Access Journals (Sweden)

Muhammad Ahmad

Full Text Available Hyperspectral image classification with a limited number of training samples without loss of accuracy is desirable, as collecting such data is often expensive and time-consuming. However, classifiers trained with limited samples usually end up with a large generalization error. To overcome the said problem, we propose a fuzziness-based active learning framework (FALF, in which we implement the idea of selecting optimal training samples to enhance generalization performance for two different kinds of classifiers, discriminative and generative (e.g. SVM and KNN. The optimal samples are selected by first estimating the boundary of each class and then calculating the fuzziness-based distance between each sample and the estimated class boundaries. Those samples that are at smaller distances from the boundaries and have higher fuzziness are chosen as target candidates for the training set. Through detailed experimentation on three publically available datasets, we showed that when trained with the proposed sample selection framework, both classifiers achieved higher classification accuracy and lower processing time with the small amount of training data as opposed to the case where the training samples were selected randomly. Our experiments demonstrate the effectiveness of our proposed method, which equates favorably with the state-of-the-art methods.
Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.

Science.gov (United States)

Schönberg, Tom; Daw, Nathaniel D; Joel, Daphna; O'Doherty, John P

2007-11-21

The computational framework of reinforcement learning has been used to forward our understanding of the neural mechanisms underlying reward learning and decision-making behavior. It is known that humans vary widely in their performance in decision-making tasks. Here, we used a simple four-armed bandit task in which subjects are almost evenly split into two groups on the basis of their performance: those who do learn to favor choice of the optimal action and those who do not. Using models of reinforcement learning we sought to determine the neural basis of these intrinsic differences in performance by scanning both groups with functional magnetic resonance imaging. We scanned 29 subjects while they performed the reward-based decision-making task. Our results suggest that these two groups differ markedly in the degree to which reinforcement learning signals in the striatum are engaged during task performance. While the learners showed robust prediction error signals in both the ventral and dorsal striatum during learning, the nonlearner group showed a marked absence of such signals. Moreover, the magnitude of prediction error signals in a region of dorsal striatum correlated significantly with a measure of behavioral performance across all subjects. These findings support a crucial role of prediction error signals, likely originating from dopaminergic midbrain neurons, in enabling learning of action selection preferences on the basis of obtained rewards. Thus, spontaneously observed individual differences in decision making performance demonstrate the suggested dependence of this type of learning on the functional integrity of the dopaminergic striatal system in humans.
Fundamentals of computational intelligence neural networks, fuzzy systems, and evolutionary computation

CERN Document Server

Keller, James M; Fogel, David B

2016-01-01

This book covers the three fundamental topics that form the basis of computational intelligence: neural networks, fuzzy systems, and evolutionary computation. The text focuses on inspiration, design, theory, and practical aspects of implementing procedures to solve real-world problems. While other books in the three fields that comprise computational intelligence are written by specialists in one discipline, this book is co-written by current former Editor-in-Chief of IEEE Transactions on Neural Networks and Learning Systems, a former Editor-in-Chief of IEEE Transactions on Fuzzy Systems, and the founding Editor-in-Chief of IEEE Transactions on Evolutionary Computation. The coverage across the three topics is both uniform and consistent in style and notation. Discusses single-layer and multilayer neural networks, radial-basi function networks, and recurrent neural networks Covers fuzzy set theory, fuzzy relations, fuzzy logic interference, fuzzy clustering and classification, fuzzy measures and fuzz...
Reinforcement Learning in Distributed Domains: Beyond Team Games

Science.gov (United States)

Wolpert, David H.; Sill, Joseph; Turner, Kagan

2000-01-01

Distributed search algorithms are crucial in dealing with large optimization problems, particularly when a centralized approach is not only impractical but infeasible. Many machine learning concepts have been applied to search algorithms in order to improve their effectiveness. In this article we present an algorithm that blends Reinforcement Learning (RL) and hill climbing directly, by using the RL signal to guide the exploration step of a hill climbing algorithm. We apply this algorithm to the domain of a constellations of communication satellites where the goal is to minimize the loss of importance weighted data. We introduce the concept of 'ghost' traffic, where correctly setting this traffic induces the satellites to act to optimize the world utility. Our results indicated that the bi-utility search introduced in this paper outperforms both traditional hill climbing algorithms and distributed RL approaches such as team games.
Cardiovascular Dysautonomias Diagnosis Using Crisp and Fuzzy Decision Tree: A Comparative Study.

Science.gov (United States)

Kadi, Ilham; Idri, Ali

2016-01-01

Decision trees (DTs) are one of the most popular techniques for learning classification systems, especially when it comes to learning from discrete examples. In real world, many data occurred in a fuzzy form. Hence a DT must be able to deal with such fuzzy data. In fact, integrating fuzzy logic when dealing with imprecise and uncertain data allows reducing uncertainty and providing the ability to model fine knowledge details. In this paper, a fuzzy decision tree (FDT) algorithm was applied on a dataset extracted from the ANS (Autonomic Nervous System) unit of the Moroccan university hospital Avicenne. This unit is specialized on performing several dynamic tests to diagnose patients with autonomic disorder and suggest them the appropriate treatment. A set of fuzzy classifiers were generated using FID 3.4. The error rates of the generated FDTs were calculated to measure their performances. Moreover, a comparison between the error rates obtained using crisp and FDTs was carried out and has proved that the results of FDTs were better than those obtained using crisp DTs.
Cerebellar and prefrontal cortex contributions to adaptation, strategies, and reinforcement learning.

Science.gov (United States)

Taylor, Jordan A; Ivry, Richard B

2014-01-01

Traditionally, motor learning has been studied as an implicit learning process, one in which movement errors are used to improve performance in a continuous, gradual manner. The cerebellum figures prominently in this literature given well-established ideas about the role of this system in error-based learning and the production of automatized skills. Recent developments have brought into focus the relevance of multiple learning mechanisms for sensorimotor learning. These include processes involving repetition, reinforcement learning, and strategy utilization. We examine these developments, considering their implications for understanding cerebellar function and how this structure interacts with other neural systems to support motor learning. Converging lines of evidence from behavioral, computational, and neuropsychological studies suggest a fundamental distinction between processes that use error information to improve action execution or action selection. While the cerebellum is clearly linked to the former, its role in the latter remains an open question. © 2014 Elsevier B.V. All rights reserved.
Fuzzy Mutual Information Based min-Redundancy and Max-Relevance Heterogeneous Feature Selection

Directory of Open Access Journals (Sweden)

Daren Yu

2011-08-01

Full Text Available Feature selection is an important preprocessing step in pattern classification and machine learning, and mutual information is widely used to measure relevance between features and decision. However, it is difficult to directly calculate relevance between continuous or fuzzy features using mutual information. In this paper we introduce the fuzzy information entropy and fuzzy mutual information for computing relevance between numerical or fuzzy features and decision. The relationship between fuzzy information entropy and differential entropy is also discussed. Moreover, we combine fuzzy mutual information with qmin-Redundancy-Max-Relevanceq, qMax-Dependencyq and min-Redundancy-Max-Dependencyq algorithms. The performance and stability of the proposed algorithms are tested on benchmark data sets. Experimental results show the proposed algorithms are effective and stable.
Type-2 fuzzy logic uncertain systems’ modeling and control

CERN Document Server

Antão, Rómulo

2017-01-01

This book focuses on a particular domain of Type-2 Fuzzy Logic, related to process modeling and control applications. It deepens readers’understanding of Type-2 Fuzzy Logic with regard to the following three topics: using simpler methods to train a Type-2 Takagi-Sugeno Fuzzy Model; using the principles of Type-2 Fuzzy Logic to reduce the influence of modeling uncertainties on a locally linear n-step ahead predictor; and developing model-based control algorithms according to the Generalized Predictive Control principles using Type-2 Fuzzy Sets. Throughout the book, theory is always complemented with practical applications and readers are invited to take their learning process one step farther and implement their own applications using the algorithms’ source codes (provided). As such, the book offers avaluable referenceguide for allengineers and researchers in the field ofcomputer science who are interested in intelligent systems, rule-based systems and modeling uncertainty.
Fuzzy Logic Inference System for Determining The Quality Assesment of Student’s Learning ICT

Directory of Open Access Journals (Sweden)

Agus Pamuji

2017-05-01

Full Text Available The Assesment that held in the school is one of the learning process in education who do it by teacher. One of the course that exemined is Computer Application. In the computer application have 3 topic, they are Microsoft Word, Microsoft Excel, Microsoft Power Point. The assesment for student’s at politecnic about learning computer application have 3 criteria in the selection. First of all, the students have ability to operate computer system generaly, it has understanding the formula on microsoft excel, the students have skill toward any application. In this study, fuzzy logic used for determining the quality assesment of stundent’s learning Information and Comunication Technology (ICT as a tools to analyze any constraint that are known as min-max method. As a result, we have found that the students have good for analyzing in the application from the each question or case of study when the course it has been examined.
Countable Fuzzy Topological Space and Countable Fuzzy Topological Vector Space

Directory of Open Access Journals (Sweden)

Apu Kumar Saha

2015-06-01

Full Text Available This paper deals with countable fuzzy topological spaces, a generalization of the notion of fuzzy topological spaces. A collection of fuzzy sets F on a universe X forms a countable fuzzy topology if in the definition of a fuzzy topology, the condition of arbitrary supremum is relaxed to countable supremum. In this generalized fuzzy structure, the continuity of fuzzy functions and some other related properties are studied. Also the class of countable fuzzy topological vector spaces as a generalization of the class of fuzzy topological vector spaces has been introduced and investigated.
Identification and prediction of dynamic systems using an interactively recurrent self-evolving fuzzy neural network.

Science.gov (United States)

Lin, Yang-Yin; Chang, Jyh-Yeong; Lin, Chin-Teng

2013-02-01

This paper presents a novel recurrent fuzzy neural network, called an interactively recurrent self-evolving fuzzy neural network (IRSFNN), for prediction and identification of dynamic systems. The recurrent structure in an IRSFNN is formed as an external loops and internal feedback by feeding the rule firing strength of each rule to others rules and itself. The consequent part in the IRSFNN is composed of a Takagi-Sugeno-Kang (TSK) or functional-link-based type. The proposed IRSFNN employs a functional link neural network (FLNN) to the consequent part of fuzzy rules for promoting the mapping ability. Unlike a TSK-type fuzzy neural network, the FLNN in the consequent part is a nonlinear function of input variables. An IRSFNNs learning starts with an empty rule base and all of the rules are generated and learned online through a simultaneous structure and parameter learning. An on-line clustering algorithm is effective in generating fuzzy rules. The consequent update parameters are derived by a variable-dimensional Kalman filter algorithm. The premise and recurrent parameters are learned through a gradient descent algorithm. We test the IRSFNN for the prediction and identification of dynamic plants and compare it to other well-known recurrent FNNs. The proposed model obtains enhanced performance results.
Neuro-fuzzy controller to navigate an unmanned vehicle.

Science.gov (United States)

Selma, Boumediene; Chouraqui, Samira

2013-12-01

A Neuro-fuzzy control method for an Unmanned Vehicle (UV) simulation is described. The objective is guiding an autonomous vehicle to a desired destination along a desired path in an environment characterized by a terrain and a set of distinct objects, such as obstacles like donkey traffic lights and cars circulating in the trajectory. The autonomous navigate ability and road following precision are mainly influenced by its control strategy and real-time control performance. Fuzzy Logic Controller can very well describe the desired system behavior with simple "if-then" relations owing the designer to derive "if-then" rules manually by trial and error. On the other hand, Neural Networks perform function approximation of a system but cannot interpret the solution obtained neither check if its solution is plausible. The two approaches are complementary. Combining them, Neural Networks will allow learning capability while Fuzzy-Logic will bring knowledge representation (Neuro-Fuzzy). In this paper, an artificial neural network fuzzy inference system (ANFIS) controller is described and implemented to navigate the autonomous vehicle. Results show several improvements in the control system adjusted by neuro-fuzzy techniques in comparison to the previous methods like Artificial Neural Network (ANN).
Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder.

Science.gov (United States)

Rothkirch, Marcus; Tonn, Jonas; Köhler, Stephan; Sterzer, Philipp

2017-04-01

According to current concepts, major depressive disorder is strongly related to dysfunctional neural processing of motivational information, entailing impairments in reinforcement learning. While computational modelling can reveal the precise nature of neural learning signals, it has not been used to study learning-related neural dysfunctions in unmedicated patients with major depressive disorder so far. We thus aimed at comparing the neural coding of reward and punishment prediction errors, representing indicators of neural learning-related processes, between unmedicated patients with major depressive disorder and healthy participants. To this end, a group of unmedicated patients with major depressive disorder (n = 28) and a group of age- and sex-matched healthy control participants (n = 30) completed an instrumental learning task involving monetary gains and losses during functional magnetic resonance imaging. The two groups did not differ in their learning performance. Patients and control participants showed the same level of prediction error-related activity in the ventral striatum and the anterior insula. In contrast, neural coding of reward prediction errors in the medial orbitofrontal cortex was reduced in patients. Moreover, neural reward prediction error signals in the medial orbitofrontal cortex and ventral striatum showed negative correlations with anhedonia severity. Using a standard instrumental learning paradigm we found no evidence for an overall impairment of reinforcement learning in medication-free patients with major depressive disorder. Importantly, however, the attenuated neural coding of reward in the medial orbitofrontal cortex and the relation between anhedonia and reduced reward prediction error-signalling in the medial orbitofrontal cortex and ventral striatum likely reflect an impairment in experiencing pleasure from rewarding events as a key mechanism of anhedonia in major depressive disorder. © The Author (2017). Published by Oxford
Development of neural network driven fuzzy controller for outlet sodium temperature of DHX

International Nuclear Information System (INIS)

Okusa, Kyoichi; Endou, Akira; Yoshikawa, Shinji; Ozawa, Kenji

1996-01-01

Fuzzy controls are capable to exquisitely control non-linear dynamic systems in wide operating range, using linguistic description to define the control law. However the selection and the definition of the fuzzy rules and sets require a tedious trial and error process based on experience. As a method to overcome this limitation, a neural network driven fuzzy control (NDF), where the learning capability of the neural network (NN) is used to build the fuzzy rules and sets, is presented in this paper. In the NDF control the IF part of a fuzzy control is represented by a multilayer NN while the THEN part is represented by a series of multilayer NNs which calculate the desirable control action. In this work the usual stepwise variable reduction method, used for the selection of the input variable in the THEN part NN, is replaced with a learning algorithm with forgetting mechanism that realizes the automatic reduction of the variables and the tuning up of all the fuzzy control law i.e. the membership function. The NDF has been successfully applied to control the outlet sodium temperature of a dump heat exchanger (DHX) of a FBR plant
Optimality Conditions for Fuzzy Number Quadratic Programming with Fuzzy Coefficients

Directory of Open Access Journals (Sweden)

Xue-Gang Zhou

2014-01-01

Full Text Available The purpose of the present paper is to investigate optimality conditions and duality theory in fuzzy number quadratic programming (FNQP in which the objective function is fuzzy quadratic function with fuzzy number coefficients and the constraint set is fuzzy linear functions with fuzzy number coefficients. Firstly, the equivalent quadratic programming of FNQP is presented by utilizing a linear ranking function and the dual of fuzzy number quadratic programming primal problems is introduced. Secondly, we present optimality conditions for fuzzy number quadratic programming. We then prove several duality results for fuzzy number quadratic programming problems with fuzzy coefficients.
Multiobjective Reinforcement Learning for Traffic Signal Control Using Vehicular Ad Hoc Network

Directory of Open Access Journals (Sweden)

Houli Duan

2010-01-01

Full Text Available We propose a new multiobjective control algorithm based on reinforcement learning for urban traffic signal control, named multi-RL. A multiagent structure is used to describe the traffic system. A vehicular ad hoc network is used for the data exchange among agents. A reinforcement learning algorithm is applied to predict the overall value of the optimization objective given vehicles' states. The policy which minimizes the cumulative value of the optimization objective is regarded as the optimal one. In order to make the method adaptive to various traffic conditions, we also introduce a multiobjective control scheme in which the optimization objective is selected adaptively to real-time traffic states. The optimization objectives include the vehicle stops, the average waiting time, and the maximum queue length of the next intersection. In addition, we also accommodate a priority control to the buses and the emergency vehicles through our model. The simulation results indicated that our algorithm could perform more efficiently than traditional traffic light control methods.
Estimation of Fuzzy Measures Using Covariance Matrices in Gaussian Mixtures

Directory of Open Access Journals (Sweden)

Nishchal K. Verma

2012-01-01

Full Text Available This paper presents a novel computational approach for estimating fuzzy measures directly from Gaussian mixtures model (GMM. The mixture components of GMM provide the membership functions for the input-output fuzzy sets. By treating consequent part as a function of fuzzy measures, we derived its coefficients from the covariance matrices found directly from GMM and the defuzzified output constructed from both the premise and consequent parts of the nonadditive fuzzy rules that takes the form of Choquet integral. The computational burden involved with the solution of λ-measure is minimized using Q-measure. The fuzzy model whose fuzzy measures were computed using covariance matrices found in GMM has been successfully applied on two benchmark problems and one real-time electric load data of Indian utility. The performance of the resulting model for many experimental studies including the above-mentioned application is found to be better and comparable to recent available fuzzy models. The main contribution of this paper is the estimation of fuzzy measures efficiently and directly from covariance matrices found in GMM, avoiding the computational burden greatly while learning them iteratively and solving polynomial equations of order of the number of input-output variables.
Fuzzy Itand#244; Integral Driven by a Fuzzy Brownian Motion

Directory of Open Access Journals (Sweden)

Didier Kumwimba Seya

2015-11-01

Full Text Available In this paper we take into account the fuzzy stochastic integral driven by fuzzy Brownian motion. To define the metric between two fuzzy numbers and to take into account the limit of a sequence of fuzzy numbers, we invoke the Hausdorff metric. First this fuzzy stochastic integral is constructed for fuzzy simple stochastic functions, then the construction is done for fuzzy stochastic integrable functions.
Rough-fuzzy pattern recognition applications in bioinformatics and medical imaging

CERN Document Server

Maji, Pradipta

2012-01-01

Learn how to apply rough-fuzzy computing techniques to solve problems in bioinformatics and medical image processing Emphasizing applications in bioinformatics and medical image processing, this text offers a clear framework that enables readers to take advantage of the latest rough-fuzzy computing techniques to build working pattern recognition models. The authors explain step by step how to integrate rough sets with fuzzy sets in order to best manage the uncertainties in mining large data sets. Chapters are logically organized according to the major phases of pattern recognition systems dev

The Study of Reinforcement Learning for Traffic Self-Adaptive Control under Multiagent Markov Game Environment

Directory of Open Access Journals (Sweden)

Lun-Hui Xu

2013-01-01

Full Text Available Urban traffic self-adaptive control problem is dynamic and uncertain, so the states of traffic environment are hard to be observed. Efficient agent which controls a single intersection can be discovered automatically via multiagent reinforcement learning. However, in the majority of the previous works on this approach, each agent needed perfect observed information when interacting with the environment and learned individually with less efficient coordination. This study casts traffic self-adaptive control as a multiagent Markov game problem. The design employs traffic signal control agent (TSCA for each signalized intersection that coordinates with neighboring TSCAs. A mathematical model for TSCAs’ interaction is built based on nonzero-sum markov game which has been applied to let TSCAs learn how to cooperate. A multiagent Markov game reinforcement learning approach is constructed on the basis of single-agent Q-learning. This method lets each TSCA learn to update its Q-values under the joint actions and imperfect information. The convergence of the proposed algorithm is analyzed theoretically. The simulation results show that the proposed method is convergent and effective in realistic traffic self-adaptive control setting.
Solving fully fuzzy transportation problem using pentagonal fuzzy numbers

Science.gov (United States)

Maheswari, P. Uma; Ganesan, K.

2018-04-01

In this paper, we propose a simple approach for the solution of fuzzy transportation problem under fuzzy environment in which the transportation costs, supplies at sources and demands at destinations are represented by pentagonal fuzzy numbers. The fuzzy transportation problem is solved without converting to its equivalent crisp form using a robust ranking technique and a new fuzzy arithmetic on pentagonal fuzzy numbers. To illustrate the proposed approach a numerical example is provided.
TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow

OpenAIRE

Hafner, Danijar; Davidson, James; Vanhoucke, Vincent

2017-01-01

We introduce TensorFlow Agents, an efficient infrastructure paradigm for building parallel reinforcement learning algorithms in TensorFlow. We simulate multiple environments in parallel, and group them to perform the neural network computation on a batch rather than individual observations. This allows the TensorFlow execution engine to parallelize computation, without the need for manual synchronization. Environments are stepped in separate Python processes to progress them in parallel witho...
Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia

Science.gov (United States)

Markou, Athina; Salamone, John D.; Bussey, Timothy; Mar, Adam; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

2013-01-01

The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu). A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. PMID:23994273
Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia.

Science.gov (United States)

Markou, Athina; Salamone, John D; Bussey, Timothy J; Mar, Adam C; Brunner, Daniela; Gilmour, Gary; Balsam, Peter

2013-11-01

The present review article summarizes and expands upon the discussions that were initiated during a meeting of the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS; http://cntrics.ucdavis.edu) meeting. A major goal of the CNTRICS meeting was to identify experimental procedures and measures that can be used in laboratory animals to assess psychological constructs that are related to the psychopathology of schizophrenia. The issues discussed in this review reflect the deliberations of the Motivation Working Group of the CNTRICS meeting, which included most of the authors of this article as well as additional participants. After receiving task nominations from the general research community, this working group was asked to identify experimental procedures in laboratory animals that can assess aspects of reinforcement learning and motivation that may be relevant for research on the negative symptoms of schizophrenia, as well as other disorders characterized by deficits in reinforcement learning and motivation. The tasks described here that assess reinforcement learning are the Autoshaping Task, Probabilistic Reward Learning Tasks, and the Response Bias Probabilistic Reward Task. The tasks described here that assess motivation are Outcome Devaluation and Contingency Degradation Tasks and Effort-Based Tasks. In addition to describing such methods and procedures, the present article provides a working vocabulary for research and theory in this field, as well as an industry perspective about how such tasks may be used in drug discovery. It is hoped that this review can aid investigators who are conducting research in this complex area, promote translational studies by highlighting shared research goals and fostering a common vocabulary across basic and clinical fields, and facilitate the development of medications for the treatment of symptoms mediated by reinforcement learning and motivational deficits. Copyright © 2013 Elsevier
Switching Reinforcement Learning for Continuous Action Space

Science.gov (United States)

Nagayoshi, Masato; Murao, Hajime; Tamaki, Hisashi

Reinforcement Learning (RL) attracts much attention as a technique of realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. This difficulty includes a problem of designing a suitable action space of an agent, i.e., satisfying two requirements in trade-off: (i) to keep the characteristics (or structure) of an original search space as much as possible in order to seek strategies that lie close to the optimal, and (ii) to reduce the search space as much as possible in order to expedite the learning process. In order to design a suitable action space adaptively, we propose switching RL model to mimic a process of an infant's motor development in which gross motor skills develop before fine motor skills. Then, a method for switching controllers is constructed by introducing and referring to the “entropy”. Further, through computational experiments by using robot navigation problems with one and two-dimensional continuous action space, the validity of the proposed method has been confirmed.
Diamond Fuzzy Number

Directory of Open Access Journals (Sweden)

T. Pathinathan

2015-01-01

Full Text Available In this paper we define diamond fuzzy number with the help of triangular fuzzy number. We include basic arithmetic operations like addition, subtraction of diamond fuzzy numbers with examples. We define diamond fuzzy matrix with some matrix properties. We have defined Nested diamond fuzzy number and Linked diamond fuzzy number. We have further classified Right Linked Diamond Fuzzy number and Left Linked Diamond Fuzzy number. Finally we have verified the arithmetic operations for the above mentioned types of Diamond Fuzzy Numbers.
Neural Control of a Tracking Task via Attention-Gated Reinforcement Learning for Brain-Machine Interfaces.

Science.gov (United States)

Wang, Yiwen; Wang, Fang; Xu, Kai; Zhang, Qiaosheng; Zhang, Shaomin; Zheng, Xiaoxiang

2015-05-01

Reinforcement learning (RL)-based brain machine interfaces (BMIs) enable the user to learn from the environment through interactions to complete the task without desired signals, which is promising for clinical applications. Previous studies exploited Q-learning techniques to discriminate neural states into simple directional actions providing the trial initial timing. However, the movements in BMI applications can be quite complicated, and the action timing explicitly shows the intention when to move. The rich actions and the corresponding neural states form a large state-action space, imposing generalization difficulty on Q-learning. In this paper, we propose to adopt attention-gated reinforcement learning (AGREL) as a new learning scheme for BMIs to adaptively decode high-dimensional neural activities into seven distinct movements (directional moves, holdings and resting) due to the efficient weight-updating. We apply AGREL on neural data recorded from M1 of a monkey to directly predict a seven-action set in a time sequence to reconstruct the trajectory of a center-out task. Compared to Q-learning techniques, AGREL could improve the target acquisition rate to 90.16% in average with faster convergence and more stability to follow neural activity over multiple days, indicating the potential to achieve better online decoding performance for more complicated BMI tasks.
Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing

Science.gov (United States)

Lefebvre, Germain; Blakemore, Sarah-Jayne

2017-01-01

Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice. PMID:28800597
Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing.

Science.gov (United States)

Palminteri, Stefano; Lefebvre, Germain; Kilford, Emma J; Blakemore, Sarah-Jayne

2017-08-01

Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice.
Within- and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory.

Science.gov (United States)

Collins, Anne G E; Frank, Michael J

2018-03-06

Learning from rewards and punishments is essential to survival and facilitates flexible human behavior. It is widely appreciated that multiple cognitive and reinforcement learning systems contribute to decision-making, but the nature of their interactions is elusive. Here, we leverage methods for extracting trial-by-trial indices of reinforcement learning (RL) and working memory (WM) in human electro-encephalography to reveal single-trial computations beyond that afforded by behavior alone. Neural dynamics confirmed that increases in neural expectation were predictive of reduced neural surprise in the following feedback period, supporting central tenets of RL models. Within- and cross-trial dynamics revealed a cooperative interplay between systems for learning, in which WM contributes expectations to guide RL, despite competition between systems during choice. Together, these results provide a deeper understanding of how multiple neural systems interact for learning and decision-making and facilitate analysis of their disruption in clinical populations.
Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

Directory of Open Access Journals (Sweden)

Nicolas Frémaux

2013-04-01

Full Text Available Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD learning of Doya (2000 to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.
Performance Comparison of Two Reinforcement Learning Algorithms for Small Mobile Robots

Czech Academy of Sciences Publication Activity Database

Neruda, Roman; Slušný, Stanislav

2009-01-01

Roč. 2, č. 1 (2009), s. 59-68 ISSN 2005-4297 R&D Projects: GA MŠk(CZ) 1M0567 Grant - others:GA UK(CZ) 7637/2007 Institutional research plan: CEZ:AV0Z10300504 Keywords : reinforcement learning * mobile robots * inteligent agents Subject RIV: IN - Informatics, Computer Science http://www.sersc.org/journals/IJCA/vol2_no1/7.pdf
IMPLEMENTATION OF MULTIAGENT REINFORCEMENT LEARNING MECHANISM FOR OPTIMAL ISLANDING OPERATION OF DISTRIBUTION NETWORK

DEFF Research Database (Denmark)

Saleem, Arshad; Lind, Morten

2008-01-01

among electric power utilities to utilize modern information and communication technologies (ICT) in order to improve the automation of the distribution system. In this paper we present our work for the implementation of a dynamic multi-agent based distributed reinforcement learning mechanism...
Neuro-Fuzzy DC Motor Speed Control Using Particle Swarm Optimization

Directory of Open Access Journals (Sweden)

Boumediene ALLAOUA

2009-12-01

Full Text Available This paper presents an application of Adaptive Neuro-Fuzzy Inference System (ANFIS control for DC motor speed optimized with swarm collective intelligence. First, the controller is designed according to Fuzzy rules such that the systems are fundamentally robust. Secondly, an adaptive Neuro-Fuzzy controller of the DC motor speed is then designed and simulated; the ANFIS has the advantage of expert knowledge of the Fuzzy inference system and the learning capability of neural networks. Finally, the ANFIS is optimized by Swarm Intelligence. Digital simulation results demonstrate that the deigned ANFIS-Swarm speed controller realize a good dynamic behavior of the DC motor, a perfect speed tracking with no overshoot, give better performance and high robustness than those obtained by the ANFIS alone.
Fuzzy forecasting based on fuzzy-trend logical relationship groups.

Science.gov (United States)

Chen, Shyi-Ming; Wang, Nai-Yi

2010-10-01

In this paper, we present a new method to predict the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) based on fuzzy-trend logical relationship groups (FTLRGs). The proposed method divides fuzzy logical relationships into FTLRGs based on the trend of adjacent fuzzy sets appearing in the antecedents of fuzzy logical relationships. First, we apply an automatic clustering algorithm to cluster the historical data into intervals of different lengths. Then, we define fuzzy sets based on these intervals of different lengths. Then, the historical data are fuzzified into fuzzy sets to derive fuzzy logical relationships. Then, we divide the fuzzy logical relationships into FTLRGs for forecasting the TAIEX. Moreover, we also apply the proposed method to forecast the enrollments and the inventory demand, respectively. The experimental results show that the proposed method gets higher average forecasting accuracy rates than the existing methods.
Genetic algorithms and fuzzy multiobjective optimization

CERN Document Server

Sakawa, Masatoshi

2002-01-01

Since the introduction of genetic algorithms in the 1970s, an enormous number of articles together with several significant monographs and books have been published on this methodology. As a result, genetic algorithms have made a major contribution to optimization, adaptation, and learning in a wide variety of unexpected fields. Over the years, many excellent books in genetic algorithm optimization have been published; however, they focus mainly on single-objective discrete or other hard optimization problems under certainty. There appears to be no book that is designed to present genetic algorithms for solving not only single-objective but also fuzzy and multiobjective optimization problems in a unified way. Genetic Algorithms And Fuzzy Multiobjective Optimization introduces the latest advances in the field of genetic algorithm optimization for 0-1 programming, integer programming, nonconvex programming, and job-shop scheduling problems under multiobjectiveness and fuzziness. In addition, the book treats a w...
Stochastic abstract policies: generalizing knowledge to improve reinforcement learning.

Science.gov (United States)

Koga, Marcelo L; Freire, Valdinei; Costa, Anna H R

2015-01-01

Reinforcement learning (RL) enables an agent to learn behavior by acquiring experience through trial-and-error interactions with a dynamic environment. However, knowledge is usually built from scratch and learning to behave may take a long time. Here, we improve the learning performance by leveraging prior knowledge; that is, the learner shows proper behavior from the beginning of a target task, using the knowledge from a set of known, previously solved, source tasks. In this paper, we argue that building stochastic abstract policies that generalize over past experiences is an effective way to provide such improvement and this generalization outperforms the current practice of using a library of policies. We achieve that contributing with a new algorithm, AbsProb-PI-multiple and a framework for transferring knowledge represented as a stochastic abstract policy in new RL tasks. Stochastic abstract policies offer an effective way to encode knowledge because the abstraction they provide not only generalizes solutions but also facilitates extracting the similarities among tasks. We perform experiments in a robotic navigation environment and analyze the agent's behavior throughout the learning process and also assess the transfer ratio for different amounts of source tasks. We compare our method with the transfer of a library of policies, and experiments show that the use of a generalized policy produces better results by more effectively guiding the agent when learning a target task.
Fuzzy Models to Deal with Sensory Data in Food Industry

Institute of Scientific and Technical Information of China (English)

Serge Guillaume; Brigitte Charnomordic

2004-01-01

Sensory data are, due to the lack of an absolute reference, imprecise and uncertain data. Fuzzy logic can handle uncertainty and can be used in approximate reasoning. Automatic learning procedures allow to generate fuzzy reasoning rules from data including numerical and symbolic or sensory variables. We briefly present an induction method that was developed to extract qualitative knowledge from data samples. The induction process is run under interpretability constraints to ensure the fuzzy rules have a meaning for the human expert. We then study two applied problems in the food industry: sensory evaluation and process modeling.
Fuzzeval: A Fuzzy Controller-Based Approach in Adaptive Learning for Backgammon Game

DEFF Research Database (Denmark)

Heinze, Mikael; Ortiz-Arroyo, Daniel; Larsen, Henrik Legind

2005-01-01

In this paper we investigate the effectiveness of applying fuzzy controllers to create strong computer player programs in the domain of backgammon. Fuzzeval, our proposed mechanism, consists of a fuzzy controller that dynamically evaluates the perceived strength of the board configurations it re-...

Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer.

Science.gov (United States)

Hu, Yujing; Gao, Yang; An, Bo

2015-07-01

An important approach in multiagent reinforcement learning (MARL) is equilibrium-based MARL, which adopts equilibrium solution concepts in game theory and requires agents to play equilibrium strategies at each state. However, most existing equilibrium-based MARL algorithms cannot scale due to a large number of computationally expensive equilibrium computations (e.g., computing Nash equilibria is PPAD-hard) during learning. For the first time, this paper finds that during the learning process of equilibrium-based MARL, the one-shot games corresponding to each state's successive visits often have the same or similar equilibria (for some states more than 90% of games corresponding to successive visits have similar equilibria). Inspired by this observation, this paper proposes to use equilibrium transfer to accelerate equilibrium-based MARL. The key idea of equilibrium transfer is to reuse previously computed equilibria when each agent has a small incentive to deviate. By introducing transfer loss and transfer condition, a novel framework called equilibrium transfer-based MARL is proposed. We prove that although equilibrium transfer brings transfer loss, equilibrium-based MARL algorithms can still converge to an equilibrium policy under certain assumptions. Experimental results in widely used benchmarks (e.g., grid world game, soccer game, and wall game) show that the proposed framework: 1) not only significantly accelerates equilibrium-based MARL (up to 96.7% reduction in learning time), but also achieves higher average rewards than algorithms without equilibrium transfer and 2) scales significantly better than algorithms without equilibrium transfer when the state/action space grows and the number of agents increases.
On the Fuzzy Convergence

Directory of Open Access Journals (Sweden)

Abdul Hameed Q. A. Al-Tai

2011-01-01

Full Text Available The aim of this paper is to introduce and study the fuzzy neighborhood, the limit fuzzy number, the convergent fuzzy sequence, the bounded fuzzy sequence, and the Cauchy fuzzy sequence on the base which is adopted by Abdul Hameed (every real number r is replaced by a fuzzy number r¯ (either triangular fuzzy number or singleton fuzzy set (fuzzy point. And then, we will consider that some results respect effect of the upper sequence on the convergent fuzzy sequence, the bounded fuzzy sequence, and the Cauchy fuzzy sequence.
Energy Management Strategy for a Hybrid Electric Vehicle Based on Deep Reinforcement Learning

OpenAIRE

Yue Hu; Weimin Li; Kun Xu; Taimoor Zahid; Feiyan Qin; Chenming Li

2018-01-01

An energy management strategy (EMS) is important for hybrid electric vehicles (HEVs) since it plays a decisive role on the performance of the vehicle. However, the variation of future driving conditions deeply influences the effectiveness of the EMS. Most existing EMS methods simply follow predefined rules that are not adaptive to different driving conditions online. Therefore, it is useful that the EMS can learn from the environment or driving cycle. In this paper, a deep reinforcement learn...
Reinforcement learning produces dominant strategies for the Iterated Prisoner's Dilemma.

Science.gov (United States)

Harper, Marc; Knight, Vincent; Jones, Martin; Koutsovoulos, Georgios; Glynatsi, Nikoleta E; Campbell, Owen

2017-01-01

We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.
Stability Analysis of Interconnected Fuzzy Systems Using the Fuzzy Lyapunov Method

Directory of Open Access Journals (Sweden)

Ken Yeh

2010-01-01

Full Text Available The fuzzy Lyapunov method is investigated for use with a class of interconnected fuzzy systems. The interconnected fuzzy systems consist of J interconnected fuzzy subsystems, and the stability analysis is based on Lyapunov functions. Based on traditional Lyapunov stability theory, we further propose a fuzzy Lyapunov method for the stability analysis of interconnected fuzzy systems. The fuzzy Lyapunov function is defined in fuzzy blending quadratic Lyapunov functions. Some stability conditions are derived through the use of fuzzy Lyapunov functions to ensure that the interconnected fuzzy systems are asymptotically stable. Common solutions can be obtained by solving a set of linear matrix inequalities (LMIs that are numerically feasible. Finally, simulations are performed in order to verify the effectiveness of the proposed stability conditions in this paper.
Stochastic Optimal Estimation with Fuzzy Random Variables and Fuzzy Kalman Filtering

Institute of Scientific and Technical Information of China (English)

FENG Yu-hu

2005-01-01

By constructing a mean-square performance index in the case of fuzzy random variable, the optimal estimation theorem for unknown fuzzy state using the fuzzy observation data are given. The state and output of linear discrete-time dynamic fuzzy system with Gaussian noise are Gaussian fuzzy random variable sequences. An approach to fuzzy Kalman filtering is discussed. Fuzzy Kalman filtering contains two parts: a real-valued non-random recurrence equation and the standard Kalman filtering.
Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex.

Directory of Open Access Journals (Sweden)

George L Chadderdon

Full Text Available Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint "forearm" to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1, no learning (0, or punishment (-1, corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.
Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex.

Science.gov (United States)

Chadderdon, George L; Neymotin, Samuel A; Kerr, Cliff C; Lytton, William W

2012-01-01

Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint "forearm" to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1), no learning (0), or punishment (-1), corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.
Intrinsically motivated reinforcement learning for human-robot interaction in the real-world.

Science.gov (United States)

Qureshi, Ahmed Hussain; Nakamura, Yutaka; Yoshikawa, Yuichiro; Ishiguro, Hiroshi

2018-03-26

For a natural social human-robot interaction, it is essential for a robot to learn the human-like social skills. However, learning such skills is notoriously hard due to the limited availability of direct instructions from people to teach a robot. In this paper, we propose an intrinsically motivated reinforcement learning framework in which an agent gets the intrinsic motivation-based rewards through the action-conditional predictive model. By using the proposed method, the robot learned the social skills from the human-robot interaction experiences gathered in the real uncontrolled environments. The results indicate that the robot not only acquired human-like social skills but also took more human-like decisions, on a test dataset, than a robot which received direct rewards for the task achievement. Copyright © 2018 Elsevier Ltd. All rights reserved.
Knowledge-Based Reinforcement Learning for Data Mining

Science.gov (United States)

Kudenko, Daniel; Grzes, Marek

Data Mining is the process of extracting patterns from data. Two general avenues of research in the intersecting areas of agents and data mining can be distinguished. The first approach is concerned with mining an agent’s observation data in order to extract patterns, categorize environment states, and/or make predictions of future states. In this setting, data is normally available as a batch, and the agent’s actions and goals are often independent of the data mining task. The data collection is mainly considered as a side effect of the agent’s activities. Machine learning techniques applied in such situations fall into the class of supervised learning. In contrast, the second scenario occurs where an agent is actively performing the data mining, and is responsible for the data collection itself. For example, a mobile network agent is acquiring and processing data (where the acquisition may incur a certain cost), or a mobile sensor agent is moving in a (perhaps hostile) environment, collecting and processing sensor readings. In these settings, the tasks of the agent and the data mining are highly intertwined and interdependent (or even identical). Supervised learning is not a suitable technique for these cases. Reinforcement Learning (RL) enables an agent to learn from experience (in form of reward and punishment for explorative actions) and adapt to new situations, without a teacher. RL is an ideal learning technique for these data mining scenarios, because it fits the agent paradigm of continuous sensing and acting, and the RL agent is able to learn to make decisions on the sampling of the environment which provides the data. Nevertheless, RL still suffers from scalability problems, which have prevented its successful use in many complex real-world domains. The more complex the tasks, the longer it takes a reinforcement learning algorithm to converge to a good solution. For many real-world tasks, human expert knowledge is available. For example, human
Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions.

Science.gov (United States)

Tamosiunaite, Minija; Asfour, Tamim; Wörgötter, Florentin

2009-03-01

Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult.
Neuro-fuzzy controller of low head hydropower plants using adaptive-network based fuzzy inference system

Energy Technology Data Exchange (ETDEWEB)

Djukanovic, M.B. [Inst. Nikola Tesla, Belgrade (Yugoslavia). Dept. of Power Systems; Calovic, M.S. [Univ. of Belgrade (Yugoslavia). Dept. of Electrical Engineering; Vesovic, B.V. [Inst. Mihajlo Pupin, Belgrade (Yugoslavia). Dept. of Automatic Control; Sobajic, D.J. [Electric Power Research Inst., Palo Alto, CA (United States)

1997-12-01

This paper presents an attempt of nonlinear, multivariable control of low-head hydropower plants, by using adaptive-network based fuzzy inference system (ANFIS). The new design technique enhances fuzzy controllers with self-learning capability for achieving prescribed control objectives in a near optimal manner. The controller has flexibility for accepting more sensory information, with the main goal to improve the generator unit transients, by adjusting the exciter input, the wicket gate and runner blade positions. The developed ANFIS controller whose control signals are adjusted by using incomplete on-line measurements, can offer better damping effects to generator oscillations over a wide range of operating conditions, than conventional controllers. Digital simulations of hydropower plant equipped with low-head Kaplan turbine are performed and the comparisons of conventional excitation-governor control, state-feedback optimal control and ANFIS based output feedback control are presented. To demonstrate the effectiveness of the proposed control scheme and the robustness of the acquired neuro-fuzzy controller, the controller has been implemented on a complex high-order non-linear hydrogenerator model.
Reducing the Complexity of Genetic Fuzzy Classifiers in Highly-Dimensional Classification Problems

Directory of Open Access Journals (Sweden)

DimitrisG. Stavrakoudis

2012-04-01

Full Text Available This paper introduces the Fast Iterative Rule-based Linguistic Classifier (FaIRLiC, a Genetic Fuzzy Rule-Based Classification System (GFRBCS which targets at reducing the structural complexity of the resulting rule base, as well as its learning algorithm's computational requirements, especially when dealing with high-dimensional feature spaces. The proposed methodology follows the principles of the iterative rule learning (IRL approach, whereby a rule extraction algorithm (REA is invoked in an iterative fashion, producing one fuzzy rule at a time. The REA is performed in two successive steps: the first one selects the relevant features of the currently extracted rule, whereas the second one decides the antecedent part of the fuzzy rule, using the previously selected subset of features. The performance of the classifier is finally optimized through a genetic tuning post-processing stage. Comparative results in a hyperspectral remote sensing classification as well as in 12 real-world classification datasets indicate the effectiveness of the proposed methodology in generating high-performing and compact fuzzy rule-based classifiers, even for very high-dimensional feature spaces.
Evolving fuzzy rules for relaxed-criteria negotiation.

Science.gov (United States)

Sim, Kwang Mong

2008-12-01

In the literature on automated negotiation, very few negotiation agents are designed with the flexibility to slightly relax their negotiation criteria to reach a consensus more rapidly and with more certainty. Furthermore, these relaxed-criteria negotiation agents were not equipped with the ability to enhance their performance by learning and evolving their relaxed-criteria negotiation rules. The impetus of this work is designing market-driven negotiation agents (MDAs) that not only have the flexibility of relaxing bargaining criteria using fuzzy rules, but can also evolve their structures by learning new relaxed-criteria fuzzy rules to improve their negotiation outcomes as they participate in negotiations in more e-markets. To this end, an evolutionary algorithm for adapting and evolving relaxed-criteria fuzzy rules was developed. Implementing the idea in a testbed, two kinds of experiments for evaluating and comparing EvEMDAs (MDAs with relaxed-criteria rules that are evolved using the evolutionary algorithm) and EMDAs (MDAs with relaxed-criteria rules that are manually constructed) were carried out through stochastic simulations. Empirical results show that: 1) EvEMDAs generally outperformed EMDAs in different types of e-markets and 2) the negotiation outcomes of EvEMDAs generally improved as they negotiated in more e-markets.
Complex Fuzzy Set-Valued Complex Fuzzy Measures and Their Properties

Science.gov (United States)

Ma, Shengquan; Li, Shenggang

2014-01-01

Let F*(K) be the set of all fuzzy complex numbers. In this paper some classical and measure-theoretical notions are extended to the case of complex fuzzy sets. They are fuzzy complex number-valued distance on F*(K), fuzzy complex number-valued measure on F*(K), and some related notions, such as null-additivity, pseudo-null-additivity, null-subtraction, pseudo-null-subtraction, autocontionuous from above, autocontionuous from below, and autocontinuity of the defined fuzzy complex number-valued measures. Properties of fuzzy complex number-valued measures are studied in detail. PMID:25093202
Reinforcement learning produces dominant strategies for the Iterated Prisoner's Dilemma.

Directory of Open Access Journals (Sweden)

Marc Harper

Full Text Available We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms. These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.
Hierarchical type-2 fuzzy aggregation of fuzzy controllers

CERN Document Server

Cervantes, Leticia

2016-01-01

This book focuses on the fields of fuzzy logic, granular computing and also considering the control area. These areas can work together to solve various control problems, the idea is that this combination of areas would enable even more complex problem solving and better results. In this book we test the proposed method using two benchmark problems: the total flight control and the problem of water level control for a 3 tank system. When fuzzy logic is used it make it easy to performed the simulations, these fuzzy systems help to model the behavior of a real systems, using the fuzzy systems fuzzy rules are generated and with this can generate the behavior of any variable depending on the inputs and linguistic value. For this reason this work considers the proposed architecture using fuzzy systems and with this improve the behavior of the complex control problems.
Fuzzy portfolio model with fuzzy-input return rates and fuzzy-output proportions

Science.gov (United States)

Tsaur, Ruey-Chyn

2015-02-01

In the finance market, a short-term investment strategy is usually applied in portfolio selection in order to reduce investment risk; however, the economy is uncertain and the investment period is short. Further, an investor has incomplete information for selecting a portfolio with crisp proportions for each chosen security. In this paper we present a new method of constructing fuzzy portfolio model for the parameters of fuzzy-input return rates and fuzzy-output proportions, based on possibilistic mean-standard deviation models. Furthermore, we consider both excess or shortage of investment in different economic periods by using fuzzy constraint for the sum of the fuzzy proportions, and we also refer to risks of securities investment and vagueness of incomplete information during the period of depression economics for the portfolio selection. Finally, we present a numerical example of a portfolio selection problem to illustrate the proposed model and a sensitivity analysis is realised based on the results.
Memory Transformation Enhances Reinforcement Learning in Dynamic Environments.

Science.gov (United States)

Santoro, Adam; Frankland, Paul W; Richards, Blake A

2016-11-30

Over the course of systems consolidation, there is a switch from a reliance on detailed episodic memories to generalized schematic memories. This switch is sometimes referred to as "memory transformation." Here we demonstrate a previously unappreciated benefit of memory transformation, namely, its ability to enhance reinforcement learning in a dynamic environment. We developed a neural network that is trained to find rewards in a foraging task where reward locations are continuously changing. The network can use memories for specific locations (episodic memories) and statistical patterns of locations (schematic memories) to guide its search. We find that switching from an episodic to a schematic strategy over time leads to enhanced performance due to the tendency for the reward location to be highly correlated with itself in the short-term, but regress to a stable distribution in the long-term. We also show that the statistics of the environment determine the optimal utilization of both types of memory. Our work recasts the theoretical question of why memory transformation occurs, shifting the focus from the avoidance of memory interference toward the enhancement of reinforcement learning across multiple timescales. As time passes, memories transform from a highly detailed state to a more gist-like state, in a process called "memory transformation." Theories of memory transformation speak to its advantages in terms of reducing memory interference, increasing memory robustness, and building models of the environment. However, the role of memory transformation from the perspective of an agent that continuously acts and receives reward in its environment is not well explored. In this work, we demonstrate a view of memory transformation that defines it as a way of optimizing behavior across multiple timescales. Copyright © 2016 the authors 0270-6474/16/3612228-15$15.00/0.
A Fuzzy Control System for Inductive Video Games

OpenAIRE

Lara-Alvarez, Carlos; Mitre-Hernandez, Hugo; Flores, Juan; Fuentes, Maria

2017-01-01

It has been shown that the emotional state of students has an important relationship with learning; for instance, engaged concentration is positively correlated with learning. This paper proposes the Inductive Control (IC) for educational games. Unlike conventional approaches that only modify the game level, the proposed technique also induces emotions in the player for supporting the learning process. This paper explores a fuzzy system that analyzes the players' performance and their emotion...

Multi-Objective Reinforcement Learning-Based Deep Neural Networks for Cognitive Space Communications

Science.gov (United States)

Ferreria, Paulo Victor R.; Paffenroth, Randy; Wyglinski, Alexander M.; Hackett, Timothy M.; Bilen, Sven G.; Reinhart, Richard C.; Mortensen, Dale J.

2017-01-01

Future communication subsystems of space exploration missions can potentially benefit from software-defined radios (SDRs) controlled by machine learning algorithms. In this paper, we propose a novel hybrid radio resource allocation management control algorithm that integrates multi-objective reinforcement learning and deep artificial neural networks. The objective is to efficiently manage communications system resources by monitoring performance functions with common dependent variables that result in conflicting goals. The uncertainty in the performance of thousands of different possible combinations of radio parameters makes the trade-off between exploration and exploitation in reinforcement learning (RL) much more challenging for future critical space-based missions. Thus, the system should spend as little time as possible on exploring actions, and whenever it explores an action, it should perform at acceptable levels most of the time. The proposed approach enables on-line learning by interactions with the environment and restricts poor resource allocation performance through virtual environment exploration. Improvements in the multiobjective performance can be achieved via transmitter parameter adaptation on a packet-basis, with poorly predicted performance promptly resulting in rejected decisions. Simulations presented in this work considered the DVB-S2 standard adaptive transmitter parameters and additional ones expected to be present in future adaptive radio systems. Performance results are provided by analysis of the proposed hybrid algorithm when operating across a satellite communication channel from Earth to GEO orbit during clear sky conditions. The proposed approach constitutes part of the core cognitive engine proof-of-concept to be delivered to the NASA Glenn Research Center SCaN Testbed located onboard the International Space Station.
Combining fuzzy mathematics with fuzzy logic to solve business management problems

Science.gov (United States)

Vrba, Joseph A.

1993-12-01

Fuzzy logic technology has been applied to control problems with great success. Because of this, many observers fell that fuzzy logic is applicable only in the control arena. However, business management problems almost never deal with crisp values. Fuzzy systems technology--a combination of fuzzy logic, fuzzy mathematics and a graphical user interface--is a natural fit for developing software to assist in typical business activities such as planning, modeling and estimating. This presentation discusses how fuzzy logic systems can be extended through the application of fuzzy mathematics and the use of a graphical user interface to make the information contained in fuzzy numbers accessible to business managers. As demonstrated through examples from actual deployed systems, this fuzzy systems technology has been employed successfully to provide solutions to the complex real-world problems found in the business environment.
Fuzzy Languages

Science.gov (United States)

Rahonis, George

The theory of fuzzy recognizable languages over bounded distributive lattices is presented as a paradigm of recognizable formal power series. Due to the idempotency properties of bounded distributive lattices, the equality of fuzzy recognizable languages is decidable, the determinization of multi-valued automata is effective, and a pumping lemma exists. Fuzzy recognizable languages over finite and infinite words are expressively equivalent to sentences of the multi-valued monadic second-order logic. Fuzzy recognizability over bounded ℓ-monoids and residuated lattices is briefly reported. The chapter concludes with two applications of fuzzy recognizable languages to real world problems in medicine.
Fuzzy logic and neural networks in artificial intelligence and pattern recognition

Science.gov (United States)

Sanchez, Elie

1991-10-01

With the use of fuzzy logic techniques, neural computing can be integrated in symbolic reasoning to solve complex real world problems. In fact, artificial neural networks, expert systems, and fuzzy logic systems, in the context of approximate reasoning, share common features and techniques. A model of Fuzzy Connectionist Expert System is introduced, in which an artificial neural network is designed to construct the knowledge base of an expert system from, training examples (this model can also be used for specifications of rules in fuzzy logic control). Two types of weights are associated with the synaptic connections in an AND-OR structure: primary linguistic weights, interpreted as labels of fuzzy sets, and secondary numerical weights. Cell activation is computed through min-max fuzzy equations of the weights. Learning consists in finding the (numerical) weights and the network topology. This feedforward network is described and first illustrated in a biomedical application (medical diagnosis assistance from inflammatory-syndromes/proteins profiles). Then, it is shown how this methodology can be utilized for handwritten pattern recognition (characters play the role of diagnoses): in a fuzzy neuron describing a number for example, the linguistic weights represent fuzzy sets on cross-detecting lines and the numerical weights reflect the importance (or weakness) of connections between cross-detecting lines and characters.
Applying Fuzzy Possibilistic Methods on Critical Objects

DEFF Research Database (Denmark)

Yazdani, Hossein; Ortiz-Arroyo, Daniel; Choros, Kazimierz

2016-01-01

Providing a ﬂexible environment to process data objects is a desirable goal of machine learning algorithms. In fuzzy and possibilistic methods, the relevance of data objects is evaluated and a membership degree is assigned. However, some critical objects objects have the potential ability to affect...... the performance of the clustering algorithms if they remain in a speciﬁc cluster or they are moved into another. In this paper we analyze and compare how critical objects affect the behaviour of fuzzy possibilistic methods in several data sets. The comparison is based on the accuracy and ability of learning...... methods to provide a proper searching space for data objects. The membership functions used by each method when dealing with critical objects is also evaluated. Our results show that relaxing the conditions of participation for data objects in as many partitions as they can, is beneﬁcial....
A Day-to-Day Route Choice Model Based on Reinforcement Learning

Directory of Open Access Journals (Sweden)

Fangfang Wei

2014-01-01

Full Text Available Day-to-day traffic dynamics are generated by individual traveler’s route choice and route adjustment behaviors, which are appropriate to be researched by using agent-based model and learning theory. In this paper, we propose a day-to-day route choice model based on reinforcement learning and multiagent simulation. Travelers’ memory, learning rate, and experience cognition are taken into account. Then the model is verified and analyzed. Results show that the network flow can converge to user equilibrium (UE if travelers can remember all the travel time they have experienced, but which is not necessarily the case under limited memory; learning rate can strengthen the flow fluctuation, but memory leads to the contrary side; moreover, high learning rate results in the cyclical oscillation during the process of flow evolution. Finally, both the scenarios of link capacity degradation and random link capacity are used to illustrate the model’s applications. Analyses and applications of our model demonstrate the model is reasonable and useful for studying the day-to-day traffic dynamics.
Reinforcement Learning Based Data Self-Destruction Scheme for Secured Data Management

Directory of Open Access Journals (Sweden)

Young Ki Kim

2018-04-01

Full Text Available As technologies and services that leverage cloud computing have evolved, the number of businesses and individuals who use them are increasing rapidly. In the course of using cloud services, as users store and use data that include personal information, research on privacy protection models to protect sensitive information in the cloud environment is becoming more important. As a solution to this problem, a self-destructing scheme has been proposed that prevents the decryption of encrypted user data after a certain period of time using a Distributed Hash Table (DHT network. However, the existing self-destructing scheme does not mention how to set the number of key shares and the threshold value considering the environment of the dynamic DHT network. This paper proposes a method to set the parameters to generate the key shares needed for the self-destructing scheme considering the availability and security of data. The proposed method defines state, action, and reward of the reinforcement learning model based on the similarity of the graph, and applies the self-destructing scheme process by updating the parameter based on the reinforcement learning model. Through the proposed technique, key sharing parameters can be set in consideration of data availability and security in dynamic DHT network environments.
Bio-robots automatic navigation with graded electric reward stimulation based on Reinforcement Learning.

Science.gov (United States)

Zhang, Chen; Sun, Chao; Gao, Liqiang; Zheng, Nenggan; Chen, Weidong; Zheng, Xiaoxiang

2013-01-01

Bio-robots based on brain computer interface (BCI) suffer from the lack of considering the characteristic of the animals in navigation. This paper proposed a new method for bio-robots' automatic navigation combining the reward generating algorithm base on Reinforcement Learning (RL) with the learning intelligence of animals together. Given the graded electrical reward, the animal e.g. the rat, intends to seek the maximum reward while exploring an unknown environment. Since the rat has excellent spatial recognition, the rat-robot and the RL algorithm can convergent to an optimal route by co-learning. This work has significant inspiration for the practical development of bio-robots' navigation with hybrid intelligence.
Relational Demonic Fuzzy Refinement

Directory of Open Access Journals (Sweden)

Fairouz Tchier

2014-01-01

Full Text Available We use relational algebra to define a refinement fuzzy order called demonic fuzzy refinement and also the associated fuzzy operators which are fuzzy demonic join (⊔fuz, fuzzy demonic meet (⊓fuz, and fuzzy demonic composition (□fuz. Our definitions and properties are illustrated by some examples using mathematica software (fuzzy logic.
Analytical fuzzy approach to biological data analysis

Directory of Open Access Journals (Sweden)

Weiping Zhang

2017-03-01

Full Text Available The assessment of the physiological state of an individual requires an objective evaluation of biological data while taking into account both measurement noise and uncertainties arising from individual factors. We suggest to represent multi-dimensional medical data by means of an optimal fuzzy membership function. A carefully designed data model is introduced in a completely deterministic framework where uncertain variables are characterized by fuzzy membership functions. The study derives the analytical expressions of fuzzy membership functions on variables of the multivariate data model by maximizing the over-uncertainties-averaged-log-membership values of data samples around an initial guess. The analytical solution lends itself to a practical modeling algorithm facilitating the data classification. The experiments performed on the heartbeat interval data of 20 subjects verified that the proposed method is competing alternative to typically used pattern recognition and machine learning algorithms.
Hybrid Multi-objective Forecasting of Solar Photovoltaic Output Using Kalman Filter based Interval Type-2 Fuzzy Logic System

DEFF Research Database (Denmark)

Hassan, Saima; Ahmadieh Khanesar, Mojtaba; Hajizadeh, Amin

2017-01-01

Learning of fuzzy parameters for system modeling using evolutionary algorithms is an interesting topic. In this paper, two optimal design and tuning of Interval type-2 fuzzy logic system are proposed using hybrid learning algorithms. The consequent parameters of the interval type-2 fuzzy logic...... system in both the hybrid algorithms are tuned using Kalman filter. Whereas the antecedent parameters of the system in the first hybrid algorithm is optimized using the multi-objective particle swarm optimization (MOPSO) and using the multi-objective evolutionary algorithm Based on Decomposition (MOEA...
A New Fuzzy Cognitive Map Learning Algorithm for Speech Emotion Recognition

Directory of Open Access Journals (Sweden)

Wei Zhang

2017-01-01

Full Text Available Selecting an appropriate recognition method is crucial in speech emotion recognition applications. However, the current methods do not consider the relationship between emotions. Thus, in this study, a speech emotion recognition system based on the fuzzy cognitive map (FCM approach is constructed. Moreover, a new FCM learning algorithm for speech emotion recognition is proposed. This algorithm includes the use of the pleasure-arousal-dominance emotion scale to calculate the weights between emotions and certain mathematical derivations to determine the network structure. The proposed algorithm can handle a large number of concepts, whereas a typical FCM can handle only relatively simple networks (maps. Different acoustic features, including fundamental speech features and a new spectral feature, are extracted to evaluate the performance of the proposed method. Three experiments are conducted in this paper, namely, single feature experiment, feature combination experiment, and comparison between the proposed algorithm and typical networks. All experiments are performed on TYUT2.0 and EMO-DB databases. Results of the feature combination experiments show that the recognition rates of the combination features are 10%–20% better than those of single features. The proposed FCM learning algorithm generates 5%–20% performance improvement compared with traditional classification networks.
Fuzzy logic

CERN Document Server

Smets, P

1995-01-01

We start by describing the nature of imperfect data, and giving an overview of the various models that have been proposed. Fuzzy sets theory is shown to be an extension of classical set theory, and as such has a proeminent role or modelling imperfect data. The mathematic of fuzzy sets theory is detailled, in particular the role of the triangular norms. The use of fuzzy sets theory in fuzzy logic and possibility theory,the nature of the generalized modus ponens and of the implication operator for approximate reasoning are analysed. The use of fuzzy logic is detailled for application oriented towards process control and database problems.
Fuzzy Neuroidal Nets and Recurrent Fuzzy Computations

Czech Academy of Sciences Publication Activity Database

Wiedermann, Jiří

2001-01-01

Roč. 11, č. 6 (2001), s. 675-686 ISSN 1210-0552. [SOFSEM 2001 Workshop on Soft Computing. Piešťany, 29.11.2001-30.11.2001] R&D Projects: GA ČR GA201/00/1489; GA AV ČR KSK1019101 Institutional research plan: AV0Z1030915 Keywords : fuzzy computing * fuzzy neural nets * fuzzy Turing machines * non-uniform computational complexity Subject RIV: BA - General Mathematics
Curiosity driven reinforcement learning for motion planning on humanoids

Science.gov (United States)

Frank, Mikhail; Leitner, Jürgen; Stollenga, Marijn; Förster, Alexander; Schmidhuber, Jürgen

2014-01-01

Most previous work on artificial curiosity (AC) and intrinsic motivation focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study AC in a more realistic setting, we embody a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning (RL) framework consists of a state-of-the-art, low-level, reactive control layer, which controls the iCub while respecting constraints, and a high-level curious agent, which explores the iCub's state-action space through information gain maximization, learning a world model from experience, controlling the actual iCub hardware in real-time. To the best of our knowledge, this is the first ever embodied, curious agent for real-time motion planning on a humanoid. We demonstrate that it can learn compact Markov models to represent large regions of the iCub's configuration space, and that the iCub explores intelligently, showing interest in its physical constraints as well as in objects it finds in its environment. PMID:24432001
A Fuzzy Collaborative Forecasting Approach for Forecasting the Productivity of a Factory

Directory of Open Access Journals (Sweden)

Yi-Chi Wang

2013-01-01

Full Text Available Productivity is always considered as one of the most basic and important factors to the competitiveness of a factory. For this reason, all factories have sought to enhance productivity. To achieve this goal, we first need to estimate the productivity. However, there is considerable degree of uncertainty in productivity. For this reason, a fuzzy collaborative forecasting approach is proposed in this study to forecast the productivity of a factory. First, a learning model is established to estimate the future productivity. Subsequently, the learning model is converted into three equivalent nonlinear programming problems to be solved from various viewpoints. The fuzzy productivity forecasts by different experts may not be equal and should therefore be aggregated. To this end, the fuzzy intersection and back propagation network approach is applied. The practical example of a dynamic random access memory (DRAM factory is used to evaluate the effectiveness of the proposed methodology.
Tunnel Ventilation Control Using Reinforcement Learning Methodology

Science.gov (United States)

Chu, Baeksuk; Kim, Dongnam; Hong, Daehie; Park, Jooyoung; Chung, Jin Taek; Kim, Tae-Hyung

The main purpose of tunnel ventilation system is to maintain CO pollutant concentration and VI (visibility index) under an adequate level to provide drivers with comfortable and safe driving environment. Moreover, it is necessary to minimize power consumption used to operate ventilation system. To achieve the objectives, the control algorithm used in this research is reinforcement learning (RL) method. RL is a goal-directed learning of a mapping from situations to actions without relying on exemplary supervision or complete models of the environment. The goal of RL is to maximize a reward which is an evaluative feedback from the environment. In the process of constructing the reward of the tunnel ventilation system, two objectives listed above are included, that is, maintaining an adequate level of pollutants and minimizing power consumption. RL algorithm based on actor-critic architecture and gradient-following algorithm is adopted to the tunnel ventilation system. The simulations results performed with real data collected from existing tunnel ventilation system and real experimental verification are provided in this paper. It is confirmed that with the suggested controller, the pollutant level inside the tunnel was well maintained under allowable limit and the performance of energy consumption was improved compared to conventional control scheme.
Rats bred for helplessness exhibit positive reinforcement learning deficits which are not alleviated by an antidepressant dose of the MAO-B inhibitor deprenyl.

Science.gov (United States)

Schulz, Daniela; Henn, Fritz A; Petri, David; Huston, Joseph P

2016-08-04

Principles of negative reinforcement learning may play a critical role in the etiology and treatment of depression. We examined the integrity of positive reinforcement learning in congenitally helpless (cH) rats, an animal model of depression, using a random ratio schedule and a devaluation-extinction procedure. Furthermore, we tested whether an antidepressant dose of the monoamine oxidase (MAO)-B inhibitor deprenyl would reverse any deficits in positive reinforcement learning. We found that cH rats (n=9) were impaired in the acquisition of even simple operant contingencies, such as a fixed interval (FI) 20 schedule. cH rats exhibited no apparent deficits in appetite or reward sensitivity. They reacted to the devaluation of food in a manner consistent with a dose-response relationship. Reinforcer motivation as assessed by lever pressing across sessions with progressively decreasing reward probabilities was highest in congenitally non-helpless (cNH, n=10) rats as long as the reward probabilities remained relatively high. cNH compared to wild-type (n=10) rats were also more resistant to extinction across sessions. Compared to saline (n=5), deprenyl (n=5) reduced the duration of immobility of cH rats in the forced swimming test, indicative of antidepressant effects, but did not restore any deficits in the acquisition of a FI 20 schedule. We conclude that positive reinforcement learning was impaired in rats bred for helplessness, possibly due to motivational impairments but not deficits in reward sensitivity, and that deprenyl exerted antidepressant effects but did not reverse the deficits in positive reinforcement learning. Copyright © 2016 IBRO. Published by Elsevier Ltd. All rights reserved.
Neuro-fuzzy system modeling based on automatic fuzzy clustering

Institute of Scientific and Technical Information of China (English)

Yuangang TANG; Fuchun SUN; Zengqi SUN

2005-01-01

A neuro-fuzzy system model based on automatic fuzzy clustering is proposed.A hybrid model identification algorithm is also developed to decide the model structure and model parameters.The algorithm mainly includes three parts:1) Automatic fuzzy C-means (AFCM),which is applied to generate fuzzy rules automatically,and then fix on the size of the neuro-fuzzy network,by which the complexity of system design is reducesd greatly at the price of the fitting capability;2) Recursive least square estimation (RLSE).It is used to update the parameters of Takagi-Sugeno model,which is employed to describe the behavior of the system;3) Gradient descent algorithm is also proposed for the fuzzy values according to the back propagation algorithm of neural network.Finally,modeling the dynamical equation of the two-link manipulator with the proposed approach is illustrated to validate the feasibility of the method.
Vision-based Navigation and Reinforcement Learning Path Finding for Social Robots

OpenAIRE

Pérez Sala, Xavier

2010-01-01

We propose a robust system for automatic Robot Navigation in uncontrolled en- vironments. The system is composed by three main modules: the Arti cial Vision module, the Reinforcement Learning module, and the behavior control module. The aim of the system is to allow a robot to automatically nd a path that arrives to a pre xed goal. Turn and straight movements in uncontrolled environments are automatically estimated and controlled using the proposed modules. The Arti cial Vi...

Combinational Reasoning of Quantitative Fuzzy Topological Relations for Simple Fuzzy Regions

Science.gov (United States)

Liu, Bo; Li, Dajun; Xia, Yuanping; Ruan, Jian; Xu, Lili; Wu, Huanyi

2015-01-01

In recent years, formalization and reasoning of topological relations have become a hot topic as a means to generate knowledge about the relations between spatial objects at the conceptual and geometrical levels. These mechanisms have been widely used in spatial data query, spatial data mining, evaluation of equivalence and similarity in a spatial scene, as well as for consistency assessment of the topological relations of multi-resolution spatial databases. The concept of computational fuzzy topological space is applied to simple fuzzy regions to efficiently and more accurately solve fuzzy topological relations. Thus, extending the existing research and improving upon the previous work, this paper presents a new method to describe fuzzy topological relations between simple spatial regions in Geographic Information Sciences (GIS) and Artificial Intelligence (AI). Firstly, we propose a new definition for simple fuzzy line segments and simple fuzzy regions based on the computational fuzzy topology. And then, based on the new definitions, we also propose a new combinational reasoning method to compute the topological relations between simple fuzzy regions, moreover, this study has discovered that there are (1) 23 different topological relations between a simple crisp region and a simple fuzzy region; (2) 152 different topological relations between two simple fuzzy regions. In the end, we have discussed some examples to demonstrate the validity of the new method, through comparisons with existing fuzzy models, we showed that the proposed method can compute more than the existing models, as it is more expressive than the existing fuzzy models. PMID:25775452
Geometric Fuzzy Techniques for Guidance of Visually Impaired People

Directory of Open Access Journals (Sweden)

Adán Landa-Hernández

2013-01-01

Full Text Available In this paper we present the design of a device to guide the visually impaired person who normally uses a cane. We propose a non-invasive device that will help blind and visually impaired people to navigate. The system uses stereoscopic vision, a RGB-D sensor and an IMU to process images and to compute the distances from obstacles relative to cameras and to search for free walking paths in the scene. This computing is done using stereo vision, vanishing points, and fuzzy rules. Vanishing points are used to obtain a main orientation in structured spaces. Since the guidance system is related to a spatial reference system, the vanishing point is used like a virtual compass that helps the blind to orient him- or herself towards a goal. Reinforced with fuzzy decision rules, the system supports the blind in avoiding obstacles, thus the blind person is able to cross structured spaces and avoid obstacles without the need for a cane.
Reinforcement Learning Based Web Service Compositions for Mobile Business

Science.gov (United States)

Zhou, Juan; Chen, Shouming

In this paper, we propose a new solution to Reactive Web Service Composition, via molding with Reinforcement Learning, and introducing modified (alterable) QoS variables into the model as elements in the Markov Decision Process tuple. Moreover, we give an example of Reactive-WSC-based mobile banking, to demonstrate the intrinsic capability of the solution in question of obtaining the optimized service composition, characterized by (alterable) target QoS variable sets with optimized values. Consequently, we come to the conclusion that the solution has decent potentials in boosting customer experiences and qualities of services in Web Services, and those in applications in the whole electronic commerce and business sector.
Global reinforcement training of CrossNets

Science.gov (United States)

Ma, Xiaolong

2007-10-01

Hybrid "CMOL" integrated circuits, incorporating advanced CMOS devices for neural cell bodies, nanowires as axons and dendrites, and latching switches as synapses, may be used for the hardware implementation of extremely dense (107 cells and 1012 synapses per cm2) neuromorphic networks, operating up to 10 6 times faster than their biological prototypes. We are exploring several "Cross- Net" architectures that accommodate the limitations imposed by CMOL hardware and should allow effective training of the networks without a direct external access to individual synapses. Our studies have show that CrossNets based on simple (two-terminal) crosspoint devices can work well in at least two modes: as Hop-field networks for associative memory and multilayer perceptrons for classification tasks. For more intelligent tasks (such as robot motion control or complex games), which do not have "examples" for supervised learning, more advanced training methods such as the global reinforcement learning are necessary. For application of global reinforcement training algorithms to CrossNets, we have extended Williams's REINFORCE learning principle to a more general framework and derived several learning rules that are more suitable for CrossNet hardware implementation. The results of numerical experiments have shown that these new learning rules can work well for both classification tasks and reinforcement tasks such as the cartpole balancing control problem. Some limitations imposed by the CMOL hardware need to be carefully addressed for the the successful application of in situ reinforcement training to CrossNets.
Indirect adaptive fuzzy wavelet neural network with self- recurrent consequent part for AC servo system.

Science.gov (United States)

Hou, Runmin; Wang, Li; Gao, Qiang; Hou, Yuanglong; Wang, Chao

2017-09-01

This paper proposes a novel indirect adaptive fuzzy wavelet neural network (IAFWNN) to control the nonlinearity, wide variations in loads, time-variation and uncertain disturbance of the ac servo system. In the proposed approach, the self-recurrent wavelet neural network (SRWNN) is employed to construct an adaptive self-recurrent consequent part for each fuzzy rule of TSK fuzzy model. For the IAFWNN controller, the online learning algorithm is based on back propagation (BP) algorithm. Moreover, an improved particle swarm optimization (IPSO) is used to adapt the learning rate. The aid of an adaptive SRWNN identifier offers the real-time gradient information to the adaptive fuzzy wavelet neural controller to overcome the impact of parameter variations, load disturbances and other uncertainties effectively, and has a good dynamic. The asymptotical stability of the system is guaranteed by using the Lyapunov method. The result of the simulation and the prototype test prove that the proposed are effective and suitable. Copyright © 2017. Published by Elsevier Ltd.
Application and Simulation of Fuzzy Neural Network PID Controller in the Aircraft Cabin Temperature

Directory of Open Access Journals (Sweden)

Ding Fang

2013-06-01

Full Text Available Considering complex factors of affecting ambient temperature in Aircraft cabin, and some shortages of traditional PID control like the parameters difficult to be tuned and control ineffective, this paper puts forward the intelligent PID algorithm that makes fuzzy logic method and neural network together, scheming out the fuzzy neural net PID controller. After the correction of the fuzzy inference and dynamic learning of neural network, PID parameters of the controller get the optimal parameters. MATLAB simulation results of the cabin temperature control model show that the performance of the fuzzy neural network PID controller has been greatly improved, with faster response, smaller overshoot and better adaptability.
Grounding the meanings in sensorimotor behavior using reinforcement learning

Directory of Open Access Journals (Sweden)

Igor eFarkaš

2012-02-01

Full Text Available The recent outburst of interest in cognitive developmental robotics is fueled by the ambition to propose ecologically plausible mechanisms of how, among other things, a learning agent/robot could ground linguistic meanings in its sensorimotor behaviour. Along this stream, we propose a model that allows the simulated iCub robot to learn the meanings of actions (point, touch and push oriented towards objects in robot's peripersonal space. In our experiments, the iCub learns to execute motor actions and comment on them. Architecturally, the model is composed of three neural-network-based modules that are trained in different ways. The first module, a two-layer perceptron, is trained by back-propagation to attend to the target position in the visual scene, given the low-level visual information and the feature-based target information. The second module, having the form of an actor-critic architecture, is the most distinguishing part of our model, and is trained by a continuous version of reinforcement learning to execute actions as sequences, based on a linguistic command. The third module, an echo-state network, is trained to provide the linguistic description of the executed actions. The trained model generalises well in case of novel action-target combinations with randomised initial arm positions. It can also promptly adapt its behavior if the action/target suddenly changes during motor execution.
FUZZY RINGS AND ITS PROPERTIES

Directory of Open Access Journals (Sweden)

Karyati Karyati

2017-01-01

One of algebraic structure that involves a binary operation is a group that is defined an un empty set (classical with an associative binary operation, it has identity elements and each element has an inverse. In the structure of the group known as the term subgroup, normal subgroup, subgroup and factor group homomorphism and its properties. Classical algebraic structure is developed to algebraic structure fuzzy by the researchers as an example semi group fuzzy and fuzzy group after fuzzy sets is introduced by L. A. Zadeh at 1965. It is inspired of writing about semi group fuzzy and group of fuzzy, a research on the algebraic structure of the ring is held with reviewing ring fuzzy, ideal ring fuzzy, homomorphism ring fuzzy and quotient ring fuzzy with its properties. The results of this study are obtained fuzzy properties of the ring, ring ideal properties fuzzy, properties of fuzzy ring homomorphism and properties of fuzzy quotient ring by utilizing a subset of a subset level and strong level as well as image and pre-image homomorphism fuzzy ring. Keywords: fuzzy ring, subset level, homomorphism fuzzy ring, fuzzy quotient ring
On fuzzy quasi continuity and an application of fuzzy set theory

CERN Document Server

Mahmoud, R A

2003-01-01

Where as classical topology has been developed closely connected with classical analysis describing topological phenomena in analysis, fuzzy topology with its important application in quantum gravity indicated by Witten and Elnaschie, has only been introduced as an analogue of the classical topology. The development of fuzzy topology without close relations to analytical problems did not give the possibility of testing successfully the applicability of the new notions and results. Till now this situation did not change, essentially. Although, many types of fuzzy sets and fuzzy functions having the quasi-property in both of weak and strong than openness and continuity, respectively, have been studied in detail. Many properties on fuzzy topological spaces such as compactness are discussed via fuzzy notion. While others are far from being completely devoted in its foundation. So, this paper is devoted to present a new class of fuzzy quasi-continuous functions via fuzzy compactness has been defined. Some characte...
Identification of Fuzzy Inference Systems by Means of a Multiobjective Opposition-Based Space Search Algorithm

Directory of Open Access Journals (Sweden)

Wei Huang

2013-01-01

Full Text Available We introduce a new category of fuzzy inference systems with the aid of a multiobjective opposition-based space search algorithm (MOSSA. The proposed MOSSA is essentially a multiobjective space search algorithm improved by using an opposition-based learning that employs a so-called opposite numbers mechanism to speed up the convergence of the optimization algorithm. In the identification of fuzzy inference system, the MOSSA is exploited to carry out the parametric identification of the fuzzy model as well as to realize its structural identification. Experimental results demonstrate the effectiveness of the proposed fuzzy models.
Continuous theta-burst stimulation (cTBS) over the lateral prefrontal cortex alters reinforcement learning bias.

Science.gov (United States)

Ott, Derek V M; Ullsperger, Markus; Jocham, Gerhard; Neumann, Jane; Klein, Tilmann A

2011-07-15

The prefrontal cortex is known to play a key role in higher-order cognitive functions. Recently, we showed that this brain region is active in reinforcement learning, during which subjects constantly have to integrate trial outcomes in order to optimize performance. To further elucidate the role of the dorsolateral prefrontal cortex (DLPFC) in reinforcement learning, we applied continuous theta-burst stimulation (cTBS) either to the left or right DLPFC, or to the vertex as a control region, respectively, prior to the performance of a probabilistic learning task in an fMRI environment. While there was no influence of cTBS on learning performance per se, we observed a stimulation-dependent modulation of reward vs. punishment sensitivity: Left-hemispherical DLPFC stimulation led to a more reward-guided performance, while right-hemispherical cTBS induced a more avoidance-guided behavior. FMRI results showed enhanced prediction error coding in the ventral striatum in subjects stimulated over the left as compared to the right DLPFC. Both behavioral and imaging results are in line with recent findings that left, but not right-hemispherical stimulation can trigger a release of dopamine in the ventral striatum, which has been suggested to increase the relative impact of rewards rather than punishment on behavior. Copyright © 2011 Elsevier Inc. All rights reserved.
Improvement of Fuzzy Image Contrast Enhancement Using Simulated Ergodic Fuzzy Markov Chains

Directory of Open Access Journals (Sweden)

Behrouz Fathi-Vajargah

2014-01-01

Full Text Available This paper presents a novel fuzzy enhancement technique using simulated ergodic fuzzy Markov chains for low contrast brain magnetic resonance imaging (MRI. The fuzzy image contrast enhancement is proposed by weighted fuzzy expected value. The membership values are then modified to enhance the image using ergodic fuzzy Markov chains. The qualitative performance of the proposed method is compared to another method in which ergodic fuzzy Markov chains are not considered. The proposed method produces better quality image.
Fuzzy control. Fundamentals, stability and design of fuzzy controllers

Energy Technology Data Exchange (ETDEWEB)

Michels, K. [Fichtner GmbH und Co. KG, Stuttgart (Germany); Klawonn, F. [Fachhochschule Braunschweig/Wolfenbuettel (Germany). Fachbereich Informatik; Kruse, R. [Magdeburg Univ. (Germany). Fakultaet Informatik, Abt. Wiss.- und Sprachverarbeitung; Nuernberger, A. (eds.) [California Univ., Berkeley, CA (United States). Computer Science Division

2006-07-01

The book provides a critical discussion of fuzzy controllers from the perspective of classical control theory. Special emphases are placed on topics that are of importance for industrial applications, like (self-) tuning of fuzzy controllers, optimisation and stability analysis. The book is written as a textbook for graduate students as well as a comprehensive reference book about fuzzy control for researchers and application engineers. Starting with a detailed introduction to fuzzy systems and control theory the reader is guided to up-to-date research results. (orig.)
Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

NARCIS (Netherlands)

Whiteson, S.; Taylor, M.E.; Stone, P.

2010-01-01

Temporal difference and evolutionary methods are two of the most common approaches to solving reinforcement learning problems. However, there is little consensus on their relative merits and there have been few empirical studies that directly compare their performance. This article aims to address
Introduction to fuzzy systems

CERN Document Server

Chen, Guanrong

2005-01-01

Introduction to Fuzzy Systems provides students with a self-contained introduction that requires no preliminary knowledge of fuzzy mathematics and fuzzy control systems theory. Simplified and readily accessible, it encourages both classroom and self-directed learners to build a solid foundation in fuzzy systems. After introducing the subject, the authors move directly into presenting real-world applications of fuzzy logic, revealing its practical flavor. This practicality is then followed by basic fuzzy systems theory. The book also offers a tutorial on fuzzy control theory, based mainly on th
Multiagent-Based Simulation of Temporal-Spatial Characteristics of Activity-Travel Patterns Using Interactive Reinforcement Learning

Directory of Open Access Journals (Sweden)

Min Yang

2014-01-01

Full Text Available We propose a multiagent-based reinforcement learning algorithm, in which the interactions between travelers and the environment are considered to simulate temporal-spatial characteristics of activity-travel patterns in a city. Road congestion degree is added to the reinforcement learning algorithm as a medium that passes the influence of one traveler’s decision to others. Meanwhile, the agents used in the algorithm are initialized from typical activity patterns extracted from the travel survey diary data of Shangyu city in China. In the simulation, both macroscopic activity-travel characteristics such as traffic flow spatial-temporal distribution and microscopic characteristics such as activity-travel schedules of each agent are obtained. Comparing the simulation results with the survey data, we find that deviation of the peak-hour traffic flow is less than 5%, while the correlation of the simulated versus survey location choice distribution is over 0.9.
Fuzzy Logic Based Anomaly Detection for Embedded Network Security Cyber Sensor

Energy Technology Data Exchange (ETDEWEB)

Ondrej Linda; Todd Vollmer; Jason Wright; Milos Manic

2011-04-01

Resiliency and security in critical infrastructure control systems in the modern world of cyber terrorism constitute a relevant concern. Developing a network security system specifically tailored to the requirements of such critical assets is of a primary importance. This paper proposes a novel learning algorithm for anomaly based network security cyber sensor together with its hardware implementation. The presented learning algorithm constructs a fuzzy logic rule based model of normal network behavior. Individual fuzzy rules are extracted directly from the stream of incoming packets using an online clustering algorithm. This learning algorithm was specifically developed to comply with the constrained computational requirements of low-cost embedded network security cyber sensors. The performance of the system was evaluated on a set of network data recorded from an experimental test-bed mimicking the environment of a critical infrastructure control system.
Stability analysis of polynomial fuzzy models via polynomial fuzzy Lyapunov functions

OpenAIRE

Bernal Reza, Miguel Ángel; Sala, Antonio; JAADARI, ABDELHAFIDH; Guerra, Thierry-Marie

2011-01-01

In this paper, the stability of continuous-time polynomial fuzzy models by means of a polynomial generalization of fuzzy Lyapunov functions is studied. Fuzzy Lyapunov functions have been fruitfully used in the literature for local analysis of Takagi-Sugeno models, a particular class of the polynomial fuzzy ones. Based on a recent Taylor-series approach which allows a polynomial fuzzy model to exactly represent a nonlinear model in a compact set of the state space, it is shown that a refinemen...
Nonlinear dynamic systems identification using recurrent interval type-2 TSK fuzzy neural network - A novel structure.

Science.gov (United States)

El-Nagar, Ahmad M

2018-01-01

In this study, a novel structure of a recurrent interval type-2 Takagi-Sugeno-Kang (TSK) fuzzy neural network (FNN) is introduced for nonlinear dynamic and time-varying systems identification. It combines the type-2 fuzzy sets (T2FSs) and a recurrent FNN to avoid the data uncertainties. The fuzzy firing strengths in the proposed structure are returned to the network input as internal variables. The interval type-2 fuzzy sets (IT2FSs) is used to describe the antecedent part for each rule while the consequent part is a TSK-type, which is a linear function of the internal variables and the external inputs with interval weights. All the type-2 fuzzy rules for the proposed RIT2TSKFNN are learned on-line based on structure and parameter learning, which are performed using the type-2 fuzzy clustering. The antecedent and consequent parameters of the proposed RIT2TSKFNN are updated based on the Lyapunov function to achieve network stability. The obtained results indicate that our proposed network has a small root mean square error (RMSE) and a small integral of square error (ISE) with a small number of rules and a small computation time compared with other type-2 FNNs. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Relational Demonic Fuzzy Refinement

OpenAIRE

Tchier, Fairouz

2014-01-01

We use relational algebra to define a refinement fuzzy order called demonic fuzzy refinement and also the associated fuzzy operators which are fuzzy demonic join $({\\bigsqcup }_{\\mathrm{\\text{f}}\\mathrm{\\text{u}}\\mathrm{\\text{z}}})$ , fuzzy demonic meet $({\\sqcap }_{\\mathrm{\\text{f}}\\mathrm{\\text{u}}\\mathrm{\\text{z}}})$ , and fuzzy demonic composition $({\\square }_{\\mathrm{\\text{f}}\\mathrm{\\text{u}}\\mathrm{\\text{z}}})$ . Our definitions and properties are illustrated by some examples using ma...

Intuitionistic supra fuzzy topological spaces

International Nuclear Information System (INIS)

Abbas, S.E.

2004-01-01

In this paper, We introduce an intuitionistic supra fuzzy closure space and investigate the relationship between intuitionistic supra fuzzy topological spaces and intuitionistic supra fuzzy closure spaces. Moreover, we can obtain intuitionistic supra fuzzy topological space induced by an intuitionistic fuzzy bitopological space. We study the relationship between intuitionistic supra fuzzy closure space and the intuitionistic supra fuzzy topological space induced by an intuitionistic fuzzy bitopological space
Intelligent control-I: review of fuzzy logic and fuzzy set theory

International Nuclear Information System (INIS)

Nagrial, M.H.

2004-01-01

In the past decade or so, fuzzy systems have supplanted conventional technologies in many engineering systems, in particular in control systems and pattern recognition. Fuzzy logic has found applications in a variety of consumer products e.g. washing machines, camcorders, digital cameras, air conditioners, subway trains, cement kilns and many others. The fuzzy technology is also being applied in information technology, where it provides decision-support and expert systems with powerful reasoning capabilities. Fuzzy sets, introduced by Zadeh in 1965 as a mathematical way to represent vagueness in linguistics, can be considered a generalisation of classical set theory. Fuzziness is often confused with probability. This lecture will introduce the principal concepts and mathematical notions of fuzzy set theory. (author)
Approximate solutions of dual fuzzy polynomials by feed-back neural networks

Directory of Open Access Journals (Sweden)

Ahmad Jafarian

2012-11-01

Full Text Available Recently, artificial neural networks (ANNs have been extensively studied and used in different areas such as pattern recognition, associative memory, combinatorial optimization, etc. In this paper, we investigate the ability of fuzzy neural networks to approximate solution of a dual fuzzy polynomial of the form $a_{1}x+ ...+a_{n}x^n =b_{1}x+ ...+b_{n}x^n+d,$ where $a_{j},b_{j},d epsilon E^1 (for j=1,...,n.$ Since the operation of fuzzy neural networks is based on Zadeh's extension principle. For this scope we train a fuzzified neural network by back-propagation-type learning algorithm which has five layer where connection weights are crisp numbers. This neural network can get a crisp input signal and then calculates its corresponding fuzzy output. Presented method can give a real approximate solution for given polynomial by using a cost function which is defined for the level sets of fuzzy output and target output. The simulation results are presented to demonstrate the efficiency and effectiveness of the proposed approach.
Hesitant fuzzy sets theory

CERN Document Server

Xu, Zeshui

2014-01-01

This book provides the readers with a thorough and systematic introduction to hesitant fuzzy theory. It presents the most recent research results and advanced methods in the field. These includes: hesitant fuzzy aggregation techniques, hesitant fuzzy preference relations, hesitant fuzzy measures, hesitant fuzzy clustering algorithms and hesitant fuzzy multi-attribute decision making methods. Since its introduction by Torra and Narukawa in 2009, hesitant fuzzy sets have become more and more popular and have been used for a wide range of applications, from decision-making problems to cluster analysis, from medical diagnosis to personnel appraisal and information retrieval. This book offers a comprehensive report on the state-of-the-art in hesitant fuzzy sets theory and applications, aiming at becoming a reference guide for both researchers and practitioners in the area of fuzzy mathematics and other applied research fields (e.g. operations research, information science, management science and engineering) chara...
Fuzzy logic in management

CERN Document Server

Carlsson, Christer; Fullér, Robert

2004-01-01

Fuzzy Logic in Management demonstrates that difficult problems and changes in the management environment can be more easily handled by bringing fuzzy logic into the practice of management. This explicit theme is developed through the book as follows: Chapter 1, "Management and Intelligent Support Technologies", is a short survey of management leadership and what can be gained from support technologies. Chapter 2, "Fuzzy Sets and Fuzzy Logic", provides a short introduction to fuzzy sets, fuzzy relations, the extension principle, fuzzy implications and linguistic variables. Chapter 3, "Group Decision Support Systems", deals with group decision making, and discusses methods for supporting the consensus reaching processes. Chapter 4, "Fuzzy Real Options for Strategic Planning", summarizes research where the fuzzy real options theory was implemented as a series of models. These models were thoroughly tested on a number of real life investments, and validated in 2001. Chapter 5, "Soft Computing Methods for Reducing...
From Creatures of Habit to Goal-Directed Learners: Tracking the Developmental Emergence of Model-Based Reinforcement Learning.

Science.gov (United States)

Decker, Johannes H; Otto, A Ross; Daw, Nathaniel D; Hartley, Catherine A

2016-06-01

Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults performed a sequential reinforcement-learning task that enabled estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was apparent in choice behavior across all age groups, a model-based strategy was absent in children, became evident in adolescents, and strengthened in adults. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior. © The Author(s) 2016.
The World of Combinatorial Fuzzy Problems and the Efficiency of Fuzzy Approximation Algorithms

OpenAIRE

Yamakami, Tomoyuki

2015-01-01

We re-examine a practical aspect of combinatorial fuzzy problems of various types, including search, counting, optimization, and decision problems. We are focused only on those fuzzy problems that take series of fuzzy input objects and produce fuzzy values. To solve such problems efficiently, we design fast fuzzy algorithms, which are modeled by polynomial-time deterministic fuzzy Turing machines equipped with read-only auxiliary tapes and write-only output tapes and also modeled by polynomia...
Reinforcement Learning for Routing in Cognitive Radio Ad Hoc Networks

Directory of Open Access Journals (Sweden)

Hasan A. A. Al-Rawi

2014-01-01

Full Text Available Cognitive radio (CR enables unlicensed users (or secondary users, SUs to sense for and exploit underutilized licensed spectrum owned by the licensed users (or primary users, PUs. Reinforcement learning (RL is an artificial intelligence approach that enables a node to observe, learn, and make appropriate decisions on action selection in order to maximize network performance. Routing enables a source node to search for a least-cost route to its destination node. While there have been increasing efforts to enhance the traditional RL approach for routing in wireless networks, this research area remains largely unexplored in the domain of routing in CR networks. This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate through simulation. New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing. Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs’ network performance without significantly jeopardizing PUs’ network performance, specifically SUs’ interference to PUs.
Space Objects Maneuvering Detection and Prediction via Inverse Reinforcement Learning

Science.gov (United States)

Linares, R.; Furfaro, R.

This paper determines the behavior of Space Objects (SOs) using inverse Reinforcement Learning (RL) to estimate the reward function that each SO is using for control. The approach discussed in this work can be used to analyze maneuvering of SOs from observational data. The inverse RL problem is solved using the Feature Matching approach. This approach determines the optimal reward function that a SO is using while maneuvering by assuming that the observed trajectories are optimal with respect to the SO's own reward function. This paper uses estimated orbital elements data to determine the behavior of SOs in a data-driven fashion.
A fuzzy art neural network based color image processing and ...

African Journals Online (AJOL)

To improve the learning process from the input data, a new learning rule was suggested. In this paper, a new method is proposed to deal with the RGB color image pixels, which enables a Fuzzy ART neural network to process the RGB color images. The application of the algorithm was implemented and tested on a set of ...
Intuitionistic fuzzy calculus

CERN Document Server

Lei, Qian

2017-01-01

This book offers a comprehensive and systematic review of the latest research findings in the area of intuitionistic fuzzy calculus. After introducing the intuitionistic fuzzy numbers’ operational laws and their geometrical and algebraic properties, the book defines the concept of intuitionistic fuzzy functions and presents the research on the derivative, differential, indefinite integral and definite integral of intuitionistic fuzzy functions. It also discusses some of the methods that have been successfully used to deal with continuous intuitionistic fuzzy information or data, which are different from the previous aggregation operators focusing on discrete information or data. Mainly intended for engineers and researchers in the fields of fuzzy mathematics, operations research, information science and management science, this book is also a valuable textbook for postgraduate and advanced undergraduate students alike.
Fuzzy risk matrix

International Nuclear Information System (INIS)

Markowski, Adam S.; Mannan, M. Sam

2008-01-01

A risk matrix is a mechanism to characterize and rank process risks that are typically identified through one or more multifunctional reviews (e.g., process hazard analysis, audits, or incident investigation). This paper describes a procedure for developing a fuzzy risk matrix that may be used for emerging fuzzy logic applications in different safety analyses (e.g., LOPA). The fuzzification of frequency and severity of the consequences of the incident scenario are described which are basic inputs for fuzzy risk matrix. Subsequently using different design of risk matrix, fuzzy rules are established enabling the development of fuzzy risk matrices. Three types of fuzzy risk matrix have been developed (low-cost, standard, and high-cost), and using a distillation column case study, the effect of the design on final defuzzified risk index is demonstrated
Continuous theta-burst stimulation (cTBS) over the lateral prefrontal cortex alters reinforcement learning bias

NARCIS (Netherlands)

Ott, D.V.M.; Ullsperger, M.; Jocham, G.; Neumann, J.; Klein, T.A.

2011-01-01

The prefrontal cortex is known to play a key role in higher-order cognitive functions. Recently, we showed that this brain region is active in reinforcement learning, during which subjects constantly have to integrate trial outcomes in order to optimize performance. To further elucidate the role of
Hybrid Neuro-Fuzzy Classifier Based On Nefclass Model

Directory of Open Access Journals (Sweden)

Bogdan Gliwa

2011-01-01

Full Text Available The paper presents hybrid neuro-fuzzy classifier, based on NEFCLASS model, which wasmodified. The presented classifier was compared to popular classifiers – neural networks andk-nearest neighbours. Efficiency of modifications in classifier was compared with methodsused in original model NEFCLASS (learning methods. Accuracy of classifier was testedusing 3 datasets from UCI Machine Learning Repository: iris, wine and breast cancer wisconsin.Moreover, influence of ensemble classification methods on classification accuracy waspresented.
Global sensitivity analysis for fuzzy inputs based on the decomposition of fuzzy output entropy

Science.gov (United States)

Shi, Yan; Lu, Zhenzhou; Zhou, Yicheng

2018-06-01

To analyse the component of fuzzy output entropy, a decomposition method of fuzzy output entropy is first presented. After the decomposition of fuzzy output entropy, the total fuzzy output entropy can be expressed as the sum of the component fuzzy entropy contributed by fuzzy inputs. Based on the decomposition of fuzzy output entropy, a new global sensitivity analysis model is established for measuring the effects of uncertainties of fuzzy inputs on the output. The global sensitivity analysis model can not only tell the importance of fuzzy inputs but also simultaneously reflect the structural composition of the response function to a certain degree. Several examples illustrate the validity of the proposed global sensitivity analysis, which is a significant reference in engineering design and optimization of structural systems.
Intuitionistic Fuzzy Subbialgebras and Duality

Directory of Open Access Journals (Sweden)

Wenjuan Chen

2014-01-01

Full Text Available We investigate connections between bialgebras and Atanassov’s intuitionistic fuzzy sets. Firstly we define an intuitionistic fuzzy subbialgebra of a bialgebra with an intuitionistic fuzzy subalgebra structure and also with an intuitionistic fuzzy subcoalgebra structure. Secondly we investigate the related properties of intuitionistic fuzzy subbialgebras. Finally we prove that the dual of an intuitionistic fuzzy strong subbialgebra is an intuitionistic fuzzy strong subbialgebra.
Fuzzy topological digital space and digital fuzzy spline of electroencephalography during epileptic seizures

Science.gov (United States)

Shah, Mazlina Muzafar; Wahab, Abdul Fatah

2017-08-01

Epilepsy disease occurs because of there is a temporary electrical disturbance in a group of brain cells (nurons). The recording of electrical signals come from the human brain which can be collected from the scalp of the head is called Electroencephalography (EEG). EEG then considered in digital format and in fuzzy form makes it a fuzzy digital space data form. The purpose of research is to identify the area (curve and surface) in fuzzy digital space affected by inside epilepsy seizure in epileptic patient's brain. The main focus for this research is to generalize fuzzy topological digital space, definition and basic operation also the properties by using digital fuzzy set and the operations. By using fuzzy digital space, the theory of digital fuzzy spline can be introduced to replace grid data that has been use previously to get better result. As a result, the flat of EEG can be fuzzy topological digital space and this type of data can be use to interpolate the digital fuzzy spline.
Intuitionistic Fuzzy Time Series Forecasting Model Based on Intuitionistic Fuzzy Reasoning

Directory of Open Access Journals (Sweden)

Ya’nan Wang

2016-01-01

Full Text Available Fuzzy sets theory cannot describe the data comprehensively, which has greatly limited the objectivity of fuzzy time series in uncertain data forecasting. In this regard, an intuitionistic fuzzy time series forecasting model is built. In the new model, a fuzzy clustering algorithm is used to divide the universe of discourse into unequal intervals, and a more objective technique for ascertaining the membership function and nonmembership function of the intuitionistic fuzzy set is proposed. On these bases, forecast rules based on intuitionistic fuzzy approximate reasoning are established. At last, contrast experiments on the enrollments of the University of Alabama and the Taiwan Stock Exchange Capitalization Weighted Stock Index are carried out. The results show that the new model has a clear advantage of improving the forecast accuracy.
Fuzzy forecasting based on two-factors second-order fuzzy-trend logical relationship groups and the probabilities of trends of fuzzy logical relationships.

Science.gov (United States)

Chen, Shyi-Ming; Chen, Shen-Wen

2015-03-01

In this paper, we present a new method for fuzzy forecasting based on two-factors second-order fuzzy-trend logical relationship groups and the probabilities of trends of fuzzy-trend logical relationships. Firstly, the proposed method fuzzifies the historical training data of the main factor and the secondary factor into fuzzy sets, respectively, to form two-factors second-order fuzzy logical relationships. Then, it groups the obtained two-factors second-order fuzzy logical relationships into two-factors second-order fuzzy-trend logical relationship groups. Then, it calculates the probability of the "down-trend," the probability of the "equal-trend" and the probability of the "up-trend" of the two-factors second-order fuzzy-trend logical relationships in each two-factors second-order fuzzy-trend logical relationship group, respectively. Finally, it performs the forecasting based on the probabilities of the down-trend, the equal-trend, and the up-trend of the two-factors second-order fuzzy-trend logical relationships in each two-factors second-order fuzzy-trend logical relationship group. We also apply the proposed method to forecast the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) and the NTD/USD exchange rates. The experimental results show that the proposed method outperforms the existing methods.
Intelligent neural network and fuzzy logic control of industrial and power systems

Science.gov (United States)

Kuljaca, Ognjen

The main role played by neural network and fuzzy logic intelligent control algorithms today is to identify and compensate unknown nonlinear system dynamics. There are a number of methods developed, but often the stability analysis of neural network and fuzzy control systems was not provided. This work will meet those problems for the several algorithms. Some more complicated control algorithms included backstepping and adaptive critics will be designed. Nonlinear fuzzy control with nonadaptive fuzzy controllers is also analyzed. An experimental method for determining describing function of SISO fuzzy controller is given. The adaptive neural network tracking controller for an autonomous underwater vehicle is analyzed. A novel stability proof is provided. The implementation of the backstepping neural network controller for the coupled motor drives is described. Analysis and synthesis of adaptive critic neural network control is also provided in the work. Novel tuning laws for the system with action generating neural network and adaptive fuzzy critic are given. Stability proofs are derived for all those control methods. It is shown how these control algorithms and approaches can be used in practical engineering control. Stability proofs are given. Adaptive fuzzy logic control is analyzed. Simulation study is conducted to analyze the behavior of the adaptive fuzzy system on the different environment changes. A novel stability proof for adaptive fuzzy logic systems is given. Also, adaptive elastic fuzzy logic control architecture is described and analyzed. A novel membership function is used for elastic fuzzy logic system. The stability proof is proffered. Adaptive elastic fuzzy logic control is compared with the adaptive nonelastic fuzzy logic control. The work described in this dissertation serves as foundation on which analysis of particular representative industrial systems will be conducted. Also, it gives a good starting point for analysis of learning abilities of

Uncovering transcriptional interactions via an adaptive fuzzy logic approach

Directory of Open Access Journals (Sweden)

Chen Chung-Ming

2009-12-01

Full Text Available Abstract Background To date, only a limited number of transcriptional regulatory interactions have been uncovered. In a pilot study integrating sequence data with microarray data, a position weight matrix (PWM performed poorly in inferring transcriptional interactions (TIs, which represent physical interactions between transcription factors (TF and upstream sequences of target genes. Inferring a TI means that the promoter sequence of a target is inferred to match the consensus sequence motifs of a potential TF, and their interaction type such as AT or RT is also predicted. Thus, a robust PWM (rPWM was developed to search for consensus sequence motifs. In addition to rPWM, one feature extracted from ChIP-chip data was incorporated to identify potential TIs under specific conditions. An interaction type classifier was assembled to predict activation/repression of potential TIs using microarray data. This approach, combining an adaptive (learning fuzzy inference system and an interaction type classifier to predict transcriptional regulatory networks, was named AdaFuzzy. Results AdaFuzzy was applied to predict TIs using real genomics data from Saccharomyces cerevisiae. Following one of the latest advances in predicting TIs, constrained probabilistic sparse matrix factorization (cPSMF, and using 19 transcription factors (TFs, we compared AdaFuzzy to four well-known approaches using over-representation analysis and gene set enrichment analysis. AdaFuzzy outperformed these four algorithms. Furthermore, AdaFuzzy was shown to perform comparably to 'ChIP-experimental method' in inferring TIs identified by two sets of large scale ChIP-chip data, respectively. AdaFuzzy was also able to classify all predicted TIs into one or more of the four promoter architectures. The results coincided with known promoter architectures in yeast and provided insights into transcriptional regulatory mechanisms. Conclusion AdaFuzzy successfully integrates multiple types of
Probabilistic fuzzy systems as additive fuzzy systems

NARCIS (Netherlands)

Almeida, R.J.; Verbeek, N.; Kaymak, U.; Costa Sousa, da J.M.; Laurent, A.; Strauss, O.; Bouchon-Meunier, B.; Yager, R.

2014-01-01

Probabilistic fuzzy systems combine a linguistic description of the system behaviour with statistical properties of data. It was originally derived based on Zadeh’s concept of probability of a fuzzy event. Two possible and equivalent additive reasoning schemes were proposed, that lead to the
Models for cooperative games with fuzzy relations among the agents fuzzy communication, proximity relation and fuzzy permission

CERN Document Server

Jiménez-Losada, Andrés

2017-01-01

This book offers a comprehensive introduction to cooperative game theory and a practice-oriented reference guide to new models and tools for studying bilateral fuzzy relations among several agents or players. It introduces the reader to several fuzzy models, each of which is first analyzed in the context of classical games (crisp games) and subsequently in the context of fuzzy games. Special emphasis is given to the value of Shapley, which is presented for the first time in the context of fuzzy games. Students and researchers will find here a self-contained reference guide to cooperative fuzzy games, characterized by a wealth of examples, descriptions of a wide range of possible situations, step-by-step explanations of the basic mathematical concepts involved, and easy-to-follow information on axioms and properties.
Phase inductance estimation for switched reluctance motor using adaptive neuro-fuzzy inference system

International Nuclear Information System (INIS)

Daldaban, Ferhat; Ustkoyuncu, Nurettin; Guney, Kerim

2006-01-01

A new method based on an adaptive neuro-fuzzy inference system (ANFIS) for estimating the phase inductance of switched reluctance motors (SRMs) is presented. The ANFIS has the advantages of expert knowledge of the fuzzy inference system and the learning capability of neural networks. A hybrid learning algorithm, which combines the least square method and the back propagation algorithm, is used to identify the parameters of the ANFIS. The rotor position and the phase current of the 6/4 pole SRM are used to predict the phase inductance. The phase inductance results predicted by the ANFIS are in excellent agreement with the results of the finite element method
Optimal medication dosing from suboptimal clinical examples: a deep reinforcement learning approach.

Science.gov (United States)

Nemati, Shamim; Ghassemi, Mohammad M; Clifford, Gari D

2016-08-01

Misdosing medications with sensitive therapeutic windows, such as heparin, can place patients at unnecessary risk, increase length of hospital stay, and lead to wasted hospital resources. In this work, we present a clinician-in-the-loop sequential decision making framework, which provides an individualized dosing policy adapted to each patient's evolving clinical phenotype. We employed retrospective data from the publicly available MIMIC II intensive care unit database, and developed a deep reinforcement learning algorithm that learns an optimal heparin dosing policy from sample dosing trails and their associated outcomes in large electronic medical records. Using separate training and testing datasets, our model was observed to be effective in proposing heparin doses that resulted in better expected outcomes than the clinical guidelines. Our results demonstrate that a sequential modeling approach, learned from retrospective data, could potentially be used at the bedside to derive individualized patient dosing policies.
Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems

International Nuclear Information System (INIS)

Wei Qing-Lai; Song Rui-Zhuo; Xiao Wen-Dong; Sun Qiu-Ye

2015-01-01

This paper estimates an off-policy integral reinforcement learning (IRL) algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the system data generated by an arbitrary control. Moreover, off-policy IRL can be regarded as a direct learning method, which avoids the identification of system dynamics. In this paper, the performance index function is first given based on the system tracking error and control error. For solving the Hamilton–Jacobi–Bellman (HJB) equation, an off-policy IRL algorithm is proposed. It is proven that the iterative control makes the tracking error system asymptotically stable, and the iterative performance index function is convergent. Simulation study demonstrates the effectiveness of the developed tracking control method. (paper)
Multicriteria optimization in a fuzzy environment: The fuzzy analytic hierarchy process

Directory of Open Access Journals (Sweden)

Gardašević-Filipović Milanka

2010-01-01

Full Text Available In the paper the fuzzy extension of the Analytic Hierarchy Process (AHP based on fuzzy numbers, and its application in solving a practical problem, are considered. The paper advocates the use of contradictory test to check the fuzzy user preferences during fuzzy AHP decision-making process. We also propose consistency check and deriving priorities from inconsistent fuzzy judgment matrices to be included in the process, in order to check if the fuzzy approach can be applied in the AHP for the problem considered. An aggregation of local priorities obtained at different levels into composite global priorities for the alternatives based on weighted-sum method is also discussed. The contradictory fuzzy judgment matrix is analyzed. Our theoretical consideration has been verified by an application of commercially available Super Decisions program (developed for solving multi-criteria optimization problems using AHP approach on the problem previously treated in the literature. The obtained results are compared with those from the literature. The conclusions are given and the possibilities for further work in the field are pointed out.
Correction of Visual Perception Based on Neuro-Fuzzy Learning for the Humanoid Robot TEO

Directory of Open Access Journals (Sweden)

Juan Hernandez-Vicen

2018-03-01

Full Text Available New applications related to robotic manipulation or transportation tasks, with or without physical grasping, are continuously being developed. To perform these activities, the robot takes advantage of different kinds of perceptions. One of the key perceptions in robotics is vision. However, some problems related to image processing makes the application of visual information within robot control algorithms difficult. Camera-based systems have inherent errors that affect the quality and reliability of the information obtained. The need of correcting image distortion slows down image parameter computing, which decreases performance of control algorithms. In this paper, a new approach to correcting several sources of visual distortions on images in only one computing step is proposed. The goal of this system/algorithm is the computation of the tilt angle of an object transported by a robot, minimizing image inherent errors and increasing computing speed. After capturing the image, the computer system extracts the angle using a Fuzzy filter that corrects at the same time all possible distortions, obtaining the real angle in only one processing step. This filter has been developed by the means of Neuro-Fuzzy learning techniques, using datasets with information obtained from real experiments. In this way, the computing time has been decreased and the performance of the application has been improved. The resulting algorithm has been tried out experimentally in robot transportation tasks in the humanoid robot TEO (Task Environment Operator from the University Carlos III of Madrid.
Fuzzy weakly preopen (preclosed) function in Kubiak-Sostak fuzzy topological spaces

International Nuclear Information System (INIS)

Zahran, A.M.; Abd-Allah, M. Azab.; Abd El-Rahman, Abd El-Nasser G.

2009-01-01

In this paper, we introduce and characterize fuzzy weakly preopen and fuzzy weakly preclosed functions between L-fuzzy topological spaces in Kubiak-Sostak sense and also study these functions in relation to some other types of already known functions.
On Fuzzy β-I-open sets and Fuzzy β-I-continuous functions

International Nuclear Information System (INIS)

Keskin, Aynur

2009-01-01

In this paper, first of all we obtain some properties and characterizations of fuzzy β-I-open sets. After that, we also define the notion of β-I-closed sets and obtain some properties. Lastly, we introduce the notions of fuzzy β-I-continuity with the help of fuzzy β-I-open sets to obtain decomposition of fuzzy continuity.
On Fuzzy {beta}-I-open sets and Fuzzy {beta}-I-continuous functions

Energy Technology Data Exchange (ETDEWEB)

Keskin, Aynur [Department of Mathematics, Faculty of Science and Arts, Selcuk University, Campus, 42075 Konya (Turkey)], E-mail: akeskin@selcuk.edu.tr

2009-11-15

In this paper, first of all we obtain some properties and characterizations of fuzzy {beta}-I-open sets. After that, we also define the notion of {beta}-I-closed sets and obtain some properties. Lastly, we introduce the notions of fuzzy {beta}-I-continuity with the help of fuzzy {beta}-I-open sets to obtain decomposition of fuzzy continuity.
A reinforcement learning model of joy, distress, hope and fear

Science.gov (United States)

Broekens, Joost; Jacobs, Elmer; Jonker, Catholijn M.

2015-07-01

In this paper we computationally study the relation between adaptive behaviour and emotion. Using the reinforcement learning framework, we propose that learned state utility, ?, models fear (negative) and hope (positive) based on the fact that both signals are about anticipation of loss or gain. Further, we propose that joy/distress is a signal similar to the error signal. We present agent-based simulation experiments that show that this model replicates psychological and behavioural dynamics of emotion. This work distinguishes itself by assessing the dynamics of emotion in an adaptive agent framework - coupling it to the literature on habituation, development, extinction and hope theory. Our results support the idea that the function of emotion is to provide a complex feedback signal for an organism to adapt its behaviour. Our work is relevant for understanding the relation between emotion and adaptation in animals, as well as for human-robot interaction, in particular how emotional signals can be used to communicate between adaptive agents and humans.
Fuzzy stochastic damage mechanics (FSDM based on fuzzy auto-adaptive control theory

Directory of Open Access Journals (Sweden)

Ya-jun Wang

2012-06-01

Full Text Available In order to fully interpret and describe damage mechanics, the origin and development of fuzzy stochastic damage mechanics were introduced based on the analysis of the harmony of damage, probability, and fuzzy membership in the interval of [0,1]. In a complete normed linear space, it was proven that a generalized damage field can be simulated through β probability distribution. Three kinds of fuzzy behaviors of damage variables were formulated and explained through analysis of the generalized uncertainty of damage variables and the establishment of a fuzzy functional expression. Corresponding fuzzy mapping distributions, namely, the half-depressed distribution, swing distribution, and combined swing distribution, which can simulate varying fuzzy evolution in diverse stochastic damage situations, were set up. Furthermore, through demonstration of the generalized probabilistic characteristics of damage variables, the cumulative distribution function and probability density function of fuzzy stochastic damage variables, which show β probability distribution, were modified according to the expansion principle. The three-dimensional fuzzy stochastic damage mechanical behaviors of the Longtan rolled-concrete dam were examined with the self-developed fuzzy stochastic damage finite element program. The statistical correlation and non-normality of random field parameters were considered comprehensively in the fuzzy stochastic damage model described in this paper. The results show that an initial damage field based on the comprehensive statistical evaluation helps to avoid many difficulties in the establishment of experiments and numerical algorithms for damage mechanics analysis.
ANALYSIS OF FUZZY QUEUES: PARAMETRIC PROGRAMMING APPROACH BASED ON RANDOMNESS - FUZZINESS CONSISTENCY PRINCIPLE

OpenAIRE

Dhruba Das; Hemanta K. Baruah

2015-01-01

In this article, based on Zadeh’s extension principle we have apply the parametric programming approach to construct the membership functions of the performance measures when the interarrival time and the service time are fuzzy numbers based on the Baruah’s Randomness- Fuzziness Consistency Principle. The Randomness-Fuzziness Consistency Principle leads to defining a normal law of fuzziness using two different laws of randomness. In this article, two fuzzy queues FM...
ANALYSIS OF FUZZY QUEUES: PARAMETRIC PROGRAMMING APPROACH BASED ON RANDOMNESS - FUZZINESS CONSISTENCY PRINCIPLE

Directory of Open Access Journals (Sweden)

Dhruba Das

2015-04-01

Full Text Available In this article, based on Zadeh’s extension principle we have apply the parametric programming approach to construct the membership functions of the performance measures when the interarrival time and the service time are fuzzy numbers based on the Baruah’s Randomness- Fuzziness Consistency Principle. The Randomness-Fuzziness Consistency Principle leads to defining a normal law of fuzziness using two different laws of randomness. In this article, two fuzzy queues FM/M/1 and M/FM/1 has been studied and constructed their membership functions of the system characteristics based on the aforesaid principle. The former represents a queue with fuzzy exponential arrivals and exponential service rate while the latter represents a queue with exponential arrival rate and fuzzy exponential service rate.
Uncertainty analysis of flexible rotors considering fuzzy parameters and fuzzy-random parameters

Directory of Open Access Journals (Sweden)

Fabian Andres Lara-Molina

Full Text Available Abstract The components of flexible rotors are subjected to uncertainties. The main sources of uncertainties include the variation of mechanical properties. This contribution aims at analyzing the dynamics of flexible rotors under uncertain parameters modeled as fuzzy and fuzzy random variables. The uncertainty analysis encompasses the modeling of uncertain parameters and the numerical simulation of the corresponding flexible rotor model by using an approach based on fuzzy dynamic analysis. The numerical simulation is accomplished by mapping the fuzzy parameters of the deterministic flexible rotor model. Thereby, the flexible rotor is modeled by using both the Fuzzy Finite Element Method and the Fuzzy Stochastic Finite Element Method. Numerical simulations illustrate the methodology conveyed in terms of orbits and frequency response functions subject to uncertain parameters.
Depression, Activity, and Evaluation of Reinforcement

Science.gov (United States)

Hammen, Constance L.; Glass, David R., Jr.

1975-01-01

This research attempted to find the causal relation between mood and level of reinforcement. An effort was made to learn what mood change might occur if depressed subjects increased their levels of participation in reinforcing activities. (Author/RK)
evaluation of a multi-variable self-learning fuzzy logic controller

African Journals Online (AJOL)

Dr Obe

2003-03-01

Mar 1, 2003 ... The most challenging aspect of the design of a fuzzy logic controller is ... inaccuracy (or structured uncertainty) and unmodelled ... mathematical analysis on paper is impossible ... output (SISO) system that can self-construct ...
Multitask TSK fuzzy system modeling by mining intertask common hidden structure.

Science.gov (United States)

Jiang, Yizhang; Chung, Fu-Lai; Ishibuchi, Hisao; Deng, Zhaohong; Wang, Shitong

2015-03-01

The classical fuzzy system modeling methods implicitly assume data generated from a single task, which is essentially not in accordance with many practical scenarios where data can be acquired from the perspective of multiple tasks. Although one can build an individual fuzzy system model for each task, the result indeed tells us that the individual modeling approach will get poor generalization ability due to ignoring the intertask hidden correlation. In order to circumvent this shortcoming, we consider a general framework for preserving the independent information among different tasks and mining hidden correlation information among all tasks in multitask fuzzy modeling. In this framework, a low-dimensional subspace (structure) is assumed to be shared among all tasks and hence be the hidden correlation information among all tasks. Under this framework, a multitask Takagi-Sugeno-Kang (TSK) fuzzy system model called MTCS-TSK-FS (TSK-FS for multiple tasks with common hidden structure), based on the classical L2-norm TSK fuzzy system, is proposed in this paper. The proposed model can not only take advantage of independent sample information from the original space for each task, but also effectively use the intertask common hidden structure among multiple tasks to enhance the generalization performance of the built fuzzy systems. Experiments on synthetic and real-world datasets demonstrate the applicability and distinctive performance of the proposed multitask fuzzy system model in multitask regression learning scenarios.
Enhanced fuzzy-connective-based hierarchical aggregation network using particle swarm optimization

Science.gov (United States)

Wang, Fang-Fang; Su, Chao-Ton

2014-11-01

The fuzzy-connective-based aggregation network is similar to the human decision-making process. It is capable of aggregating and propagating degrees of satisfaction of a set of criteria in a hierarchical manner. Its interpreting ability and transparency make it especially desirable. To enhance its effectiveness and further applicability, a learning approach is successfully developed based on particle swarm optimization to determine the weights and parameters of the connectives in the network. By experimenting on eight datasets with different characteristics and conducting further statistical tests, it has been found to outperform the gradient- and genetic algorithm-based learning approaches proposed in the literature; furthermore, it is capable of generating more accurate estimates. The present approach retains the original benefits of fuzzy-connective-based aggregation networks and is widely applicable. The characteristics of the learning approaches are also discussed and summarized, providing better understanding of the similarities and differences among these three approaches.

A Car-Steering Model Based on an Adaptive Neuro-Fuzzy Controller

Science.gov (United States)

Amor, Mohamed Anis Ben; Oda, Takeshi; Watanabe, Shigeyoshi

This paper is concerned with the development of a car-steering model for traffic simulation. Our focus in this paper is to propose a model of the steering behavior of a human driver for different driving scenarios. These scenarios are modeled in a unified framework using the idea of target position. The proposed approach deals with the driver’s approximation and decision-making mechanisms in tracking a target position by means of fuzzy set theory. The main novelty in this paper lies in the development of a learning algorithm that has the intention to imitate the driver’s self-learning from his driving experience and to mimic his maneuvers on the steering wheel, using linear networks as local approximators in the corresponding fuzzy areas. Results obtained from the simulation of an obstacle avoidance scenario show the capability of the model to carry out a human-like behavior with emphasis on learned skills.
Adaptive fuzzy trajectory control for biaxial motion stage system

Directory of Open Access Journals (Sweden)

Wei-Lung Mao

2016-04-01

Full Text Available Motion control is an essential part of industrial machinery and manufacturing systems. In this article, the adaptive fuzzy controller is proposed for precision trajectory tracking control in biaxial X-Y motion stage system. The theoretical analyses of direct fuzzy control which is insensitive to parameter uncertainties and external load disturbances are derived to demonstrate the feasibility to track the reference trajectories. The Lyapunov stability theorem has been used to testify the asymptotic stability of the whole system, and all the signals are bounded in the closed-loop system. The intelligent position controller combines the merits of the adaptive fuzzy control with robust characteristics and learning ability for periodic command tracking of a servo drive mechanism. The simulation and experimental results on square, triangle, star, and circle reference contours are presented to show that the proposed controller indeed accomplishes the better tracking performances with regard to model uncertainties. It is observed that the convergence of parameters and tracking errors can be faster and smaller compared with the conventional adaptive fuzzy control in terms of average tracking error and tracking error standard deviation.
Statistical Methods for Fuzzy Data

CERN Document Server

Viertl, Reinhard

2011-01-01

Statistical data are not always precise numbers, or vectors, or categories. Real data are frequently what is called fuzzy. Examples where this fuzziness is obvious are quality of life data, environmental, biological, medical, sociological and economics data. Also the results of measurements can be best described by using fuzzy numbers and fuzzy vectors respectively. Statistical analysis methods have to be adapted for the analysis of fuzzy data. In this book, the foundations of the description of fuzzy data are explained, including methods on how to obtain the characterizing function of fuzzy m
Fuzzy support vector machine for microarray imbalanced data classification

Science.gov (United States)

Ladayya, Faroh; Purnami, Santi Wulan; Irhamah

2017-11-01

DNA microarrays are data containing gene expression with small sample sizes and high number of features. Furthermore, imbalanced classes is a common problem in microarray data. This occurs when a dataset is dominated by a class which have significantly more instances than the other minority classes. Therefore, it is needed a classification method that solve the problem of high dimensional and imbalanced data. Support Vector Machine (SVM) is one of the classification methods that is capable of handling large or small samples, nonlinear, high dimensional, over learning and local minimum issues. SVM has been widely applied to DNA microarray data classification and it has been shown that SVM provides the best performance among other machine learning methods. However, imbalanced data will be a problem because SVM treats all samples in the same importance thus the results is bias for minority class. To overcome the imbalanced data, Fuzzy SVM (FSVM) is proposed. This method apply a fuzzy membership to each input point and reformulate the SVM such that different input points provide different contributions to the classifier. The minority classes have large fuzzy membership so FSVM can pay more attention to the samples with larger fuzzy membership. Given DNA microarray data is a high dimensional data with a very large number of features, it is necessary to do feature selection first using Fast Correlation based Filter (FCBF). In this study will be analyzed by SVM, FSVM and both methods by applying FCBF and get the classification performance of them. Based on the overall results, FSVM on selected features has the best classification performance compared to SVM.
Paired fuzzy sets

DEFF Research Database (Denmark)

Rodríguez, J. Tinguaro; Franco de los Ríos, Camilo; Gómez, Daniel

2015-01-01

In this paper we want to stress the relevance of paired fuzzy sets, as already proposed in previous works of the authors, as a family of fuzzy sets that offers a unifying view for different models based upon the opposition of two fuzzy sets, simply allowing the existence of different types...
A Plant Control Technology Using Reinforcement Learning Method with Automatic Reward Adjustment

Science.gov (United States)

Eguchi, Toru; Sekiai, Takaaki; Yamada, Akihiro; Shimizu, Satoru; Fukai, Masayuki

A control technology using Reinforcement Learning (RL) and Radial Basis Function (RBF) Network has been developed to reduce environmental load substances exhausted from power and industrial plants. This technology consists of the statistic model using RBF Network, which estimates characteristics of plants with respect to environmental load substances, and RL agent, which learns the control logic for the plants using the statistic model. In this technology, it is necessary to design an appropriate reward function given to the agent immediately according to operation conditions and control goals to control plants flexibly. Therefore, we propose an automatic reward adjusting method of RL for plant control. This method adjusts the reward function automatically using information of the statistic model obtained in its learning process. In the simulations, it is confirmed that the proposed method can adjust the reward function adaptively for several test functions, and executes robust control toward the thermal power plant considering the change of operation conditions and control goals.
HYBRID SYSTEM BASED FUZZY-PID CONTROL SCHEMES FOR UNPREDICTABLE PROCESS

Directory of Open Access Journals (Sweden)

M.K. Tan

2011-07-01

Full Text Available In general, the primary aim of polymerization industry is to enhance the process operation in order to obtain high quality and purity product. However, a sudden and large amount of heat will be released rapidly during the mixing process of two reactants, i.e. phenol and formalin due to its exothermic behavior. The unpredictable heat will cause deviation of process temperature and hence affect the quality of the product. Therefore, it is vital to control the process temperature during the polymerization. In the modern industry, fuzzy logic is commonly used to auto-tune PID controller to control the process temperature. However, this method needs an experienced operator to fine tune the fuzzy membership function and universe of discourse via trial and error approach. Hence, the setting of fuzzy inference system might not be accurate due to the human errors. Besides that, control of the process can be challenging due to the rapid changes in the plant parameters which will increase the process complexity. This paper proposes an optimization scheme using hybrid of Q-learning (QL and genetic algorithm (GA to optimize the fuzzy membership function in order to allow the conventional fuzzy-PID controller to control the process temperature more effectively. The performances of the proposed optimization scheme are compared with the existing fuzzy-PID scheme. The results show that the proposed optimization scheme is able to control the process temperature more effectively even if disturbance is introduced.
FUZZY CLUSTERING BASED BAYESIAN FRAMEWORK TO PREDICT MENTAL HEALTH PROBLEMS AMONG CHILDREN

Directory of Open Access Journals (Sweden)

M R Sumathi

2017-04-01

Full Text Available According to World Health Organization, 10-20% of children and adolescents all over the world are experiencing mental disorders. Correct diagnosis of mental disorders at an early stage improves the quality of life of children and avoids complicated problems. Various expert systems using artificial intelligence techniques have been developed for diagnosing mental disorders like Schizophrenia, Depression, Dementia, etc. This study focuses on predicting basic mental health problems of children, like Attention problem, Anxiety problem, Developmental delay, Attention Deficit Hyperactivity Disorder (ADHD, Pervasive Developmental Disorder(PDD, etc. using the machine learning techniques, Bayesian Networks and Fuzzy clustering. The focus of the article is on learning the Bayesian network structure using a novel Fuzzy Clustering Based Bayesian network structure learning framework. The performance of the proposed framework was compared with the other existing algorithms and the experimental results have shown that the proposed framework performs better than the earlier algorithms.
Fuzzy Arden Syntax: A fuzzy programming language for medicine.

Science.gov (United States)

Vetterlein, Thomas; Mandl, Harald; Adlassnig, Klaus-Peter

2010-05-01

The programming language Arden Syntax has been optimised for use in clinical decision support systems. We describe an extension of this language named Fuzzy Arden Syntax, whose original version was introduced in S. Tiffe's dissertation on "Fuzzy Arden Syntax: Representation and Interpretation of Vague Medical Knowledge by Fuzzified Arden Syntax" (Vienna University of Technology, 2003). The primary aim is to provide an easy means of processing vague or uncertain data, which frequently appears in medicine. For both propositional and number data types, fuzzy equivalents have been added to Arden Syntax. The Boolean data type was generalised to represent any truth degree between the two extremes 0 (falsity) and 1 (truth); fuzzy data types were introduced to represent fuzzy sets. The operations on truth values and real numbers were generalised accordingly. As the conditions to decide whether a certain programme unit is executed or not may be indeterminate, a Fuzzy Arden Syntax programme may split. The data in the different branches may be optionally aggregated subsequently. Fuzzy Arden Syntax offers the possibility to formulate conveniently Medical Logic Modules (MLMs) based on the principle of a continuously graded applicability of statements. Furthermore, ad hoc decisions about sharp value boundaries can be avoided. As an illustrative example shows, an MLM making use of the features of Fuzzy Arden Syntax is not significantly more complex than its Arden Syntax equivalent; in the ideal case, a programme handling crisp data remains practically unchanged when compared to its fuzzified version. In the latter case, the output data, which can be a set of weighted alternatives, typically depends continuously from the input data. In typical applications an Arden Syntax MLM can produce a different output after only slight changes of the input; discontinuities are in fact unavoidable when the input varies continuously but the output is taken from a discrete set of possibilities
Brain Circuits of Methamphetamine Place Reinforcement Learning: The Role of the Hippocampus-VTA Loop.

Science.gov (United States)

Keleta, Yonas B; Martinez, Joe L

2012-03-01

The reinforcing effects of addictive drugs including methamphetamine (METH) involve the midbrain ventral tegmental area (VTA). VTA is primary source of dopamine (DA) to the nucleus accumbens (NAc) and the ventral hippocampus (VHC). These three brain regions are functionally connected through the hippocampal-VTA loop that includes two main neural pathways: the bottom-up pathway and the top-down pathway. In this paper, we take the view that addiction is a learning process. Therefore, we tested the involvement of the hippocampus in reinforcement learning by studying conditioned place preference (CPP) learning by sequentially conditioning each of the three nuclei in either the bottom-up order of conditioning; VTA, then VHC, finally NAc, or the top-down order; VHC, then VTA, finally NAc. Following habituation, the rats underwent experimental modules consisting of two conditioning trials each followed by immediate testing (test 1 and test 2) and two additional tests 24 h (test 3) and/or 1 week following conditioning (test 4). The module was repeated three times for each nucleus. The results showed that METH, but not Ringer's, produced positive CPP following conditioning each brain area in the bottom-up order. In the top-down order, METH, but not Ringer's, produced either an aversive CPP or no learning effect following conditioning each nucleus of interest. In addition, METH place aversion was antagonized by coadministration of the N-methyl-d-aspartate (NMDA) receptor antagonist MK801, suggesting that the aversion learning was an NMDA receptor activation-dependent process. We conclude that the hippocampus is a critical structure in the reward circuit and hence suggest that the development of target-specific therapeutics for the control of addiction emphasizes on the hippocampus-VTA top-down connection.
Reinforcement learning of self-regulated β-oscillations for motor restoration in chronic stroke

Directory of Open Access Journals (Sweden)

Georgios eNaros

2015-07-01

Full Text Available Neurofeedback training of motor imagery-related brain-states with brain-machine interfaces (BMI is currently being explored prior to standard physiotherapy to improve the motor outcome of stroke rehabilitation. Pilot studies suggest that such a priming intervention before physiotherapy might increase the responsiveness of the brain to the subsequent physiotherapy, thereby improving the clinical outcome. However, there is little evidence up to now that these BMI-based interventions have achieved operate conditioning of specific brain states that facilitate task-specific functional gains beyond the practice of primed physiotherapy. In this context, we argue that BMI technology needs to aim at physiological features relevant for the targeted behavioral gain. Moreover, this therapeutic intervention has to be informed by concepts of reinforcement learning to develop its full potential. Such a refined neurofeedback approach would need to address the following issues (1 Defining a physiological feedback target specific to the intended behavioral gain, e.g. β-band oscillations for cortico-muscular communication. This targeted brain state could well be different from the brain state optimal for the neurofeedback task (2 Selecting a BMI classification and thresholding approach on the basis of learning principles, i.e. balancing challenge and reward of the neurofeedback task instead of maximizing the classification accuracy of the feedback device (3 Adjusting the feedback in the course of the training period to account for the cognitive load and the learning experience of the participant. The proposed neurofeedback strategy provides evidence for the feasibility of the suggested approach by demonstrating that dynamic threshold adaptation based on reinforcement learning may lead to frequency-specific operant conditioning of β-band oscillations paralleled by task-specific motor improvement; a proposal that requires investigation in a larger cohort of stroke
Intuitionistic fuzzy logics

CERN Document Server

T Atanassov, Krassimir

2017-01-01

The book offers a comprehensive survey of intuitionistic fuzzy logics. By reporting on both the author’s research and others’ findings, it provides readers with a complete overview of the field and highlights key issues and open problems, thus suggesting new research directions. Starting with an introduction to the basic elements of intuitionistic fuzzy propositional calculus, it then provides a guide to the use of intuitionistic fuzzy operators and quantifiers, and lastly presents state-of-the-art applications of intuitionistic fuzzy sets. The book is a valuable reference resource for graduate students and researchers alike.
Identifikasi Gangguan Neurologis Menggunakan Metode Adaptive Neuro Fuzzy Inference System (ANFIS

Directory of Open Access Journals (Sweden)

Jani Kusanti

2015-07-01

Abstract The use of Adaptive Neuro Fuzzy Inference System (ANFIS methods in the process of identifying one of neurological disorders in the head, known in medical terms ischemic stroke from the ct scan of the head in order to identify the location of ischemic stroke. The steps are performed in the extraction process of identifying, among others, the image of the ct scan of the head by using a histogram. Enhanced image of the intensity histogram image results using Otsu threshold to obtain results pixels rated 1 related to the object while pixel rated 0 associated with the measurement background. The result used for image clustering process, to process image clusters used fuzzy c-mean (FCM clustering result is a row of the cluster center, the results of the data used to construct a fuzzy inference system (FIS. Fuzzy inference system applied is fuzzy inference model of Takagi-Sugeno-Kang. In this study ANFIS is used to optimize the results of the determination of the location of the blockage ischemic stroke. Used recursive least squares estimator (RLSE for learning. RMSE results obtained in the training process of 0.0432053, while in the process of generated test accuracy rate of 98.66% Keywords— Stroke Ischemik, Global threshold, Fuzzy Inference System model Sugeno, ANFIS, RMSE
(L,M-Fuzzy σ-Algebras

Directory of Open Access Journals (Sweden)

Fu-Gui Shi

2010-01-01

Full Text Available The notion of (L,M-fuzzy σ-algebras is introduced in the lattice value fuzzy set theory. It is a generalization of Klement's fuzzy σ-algebras. In our definition of (L,M-fuzzy σ-algebras, each L-fuzzy subset can be regarded as an L-measurable set to some degree.
Hybrid ellipsoidal fuzzy systems in forecasting regional electricity loads

Energy Technology Data Exchange (ETDEWEB)

Pai, Ping-Feng [Department of Information Management, National Chi Nan University, 1 University Road, Puli, Nantou 545, Taiwan (China)

2006-09-15

Because of the privatization of electricity in many countries, load forecasting has become one of the most crucial issues in the planning and operations of electric utilities. In addition, accurate regional load forecasting can provide the transmission and distribution operators with more information. The hybrid ellipsoidal fuzzy system was originally designed to solve control and pattern recognition problems. The main objective of this investigation is to develop a hybrid ellipsoidal fuzzy system for time series forecasting (HEFST) and apply the proposed model to forecast regional electricity loads in Taiwan. Additionally, a scaled conjugate gradient learning method is employed in the supervised learning phase of the HEFST model. Subsequently, numerical data taken from the existing literature is used to demonstrate the forecasting performance of the HEFST model. Simulation results reveal that the proposed model has better forecasting performance than the artificial neural network model and the regression model. Thus, the HEFST model is a valid and promising alternative for forecasting regional electricity loads. (author)
Recurrent fuzzy ranking methods

Science.gov (United States)

Hajjari, Tayebeh

2012-11-01

With the increasing development of fuzzy set theory in various scientific fields and the need to compare fuzzy numbers in different areas. Therefore, Ranking of fuzzy numbers plays a very important role in linguistic decision-making, engineering, business and some other fuzzy application systems. Several strategies have been proposed for ranking of fuzzy numbers. Each of these techniques has been shown to produce non-intuitive results in certain case. In this paper, we reviewed some recent ranking methods, which will be useful for the researchers who are interested in this area.
Fuzzy production planning models for an unreliable production system with fuzzy production rate and stochastic/fuzzy demand rate

Directory of Open Access Journals (Sweden)

K. A. Halim

2011-01-01

Full Text Available In this article, we consider a single-unit unreliable production system which produces a single item. During a production run, the production process may shift from the in-control state to the out-of-control state at any random time when it produces some defective items. The defective item production rate is assumed to be imprecise and is characterized by a trapezoidal fuzzy number. The production rate is proportional to the demand rate where the proportionality constant is taken to be a fuzzy number. Two production planning models are developed on the basis of fuzzy and stochastic demand patterns. The expected cost per unit time in the fuzzy sense is derived in each model and defuzzified by using the graded mean integration representation method. Numerical examples are provided to illustrate the optimal results of the proposed fuzzy models.
Construction of Fuzzy Sets and Applying Aggregation Operators for Fuzzy Queries

DEFF Research Database (Denmark)

Hudec, Miroslav; Sudzina, Frantisek

Flexible query conditions could use linguistic terms described by fuzzy sets. The question is how to properly construct fuzzy sets for each linguistic term and apply an adequate aggregation function. For construction of fuzzy sets, the lowest value, the highest value of attribute...... and the distribution of data inside its domain are used. The logarithmic transformation of domains appears to be suitable. This way leads to a balanced distribution of tuples over fuzzy sets. In addition, users’ opinions about linguistic terms as well as current content in database are merged. The second investigated...
Fuzzy social choice theory

CERN Document Server

B Gibilisco, Michael; E Albert, Karen; N Mordeson, John; J Wierman, Mark; D Clark, Terry

2014-01-01

This book offers a comprehensive analysis of the social choice literature and shows, by applying fuzzy sets, how the use of fuzzy preferences, rather than that of strict ones, may affect the social choice theorems. To do this, the book explores the presupposition of rationality within the fuzzy framework and shows that the two conditions for rationality, completeness and transitivity, do exist with fuzzy preferences. Specifically, this book examines: the conditions under which a maximal set exists; the Arrow’s theorem; the Gibbard-Satterthwaite theorem; and the median voter theorem. After showing that a non-empty maximal set does exists for fuzzy preference relations, this book goes on to demonstrating the existence of a fuzzy aggregation rule satisfying all five Arrowian conditions, including non-dictatorship. While the Gibbard-Satterthwaite theorem only considers individual fuzzy preferences, this work shows that both individuals and groups can choose alternatives to various degrees, resulting in a so...
Relations Among Some Fuzzy Entropy Formulae

Institute of Scientific and Technical Information of China (English)

卿铭

2004-01-01

Fuzzy entropy has been widely used to analyze and design fuzzy systems, and many fuzzy entropy formulae have been proposed. For further in-deepth analysis of fuzzy entropy, the axioms and some important formulae of fuzzy entropy are introduced. Some equivalence results among these fuzzy entropy formulae are proved, and it is shown that fuzzy entropy is a special distance measurement.

Decision Making in Reinforcement Learning Using a Modified Learning Space Based on the Importance of Sensors

Directory of Open Access Journals (Sweden)

Yasutaka Kishima

2013-01-01

Full Text Available Many studies have been conducted on the application of reinforcement learning (RL to robots. A robot which is made for general purpose has redundant sensors or actuators because it is difficult to assume an environment that the robot will face and a task that the robot must execute. In this case, -space on RL contains redundancy so that the robot must take much time to learn a given task. In this study, we focus on the importance of sensors with regard to a robot’s performance of a particular task. The sensors that are applicable to a task differ according to the task. By using the importance of the sensors, we try to adjust the state number of the sensors and to reduce the size of -space. In this paper, we define the measure of importance of a sensor for a task with the correlation between the value of each sensor and reward. A robot calculates the importance of the sensors and makes the size of -space smaller. We propose the method which reduces learning space and construct the learning system by putting it in RL. In this paper, we confirm the effectiveness of our proposed system with an experimental robot.
New fuzzy EWMA control charts for monitoring phase II fuzzy profiles

Directory of Open Access Journals (Sweden)

Ghazale Moghadam

2016-01-01

Full Text Available In many quality control applications, the quality of a process or product is explained by the relationship between response variable and one or more explanatory variables, called a profile. In this paper, a new fuzzy EWMA control chart for phase II fuzzy profile monitoring is proposed. To this end, we extend EWMA control charts to its equivalent Fuzzy type and then implement fuzzy ranking methods to determine whether the process fuzzy profile is under or out of control. The proposed method is capable of identifying small changes in process under condition of process profile explaining parameters vagueness, roughness and uncertainty. Determining the source of changes, this method provides us with the possibility of recognizing the causes of process transition from stable mode, removing these causes and restoring the process stable mode.
Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games.

Science.gov (United States)

Li, Jinna; Modares, Hamidreza; Chai, Tianyou; Lewis, Frank L; Xie, Lihua

2017-10-01

This paper develops an off-policy reinforcement learning (RL) algorithm to solve optimal synchronization of multiagent systems. This is accomplished by using the framework of graphical games. In contrast to traditional control protocols, which require complete knowledge of agent dynamics, the proposed off-policy RL algorithm is a model-free approach, in that it solves the optimal synchronization problem without knowing any knowledge of the agent dynamics. A prescribed control policy, called behavior policy, is applied to each agent to generate and collect data for learning. An off-policy Bellman equation is derived for each agent to learn the value function for the policy under evaluation, called target policy, and find an improved policy, simultaneously. Actor and critic neural networks along with least-square approach are employed to approximate target control policies and value functions using the data generated by applying prescribed behavior policies. Finally, an off-policy RL algorithm is presented that is implemented in real time and gives the approximate optimal control policy for each agent using only measured data. It is shown that the optimal distributed policies found by the proposed algorithm satisfy the global Nash equilibrium and synchronize all agents to the leader. Simulation results illustrate the effectiveness of the proposed method.
Application of adaptive fuzzy control technology to pressure control of a pressurizer

Institute of Scientific and Technical Information of China (English)

YANG Ben-kun; BIAN Xin-qian; GUO Wei-lai

2005-01-01

A pressurizer is one of important equipment in a pressurized water reactor plant. It is used to maintain the pressure of primary coolant within allowed range because the sharp change of coolant pressure affects the security of reactor,therefor,the study of pressurizer's pressure control methods is very important. In this paper, an adaptive fuzzy controller is presented for pressure control of a pressurizer in a nuclear power plant. The controller can on-line tune fuzzy control rules and parameters by self-learning in the actual control process, which possesses the way of thinking like human to make a decision. The simulation results for a pressurized water reactor plant show that the adaptive fuzzy controller has optimum and intelligent characteristics, which prove the controller is effective.
Fuzzy inference game approach to uncertainty in business decisions and market competitions.

Science.gov (United States)

Oderanti, Festus Oluseyi

2013-01-01

The increasing challenges and complexity of business environments are making business decisions and operations more difficult for entrepreneurs to predict the outcomes of these processes. Therefore, we developed a decision support scheme that could be used and adapted to various business decision processes. These involve decisions that are made under uncertain situations such as business competition in the market or wage negotiation within a firm. The scheme uses game strategies and fuzzy inference concepts to effectively grasp the variables in these uncertain situations. The games are played between human and fuzzy players. The accuracy of the fuzzy rule base and the game strategies help to mitigate the adverse effects that a business may suffer from these uncertain factors. We also introduced learning which enables the fuzzy player to adapt over time. We tested this scheme in different scenarios and discover that it could be an invaluable tool in the hand of entrepreneurs that are operating under uncertain and competitive business environments.
A Hybrid Neuro-Fuzzy Model For Integrating Large Earth-Science Datasets

Science.gov (United States)

Porwal, A.; Carranza, J.; Hale, M.

2004-12-01

A GIS-based hybrid neuro-fuzzy approach to integration of large earth-science datasets for mineral prospectivity mapping is described. It implements a Takagi-Sugeno type fuzzy inference system in the framework of a four-layered feed-forward adaptive neural network. Each unique combination of the datasets is considered a feature vector whose components are derived by knowledge-based ordinal encoding of the constituent datasets. A subset of feature vectors with a known output target vector (i.e., unique conditions known to be associated with either a mineralized or a barren location) is used for the training of an adaptive neuro-fuzzy inference system. Training involves iterative adjustment of parameters of the adaptive neuro-fuzzy inference system using a hybrid learning procedure for mapping each training vector to its output target vector with minimum sum of squared error. The trained adaptive neuro-fuzzy inference system is used to process all feature vectors. The output for each feature vector is a value that indicates the extent to which a feature vector belongs to the mineralized class or the barren class. These values are used to generate a prospectivity map. The procedure is demonstrated by an application to regional-scale base metal prospectivity mapping in a study area located in the Aravalli metallogenic province (western India). A comparison of the hybrid neuro-fuzzy approach with pure knowledge-driven fuzzy and pure data-driven neural network approaches indicates that the former offers a superior method for integrating large earth-science datasets for predictive spatial mathematical modelling.
Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

Science.gov (United States)

Uchibe, Eiji; Doya, Kenji

2008-12-01

Understanding the design principle of reward functions is a substantial challenge both in artificial intelligence and neuroscience. Successful acquisition of a task usually requires not only rewards for goals, but also for intermediate states to promote effective exploration. This paper proposes a method for designing 'intrinsic' rewards of autonomous agents by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use Cyber Rodent robots, in which collision avoidance, recharging from battery packs, and 'mating' by software reproduction are three major 'extrinsic' rewards. We show in hardware experiments that the robots can find appropriate 'intrinsic' rewards for the vision of battery packs and other robots to promote approach behaviors.
Inference of RMR value using fuzzy set theory and neuro-fuzzy techniques

Energy Technology Data Exchange (ETDEWEB)

Bae, Gyu-Jin; Cho, Mahn-Sup [Korea Institute of Construction Technology, Koyang(Korea)

2001-12-31

In the design of tunnel, it contains inaccuracy of data, fuzziness of evaluation, observer error and so on. The face observation during tunnel excavation, therefore, plays an important role to raise stability and to reduce supporting cost. This study is carried out to minimize the subjectiveness of observer and to exactly evaluate the natural properties of ground during the face observation. For these purpose, fuzzy set theory and neuro-fuzzy techniques in artificial intelligent techniques are applied to the inference of the RMR(Rock Mass Rating) value from the observation data. The correlation between original RMR value and inferred RMR{sub {sub F}U} and RMR{sub {sub N}F} values from fuzzy Set theory and neuro-fuzzy techniques is investigated using 46 data. The results show that good correlation between original RMR value and inferred RMR{sub {sub F}U} and RMR{sub {sub N}F} values is observed when the correlation coefficients are |R|=0.96 and |R|=0.95 respectively. >From these results, applicability of fuzzy set theory and neuro-fuzzy techniques to rock mass classification is proved to be sufficiently high enough. (author). 17 refs., 5 tabs., 9 figs.
Outdoor altitude stabilization of QuadRotor based on type-2 fuzzy and fuzzy PID

Science.gov (United States)

Wicaksono, H.; Yusuf, Y. G.; Kristanto, C.; Haryanto, L.

2017-11-01

This paper presents a design of altitude stabilization of QuadRotor based on type-2 fuzzy and fuzzy PID. This practical design is implemented outdoor. Barometric and sonar sensor were used in this experiment as an input for the controller YoHe. The throttle signal as a control input was provided by the controller to leveling QuadRotor in particular altitude and known well as altitude stabilization. The parameter of type-2 fuzzy and fuzzy PID was tuned in several heights to get the best control parameter for any height. Type-2 fuzzy produced better result than fuzzy PID but had a slow response in the beginning.
ps-ro Fuzzy Open(Closed Functions and ps-ro Fuzzy Semi-Homeomorphism

Directory of Open Access Journals (Sweden)

Pankaj Chettri

2015-11-01

Full Text Available The aim of this paper is to introduce and characterize some new class of functions in a fuzzy topological space termed as ps-ro fuzzy open(closed functions, ps-ro fuzzy pre semiopen functions and ps-ro fuzzy semi-homeomorphism. The interrelation among these concepts and also their relations with the parallel existing concepts are established. It is also shown with the help of examples that these newly introduced concepts are independent of the well known existing allied concepts.
Determination of interrill soil erodibility coefficient based on Fuzzy and Fuzzy-Genetic Systems

Directory of Open Access Journals (Sweden)

Habib Palizvan Zand

2017-02-01

Full Text Available Introduction: Although the fuzzy logic science has been used successfully in various sudies of hydrology and soil erosion, but in literature review no article was found about its performance for estimating of interrill erodibility. On the other hand, studies indicate that genetic algorithm techniques can be used in fuzzy models and finding the appropriate membership functions for linguistic variables and fuzzy rules. So this study was conducted to develop the fuzzy and fuzzy–genetics models and investigation of their performance in the estimation of soil interrill erodibility factor (Ki. Materials and Methods: For this reason 36 soil samples with different physical and chemical properties were collected from west of Azerbaijan province . soilsamples were also taken from the Ap or A horizon of each soil profile. The samples were air-dried , sieved and Some soil characteristics such as soil texture, organic matter (OM, cation exchange capacity (CEC, sodium adsorption ratio (SAR, EC and pH were determined by the standard laboratory methods. Aggregates size distributions (ASD were determined by the wet-sieving method and fractal dimension of soil aggregates (Dn was also calculated. In order to determination of soil interrill erodibility, the flume experiment performed by packing soil a depth of 0.09-m in 0.5 × 1.0 m. soil was saturated from the base and adjusted to 9% slope and was subjected to at least 90 min rainfall . Rainfall intensity treatments were 20, 37 and 47 mm h-1. During each rainfall event, runoff was collected manually in different time intervals, being less than 60 s at the beginning, up to 15 min near the end of the test. At the end of the experiment, the volumes of runoff samples and the mass of sediment load at each time interval were measured. Finally interrill erodibility values were calculated using Kinnell (11 Equation. Then by statistical analyses Dn and sand percent of the soils were selected as input variables and Ki as
Foundations Of Fuzzy Control

DEFF Research Database (Denmark)

Jantzen, Jan

The objective of this textbook is to acquire an understanding of the behaviour of fuzzy logic controllers. Under certain conditions a fuzzy controller is equivalent to a proportional-integral-derivative (PID) controller. Using that equivalence as a link, the book applies analysis methods from...... linear and nonlinear control theory. In the linear domain, PID tuning methods and stability analyses are transferred to linear fuzzy controllers. The Nyquist plot shows the robustness of different settings of the fuzzy gain parameters. As a result, a fuzzy controller is guaranteed to perform as well...... as any PID controller. In the nonlinear domain, the stability of four standard control surfaces is analysed by means of describing functions and Nyquist plots. The self-organizing controller (SOC) is shown to be a model reference adaptive controller. There is a possibility that a nonlinear fuzzy PID...
Recurrent fuzzy neural network by using feedback error learning approaches for LFC in interconnected power system

International Nuclear Information System (INIS)

Sabahi, Kamel; Teshnehlab, Mohammad; Shoorhedeli, Mahdi Aliyari

2009-01-01

In this study, a new adaptive controller based on modified feedback error learning (FEL) approaches is proposed for load frequency control (LFC) problem. The FEL strategy consists of intelligent and conventional controllers in feedforward and feedback paths, respectively. In this strategy, a conventional feedback controller (CFC), i.e. proportional, integral and derivative (PID) controller, is essential to guarantee global asymptotic stability of the overall system; and an intelligent feedforward controller (INFC) is adopted to learn the inverse of the controlled system. Therefore, when the INFC learns the inverse of controlled system, the tracking of reference signal is done properly. Generally, the CFC is designed at nominal operating conditions of the system and, therefore, fails to provide the best control performance as well as global stability over a wide range of changes in the operating conditions of the system. So, in this study a supervised controller (SC), a lookup table based controller, is addressed for tuning of the CFC. During abrupt changes of the power system parameters, the SC adjusts the PID parameters according to these operating conditions. Moreover, for improving the performance of overall system, a recurrent fuzzy neural network (RFNN) is adopted in INFC instead of the conventional neural network, which was used in past studies. The proposed FEL controller has been compared with the conventional feedback error learning controller (CFEL) and the PID controller through some performance indices
Dopaminergic control of motivation and reinforcement learning: a closed-circuit account for reward-oriented behavior.

Science.gov (United States)

Morita, Kenji; Morishima, Mieko; Sakai, Katsuyuki; Kawaguchi, Yasuo

2013-05-15

Humans and animals take actions quickly when they expect that the actions lead to reward, reflecting their motivation. Injection of dopamine receptor antagonists into the striatum has been shown to slow such reward-seeking behavior, suggesting that dopamine is involved in the control of motivational processes. Meanwhile, neurophysiological studies have revealed that phasic response of dopamine neurons appears to represent reward prediction error, indicating that dopamine plays central roles in reinforcement learning. However, previous attempts to elucidate the mechanisms of these dopaminergic controls have not fully explained how the motivational and learning aspects are related and whether they can be understood by the way the activity of dopamine neurons itself is controlled by their upstream circuitries. To address this issue, we constructed a closed-circuit model of the corticobasal ganglia system based on recent findings regarding intracortical and corticostriatal circuit architectures. Simulations show that the model could reproduce the observed distinct motivational effects of D1- and D2-type dopamine receptor antagonists. Simultaneously, our model successfully explains the dopaminergic representation of reward prediction error as observed in behaving animals during learning tasks and could also explain distinct choice biases induced by optogenetic stimulation of the D1 and D2 receptor-expressing striatal neurons. These results indicate that the suggested roles of dopamine in motivational control and reinforcement learning can be understood in a unified manner through a notion that the indirect pathway of the basal ganglia represents the value of states/actions at a previous time point, an empirically driven key assumption of our model.
A fuzzy-ontology-oriented case-based reasoning framework for semantic diabetes diagnosis.

Science.gov (United States)

El-Sappagh, Shaker; Elmogy, Mohammed; Riad, A M

2015-11-01

Case-based reasoning (CBR) is a problem-solving paradigm that uses past knowledge to interpret or solve new problems. It is suitable for experience-based and theory-less problems. Building a semantically intelligent CBR that mimic the expert thinking can solve many problems especially medical ones. Knowledge-intensive CBR using formal ontologies is an evolvement of this paradigm. Ontologies can be used for case representation and storage, and it can be used as a background knowledge. Using standard medical ontologies, such as SNOMED CT, enhances the interoperability and integration with the health care systems. Moreover, utilizing vague or imprecise knowledge further improves the CBR semantic effectiveness. This paper proposes a fuzzy ontology-based CBR framework. It proposes a fuzzy case-base OWL2 ontology, and a fuzzy semantic retrieval algorithm that handles many feature types. This framework is implemented and tested on the diabetes diagnosis problem. The fuzzy ontology is populated with 60 real diabetic cases. The effectiveness of the proposed approach is illustrated with a set of experiments and case studies. The resulting system can answer complex medical queries related to semantic understanding of medical concepts and handling of vague terms. The resulting fuzzy case-base ontology has 63 concepts, 54 (fuzzy) object properties, 138 (fuzzy) datatype properties, 105 fuzzy datatypes, and 2640 instances. The system achieves an accuracy of 97.67%. We compare our framework with existing CBR systems and a set of five machine-learning classifiers; our system outperforms all of these systems. Building an integrated CBR system can improve its performance. Representing CBR knowledge using the fuzzy ontology and building a case retrieval algorithm that treats different features differently improves the accuracy of the resulting systems. Copyright © 2015 Elsevier B.V. All rights reserved.
Fuzzy Privacy Decision for Context-Aware Access Personal Information

Institute of Scientific and Technical Information of China (English)

ZHANG Qingsheng; QI Yong; ZHAO Jizhong; HOU Di; NIU Yujie

2007-01-01

A context-aware privacy protection framework was designed for context-aware services and privacy control methods about access personal information in pervasive environment. In the process of user's privacy decision, it can produce fuzzy privacy decision as the change of personal information sensitivity and personal information receiver trust. The uncertain privacy decision model was proposed about personal information disclosure based on the change of personal information receiver trust and personal information sensitivity. A fuzzy privacy decision information system was designed according to this model. Personal privacy control policies can be extracted from this information system by using rough set theory. It also solves the problem about learning privacy control policies of personal information disclosure.
Energy Management Strategy for a Hybrid Electric Vehicle Based on Deep Reinforcement Learning

Directory of Open Access Journals (Sweden)

Yue Hu

2018-01-01

Full Text Available An energy management strategy (EMS is important for hybrid electric vehicles (HEVs since it plays a decisive role on the performance of the vehicle. However, the variation of future driving conditions deeply influences the effectiveness of the EMS. Most existing EMS methods simply follow predefined rules that are not adaptive to different driving conditions online. Therefore, it is useful that the EMS can learn from the environment or driving cycle. In this paper, a deep reinforcement learning (DRL-based EMS is designed such that it can learn to select actions directly from the states without any prediction or predefined rules. Furthermore, a DRL-based online learning architecture is presented. It is significant for applying the DRL algorithm in HEV energy management under different driving conditions. Simulation experiments have been conducted using MATLAB and Advanced Vehicle Simulator (ADVISOR co-simulation. Experimental results validate the effectiveness of the DRL-based EMS compared with the rule-based EMS in terms of fuel economy. The online learning architecture is also proved to be effective. The proposed method ensures the optimality, as well as real-time applicability, in HEVs.
Improved Fuzzy Modelling to Predict the Academic Performance of Distance Education Students

Directory of Open Access Journals (Sweden)

Osman Yildiz

2013-12-01

Full Text Available It is essential to predict distance education students’ year-end academic performance early during the course of the semester and to take precautions using such prediction-based information. This will, in particular, help enhance their academic performance and, therefore, improve the overall educational quality. The present study was on the development of a mathematical model intended to predict distance education students’ year-end academic performance using the first eight-week data on the learning management system. First, two fuzzy models were constructed, namely the classical fuzzy model and the expert fuzzy model, the latter being based on expert opinion. Afterwards, a gene-fuzzy model was developed optimizing membership functions through genetic algorithm. The data on distance education were collected through Moodle, an open source learning management system. The data were on a total of 218 students who enrolled in Basic Computer Sciences in 2012. The input data consisted of the following variables: When a student logged on to the system for the last time after the content of a lesson was uploaded, how often he/she logged on to the system, how long he/she stayed online in the last login, what score he/she got in the quiz taken in Week 4, and what score he/she got in the midterm exam taken in Week 8. A comparison was made among the predictions of the three models concerning the students’ year-end academic performance.
An analysis of intergroup rivalry using Ising model and reinforcement learning

Science.gov (United States)

Zhao, Feng-Fei; Qin, Zheng; Shao, Zhuo

2014-01-01

Modeling of intergroup rivalry can help us better understand economic competitions, political elections and other similar activities. The result of intergroup rivalry depends on the co-evolution of individual behavior within one group and the impact from the rival group. In this paper, we model the rivalry behavior using Ising model. Different from other simulation studies using Ising model, the evolution rules of each individual in our model are not static, but have the ability to learn from historical experience using reinforcement learning technique, which makes the simulation more close to real human behavior. We studied the phase transition in intergroup rivalry and focused on the impact of the degree of social freedom, the personality of group members and the social experience of individuals. The results of computer simulation show that a society with a low degree of social freedom and highly educated, experienced individuals is more likely to be one-sided in intergroup rivalry.
Comparison of adaptive neuro-fuzzy inference system (ANFIS) and Gaussian processes for machine learning (GPML) algorithms for the prediction of skin temperature in lower limb prostheses.

Science.gov (United States)

Mathur, Neha; Glesk, Ivan; Buis, Arjan

2016-10-01

Monitoring of the interface temperature at skin level in lower-limb prosthesis is notoriously complicated. This is due to the flexible nature of the interface liners used impeding the required consistent positioning of the temperature sensors during donning and doffing. Predicting the in-socket residual limb temperature by monitoring the temperature between socket and liner rather than skin and liner could be an important step in alleviating complaints on increased temperature and perspiration in prosthetic sockets. In this work, we propose to implement an adaptive neuro fuzzy inference strategy (ANFIS) to predict the in-socket residual limb temperature. ANFIS belongs to the family of fused neuro fuzzy system in which the fuzzy system is incorporated in a framework which is adaptive in nature. The proposed method is compared to our earlier work using Gaussian processes for machine learning. By comparing the predicted and actual data, results indicate that both the modeling techniques have comparable performance metrics and can be efficiently used for non-invasive temperature monitoring. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.

Properties of Bipolar Fuzzy Hypergraphs

OpenAIRE

Akram, M.; Dudek, W. A.; Sarwar, S.

2013-01-01

In this article, we apply the concept of bipolar fuzzy sets to hypergraphs and investigate some properties of bipolar fuzzy hypergraphs. We introduce the notion of $A-$ tempered bipolar fuzzy hypergraphs and present some of their properties. We also present application examples of bipolar fuzzy hypergraphs.
The Motion Path Study of Measuring Robot Based on Variable Universe Fuzzy Control

Directory of Open Access Journals (Sweden)

Ma Guoqing

2017-01-01

Full Text Available For the problem of measuring robot requires a higher positioning, firstly learning about the error overview of the system, analysised the influence of attitude, speed and other factors on systematic errors. Then collected and analyzed the systematic error curve in the track to complete the planning process. The last adding fuzzy control in both cases, by comparing with the original system, can found that the method based on fuzzy control system can significantly reduce the error during the motion.
Neural systems underlying aversive conditioning in humans with primary and secondary reinforcers

Directory of Open Access Journals (Sweden)

Mauricio R Delgado

2011-05-01

Full Text Available Money is a secondary reinforcer commonly used across a range of disciplines in experimental paradigms investigating reward learning and decision-making. The effectiveness of monetary reinforcers during aversive learning and its neural basis, however, remains a topic of debate. Specifically, it is unclear if the initial acquisition of aversive representations of monetary losses depends on similar neural systems as more traditional aversive conditioning that involves primary reinforcers. This study contrasts the efficacy of a biologically defined primary reinforcer (shock and a socially defined secondary reinforcer (money during aversive learning and its associated neural circuitry. During a two-part experiment, participants first played a gambling game where wins and losses were based on performance to gain an experimental bank. Participants were then exposed to two separate aversive conditioning sessions. In one session, a primary reinforcer (mild shock served as an unconditioned stimulus (US and was paired with one of two colored squares, the conditioned stimuli (CS+ and CS-, respectively. In another session, a secondary reinforcer (loss of money served as the US and was paired with one of two different CS. Skin conductance responses were greater for CS+ compared to CS- trials irrespective of type of reinforcer. Neuroimaging results revealed that the striatum, a region typically linked with reward-related processing, was found to be involved in the acquisition of aversive conditioned response irrespective of reinforcer type. In contrast, the amygdala was involved during aversive conditioning with primary reinforcers, as suggested by both an exploratory fMRI analysis and a follow-up case study with a patient with bilateral amygdala damage. Taken together, these results suggest that learning about potential monetary losses may depend on reinforcement learning related systems, rather than on typical structures involved in more biologically based
Word Similarity from Dictionaries: Inferring Fuzzy Measures from Fuzzy Graphs

Directory of Open Access Journals (Sweden)

Vicenc Torra

2008-01-01

Full Text Available WORD SIMILARITY FROM DICTIONARIES: INFERRING FUZZY MEASURES FROM FUZZY GRAPHS The computation of similarities between words is a basic element of information retrieval systems, when retrieval is not solely based on word matching. In this work we consider a measure between words based on dictionaries. This is achieved assuming that a dictionary is formalized as a fuzzy graph. We show that the approach permits to compute measures not only for pairs of words but for sets of them.
Comparison of fuzzy AHP and fuzzy TODIM methods for landfill location selection.

Science.gov (United States)

Hanine, Mohamed; Boutkhoum, Omar; Tikniouine, Abdessadek; Agouti, Tarik

2016-01-01

Landfill location selection is a multi-criteria decision problem and has a strategic importance for many regions. The conventional methods for landfill location selection are insufficient in dealing with the vague or imprecise nature of linguistic assessment. To resolve this problem, fuzzy multi-criteria decision-making methods are proposed. The aim of this paper is to use fuzzy TODIM (the acronym for Interactive and Multi-criteria Decision Making in Portuguese) and the fuzzy analytic hierarchy process (AHP) methods for the selection of landfill location. The proposed methods have been applied to a landfill location selection problem in the region of Casablanca, Morocco. After determining the criteria affecting the landfill location decisions, fuzzy TODIM and fuzzy AHP methods are applied to the problem and results are presented. The comparisons of these two methods are also discussed.
Fast Conflict Resolution Based on Reinforcement Learning in Multi-agent System

Institute of Scientific and Technical Information of China (English)

PIAOSonghao; HONGBingrong; CHUHaitao

2004-01-01

In multi-agent system where each agen thas a different goal (even the team of agents has the same goal), agents must be able to resolve conflicts arising in the process of achieving their goal. Many researchers presented methods for conflict resolution, e.g., Reinforcement learning (RL), but the conventional RL requires a large computation cost because every agent must learn, at the same time the overlap of actions selected by each agent results in local conflict. Therefore in this paper, we propose a novel method to solve these problems. In order to deal with the conflict within the multi-agent system, the concept of potential field function based Action selection priority level (ASPL) is brought forward. In this method, all kinds of environment factor that may have influence on the priority are effectively computed with the potential field function. So the priority to access the local resource can be decided rapidly. By avoiding the complex coordination mechanism used in general multi-agent system, the conflict in multi-agent system is settled more efficiently. Our system consists of RL with ASPL module and generalized rules module. Using ASPL, RL module chooses a proper cooperative behavior, and generalized rule module can accelerate the learning process. By applying the proposed method to Robot Soccer, the learning process can be accelerated. The results of simulation and real experiments indicate the effectiveness of the method.
(Fuzzy) Ideals of BN-Algebras

Science.gov (United States)

Walendziak, Andrzej

2015-01-01

The notions of an ideal and a fuzzy ideal in BN-algebras are introduced. The properties and characterizations of them are investigated. The concepts of normal ideals and normal congruences of a BN-algebra are also studied, the properties of them are displayed, and a one-to-one correspondence between them is presented. Conditions for a fuzzy set to be a fuzzy ideal are given. The relationships between ideals and fuzzy ideals of a BN-algebra are established. The homomorphic properties of fuzzy ideals of a BN-algebra are provided. Finally, characterizations of Noetherian BN-algebras and Artinian BN-algebras via fuzzy ideals are obtained. PMID:26125050
Classification of underground pipe scanned images using feature extraction and neuro-fuzzy algorithm.

Science.gov (United States)

Sinha, S K; Karray, F

2002-01-01

Pipeline surface defects such as holes and cracks cause major problems for utility managers, particularly when the pipeline is buried under the ground. Manual inspection for surface defects in the pipeline has a number of drawbacks, including subjectivity, varying standards, and high costs. Automatic inspection system using image processing and artificial intelligence techniques can overcome many of these disadvantages and offer utility managers an opportunity to significantly improve quality and reduce costs. A recognition and classification of pipe cracks using images analysis and neuro-fuzzy algorithm is proposed. In the preprocessing step the scanned images of pipe are analyzed and crack features are extracted. In the classification step the neuro-fuzzy algorithm is developed that employs a fuzzy membership function and error backpropagation algorithm. The idea behind the proposed approach is that the fuzzy membership function will absorb variation of feature values and the backpropagation network, with its learning ability, will show good classification efficiency.
An Application of Fuzzy Inference System by Clustering Subtractive Fuzzy Method for Estimating of Product Requirement

Directory of Open Access Journals (Sweden)

Fajar Ibnu Tufeil

2009-06-01

Full Text Available Model fuzzy memiliki kemampuan untuk menjelaskan secara linguistik suatu sistem yang terlalu kompleks. Aturan-aturan dalam model fuzzy pada umumnya dibangun berdasarkan keahlian manusia dan pengetahuan heuristik dari sistem yang dimodelkan. Teknik ini selanjutnya dikembangkan menjadi teknik yang dapat mengidentifikasi aturan-aturan dari suatu basis data yang telah dikelompokkan berdasarkan persamaan strukturnya. Dalam hal ini metode pengelompokan fuzzy berfungsi untuk mencari kelompok-kelompok data. Informasi yang dihasilkan dari metode pengelompokan ini, yaitu informasi tentang pusat kelompok, digunakan untuk membentuk aturan-aturan dalam sistem penalaran fuzzy. Dalam skripsi ini dibahas mengenai penerapan fuzzy infereance system dengan metode pengelompokan fuzzy subtractive clustering, yaitu untuk membentuk sistem penalaran fuzzy dengan menggunakan model fuzzy Takagi-Sugeno orde satu. Selanjutnya, metode pengelompokan fuzzy subtractive clustering diterapkan dalam memodelkan masalah dibidang pemasaran, yaitu untuk memprediksi permintaan pasar terhadap suatu produk susu. Aplikasi ini dibangun menggunakan Borland Delphi 6.0. Dari hasil pengujian diperoleh tingkat error prediksi terkecil yaitu dengan Error Average 0.08%.
Neuro-Fuzzy Wavelet Based Adaptive MPPT Algorithm for Photovoltaic Systems

Directory of Open Access Journals (Sweden)

Syed Zulqadar Hassan

2017-03-01

Full Text Available An intelligent control of photovoltaics is necessary to ensure fast response and high efficiency under different weather conditions. This is often arduous to accomplish using traditional linear controllers, as photovoltaic systems are nonlinear and contain several uncertainties. Based on the analysis of the existing literature of Maximum Power Point Tracking (MPPT techniques, a high performance neuro-fuzzy indirect wavelet-based adaptive MPPT control is developed in this work. The proposed controller combines the reasoning capability of fuzzy logic, the learning capability of neural networks and the localization properties of wavelets. In the proposed system, the Hermite Wavelet-embedded Neural Fuzzy (HWNF-based gradient estimator is adopted to estimate the gradient term and makes the controller indirect. The performance of the proposed controller is compared with different conventional and intelligent MPPT control techniques. MATLAB results show the superiority over other existing techniques in terms of fast response, power quality and efficiency.
A hybrid fuzzy logic and extreme learning machine for improving efficiency of circulating water systems in power generation plant

Science.gov (United States)

Aziz, Nur Liyana Afiqah Abdul; Siah Yap, Keem; Afif Bunyamin, Muhammad

2013-06-01

This paper presents a new approach of the fault detection for improving efficiency of circulating water system (CWS) in a power generation plant using a hybrid Fuzzy Logic System (FLS) and Extreme Learning Machine (ELM) neural network. The FLS is a mathematical tool for calculating the uncertainties where precision and significance are applied in the real world. It is based on natural language which has the ability of "computing the word". The ELM is an extremely fast learning algorithm for neural network that can completed the training cycle in a very short time. By combining the FLS and ELM, new hybrid model, i.e., FLS-ELM is developed. The applicability of this proposed hybrid model is validated in fault detection in CWS which may help to improve overall efficiency of power generation plant, hence, consuming less natural recourses and producing less pollutions.
A hybrid fuzzy logic and extreme learning machine for improving efficiency of circulating water systems in power generation plant

International Nuclear Information System (INIS)

Aziz, Nur Liyana Afiqah Abdul; Yap, Keem Siah; Bunyamin, Muhammad Afif

2013-01-01

This paper presents a new approach of the fault detection for improving efficiency of circulating water system (CWS) in a power generation plant using a hybrid Fuzzy Logic System (FLS) and Extreme Learning Machine (ELM) neural network. The FLS is a mathematical tool for calculating the uncertainties where precision and significance are applied in the real world. It is based on natural language which has the ability of c omputing the word . The ELM is an extremely fast learning algorithm for neural network that can completed the training cycle in a very short time. By combining the FLS and ELM, new hybrid model, i.e., FLS-ELM is developed. The applicability of this proposed hybrid model is validated in fault detection in CWS which may help to improve overall efficiency of power generation plant, hence, consuming less natural recourses and producing less pollutions.
Localized and Energy-Efficient Topology Control in Wireless Sensor Networks Using Fuzzy-Logic Control Approaches

Directory of Open Access Journals (Sweden)

Yuanjiang Huang

2014-01-01

Full Text Available The sensor nodes in the Wireless Sensor Networks (WSNs are prone to failures due to many reasons, for example, running out of battery or harsh environment deployment; therefore, the WSNs are expected to be able to maintain network connectivity and tolerate certain amount of node failures. By applying fuzzy-logic approach to control the network topology, this paper aims at improving the network connectivity and fault-tolerant capability in response to node failures, while taking into account that the control approach has to be localized and energy efficient. Two fuzzy controllers are proposed in this paper: one is Learning-based Fuzzy-logic Topology Control (LFTC, of which the fuzzy controller is learnt from a training data set; another one is Rules-based Fuzzy-logic Topology Control (RFTC, of which the fuzzy controller is obtained through designing if-then rules and membership functions. Both LFTC and RFTC do not rely on location information, and they are localized. Comparing them with other three representative algorithms (LTRT, List-based, and NONE through extensive simulations, our two proposed fuzzy controllers have been proved to be very energy efficient to achieve desired node degree and improve the network connectivity when sensor nodes run out of battery or are subject to random attacks.
Fuzzy control and identification

CERN Document Server

Lilly, John H

2010-01-01

This book gives an introduction to basic fuzzy logic and Mamdani and Takagi-Sugeno fuzzy systems. The text shows how these can be used to control complex nonlinear engineering systems, while also also suggesting several approaches to modeling of complex engineering systems with unknown models. Finally, fuzzy modeling and control methods are combined in the book, to create adaptive fuzzy controllers, ending with an example of an obstacle-avoidance controller for an autonomous vehicle using modus ponendo tollens logic.
Fuzzy pharmacology: theory and applications.

Science.gov (United States)

Sproule, Beth A; Naranjo, Claudio A; Türksen, I Burhan

2002-09-01

Fuzzy pharmacology is a term coined to represent the application of fuzzy logic and fuzzy set theory to pharmacological problems. Fuzzy logic is the science of reasoning, thinking and inference that recognizes and uses the real world phenomenon that everything is a matter of degree. It is an extension of binary logic that is able to deal with complex systems because it does not require crisp definitions and distinctions for the system components. In pharmacology, fuzzy modeling has been used for the mechanical control of drug delivery in surgical settings, and work has begun evaluating its use in other pharmacokinetic and pharmacodynamic applications. Fuzzy pharmacology is an emerging field that, based on these initial explorations, warrants further investigation.
Fuzzy data analysis

CERN Document Server

Bandemer, Hans

1992-01-01

Fuzzy data such as marks, scores, verbal evaluations, imprecise observations, experts' opinions and grey tone pictures, are quite common. In Fuzzy Data Analysis the authors collect their recent results providing the reader with ideas, approaches and methods for processing such data when looking for sub-structures in knowledge bases for an evaluation of functional relationship, e.g. in order to specify diagnostic or control systems. The modelling presented uses ideas from fuzzy set theory and the suggested methods solve problems usually tackled by data analysis if the data are real numbers. Fuzzy Data Analysis is self-contained and is addressed to mathematicians oriented towards applications and to practitioners in any field of application who have some background in mathematics and statistics.
Implementation of a fuzzy logic/neural network multivariable controller

International Nuclear Information System (INIS)

Cordes, G.A.; Clark, D.E.; Johnson, J.A.; Smartt, H.B.; Wickham, K.L.; Larson, T.K.

1992-01-01

This paper describes a multivariable controller developed at the Idaho National Engineering Laboratory (INEL) that incorporates both fuzzy logic rules and a neural network. The controller was implemented in a laboratory demonstration and was robust, producing smooth temperature and water level response curves with short time constants. In the future, intelligent control systems will be a necessity for optimal operation of autonomous reactor systems located on earth or in space. Even today, there is a need for control systems that adapt to the changing environment and process. Hybrid intelligent control systems promise to provide this adaptive capability. Fuzzy logic implements our imprecise, qualitative human reasoning. The values of system variables (controller inputs) and control variables (controller outputs) are described in linguistic terms and subdivided into fully overlapping value ranges. The fuzzy rule base describes how combinations of input parameter ranges determine the output control values. Neural networks implement our human learning. In this controller, neural networks were embedded in the software to explore their potential for adding adaptability
A novel Neuro-fuzzy classification technique for data mining

Directory of Open Access Journals (Sweden)

Soumadip Ghosh

2014-11-01

Full Text Available In our study, we proposed a novel Neuro-fuzzy classification technique for data mining. The inputs to the Neuro-fuzzy classification system were fuzzified by applying generalized bell-shaped membership function. The proposed method utilized a fuzzification matrix in which the input patterns were associated with a degree of membership to different classes. Based on the value of degree of membership a pattern would be attributed to a specific category or class. We applied our method to ten benchmark data sets from the UCI machine learning repository for classification. Our objective was to analyze the proposed method and, therefore compare its performance with two powerful supervised classification algorithms Radial Basis Function Neural Network (RBFNN and Adaptive Neuro-fuzzy Inference System (ANFIS. We assessed the performance of these classification methods in terms of different performance measures such as accuracy, root-mean-square error, kappa statistic, true positive rate, false positive rate, precision, recall, and f-measure. In every aspect the proposed method proved to be superior to RBFNN and ANFIS algorithms.
Neural and Fuzzy Adaptive Control of Induction Motor Drives

International Nuclear Information System (INIS)

Bensalem, Y.; Sbita, L.; Abdelkrim, M. N.

2008-01-01

This paper proposes an adaptive neural network speed control scheme for an induction motor (IM) drive. The proposed scheme consists of an adaptive neural network identifier (ANNI) and an adaptive neural network controller (ANNC). For learning the quoted neural networks, a back propagation algorithm was used to automatically adjust the weights of the ANNI and ANNC in order to minimize the performance functions. Here, the ANNI can quickly estimate the plant parameters and the ANNC is used to provide on-line identification of the command and to produce a control force, such that the motor speed can accurately track the reference command. By combining artificial neural network techniques with fuzzy logic concept, a neural and fuzzy adaptive control scheme is developed. Fuzzy logic was used for the adaptation of the neural controller to improve the robustness of the generated command. The developed method is robust to load torque disturbance and the speed target variations when it ensures precise trajectory tracking with the prescribed dynamics. The algorithm was verified by simulation and the results obtained demonstrate the effectiveness of the IM designed controller
Compound Option Pricing under Fuzzy Environment

Directory of Open Access Journals (Sweden)

Xiandong Wang

2014-01-01

Full Text Available Considering the uncertainty of a financial market includes two aspects: risk and vagueness; in this paper, fuzzy sets theory is applied to model the imprecise input parameters (interest rate and volatility. We present the fuzzy price of compound option by fuzzing the interest and volatility in Geske’s compound option pricing formula. For each α, the α-level set of fuzzy prices is obtained according to the fuzzy arithmetics and the definition of fuzzy-valued function. We apply a defuzzification method based on crisp possibilistic mean values of the fuzzy interest rate and fuzzy volatility to obtain the crisp possibilistic mean value of compound option price. Finally, we present a numerical analysis to illustrate the compound option pricing under fuzzy environment.

Fuzzy AutoEncode Based Cloud Detection for Remote Sensing Imagery

Directory of Open Access Journals (Sweden)

Zhenfeng Shao

2017-03-01

Full Text Available Cloud detection of remote sensing imagery is quite challenging due to the influence of complicated underlying surfaces and the variety of cloud types. Currently, most of the methods mainly rely on prior knowledge to extract features artificially for cloud detection. However, these features may not be able to accurately represent the cloud characteristics under complex environment. In this paper, we adopt an innovative model named Fuzzy Autoencode Model (FAEM to integrate the feature learning ability of stacked autoencode networks and the detection ability of fuzzy function for highly accurate cloud detection on remote sensing imagery. Our proposed method begins by selecting and fusing spectral, texture, and structure information. Thereafter, the proposed technique established a FAEM to learn the deep discriminative features from a great deal of selected information. Finally, the learned features are mapped to the corresponding cloud density map with a fuzzy function. To demonstrate the effectiveness of the proposed method, 172 Landsat ETM+ images and 25 GF-1 images with different spatial resolutions are used in this paper. For the convenience of accuracy assessment, ground truth data are manually outlined. Results show that the average RER (ratio of right rate and error rate on Landsat images is greater than 29, while the average RER of Support Vector Machine (SVM is 21.8 and Random Forest (RF is 23. The results on GF-1 images exhibit similar performance as Landsat images with the average RER of 25.9, which is much higher than the results of SVM and RF. Compared to traditional methods, our technique has attained higher average cloud detection accuracy for either different spatial resolutions or various land surfaces.
Fuzzy measures and integrals

Czech Academy of Sciences Publication Activity Database

Mesiar, Radko

2005-01-01

Roč. 28, č. 156 (2005), s. 365-370 ISSN 0165-0114 R&D Projects: GA ČR(CZ) GA402/04/1026 Institutional research plan: CEZ:AV0Z10750506 Keywords : fuzzy measures * fuzzy integral * regular fuzzy integral Subject RIV: BA - General Mathematics Impact factor: 1.039, year: 2005
Study on application of adaptive fuzzy control and neural network in the automatic leveling system

Science.gov (United States)

Xu, Xiping; Zhao, Zizhao; Lan, Weiyong; Sha, Lei; Qian, Cheng

2015-04-01

This paper discusses the adaptive fuzzy control and neural network BP algorithm in large flat automatic leveling control system application. The purpose is to develop a measurement system with a flat quick leveling, Make the installation on the leveling system of measurement with tablet, to be able to achieve a level in precision measurement work quickly, improve the efficiency of the precision measurement. This paper focuses on the automatic leveling system analysis based on fuzzy controller, Use of the method of combining fuzzy controller and BP neural network, using BP algorithm improve the experience rules .Construct an adaptive fuzzy control system. Meanwhile the learning rate of the BP algorithm has also been run-rate adjusted to accelerate convergence. The simulation results show that the proposed control method can effectively improve the leveling precision of automatic leveling system and shorten the time of leveling.
Pemodelan Sistem Fuzzy Dengan Menggunakan Matlab

Directory of Open Access Journals (Sweden)

Afan Galih Salman

2010-12-01

Full Text Available Fuzzy logic is a method in soft computing category, a method that could process uncertain, inaccurate, and less cost implemented data. Some methods in soft computing category besides fuzzy logic are artificial network nerve, probabilistic reasoning, and evolutionary computing. Fuzzy logic has the ability to develop fuzzy system that is intelligent system in uncertain environment. Some stages in fuzzy system formation process is input and output analysis, determining input and output variable, defining each fuzzy set member function, determining rules based on experience or knowledge of an expert in his field, and implementing fuzzy system. Overall, fuzzy logic uses simple mathematical concept, understandable, detectable uncertain and accurate data. Fuzzy system could create and apply expert experiences directly without exercise process and effort to decode the knowledge into a computer until becoming a modeling system that could be relied on decision making.
Integrating Fuzzy AHP and Fuzzy ARAS for evaluating financial performance

Directory of Open Access Journals (Sweden)

Abdolhamid Safaei Ghadikolaei

2014-09-01

Full Text Available Multi Criteria Decision Making (MCDM is an advanced field of Operation Research; recently MCDM methods are efficient and common tools for performance evaluation in many areas such as finance and economy. The aim of this study is to show one of applications of mathematics in real word. This study with considering value based measures and accounting based measures simultaneously, provided a hybrid approach of MCDM methods in fuzzy environment for financial performance evaluation of automotive and parts manufacturing industry of Tehran stock exchange (TSE.for this purpose Fuzzy analytic hierarchy process (FAHP is applied to determine the relative important of each criterion, then The companies are ranked according their financial performance by using fuzzy additive ratio assessment (Fuzzy ARAS method. The finding of this study showed effective of this approach in evaluating financial performance.
Predictive Modeling of Mechanical Properties of Welded Joints Based on Dynamic Fuzzy RBF Neural Network

Directory of Open Access Journals (Sweden)

ZHANG Yongzhi

2016-10-01

Full Text Available A dynamic fuzzy RBF neural network model was built to predict the mechanical properties of welded joints, and the purpose of the model was to overcome the shortcomings of static neural networks including structural identification, dynamic sample training and learning algorithm. The structure and parameters of the model are no longer head of default, dynamic adaptive adjustment in the training, suitable for dynamic sample data for learning, learning algorithm introduces hierarchical learning and fuzzy rule pruning strategy, to accelerate the training speed of model and make the model more compact. Simulation of the model was carried out by using three kinds of thickness and different process TC4 titanium alloy TIG welding test data. The results show that the model has higher prediction accuracy, which is suitable for predicting the mechanical properties of welded joints, and has opened up a new way for the on-line control of the welding process.
Reinforcement learning for a biped robot based on a CPG-actor-critic method.

Science.gov (United States)

Nakamura, Yutaka; Mori, Takeshi; Sato, Masa-aki; Ishii, Shin

2007-08-01

Animals' rhythmic movements, such as locomotion, are considered to be controlled by neural circuits called central pattern generators (CPGs), which generate oscillatory signals. Motivated by this biological mechanism, studies have been conducted on the rhythmic movements controlled by CPG. As an autonomous learning framework for a CPG controller, we propose in this article a reinforcement learning method we call the "CPG-actor-critic" method. This method introduces a new architecture to the actor, and its training is roughly based on a stochastic policy gradient algorithm presented recently. We apply this method to an automatic acquisition problem of control for a biped robot. Computer simulations show that training of the CPG can be successfully performed by our method, thus allowing the biped robot to not only walk stably but also adapt to environmental changes.
Landslide susceptibility mapping using a neuro-fuzzy

Science.gov (United States)

Lee, S.; Choi, J.; Oh, H.

2009-12-01

This paper develops and applied an adaptive neuro-fuzzy inference system (ANFIS) based on a geographic information system (GIS) environment using landslide-related factors and location for landslide susceptibility mapping. A neuro-fuzzy system is based on a fuzzy system that is trained by a learning algorithm derived from the neural network theory. The learning procedure operates on local information, and causes only local modifications in the underlying fuzzy system. The study area, Boun, suffered much damage following heavy rain in 1998 and was selected as a suitable site for the evaluation of the frequency and distribution of landslides. Boun is located in the central part of Korea. Landslide-related factors such as slope, soil texture, wood type, lithology, and density of lineament were extracted from topographic, soil, forest, and lineament maps. Landslide locations were identified from interpretation of aerial photographs and field surveys. Landslide-susceptible areas were analyzed by the ANFIS method and mapped using occurrence factors. In particular, we applied various membership functions (MFs) and analysis results were verified using the landslide location data. The predictive maps using triangular, trapezoidal, and polynomial MFs were the best individual MFs for modeling landslide susceptibility maps (84.96% accuracy), proving that ANFIS could be very effective in modeling landslide susceptibility mapping. Various MFs were used in this study, and after verification, the difference in accuracy according to the MFs was small, between 84.81% and 84.96%. The difference was just 0.15% and therefore the choice of MFs was not important in the study. Also, compared with the likelihood ratio model, which showed 84.94%, the accuracy was similar. Thus, the ANFIS could be applied to other study areas with different data and other study methods such as cross-validation. The developed ANFIS learns the if-then rules between landslide-related factors and landslide
Fuzzy Graph Language Recognizability

OpenAIRE

Kalampakas , Antonios; Spartalis , Stefanos; Iliadis , Lazaros

2012-01-01

Part 5: Fuzzy Logic; International audience; Fuzzy graph language recognizability is introduced along the lines of the established theory of syntactic graph language recognizability by virtue of the algebraic structure of magmoids. The main closure properties of the corresponding class are investigated and several interesting examples of fuzzy graph languages are examined.
Visual reinforcement shapes eye movements in visual search.

Science.gov (United States)

Paeye, Céline; Schütz, Alexander C; Gegenfurtner, Karl R

2016-08-01

We use eye movements to gain information about our visual environment; this information can indirectly be used to affect the environment. Whereas eye movements are affected by explicit rewards such as points or money, it is not clear whether the information gained by finding a hidden target has a similar reward value. Here we tested whether finding a visual target can reinforce eye movements in visual search performed in a noise background, which conforms to natural scene statistics and contains a large number of possible target locations. First we tested whether presenting the target more often in one specific quadrant would modify eye movement search behavior. Surprisingly, participants did not learn to search for the target more often in high probability areas. Presumably, participants could not learn the reward structure of the environment. In two subsequent experiments we used a gaze-contingent display to gain full control over the reinforcement schedule. The target was presented more often after saccades into a specific quadrant or a specific direction. The proportions of saccades meeting the reinforcement criteria increased considerably, and participants matched their search behavior to the relative reinforcement rates of targets. Reinforcement learning seems to serve as the mechanism to optimize search behavior with respect to the statistics of the task.
Reinforcement learning solution for HJB equation arising in constrained optimal control problem.

Science.gov (United States)

Luo, Biao; Wu, Huai-Ning; Huang, Tingwen; Liu, Derong

2015-11-01

The constrained optimal control problem depends on the solution of the complicated Hamilton-Jacobi-Bellman equation (HJBE). In this paper, a data-based off-policy reinforcement learning (RL) method is proposed, which learns the solution of the HJBE and the optimal control policy from real system data. One important feature of the off-policy RL is that its policy evaluation can be realized with data generated by other behavior policies, not necessarily the target policy, which solves the insufficient exploration problem. The convergence of the off-policy RL is proved by demonstrating its equivalence to the successive approximation approach. Its implementation procedure is based on the actor-critic neural networks structure, where the function approximation is conducted with linearly independent basis functions. Subsequently, the convergence of the implementation procedure with function approximation is also proved. Finally, its effectiveness is verified through computer simulations. Copyright © 2015 Elsevier Ltd. All rights reserved.
Safety critical application of fuzzy control

International Nuclear Information System (INIS)

Schildt, G.H.

1995-01-01

After an introduction into safety terms a short description of fuzzy logic will be given. Especially, for safety critical applications of fuzzy controllers a possible controller structure will be described. The following items will be discussed: Configuration of fuzzy controllers, design aspects like fuzzfiication, inference strategies, defuzzification and types of membership functions. As an example a typical fuzzy rule set will be presented. Especially, real-time behaviour a fuzzy controllers is mentioned. An example of fuzzy controlling for temperature control purpose within a nuclear reactor together with membership functions and inference strategy of such a fuzzy controller will be presented. (author). 4 refs, 17 figs
Radiation protection and fuzzy set theory

International Nuclear Information System (INIS)

Nishiwaki, Y.

1993-01-01

In radiation protection we encounter a variety of sources of uncertainties which are due to fuzziness in our cognition or perception of objects. For systematic treatment of this type of uncertainty, the concepts of fuzzy sets or fuzzy measures could be applied to construct system models, which may take into consideration both subjective or intrinsic fuzziness and objective or extrinsic fuzziness. The theory of fuzzy sets and fuzzy measures is still in a developing stage, but its concept may be applied to various problems of subjective perception of risk, nuclear safety, radiation protection and also to the problems of man-machine interface and human factor engineering or ergonomic
A new approach to self-organizing fuzzy polynomial neural networks guided by genetic optimization

International Nuclear Information System (INIS)

Oh, Sung-Kwun; Pedrycz, Witold

2005-01-01

In this study, we introduce a new topology of Fuzzy Polynomial Neural Networks (FPNN) that is based on a genetically optimized multilayer perceptron with fuzzy polynomial neurons (FPNs) and discuss its comprehensive design methodology. The underlying methodology involves mechanisms of genetic optimization, especially genetic algorithms (GAs). Let us recall that the design of the 'conventional' FPNNs uses an extended Group Method of Data Handling (GMDH) and exploits a fixed fuzzy inference type located at each FPN of the FPNN as well as considers a fixed number of input nodes at FPNs (or nodes) located in each layer. The proposed FPNN gives rise to a structurally optimized structure and comes with a substantial level of flexibility in comparison to the one we encounter in conventional FPNNs. The structural optimization is realized via GAs whereas in the case of the parametric optimization we proceed with a standard least square method based learning. Through the consecutive process of such structural and parametric optimization, an optimized and flexible fuzzy neural network is generated in a dynamic fashion. The performance of the proposed gFPNN is quantified through experimentation that exploits standard data already being used in fuzzy modeling. The results reveal superiority of the proposed networks over the existing fuzzy and neural models
Fuzziness and fuzzy modelling in Bulgaria's energy policy decision-making dilemma

International Nuclear Information System (INIS)

Wang Xingquan

2006-01-01

The decision complexity resulting from imprecision in decision variables and parameters, a major difficulty for conventional decision analysis methods, can be relevantly analysed and modelled by fuzzy logic. Bulgaria's nuclear policy decision-making process implicates such complexity of imprecise nature: stakeholders, criteria, measurement, etc. Given the suitable applicability of fuzzy logic in this case, this article tries to offer a concrete fuzzy paradigm including delimitation of decision space, quantification of imprecise variables, and, of course, parameterisation. (author)
Vicarious Reinforcement In Rhesus Macaques (Macaca mulatta

Directory of Open Access Journals (Sweden)

Steve W. C. Chang

2011-03-01

Full Text Available What happens to others profoundly influences our own behavior. Such other-regarding outcomes can drive observational learning, as well as motivate cooperation, charity, empathy, and even spite. Vicarious reinforcement may serve as one of the critical mechanisms mediating the influence of other-regarding outcomes on behavior and decision-making in groups. Here we show that rhesus macaques spontaneously derive vicarious reinforcement from observing rewards given to another monkey, and that this reinforcement can motivate them to subsequently deliver or withhold rewards from the other animal. We exploited Pavlovian and instrumental conditioning to associate rewards to self (M1 and/or rewards to another monkey (M2 with visual cues. M1s made more errors in the instrumental trials when cues predicted reward to M2 compared to when cues predicted reward to M1, but made even more errors when cues predicted reward to no one. In subsequent preference tests between pairs of conditioned cues, M1s preferred cues paired with reward to M2 over cues paired with reward to no one. By contrast, M1s preferred cues paired with reward to self over cues paired with reward to both monkeys simultaneously. Rates of attention to M2 strongly predicted the strength and valence of vicarious reinforcement. These patterns of behavior, which were absent in nonsocial control trials, are consistent with vicarious reinforcement based upon sensitivity to observed, or counterfactual, outcomes with respect to another individual. Vicarious reward may play a critical role in shaping cooperation and competition, as well as motivating observational learning and group coordination in rhesus macaques, much as it does in humans. We propose that vicarious reinforcement signals mediate these behaviors via homologous neural circuits involved in reinforcement learning and decision-making.
Vicarious reinforcement in rhesus macaques (macaca mulatta).

Science.gov (United States)

Chang, Steve W C; Winecoff, Amy A; Platt, Michael L

2011-01-01

What happens to others profoundly influences our own behavior. Such other-regarding outcomes can drive observational learning, as well as motivate cooperation, charity, empathy, and even spite. Vicarious reinforcement may serve as one of the critical mechanisms mediating the influence of other-regarding outcomes on behavior and decision-making in groups. Here we show that rhesus macaques spontaneously derive vicarious reinforcement from observing rewards given to another monkey, and that this reinforcement can motivate them to subsequently deliver or withhold rewards from the other animal. We exploited Pavlovian and instrumental conditioning to associate rewards to self (M1) and/or rewards to another monkey (M2) with visual cues. M1s made more errors in the instrumental trials when cues predicted reward to M2 compared to when cues predicted reward to M1, but made even more errors when cues predicted reward to no one. In subsequent preference tests between pairs of conditioned cues, M1s preferred cues paired with reward to M2 over cues paired with reward to no one. By contrast, M1s preferred cues paired with reward to self over cues paired with reward to both monkeys simultaneously. Rates of attention to M2 strongly predicted the strength and valence of vicarious reinforcement. These patterns of behavior, which were absent in non-social control trials, are consistent with vicarious reinforcement based upon sensitivity to observed, or counterfactual, outcomes with respect to another individual. Vicarious reward may play a critical role in shaping cooperation and competition, as well as motivating observational learning and group coordination in rhesus macaques, much as it does in humans. We propose that vicarious reinforcement signals mediate these behaviors via homologous neural circuits involved in reinforcement learning and decision-making.
A Reinforcement Learning Framework for Spiking Networks with Dynamic Synapses

Directory of Open Access Journals (Sweden)

Karim El-Laithy

2011-01-01

Full Text Available An integration of both the Hebbian-based and reinforcement learning (RL rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.
Neural Fuzzy Inference System-Based Weather Prediction Model and Its Precipitation Predicting Experiment

Directory of Open Access Journals (Sweden)

Jing Lu

2014-11-01

Full Text Available We propose a weather prediction model in this article based on neural network and fuzzy inference system (NFIS-WPM, and then apply it to predict daily fuzzy precipitation given meteorological premises for testing. The model consists of two parts: the first part is the “fuzzy rule-based neural network”, which simulates sequential relations among fuzzy sets using artificial neural network; and the second part is the “neural fuzzy inference system”, which is based on the first part, but could learn new fuzzy rules from the previous ones according to the algorithm we proposed. NFIS-WPM (High Pro and NFIS-WPM (Ave are improved versions of this model. It is well known that the need for accurate weather prediction is apparent when considering the benefits. However, the excessive pursuit of accuracy in weather prediction makes some of the “accurate” prediction results meaningless and the numerical prediction model is often complex and time-consuming. By adapting this novel model to a precipitation prediction problem, we make the predicted outcomes of precipitation more accurate and the prediction methods simpler than by using the complex numerical forecasting model that would occupy large computation resources, be time-consuming and which has a low predictive accuracy rate. Accordingly, we achieve more accurate predictive precipitation results than by using traditional artificial neural networks that have low predictive accuracy.
Fuzzy combination of fuzzy and switching state-feedback controllers for nonlinear systems subject to parameter uncertainties.

Science.gov (United States)

Lam, H K; Leung, Frank H F

2005-04-01

This paper presents a fuzzy controller, which involves a fuzzy combination of local fuzzy and global switching state-feedback controllers, for nonlinear systems subject to parameter uncertainties with known bounds. The nonlinear system is represented by a fuzzy combined Takagi-Sugeno-Kang model, which is a fuzzy combination of the global and local fuzzy plant models. By combining the local fuzzy and global switching state-feedback controllers using fuzzy logic techniques, the advantages of both controllers can be retained and the undesirable chattering effect introduced by the global switching state-feedback controller can be eliminated. The steady-state error introduced by the global switching state-feedback controller when a saturation function is used can also be removed. Stability conditions, which are related to the system matrices of the local and global closed-loop systems, are derived to guarantee the closed-loop system stability. An application example will be given to demonstrate the merits of the proposed approach.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.