WorldWideScience

Sample records for fault tolerant systems

  1. Fault tolerant computing systems

    Fault tolerance involves the provision of strategies for error detection damage assessment, fault treatment and error recovery. A survey is given of the different sorts of strategies used in highly reliable computing systems, together with an outline of recent research on the problems of providing fault tolerance in parallel and distributed computing systems. (orig.)

  2. Fault Tolerant Control Systems

    Bøgh, S.A.

    from this study highlights requirements for a dedicated software environment for fault tolerant control systems design. The second detailed study addressed the detection of a fault event and determination of the failed component. A variety of algorithms were compared, based on two fault scenarios in...... faults, but also that the research field still misses a systematic approach to handle realistic problems such as low sampling rate and nonlinear characteristics of the system. The thesis contributed with methods to detect both faults and specifically with a novel algorithm for the actuator fault...... detection that is superior in terms of performance and complexity to the other algorithms in the comparative study....

  3. Soft Computing Approaches To Fault Tolerant Systems

    Neeraj Prakash Srivastava

    2014-05-01

    Full Text Available We present in this paper as an introduction to soft computing techniques for fault tolerant systems and the terminology with different ways of achieving fault tolerance. The paper focuses on the problem of fault tolerance using soft computing techniques. The fundamentals of soft computing approaches and its type with introduction of fault tolerance are discussed. The main objective is to show how to implement soft computing approaches for fault detection, isolation and identification. The paper contains details about soft computing application with an application of wireless sensor network as fault tolerant system.

  4. Synthesis of Fault-Tolerant Embedded Systems

    Eles, Petru; Izosimov, Viacheslav; Pop, Paul; Peng, Zebo

    This work addresses the issue of design optimization for fault- tolerant hard real-time systems. In particular, our focus is on the handling of transient faults using both checkpointing with rollback recovery and active replication. Fault tolerant schedules are generated based on a conditional...... process graph representation. The formulated system synthesis approaches decide the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors, such that multiple transient faults are tolerated, transparency requirements are...

  5. Software fault tolerance in computer operating systems

    Iyer, Ravishankar K.; Lee, Inhwan

    1994-01-01

    This chapter provides data and analysis of the dependability and fault tolerance for three operating systems: the Tandem/GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Based on measurements from these systems, basic software error characteristics are investigated. Fault tolerance in operating systems resulting from the use of process pairs and recovery routines is evaluated. Two levels of models are developed to analyze error and recovery processes inside an operating system and interactions among multiple instances of an operating system running in a distributed environment. The measurements show that the use of process pairs in Tandem systems, which was originally intended for tolerating hardware faults, allows the system to tolerate about 70% of defects in system software that result in processor failures. The loose coupling between processors which results in the backup execution (the processor state and the sequence of events occurring) being different from the original execution is a major reason for the measured software fault tolerance. The IBM/MVS system fault tolerance almost doubles when recovery routines are provided, in comparison to the case in which no recovery routines are available. However, even when recovery routines are provided, there is almost a 50% chance of system failure when critical system jobs are involved.

  6. Energy-efficient fault-tolerant systems

    Mathew, Jimson; Pradhan, Dhiraj K

    2013-01-01

    This book describes the state-of-the-art in energy efficient, fault-tolerant embedded systems. It covers the entire product lifecycle of electronic systems design, analysis and testing and includes discussion of both circuit and system-level approaches. Readers will be enabled to meet the conflicting design objectives of energy efficiency and fault-tolerance for reliability, given the up-to-date techniques presented.

  7. Fault Tolerant Quantum Filtering and Fault Detection for Quantum Systems

    Gao, Qing; Dong, Daoyi; Petersen, Ian R.

    2015-01-01

    This paper aims to determine the fault tolerant quantum filter and fault detection equation for a class of open quantum systems coupled to a laser field that is subject to stochastic faults. In order to analyze this class of open quantum systems, we propose a quantum-classical Bayesian inference method based on the definition of a so-called quantum-classical conditional expectation. It is shown that the proposed Bayesian inference approach provides a convenient tool to simultaneously derive t...

  8. Middleware Fault Tolerance Support for the BOSS Embedded Operating System

    Afonso, Francisco; Carlos A. Silva; Montenegro, Sérgio; Tavares, Adriano

    2006-01-01

    Critical embedded systems need a dependable operating system and application. Despite all efforts to prevent and remove faults in system development, residual software faults usually persist. Therefore, critical systems need some sort of fault tolerance to deal with these faults and also with hardware faults at operation time. This work proposes fault-tolerant support mechanisms for the BOSS embedded operating system, based on the application of proven fault tolerance strategies by middlew...

  9. Fault-tolerant parallel processing system

    Harper, R.E.; Lala, J.H.

    1990-03-06

    This patent describes a fault tolerant processing system for providing processing operations, while tolerating f failures in the execution thereof. It comprises: at least (3f + 1) fault containment regions. Each of the regions includes a plurality of processors; network means connected to the processors and to the network means of the others of the fault containment regions; groups of one or more processors being configured to form redundant processing sites at least one of the groups having (2f + 1) processors, each of the processors of a group being included in a different one of the fault containment regions. Each network means of a fault containment region includes means for providing communication operations between the network means and the network means of the others of the fault containment regions, each of the network means being connected to each other network means by at lest (2f + 1) disjoint communication paths, a minimum of (f + 1) rounds of communication being provided among the network means of the fault containment regions in the execution of a the processing operation; and means for synchronizing the communication operations of the network means with the communications operations of the network means of the other fault containment regions.

  10. Software engineering of fault tolerant systems

    Pelliccione, P; Muccini, Henry

    2007-01-01

    In architecting dependable systems, what is required to improve the overall system robustness is fault tolerance. Many methods have been proposed to this end, the solutions are usually considered late during the design and implementation phases of the software life-cycle (e.g., Java and Windows NT exception handling), thus reducing the effectiveness error and fault handling. Since the system design typically models only normal behaviour of the system while ignoring exceptional ones, the implementation of the system is unable to handle abnormal events. Consequently, the system may fail in unexp

  11. Fault tolerant architecture for artificial olfactory system

    In this paper, to cover and mask the faults that occur in the sensing unit of an artificial olfactory system, a novel architecture is offered. The proposed architecture is able to tolerate failures in the sensors of the array and the faults that occur are masked. The proposed architecture for extracting the correct results from the output of the sensors can provide the quality of service for generated data from the sensor array. The results of various evaluations and analysis proved that the proposed architecture has acceptable performance in comparison with the classic form of the sensor array in gas identification. According to the results, achieving a high odor discrimination based on the suggested architecture is possible. (paper)

  12. Method and system for environmentally adaptive fault tolerant computing

    Copenhaver, Jason L. (Inventor); Jeremy, Ramos (Inventor); Wolfe, Jeffrey M. (Inventor); Brenner, Dean (Inventor)

    2010-01-01

    A method and system for adapting fault tolerant computing. The method includes the steps of measuring an environmental condition representative of an environment. An on-board processing system's sensitivity to the measured environmental condition is measured. It is determined whether to reconfigure a fault tolerance of the on-board processing system based in part on the measured environmental condition. The fault tolerance of the on-board processing system may be reconfigured based in part on the measured environmental condition.

  13. Fault tolerant aggregation for power system services

    Kosek, Anna Magdalena; Gehrke, Oliver; Kullmann, Daniel

    2013-01-01

    Exploiting the flexibility in distributed energy resources (DER) is seen as an important contribution to allow high penetrations of renewable generation in electrical power systems. However, the present control infrastructure in power systems is not well suited for the integration of a very large...... number of small units. A common approach is to aggregate a portfolio of such units together and expose them to the power system as a single large virtual unit. In order to realize the vision of a Smart Grid, concepts for flexible, resilient and reliable aggregation infrastructures are required. This...... paper presents such a concept while focusing on the aspect of resilience and fault tolerance. The proposed concept makes use of a multi-level election algorithm to transparently manage the addition, removal, failure and reorganization of units. It has been implemented and tested as a proof-of-concept on...

  14. Fault-tolerant Actuator System for Electrical Steering of Vehicles

    Sørensen, Jesper Sandberg; Blanke, Mogens

    2006-01-01

    Being critical to the safety of vehicles, the steering system is required to maintain the vehicles ability to steer until it is brought to halt, should a fault occur. With electrical steering becoming a cost-effective candidate for electrical powered vehicles, a fault-tolerant architecture is...... needed that meets this requirement. This paper studies the fault-tolerance properties of an electrical steering system. It presents a fault-tolerant architecture where a dedicated AC motor design used in conjunction with cheap voltage measurements can ensure detection of all relevant faults in the...

  15. Fault detection and fault-tolerant control for nonlinear systems

    Li, Linlin

    2016-01-01

    Linlin Li addresses the analysis and design issues of observer-based FD and FTC for nonlinear systems. The author analyses the existence conditions for the nonlinear observer-based FD systems to gain a deeper insight into the construction of FD systems. Aided by the T-S fuzzy technique, she recommends different design schemes, among them the L_inf/L_2 type of FD systems. The derived FD and FTC approaches are verified by two benchmark processes. Contents Overview of FD and FTC Technology Configuration of Nonlinear Observer-Based FD Systems Design of L2 nonlinear Observer-Based FD Systems Design of Weighted Fuzzy Observer-Based FD Systems FTC Configurations for Nonlinear Systems< Application to Benchmark Processes Target Groups Researchers and students in the field of engineering with a focus on fault diagnosis and fault-tolerant control fields The Author Dr. Linlin Li completed her dissertation under the supervision of Prof. Steven X. Ding at the Faculty of Engineering, University of Duisburg-Essen, Germany...

  16. Abstractions for Fault-Tolerant Distributed System Verification

    Pike, Lee S.; Maddalon, Jeffrey M.; Miner, Paul S.; Geser, Alfons

    2004-01-01

    Four kinds of abstraction for the design and analysis of fault tolerant distributed systems are discussed. These abstractions concern system messages, faults, fault masking voting, and communication. The abstractions are formalized in higher order logic, and are intended to facilitate specifying and verifying such systems in higher order theorem provers.

  17. Fault tolerant control of systems with saturations

    Niemann, Hans Henrik

    2013-01-01

    This paper presents framework for fault tolerant controllers (FTC) that includes input saturation. The controller architecture known from FTC is based on the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization is extended to handle input saturation. Applying this controller architecture in connec...

  18. From fault classification to fault tolerance for multi-agent systems

    Potiron, Katia; Taillibert, Patrick

    2013-01-01

    Faults are a concern for Multi-Agent Systems (MAS) designers, especially if the MAS are built for industrial or military use because there must be some guarantee of dependability. Some fault classification exists for classical systems, and is used to define faults. When dependability is at stake, such fault classification may be used from the beginning of the system's conception to define fault classes and specify which types of faults are expected. Thus, one may want to use fault classification for MAS; however, From Fault Classification to Fault Tolerance for Multi-Agent Systems argues that

  19. Software fault tolerance

    Kazinov, Tofik Hasanaga; Mostafa, Jalilian Shahrukh

    2009-01-01

    Because of our present inability to produce errorfree software, software fault tolerance is and will contiune to be an important consideration in software system. The root cause of software design errors in the complexity of the systems. This paper surveys various software fault tolerance techniquest and methodologies. They are two gpoups: Single version and Multi version software fault tolerance techniques. It is expected that software fault tolerance research will benefit from this research...

  20. Fault Tolerance in Distributed Systems using Fused State Machines

    Balasubramanian, Bharath; Garg, Vijay K

    2013-01-01

    Replication is a standard technique for fault tolerance in distributed systems modeled as deterministic finite state machines (DFSMs or machines). To correct f crash or f/2 Byzantine faults among n different machines, replication requires nf additional backup machines. We present a solution called fusion that requires just f additional backup machines. First, we build a framework for fault tolerance in DFSMs based on the notion of Hamming distances. We introduce the concept of an (f,m)-fusion...

  1. Sensor Fault Tolerant Generic Model Control for Nonlinear Systems

    2000-01-01

    A modified Strong Tracking Filter (STF) is used to develop a new approach to sensor fault tolerant control. Generic Model Control (GMC) is used to control the nonlinear process while the process runs normally because of its robust control performance. If a fault occurs in the sensor, a sensor bias vector is then introduced to the output equation of the process model. The sensor bias vector is estimated on-line during every control period using the STF. The estimated sensor bias vector is used to develop a fault detection mechanism to supervise the sensors. When a sensor fault occurs, the conventional GMC is switched to a fault tolerant control scheme, which is, in essence, a state estimation and output prediction based GMC. The laboratory experimental results on a three-tank system demonstrate the effectiveness of the proposed Sensor Fault Tolerant Generic Model Control (SFTGMC) approach.

  2. Validated Fault Tolerant Architectures for Space Station

    Lala, Jaynarayan H.

    1990-01-01

    Viewgraphs on validated fault tolerant architectures for space station are presented. Topics covered include: fault tolerance approach; advanced information processing system (AIPS); and fault tolerant parallel processor (FTPP).

  3. Guaranteed Cost Fault-Tolerant Control for Networked Control Systems with Sensor Faults

    Qixin Zhu; Kaihong Lu; Guangming Xie; Yonghong Zhu

    2015-01-01

    For the large scale and complicated structure of networked control systems, time-varying sensor faults could inevitably occur when the system works in a poor environment. Guaranteed cost fault-tolerant controller for the new networked control systems with time-varying sensor faults is designed in this paper. Based on time delay of the network transmission environment, the networked control systems with sensor faults are modeled as a discrete-time system with uncertain parameters. And the mode...

  4. Fault tolerant hypercube computer system architecture

    Madan, Herb S. (Inventor); Chow, Edward (Inventor)

    1989-01-01

    A fault-tolerant multiprocessor computer system of the hypercube type comprising a hierarchy of computers of like kind which can be functionally substituted for one another as necessary is disclosed. Communication between the working nodes is via one communications network while communications between the working nodes and watch dog nodes and load balancing nodes higher in the structure is via another communications network separate from the first. A typical branch of the hierarchy reporting to a master node or host computer comprises, a plurality of first computing nodes; a first network of message conducting paths for interconnecting the first computing nodes as a hypercube. The first network provides a path for message transfer between the first computing nodes; a first watch dog node; and a second network of message connecting paths for connecting the first computing nodes to the first watch dog node independent from the first network, the second network provides an independent path for test message and reconfiguration affecting transfers between the first computing nodes and the first switch watch dog node. There is additionally, a plurality of second computing nodes; a third network of message conducting paths for interconnecting the second computing nodes as a hypercube. The third network provides a path for message transfer between the second computing nodes; a fourth network of message conducting paths for connecting the second computing nodes to the first watch dog node independent from the third network. The fourth network provides an independent path for test message and reconfiguration affecting transfers between the second computing nodes and the first watch dog node; and a first multiplexer disposed between the first watch dog node and the second and fourth networks for allowing the first watch dog node to selectively communicate with individual ones of the computing nodes through the second and fourth networks; as well as, a second watch dog node

  5. Fault tolerant filtering and fault detection for quantum systems driven by fields in single photon states

    Gao, Qing; Dong, Daoyi; Petersen, Ian R.; Rabitz, Herschel

    2016-06-01

    The purpose of this paper is to solve the fault tolerant filtering and fault detection problem for a class of open quantum systems driven by a continuous-mode bosonic input field in single photon states when the systems are subject to stochastic faults. Optimal estimates of both the system observables and the fault process are simultaneously calculated and characterized by a set of coupled recursive quantum stochastic differential equations.

  6. Measurement and analysis of operating system fault tolerance

    Lee, I.; Tang, D.; Iyer, R. K.

    1992-01-01

    This paper demonstrates a methodology to model and evaluate the fault tolerance characteristics of operational software. The methodology is illustrated through case studies on three different operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Measurements are made on these systems for substantial periods to collect software error and recovery data. In addition to investigating basic dependability characteristics such as major software problems and error distributions, we develop two levels of models to describe error and recovery processes inside an operating system and on multiple instances of an operating system running in a distributed environment. Based on the models, reward analysis is conducted to evaluate the loss of service due to software errors and the effect of the fault-tolerance techniques implemented in the systems. Software error correlation in multicomputer systems is also investigated.

  7. Fault Tolerant Services for Safe In-Car Embedded Systems

    Navet, Nicolas; Simonot-Lion, Françoise

    2005-01-01

    Due to the increasing criticality of the functions in terms of safety, embedded automotive systems must now respect stringent dependability constraints despite the faults that may occur in a very harsh environment. In a context where critical functions are distributed over the network, the communication system plays a major role. First, we discuss the main services and functionalities that a communication system should offer for easying the design of fault-tolerant applications in the automot...

  8. Fault-tolerant Control Systems-An Introductory Overview

    Jin Jiang

    2005-01-01

    This paper presents an introductory overview on the development of fault-tolerant control systems. For this reason, the paper is written in a tutorial fashion to summarize some of the important results in this subject area deliberately without going into details in any of them. However, key references are provided from which interested readers can obtain more detailed information on a particular subject. It is necessary to mention that, throughout this paper, no efforts were made to provide an exhaustive coverage on the subject matter. In fact, it is far from it. The paper merely represents the view and experience of its author. It can very well be that some important issues or topics were left out unintentionally. If that is the case, the author sincerely apologizes in advance.After a brief account of fault-tolerant control systems, particularly on the original motivations, and the concept of redundancies, the paper reviews the development of fault-tolerant control systems with highlights to several important issues from a historical perspective. The general approaches to fault-tolerant control has been divided into passive, active, and hybrid approaches. The analysis techniques for active fault-tolerant control systems are also discussed. Practical applications of faulttolerant control are highlighted from a practical and industrial perspective. Finally, some critical issues in this area are discussed as open problems for future research/development in this emerging field.

  9. Data-driven design of fault diagnosis and fault-tolerant control systems

    Ding, Steven X

    2014-01-01

    Data-driven Design of Fault Diagnosis and Fault-tolerant Control Systems presents basic statistical process monitoring, fault diagnosis, and control methods, and introduces advanced data-driven schemes for the design of fault diagnosis and fault-tolerant control systems catering to the needs of dynamic industrial processes. With ever increasing demands for reliability, availability and safety in technical processes and assets, process monitoring and fault-tolerance have become important issues surrounding the design of automatic control systems. This text shows the reader how, thanks to the rapid development of information technology, key techniques of data-driven and statistical process monitoring and control can now become widely used in industrial practice to address these issues. To allow for self-contained study and facilitate implementation in real applications, important mathematical and control theoretical knowledge and tools are included in this book. Major schemes are presented in algorithm form and...

  10. Guaranteed Cost Fault-Tolerant Control for Networked Control Systems with Sensor Faults

    Qixin Zhu

    2015-01-01

    Full Text Available For the large scale and complicated structure of networked control systems, time-varying sensor faults could inevitably occur when the system works in a poor environment. Guaranteed cost fault-tolerant controller for the new networked control systems with time-varying sensor faults is designed in this paper. Based on time delay of the network transmission environment, the networked control systems with sensor faults are modeled as a discrete-time system with uncertain parameters. And the model of networked control systems is related to the boundary values of the sensor faults. Moreover, using Lyapunov stability theory and linear matrix inequalities (LMI approach, the guaranteed cost fault-tolerant controller is verified to render such networked control systems asymptotically stable. Finally, simulations are included to demonstrate the theoretical results.

  11. Design of fault tolerant control system for steam generator using

    Kim, Myung Ki; Seo, Mi Ro [Korea Electric Power Research Institute, Taejon (Korea, Republic of)

    1998-12-31

    A controller and sensor fault tolerant system for a steam generator is designed with fuzzy logic. A structure of the proposed fault tolerant redundant system is composed of a supervisor and two fuzzy weighting modulators. A supervisor alternatively checks a controller and a sensor induced performances to identify which part, a controller or a sensor, is faulty. In order to analyze controller induced performance both an error and a change in error of the system output are chosen as fuzzy variables. The fuzzy logic for a sensor induced performance uses two variables : a deviation between two sensor outputs and its frequency. Fuzzy weighting modulator generates an output signal compensated for faulty input signal. Simulations show that the proposed fault tolerant control scheme for a steam generator regulates well water level by suppressing fault effect of either controllers or sensors. Therefore through duplicating sensors and controllers with the proposed fault tolerant scheme, both a reliability of a steam generator control and sensor system and that of a power plant increase even more. 2 refs., 9 figs., 1 tab. (Author)

  12. A Fault-tolerant Development Methodology for Industrial Control Systems

    Izadi-Zamanabadi, Roozbeh; Thybo, C.

    2004-01-01

    Developing advanced detection schemes is not the lone factor for obtaining a successful fault diagnosis performance. Acquiring significant achievements in applying Fault-tolerance in industrial development requires that fault diagnosis and recovery schemes are developed in a consistent and...

  13. Fault-tolerant design of picture archiving and communication systems

    Reliability is perhaps the most important attribute of a PACS. Any downtime of the system may seriously affect patient care. This paper describes fault-tolerant measures employed in the design of a hospital-wide PACS. Six fault-tolerant measures have been implemented: hardware redundance (networks and archives), data-base backups, monitoring routines for local host processes and network status; uninterruptible power supplied, structured software design techniques, and in-service training of all radiology technologists. A PACS consisting of 13 acquisition nodes, two optical archiving nodes, two data-base server nodes, and five workstation nodes has been developed

  14. Development and application of diagnostic systems to achieve fault tolerance

    Much work is currently being done to develop and apply diagnostic systems that are tolerant to faulted conditions in the process being monitored and in the sensors that measure the critical parameters associated with the process. A fault-tolerant diagnostic system based on state-determination, pattern-recognition techniques is currently undergoing testing and evaluation in certain applications at the EBR-II reactor. Testing and operational experience with the system to date has shown a high degree of tolerance to sensor failures, while being sensitive to very slight changes in the plant operational state. This paper briefly mentions related work being done by others, and describes in more detail the pattern-recognition system and the results of the testing and operational experience with the system at EBR-II. 9 refs., 10 figs

  15. Active Fault Tolerant Control of Livestock Stable Ventilation System

    Gholami, Mehdi

    2011-01-01

    of the hybrid model are estimated by a recursive estimation algorithm, the Extended Kalman Filter (EKF), using experimental data which was provided by an equipped laboratory. Two methods for active fault diagnosis are proposed. The AFD methods excite the system by injecting a so-called excitation...... degraded performance even in the faulty case. In this thesis, we have designed such controllers for climate control systems for livestock buildings in three steps: Deriving a model for the climate control system of a pig-stable. Designing a active fault diagnosis (AFD) algorithm for different kinds of...... fault. Designing a fault tolerant control scheme for the climate control system. In the first step, a conceptual multi-zone model for climate control of a live-stock building is derived. The model is a nonlinear hybrid model. Hybrid systems contain both discrete and continuous components. The parameters...

  16. Fault tolerance control for proton exchange membrane fuel cell systems

    Wu, Xiaojuan; Zhou, Boyang

    2016-08-01

    Fault diagnosis and controller design are two important aspects to improve proton exchange membrane fuel cell (PEMFC) system durability. However, the two tasks are often separately performed. For example, many pressure and voltage controllers have been successfully built. However, these controllers are designed based on the normal operation of PEMFC. When PEMFC faces problems such as flooding or membrane drying, a controller with a specific design must be used. This paper proposes a unique scheme that simultaneously performs fault diagnosis and tolerance control for the PEMFC system. The proposed control strategy consists of a fault diagnosis, a reconfiguration mechanism and adjustable controllers. Using a back-propagation neural network, a model-based fault detection method is employed to detect the PEMFC current fault type (flooding, membrane drying or normal). According to the diagnosis results, the reconfiguration mechanism determines which backup controllers to be selected. Three nonlinear controllers based on feedback linearization approaches are respectively built to adjust the voltage and pressure difference in the case of normal, membrane drying and flooding conditions. The simulation results illustrate that the proposed fault tolerance control strategy can track the voltage and keep the pressure difference at desired levels in faulty conditions.

  17. Fault-tolerant clock synchronization validation methodology. [in computer systems

    Butler, Ricky W.; Palumbo, Daniel L.; Johnson, Sally C.

    1987-01-01

    A validation method for the synchronization subsystem of a fault-tolerant computer system is presented. The high reliability requirement of flight-crucial systems precludes the use of most traditional validation methods. The method presented utilizes formal design proof to uncover design and coding errors and experimentation to validate the assumptions of the design proof. The experimental method is described and illustrated by validating the clock synchronization system of the Software Implemented Fault Tolerance computer. The design proof of the algorithm includes a theorem that defines the maximum skew between any two nonfaulty clocks in the system in terms of specific system parameters. Most of these parameters are deterministic. One crucial parameter is the upper bound on the clock read error, which is stochastic. The probability that this upper bound is exceeded is calculated from data obtained by the measurement of system parameters. This probability is then included in a detailed reliability analysis of the system.

  18. Reliable, fault tolerant control systems for nuclear generating stations

    Two operational features of CANDU Nuclear Power Stations provide for high plant availability. First, the plant re-fuels on-line, thereby eliminating the need for periodic and lengthy refuelling 'outages'. Second, the all plants are controlled by real-time computer systems. Later plants are also protected using real-time computer systems. In the past twenty years, the control systems now operating in 21 plants have achieved an availability of 99.8%, making significant contributions to high CANDU plant capacity factors. This paper describes some of the features that ensure the high degree of system fault tolerance and hence high plant availability. The emphasis will be placed on the fault tolerant features of the computer systems included in the latest reactor design - the CANDU 3 (450MWe). (author)

  19. Testing Virtual Reconfigurable Circuit Designed For A Fault Tolerant System

    P. N. Kumar

    2007-01-01

    Full Text Available This research describes about the testing of virtual reconfigurable circuit (VRC designed and implemented for a fault tolerant system which averages the (three sensor inputs. The circuits that are to be tested are those which are successfully evolved in this system under different situations such as (i all the three sensors are faultless (ii one of the input sensor fails as open (iii sensors fails as short circuit. The objective of this research is to test the desired optimal circuits evolved by decoding the configuration bit streams. The logic simulation tool used to perform fault simulation is AUSIM (Auburn University Simulator.

  20. Testing Virtual Reconfigurable Circuit Designed For A Fault Tolerant System

    P. N. Kumar; S. Anandhi; M. Elancheralathan; J. R.P. Perinbam

    2007-01-01

    This research describes about the testing of virtual reconfigurable circuit (VRC) designed and implemented for a fault tolerant system which averages the (three) sensor inputs. The circuits that are to be tested are those which are successfully evolved in this system under different situations such as (i) all the three sensors are faultless (ii) one of the input sensor fails as open (iii) sensors fails as short circuit. The objective of this research is to test the desired optimal circuits ev...

  1. Performance-Oriented Fault Tolerance in Computing Systems

    Borodin, D

    2010-01-01

    In this dissertation we address the overhead reduction of fault tolerance (FT) techniques. Due to technology trends such as decreasing feature sizes and lowering voltage levels, FT is becoming increasingly important in modern computing systems. FT techniques are based on some form of redundancy. It can be space redundancy (additional hardware), time redundancy (multiple executions), and/or information redundancy (additional verification information). This redundancy significantly increases th...

  2. Development and Evaluation of Fault-Tolerant Flight Control Systems

    Song, Yong D.; Gupta, Kajal (Technical Monitor)

    2004-01-01

    The research is concerned with developing a new approach to enhancing fault tolerance of flight control systems. The original motivation for fault-tolerant control comes from the need for safe operation of control elements (e.g. actuators) in the event of hardware failures in high reliability systems. One such example is modem space vehicle subjected to actuator/sensor impairments. A major task in flight control is to revise the control policy to balance impairment detectability and to achieve sufficient robustness. This involves careful selection of types and parameters of the controllers and the impairment detecting filters used. It also involves a decision, upon the identification of some failures, on whether and how a control reconfiguration should take place in order to maintain a certain system performance level. In this project new flight dynamic model under uncertain flight conditions is considered, in which the effects of both ramp and jump faults are reflected. Stabilization algorithms based on neural network and adaptive method are derived. The control algorithms are shown to be effective in dealing with uncertain dynamics due to external disturbances and unpredictable faults. The overall strategy is easy to set up and the computation involved is much less as compared with other strategies. Computer simulation software is developed. A serious of simulation studies have been conducted with varying flight conditions.

  3. Diagnosis and Fault-Tolerant Control for Thruster-Assisted Position Mooring System

    Nguyen, Trong Dong; Blanke, Mogens; Sørensen, Asgeir

    2007-01-01

    Development of fault-tolerant control systems is crucial to maintain safe operation of o®shore installations. The objective of this paper is to develop a fault- tolerant control for thruster-assisted position mooring (PM) system with faults occurring in the mooring lines. Faults in line's pretens......Development of fault-tolerant control systems is crucial to maintain safe operation of o®shore installations. The objective of this paper is to develop a fault- tolerant control for thruster-assisted position mooring (PM) system with faults occurring in the mooring lines. Faults in line......'s pretension or line breaks will degrade the performance of the positioning of the vessel. Faults will be detected and isolated through a fault diagnosis procedure. When faults are detected, they can be accommodated through the control action in which only parameter of the controlled plant has to be updated to...

  4. Sliding mode based fault detection, reconstruction and fault tolerant control scheme for motor systems.

    Mekki, Hemza; Benzineb, Omar; Boukhetala, Djamel; Tadjine, Mohamed; Benbouzid, Mohamed

    2015-07-01

    The fault-tolerant control problem belongs to the domain of complex control systems in which inter-control-disciplinary information and expertise are required. This paper proposes an improved faults detection, reconstruction and fault-tolerant control (FTC) scheme for motor systems (MS) with typical faults. For this purpose, a sliding mode controller (SMC) with an integral sliding surface is adopted. This controller can make the output of system to track the desired position reference signal in finite-time and obtain a better dynamic response and anti-disturbance performance. But this controller cannot deal directly with total system failures. However an appropriate combination of the adopted SMC and sliding mode observer (SMO), later it is designed to on-line detect and reconstruct the faults and also to give a sensorless control strategy which can achieve tolerance to a wide class of total additive failures. The closed-loop stability is proved, using the Lyapunov stability theory. Simulation results in healthy and faulty conditions confirm the reliability of the suggested framework. PMID:25747198

  5. Fault Tolerant Software: a Multi Agent System Solution

    Caponetti, Fabio; Bergantino, Nicola; Longhi, Sauro

    2009-01-01

    Development of high dependable systems remains a labour intensive task. This paper explores recent advances on the adaptation of the software agent architecture for control application while looking to dependability issues. Multiple agent systems theory will be reviewed giving methods to supervise...... it. Software ageing is shown to be the most common problem and rejuvenation its counteract. The paper will show how an agent population can be monitored, faulty agents isolated and reloaded in a healthy state, hence rejuvenated. The aim is to propose an architecture as basis for the design of control...... software able to tolerate faults and residual bugs without the need of maintenance stops....

  6. Fault-Tolerant Relative Navigation System (RNS) for Docking Project

    National Aeronautics and Space Administration — A method is propsed to develop a sensor fusion process for blending GPS/IMU/EO data for fault tolerant rendezvous and docking of spacecraft. The methodology takes...

  7. Diagnostic software and fault tolerant microprocessor based system architectures

    In numerous industrial applications including power generation, the availability of electronic systems to perform the tasks assigned has become a major issue. At the same time, the functional complexity of these systems has increased enormously. Fortunately, the arrival of cost effective microprocessor based hardware has given the system designer a cadre of techniques to ensure the desired degree of system integrity and availability. These include: dynamic redundancy, isolation, functional diversity, built-in self-tests, embedded test subsystems, communications, error checking and error correcting codes, etc. The choice among the available techniques is generally heuristic and depends greatly on the structure of major components and systems external to the electronic system itself as well as the postulated faults and their relative frequency. Indiscriminate use of these techniques will inevitably increase cost and reduce maintainability while actually reducing system availability and reliability. The issues and the application of these techniques are discussed by describing recent examples of fault tolerant microprocessor based system architectures which include the Plant Safety Monitoring System, the EAGLE-21 Process Protection System and the Advanced Rod Position Indication System for pressurized water reactors. Each of these systems utilize unique internal architectures that address the reliability, availability, and the communications issues while improving maintainability and man-machine interfaces

  8. Fault Tolerant Feedback Control

    Stoustrup, Jakob; Niemann, H.

    2001-01-01

    An architecture for fault tolerant feedback controllers based on the Youla parameterization is suggested. It is shown that the Youla parameterization will give a residual vector directly in connection with the fault diagnosis part of the fault tolerant feedback controller. It turns out that there...... is a separation be-tween the feedback controller and the fault tolerant part. The closed loop feedback properties are handled by the nominal feedback controller and the fault tolerant part is handled by the design of the Youla parameter. The design of the fault tolerant part will not affect the...... design of the nominal feedback con-troller....

  9. Distributed Evaluation Functions for Fault Tolerant Multi-Rover Systems

    Agogino, Adrian; Turner, Kagan

    2005-01-01

    The ability to evolve fault tolerant control strategies for large collections of agents is critical to the successful application of evolutionary strategies to domains where failures are common. Furthermore, while evolutionary algorithms have been highly successful in discovering single-agent control strategies, extending such algorithms to multiagent domains has proven to be difficult. In this paper we present a method for shaping evaluation functions for agents that provide control strategies that both are tolerant to different types of failures and lead to coordinated behavior in a multi-agent setting. This method neither relies of a centralized strategy (susceptible to single point of failures) nor a distributed strategy where each agent uses a system wide evaluation function (severe credit assignment problem). In a multi-rover problem, we show that agents using our agent-specific evaluation perform up to 500% better than agents using the system evaluation. In addition we show that agents are still able to maintain a high level of performance when up to 60% of the agents fail due to actuator, communication or controller faults.

  10. Disturbance observer based fault estimation and dynamic output feedback fault tolerant control for fuzzy systems with local nonlinear models.

    Han, Jian; Zhang, Huaguang; Wang, Yingchun; Liu, Yang

    2015-11-01

    This paper addresses the problems of fault estimation (FE) and fault tolerant control (FTC) for fuzzy systems with local nonlinear models, external disturbances, sensor and actuator faults, simultaneously. Disturbance observer (DO) and FE observer are designed, simultaneously. Compared with the existing results, the proposed observer is with a wider application range. Using the estimation information, a novel fuzzy dynamic output feedback fault tolerant controller (DOFFTC) is designed. The controller can be used for the fuzzy systems with unmeasurable local nonlinear models, mismatched input disturbances, and measurement output affecting by sensor faults and disturbances. At last, the simulation shows the effectiveness of the proposed methods. PMID:26456728

  11. Adaptive fault-tolerant control of linear systems with actuator saturation and L2-disturbances

    Wei GUAN; Guanghong YANG

    2009-01-01

    This paper studies the problem of designing adaptive fault-tolerant H-infinity controllers for linear timeinvariant systems with actuator saturation. The disturbance tolerance ability of the closed-loop system is measured by an optimal index. The notion of an adaptive H-infinity performance index is proposed to describe the disturbance attenuation performances of closed-loop systems. New methods for designing indirect adaptive fault-tolerant controllers via state feedback are presented for actuator fault compensations. Based on the on-line estimation of eventual faults, the adaptive fault-tolerant controller parameters are updated automatically to compensate for the fault effects on systems. The designs are developed in the framework of the linear matrix inequality (LMI) approach, which can guarantee the disturbance tolerance ability and adaptive H-infinity performances of closed-loop systems in the cases of actuator saturation and actuator failures. An example is given to illustrate the efficiency of the design method.

  12. Modeling the Fault Tolerant Capability of a Flight Control System: An Exercise in SCR Specification

    Alexander, Chris; Cortellessa, Vittorio; DelGobbo, Diego; Mili, Ali; Napolitano, Marcello

    2000-01-01

    In life-critical and mission-critical applications, it is important to make provisions for a wide range of contingencies, by providing means for fault tolerance. In this paper, we discuss the specification of a flight control system that is fault tolerant with respect to sensor faults. Redundancy is provided by analytical relations that hold between sensor readings; depending on the conditions, this redundancy can be used to detect, identify and accommodate sensor faults.

  13. Validation Methods for Fault-Tolerant avionics and control systems, working group meeting 1

    1979-01-01

    The proceedings of the first working group meeting on validation methods for fault tolerant computer design are presented. The state of the art in fault tolerant computer validation was examined in order to provide a framework for future discussions concerning research issues for the validation of fault tolerant avionics and flight control systems. The development of positions concerning critical aspects of the validation process are given.

  14. Fault-tolerant for Electric Vehicles Drive System Sensor Failure

    Zhang Liwei

    2013-10-01

    Full Text Available When EV failure happens, it needs to take some fault-tolerant method to ensure people’s safety. When the current sensor and speed sensor are out of work, the software fault-tolerant control algorithm switching strategy can be used. This paper has done theoretical analysis of the rotor field-oriented vectoe control algorithm into the open loop constant V/F control algorithm, and the phase angle compensation method is used to reduce the shock of current and torque, and simulation is done in MATLAB/Simulink.    

  15. Reliability modeling of digital component in plant protection system with various fault-tolerant techniques

    Kim, Bo Gyung, E-mail: bogyungkim@kaist.ac.kr [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of); Kang, Hyun Gook [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of); Department of Nuclear Engineering, Khalifa University of Science, Technology and Research, Abu Dhabi (United Arab Emirates); Kim, Hee Eun [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of); Lee, Seung Jun [Integrated Safety Assessment Team, Korea Atomic Energy Research Institute, 1045, Daedeok-daero, Daejeon 305-353 (Korea, Republic of); Seong, Poong Hyun [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of)

    2013-12-15

    Highlights: • Integrated fault coverage is introduced for reflecting characteristics of fault-tolerant techniques in the reliability model of digital protection system in NPPs. • The integrated fault coverage considers the process of fault-tolerant techniques from detection to fail-safe generation process. • With integrated fault coverage, the unavailability of repairable component of DPS can be estimated. • The new developed reliability model can reveal the effects of fault-tolerant techniques explicitly for risk analysis. • The reliability model makes it possible to confirm changes of unavailability according to variation of diverse factors. - Abstract: With the improvement of digital technologies, digital protection system (DPS) has more multiple sophisticated fault-tolerant techniques (FTTs), in order to increase fault detection and to help the system safely perform the required functions in spite of the possible presence of faults. Fault detection coverage is vital factor of FTT in reliability. However, the fault detection coverage is insufficient to reflect the effects of various FTTs in reliability model. To reflect characteristics of FTTs in the reliability model, integrated fault coverage is introduced. The integrated fault coverage considers the process of FTT from detection to fail-safe generation process. A model has been developed to estimate the unavailability of repairable component of DPS using the integrated fault coverage. The new developed model can quantify unavailability according to a diversity of conditions. Sensitivity studies are performed to ascertain important variables which affect the integrated fault coverage and unavailability.

  16. Reliability modeling of digital component in plant protection system with various fault-tolerant techniques

    Highlights: • Integrated fault coverage is introduced for reflecting characteristics of fault-tolerant techniques in the reliability model of digital protection system in NPPs. • The integrated fault coverage considers the process of fault-tolerant techniques from detection to fail-safe generation process. • With integrated fault coverage, the unavailability of repairable component of DPS can be estimated. • The new developed reliability model can reveal the effects of fault-tolerant techniques explicitly for risk analysis. • The reliability model makes it possible to confirm changes of unavailability according to variation of diverse factors. - Abstract: With the improvement of digital technologies, digital protection system (DPS) has more multiple sophisticated fault-tolerant techniques (FTTs), in order to increase fault detection and to help the system safely perform the required functions in spite of the possible presence of faults. Fault detection coverage is vital factor of FTT in reliability. However, the fault detection coverage is insufficient to reflect the effects of various FTTs in reliability model. To reflect characteristics of FTTs in the reliability model, integrated fault coverage is introduced. The integrated fault coverage considers the process of FTT from detection to fail-safe generation process. A model has been developed to estimate the unavailability of repairable component of DPS using the integrated fault coverage. The new developed model can quantify unavailability according to a diversity of conditions. Sensitivity studies are performed to ascertain important variables which affect the integrated fault coverage and unavailability

  17. Fault diagnosis and fault-tolerant control strategies for non-linear systems analytical and soft computing approaches

    Witczak, Marcin

    2014-01-01

      This book presents selected fault diagnosis and fault-tolerant control strategies for non-linear systems in a unified framework. In particular, starting from advanced state estimation strategies up to modern soft computing, the discrete-time description of the system is employed Part I of the book presents original research results regarding state estimation and neural networks for robust fault diagnosis. Part II is devoted to the presentation of integrated fault diagnosis and fault-tolerant systems. It starts with a general fault-tolerant control framework, which is then extended by introducing robustness with respect to various uncertainties. Finally, it is shown how to implement the proposed framework for fuzzy systems described by the well-known Takagi–Sugeno models. This research monograph is intended for researchers, engineers, and advanced postgraduate students in control and electrical engineering, computer science,as well as mechanical and chemical engineering.

  18. Evaluation of digital fault-tolerant architectures for nuclear power plant control systems

    Four fault tolerant architectures were evaluated for their potential reliability in service as control systems of nuclear power plants. The reliability analyses showed that human- and software-related common cause failures and single points of failure in the output modules are dominant contributors to system unreliability. The four architectures are triple-modular-redundant (TMR), both synchronous and asynchronous, and also dual synchronous and asynchronous. The evaluation includes a review of design features, an analysis of the importance of coverage, and reliability analyses of fault tolerant systems. An advantage of fault-tolerant controllers over those not fault tolerant, is that fault-tolerant controllers continue to function after the occurrence of most single hardware faults. However, most fault-tolerant controllers have single hardware components that will cause system failure, almost all controllers have single points of failure in software, and all are subject to common cause failures. Reliability analyses based on data from several industries that have fault-tolerant controllers were used to estimate the mean-time-between-failures of fault-tolerant controllers and to predict those failures modes that may be important in nuclear power plants. 7 refs., 4 tabs

  19. Ship Propulsion System as a Benchmark for Fault-Tolerant Control

    Izadi-Zamanabadi, Roozbeh; Blanke, M.

    1998-01-01

    -tolerant control is a fairly new area. The paper presents a ship propulsion system as a benchmark that should be useful as a platform for development of new ideas and comparison of methods. The benchmark has two main elements. One is development of efficient FDI algorithms, the other is analysis and implementation......Fault-tolerant control combines fault detection and isolation techniques with supervisory control to achieve autonomous accommodation of faults before they develop into failures. While fault detection and isolation (FDI) methods have matured during the past decade the extension to fault...... of autonomous fault accommodation. A benchmark kit can be obtained from the authors....

  20. Boolean Logic with Fault Tolerant Coding

    Alagoz, B. Baykant

    2009-01-01

    Error detectable and error correctable coding in Hamming space was researched to discover possible fault tolerant coding constellations, which can implement Boolean logic with fault tolerant property. Basic logic operators of the Boolean algebra were developed to apply fault tolerant coding in the logic circuits. It was shown that application of three-bit fault tolerant codes have provided the digital system skill of auto-recovery without need for designing additional-fault tolerance mechanisms.

  1. Industrial Cost-Benefit Assessment for Fault-tolerant Control Systems

    Thybo, C.; Blanke, M.

    1998-01-01

    Economic aspects are decisive for industrial acceptance of research concepts including the promising ideas in fault tolerant control. Fault tolerance is the ability of a system to detect, isolate and accommodate a fault, such that simple faults in a sub-system do not develop into failures at a...... system level. In a design phase for an industrial system, possibilities span from fail safe design where any single point failure is accommodated by hardware, over fault-tolerant design where selected faults are handled without extra hardware, to fault-ignorant design where no extra precaution is taken...... against failure. The paper describes the assessments needed to find the right path for new industrial designs. The economic decisions in the design phase are discussed: cost of different failures, profits associated with available benefits, investments needed for development and life-time support. The...

  2. Software fault tolerance

    Strigini, Lorenzo

    1990-01-01

    Software design faults are a cause of major concern, and their relative importance is growing as techniques for tolerating hardware faults gain wider acceptance. The application of fault tolerance to design faults is both increasing, in particular in some life-critical applications, and controversial, due to the imperfect state of knowledge about it. This paper surveys the existing applications and research results, to help the reader form an initial picture of the existing possibilities, and...

  3. Passive Fault Tolerant Control of Piecewise Affine Systems Based on H Infinity Synthesis

    Gholami, Mehdi; Cocquempot, vincent; Schiøler, Henrik; Bak, Thomas

    2011-01-01

    In this paper we design a passive fault tolerant controller against actuator faults for discretetime piecewise affine (PWA) systems. By using dissipativity theory and H analysis, fault tolerant state feedback controller design is expressed as a set of Linear Matrix Inequalities (LMIs). In the cur...... current paper, the PWA system switches not only due to the state but also due to the control input. The method is applied on a large scale livestock ventilation model....

  4. Synthesis of Fault-Tolerant Embedded Systems with Checkpointing and Replication

    Izosimov, Viacheslav; Pop, Paul; Eles, Petru; Peng, Zebo

    We present an approach to the synthesis of fault-tolerant hard real-time systems for safety-critical applications. We use checkpointing with rollback recovery and active replication for tolerating transient faults. Processes are statically scheduled and communications are performed using the time...

  5. Energy-Aware Synthesis of Fault-Tolerant Schedules for Real-Time Distributed Embedded Systems

    Poulsen, Kåre Harbo; Pop, Paul; Izosimov, Viacheslav

    This paper presents a design optimisation tool for distributed embedded real-time systems that 1) decides mapping, fault-tolerance policy and generates a fault-tolerant schedule, 2) is targeted for hard real-time, 3) has hard reliability goal, 4) generates static schedule for processes and messages...

  6. High-Intensity Radiated Field Fault-Injection Experiment for a Fault-Tolerant Distributed Communication System

    Yates, Amy M.; Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Gonzalez, Oscar R.; Gray, W. Steven

    2010-01-01

    Safety-critical distributed flight control systems require robustness in the presence of faults. In general, these systems consist of a number of input/output (I/O) and computation nodes interacting through a fault-tolerant data communication system. The communication system transfers sensor data and control commands and can handle most faults under typical operating conditions. However, the performance of the closed-loop system can be adversely affected as a result of operating in harsh environments. In particular, High-Intensity Radiated Field (HIRF) environments have the potential to cause random fault manifestations in individual avionic components and to generate simultaneous system-wide communication faults that overwhelm existing fault management mechanisms. This paper presents the design of an experiment conducted at the NASA Langley Research Center's HIRF Laboratory to statistically characterize the faults that a HIRF environment can trigger on a single node of a distributed flight control system.

  7. Towards fault-tolerant decision support systems for ship operator guidance

    Nielsen, Ulrik Dam; Lajic, Zoran; Jensen, Jørgen Juncher

    2012-01-01

    Fault detection and isolation are very important elements in the design of fault-tolerant decision support systems for ship operator guidance. This study outlines remedies that can be applied for fault diagnosis, when the ship responses are assumed to be linear in the wave excitation. A novel num...

  8. Fault-Tolerant Control of the Road Wheel Subsystem in a Steer-By-Wire System

    Bing Zheng

    2008-01-01

    Full Text Available This paper describes a fault-tolerant steer-by-wire road wheel control system. With dual motor and dual microcontroller architecture, this system has the capability to tolerate single-point failures without degrading the control system performance. The arbitration bus, mechanical arrangement of motors, and the developed control algorithm allow the system to reconfigure itself automatically in the event of a single-point fault, and assure a smooth reconfiguration process. Both simulation and experimental results illustrate the effectiveness of the proposed fault-tolerant control system.

  9. Advanced information processing system - Status report. [for fault tolerant and damage tolerant data processing for aerospace vehicles

    Brock, L. D.; Lala, J.

    1986-01-01

    The Advanced Information Processing System (AIPS) is designed to provide a fault tolerant and damage tolerant data processing architecture for a broad range of aerospace vehicles. The AIPS architecture also has attributes to enhance system effectiveness such as graceful degradation, growth and change tolerance, integrability, etc. Two key building blocks being developed by the AIPS program are a fault and damage tolerant processor and communication network. A proof-of-concept system is now being built and will be tested to demonstrate the validity and performance of the AIPS concepts.

  10. Efficient Fault Tree Analysis of Complex Fault Tolerant Multiple-Phased Systems

    MO Yuchang; LIU Hongwei; YANG Xiaozong

    2007-01-01

    Fault tolerant multiple phased systems (FTMPS), i.e., systems whose critical components are independently replicated and whose operational life can be partitioned in a set of disjoint periods, are called "phases". Because of their deployment in critical applications, their reliability analysis is a task of primary relevance to validate the designs. Fault tree analysis based on binary decision diagram (BDD) is one of the most commonly used techniques for FTMPS reliability analysis. To utilize the technique the fault tree structure of FTMPS needs to be converted into the corresponding BDD format. Our research work shows that the system BDD generation algorithms presented in the literature are too inefficient to be used for industrial complex FTPMS because of the problems, such as variable ordering and combination of large BDDs. This paper presents a more efficient approach consisting of a flatting pre-processing technique, a proved efficient ordering heuristic and a bottom-up generation algorithm. The approach tries to combine share-variable BDDs by complex combination operation firstly and then combine no-share-variable BDDs using simple combination operation, thus to alvoid the intensive computations caused by large BDD combination operations. An example FTMPS is analyzed to illustrate the advantages of our approach.

  11. Fault-Tolerant Control using Adaptive Time-Frequency Method in Bearing Fault Detection for DFIG Wind Energy System

    Suratsavadee Koonlaboon KORKUA

    2015-02-01

    Full Text Available With the advances in power electronic technology, doubly-fed induction generators (DFIG have increasingly drawn the interest of the wind turbine industry. To ensure the reliable operation and power quality of wind power systems, the fault-tolerant control for DFIG is studied in this paper. The fault-tolerant controller is designed to maintain an acceptable level of performance during bearing fault conditions. Based on measured motor current data, an adaptive statistical time-frequency method is then used to detect the fault occurrence in the system; the controller then compensates for faulty conditions. The feature vectors, including frequency components located in the neighborhood of the characteristic fault frequencies, are first extracted and then used to estimate the next sampling stator side current, in order to better perform the current control. Early fault detection, isolation and successful reconfiguration would be very beneficial in a wind energy conversion system. The feasibility of this fault-tolerant controller has been proven by means of mathematical modeling and digital simulation based on Matlab/Simulink. The simulation results of the generator output show the effectiveness of the proposed fault-tolerant controller.

  12. Design and Assessment of a Multiple Sensor Fault Tolerant Robust Control System

    J. Chen

    2008-03-01

    Full Text Available This paper presents an enhanced robust control design structure to realise fault tolerance towards sensor faults suitable for multi-input-multi-output (MIMO systems implementation. The proposed design permits fault detection and controller elements to be designed with considerations to stability and robustness towards uncertainties besides multiple faults environment on a common mathematical platform. This framework can also cater to systems requiring fast responses. A design example is illustrated with a fast, multivariable and unstable system, that is, the double inverted pendulum system. Results indicate the potential of this design framework to handle fast systems with multiple sensor faults.

  13. Fault-Tolerant Design of Spaceborne Mass Memory System

    张宇宁; 常亮; 杨根庆; 李华旺

    2010-01-01

    A fault-tolerant spaceborne mass memory architecture is presented based on entirely commercial-off-theshelf components.The highly modularized and scalable memory kernel supports the hierarchical design and is well suited to redundancy structure.Error correcting code(ECC) and periodical scrubbing are used to deal with bit errors induced by single event upset.For 8-bit wide devices, the parallel Reed Solomon(10, 8) can perform coder/decoder calculations in one clock cycle, achieving a data rate of several Gb/...

  14. A Reliable Fault-Tolerant Scheduling Algorithm for Real Time Embedded Systems

    Arar, Chafik; Kalla, Hamoudi; Kalla, Salim; Riadh, Hocine

    2013-01-01

    In this paper, we propose a fault-tolerant scheduling for realtime embedded systems. Our scheduling algorithm is dedicated to multibus heterogeneous architectures, which take as input a given system description and a given fault hypothesis. It is based on a data fragmentation and passive redundancy, which allow fast fault detection/retransmission and efficient use of buses. Our scheduling approach consist of a list scheduling heuristic based on a Global System Failure Rate (GSFR). In order to...

  15. Fault recovery characteristics of the fault tolerant multi-processor

    Padilla, Peter A.

    1990-01-01

    The fault handling performance of the fault tolerant multiprocessor (FTMP) was investigated. Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles byzantine or lying faults. It is pointed out that these weak areas in the FTMP's design increase the probability that, for any hardware fault, a good LRU (line replaceable unit) is mistakenly disabled by the fault management software. It is concluded that fault injection can help detect and analyze the behavior of a system in the ultra-reliable regime. Although fault injection testing cannot be exhaustive, it has been demonstrated that it provides a unique capability to unmask problems and to characterize the behavior of a fault-tolerant system.

  16. Award ER25750: Coordinated Infrastructure for Fault Tolerance Systems Indiana University Final Report

    Lumsdaine, Andrew

    2013-03-08

    The main purpose of the Coordinated Infrastructure for Fault Tolerance in Systems initiative has been to conduct research with a goal of providing end-to-end fault tolerance on a systemwide basis for applications and other system software. While fault tolerance has been an integral part of most high-performance computing (HPC) system software developed over the past decade, it has been treated mostly as a collection of isolated stovepipes. Visibility and response to faults has typically been limited to the particular hardware and software subsystems in which they are initially observed. Little fault information is shared across subsystems, allowing little flexibility or control on a system-wide basis, making it practically impossible to provide cohesive end-to-end fault tolerance in support of scientific applications. As an example, consider faults such as communication link failures that can be seen by a network library but are not directly visible to the job scheduler, or consider faults related to node failures that can be detected by system monitoring software but are not inherently visible to the resource manager. If information about such faults could be shared by the network libraries or monitoring software, then other system software, such as a resource manager or job scheduler, could ensure that failed nodes or failed network links were excluded from further job allocations and that further diagnosis could be performed. As a founding member and one of the lead developers of the Open MPI project, our efforts over the course of this project have been focused on making Open MPI more robust to failures by supporting various fault tolerance techniques, and using fault information exchange and coordination between MPI and the HPC system software stack from the application, numeric libraries, and programming language runtime to other common system components such as jobs schedulers, resource managers, and monitoring tools.

  17. Computationally efficient and numerically stable reliability bounds for repairable fault-tolerant systems

    Carrasco, Juan A.

    2002-01-01

    The transient analysis of large continuous time Markov reliability models of repairable fault-tolerant systems is computationally expensive due to model stiffness. In this paper, we develop and analyze a method to compute bounds for a measure defined on a particular, but quite wide, class of continuous time Markov models, encompassing both exact and bounding continuous time Markov unreliability models of fault-tolerant systems. The method is numerically stable and computes the bounds with wel...

  18. Evaluation of digital fault-tolerant architectures for nuclear power plant control systems

    This paper reports on four fault-tolerant architectures that were evaluated for their potential reliability in service as control systems of nuclear power plants. The reliability analyses showed that human- and software-related common cause failures and single points of failure in the output modules are dominant contributors to system unreliability. The four architectures are triple-modular-redundant, both synchronous and asynchronous, and also dual synchronous and asynchronous. The evaluation includes a review of design features, an analysis of the importance of coverage, and reliability analyses of fault-tolerant systems. Reliability analyses based on data from several industries that have fault-tolerant controllers were used to estimate the mean-time-between-failures of fault-tolerant controllers and to predict those failure modes that may be important in nuclear power plants

  19. Fault-tolerant flight control system combining expert system and analytical redundancy concepts

    Handelman, Dave

    1987-01-01

    This research involves the development of a knowledge-based fault-tolerant flight control system. A software architecture is presented that integrates quantitative analytical redundancy techniques and heuristic expert system problem solving concepts for the purpose of in-flight, real-time failure accommodation.

  20. Fault Tolerant Software Architectures

    Saridakis, Titos; Issarny, Valérie

    1998-01-01

    Coping explicitly with failures during the conception and the design of software development complicates significantly the designer's job. The design complexity leads to software descriptions difficult to understand, which have to undergo many simplifications until their first functioning version. To support the systematic development of complex, fault tolerant software, this paper proposes a layered framework for the analysis of the fault tolerance software properties, where the top-most lay...

  1. Active fault tolerant control of piecewise affine systems with reference tracking and input constraints

    Gholami, M.; Cocquempot, V.; Schiøler, H.; Bak, Thomas

    2014-01-01

    An active fault tolerant control (AFTC) method is proposed for discrete-time piecewise affine (PWA) systems. Only actuator faults are considered. The AFTC framework contains a supervisory scheme, which selects a suitable controller in a set of controllers such that the stability and an acceptable...

  2. Diagnosis and fault-tolerant control

    Blanke, Mogens; Lunze, Jan; Staroswiecki, Marcel

    2016-01-01

    Fault-tolerant control aims at a gradual shutdown response in automated systems when faults occur. It satisfies the industrial demand for enhanced availability and safety, in contrast to traditional reactions to faults, which bring about sudden shutdowns and loss of availability. The book presents effective model-based analysis and design methods for fault diagnosis and fault-tolerant control. Architectural and structural models are used to analyse the propagation of the fault through the process, to test the fault detectability and to find the redundancies in the process that can be used to ensure fault tolerance. It also introduces design methods suitable for diagnostic systems and fault-tolerant controllers for continuous processes that are described by analytical models of discrete-event systems represented by automata. The book is suitable for engineering students, engineers in industry and researchers who wish to get an overview of the variety of approaches to process diagnosis and fault-tolerant contro...

  3. Fault-Tolerant Consensus of Multi-Agent System With Distributed Adaptive Protocol.

    Chen, Shun; Ho, Daniel W C; Li, Lulu; Liu, Ming

    2015-10-01

    In this paper, fault-tolerant consensus in multi-agent system using distributed adaptive protocol is investigated. Firstly, distributed adaptive online updating strategies for some parameters are proposed based on local information of the network structure. Then, under the online updating parameters, a distributed adaptive protocol is developed to compensate the fault effects and the uncertainty effects in the leaderless multi-agent system. Based on the local state information of neighboring agents, a distributed updating protocol gain is developed which leads to a fully distributed continuous adaptive fault-tolerant consensus protocol design for the leaderless multi-agent system. Furthermore, a distributed fault-tolerant leader-follower consensus protocol for multi-agent system is constructed by the proposed adaptive method. Finally, a simulation example is given to illustrate the effectiveness of the theoretical analysis. PMID:25415998

  4. Fault-tolerant interconnection network and image-processing applications for the PASM parallel processing system

    The demand for very high speed data processing coupled with falling hardware costs has made large-scale parallel and distributed computer systems both desirable and feasible. Two modes of parallel processing are single instruction stream-multiple data stream (SIMD) and multiple instruction stream-multiple data stream (MIMD). PASM, a partitionable SIMD/MIMD system, is a reconfigurable multimicroprocessor system being designed for image processing and pattern recognition. An important component of these systems is the interconnection network, the mechanism for communication among the computation nodes and memories. Assuring high reliability for such complex systems is a significant task. Thus, a crucial practical aspect of an interconnection network is fault tolerance. In answer to this need, the Extra Stage Cube (ESC), a fault-tolerant, multistage cube-type interconnection network, is define. The fault tolerance of the ESC is explored for both single and multiple faults, routing tags are defined, and consideration is given to permuting data and partitioning the ESC in the presence of faults. The ESC is compared with other fault-tolerant multistage networks. Finally, reliability of the ESC and an enhanced version of it are investigated

  5. Validation Methods Research for Fault-Tolerant Avionics and Control Systems: Working Group Meeting, 2

    Gault, J. W. (Editor); Trivedi, K. S. (Editor); Clary, J. B. (Editor)

    1980-01-01

    The validation process comprises the activities required to insure the agreement of system realization with system specification. A preliminary validation methodology for fault tolerant systems documented. A general framework for a validation methodology is presented along with a set of specific tasks intended for the validation of two specimen system, SIFT and FTMP. Two major areas of research are identified. First, are those activities required to support the ongoing development of the validation process itself, and second, are those activities required to support the design, development, and understanding of fault tolerant systems.

  6. Pivotal decomposition for reliability analysis of fault tolerant control systems on unmanned aerial vehicles

    In this paper, we describe a framework to efficiently assess the reliability of fault tolerant control systems on low-cost unmanned aerial vehicles. The analysis is developed for a system consisting of a fixed number of actuators. In addition, the system includes a scheme to detect failures in individual actuators and, as a consequence, switch between different control algorithms for automatic operation of the actuators. Existing dynamic reliability analysis methods are insufficient for this class of systems because the coverage parameters for different actuator failures can be time-varying, correlated, and difficult to obtain in practice. We address these issues by combining new fault detection performance metrics with pivotal decomposition. These new metrics capture the interactions in different fault detection channels, and can be computed from stochastic models of fault detection algorithms. Our approach also decouples the high dimensional analysis problem into low dimensional sub-problems, yielding a computationally efficient analysis. Finally, we demonstrate the proposed method on a numerical example. The analysis results are also verified by Monte Carlo simulations. - Highlights: • We study fault tolerant control (FTC) systems on low-cost unmanned aerial vehicles. • We build a reliability structure model for FTC systems. • New fault detection performance metrics are integrated via pivotal decomposition. • The fault detection metrics capture the interactions in fault detection channels. • Numerical results show that FTC techniques can improve system reliability

  7. The Impact of a Fault Tolerant MPI on Scalable Systems Services and Applications

    Graham, Richard L [ORNL; Hursey, Joshua J [ORNL; Vallee, Geoffroy R [ORNL; Naughton, III, Thomas J [ORNL; Boehm, Swen [ORNL

    2012-01-01

    Exascale targeted scientific applications must be prepared for a highly concurrent computing environment where failure will be a regular event during execution. Natural and algorithm-based fault tolerance (ABFT) techniques can often manage failures more efficiently than traditional checkpoint/restart techniques alone. Central to many petascale applications is an MPI standard that lacks support for ABFT. The Run-Through Stabilization (RTS) proposal, under consideration for MPI 3, allows an application to continue execution when processes fail. The requirements of scalable, fault tolerant MPI implementations and applications will stress the capabilities of many system services. System services must evolve to efficiently support such applications and libraries in the presence of system component failures. This paper discusses how the RTS proposal impacts system services, highlighting specific requirements. Early experimentation results from Cray systems at ORNL using prototype MPI and runtime implementations are presented. Additionally, this paper outlines fault tolerance techniques targeted at leadership class applications.

  8. Application of Joint Parameter Identification and State Estimation to a Fault-Tolerant Robot System

    Sun, Zhen; Yang, Zhenyu

    The joint parameter identification and state estimation technique is applied to develop a fault-tolerant space robot system. The potential faults in the considered system are abrupt parametric faults, which indicate that some system parameters will immediately deviate from their nominal values if a......, it would further simplify the reconfigurable design task and possibly speed up the system recovery, if the system state information under the new operating circumstance can be available along with faulty parameter information. The joint parameter identification and state estimation using the combined...

  9. Energy-efficient fault tolerance in multiprocessor real-time systems

    Guo, Yifeng

    The recent progress in the multiprocessor/multicore systems has important implications for real-time system design and operation. From vehicle navigation to space applications as well as industrial control systems, the trend is to deploy multiple processors in real-time systems: systems with 4 -- 8 processors are common, and it is expected that many-core systems with dozens of processing cores will be available in near future. For such systems, in addition to general temporal requirement common for all real-time systems, two additional operational objectives are seen as critical: energy efficiency and fault tolerance. An intriguing dimension of the problem is that energy efficiency and fault tolerance are typically conflicting objectives, due to the fact that tolerating faults (e.g., permanent/transient) often requires extra resources with high energy consumption potential. In this dissertation, various techniques for energy-efficient fault tolerance in multiprocessor real-time systems have been investigated. First, the Reliability-Aware Power Management (RAPM) framework, which can preserve the system reliability with respect to transient faults when Dynamic Voltage Scaling (DVS) is applied for energy savings, is extended to support parallel real-time applications with precedence constraints. Next, the traditional Standby-Sparing (SS) technique for dual processor systems, which takes both transient and permanent faults into consideration while saving energy, is generalized to support multiprocessor systems with arbitrary number of identical processors. Observing the inefficient usage of slack time in the SS technique, a Preference-Oriented Scheduling Framework is designed to address the problem where tasks are given preferences for being executed as soon as possible (ASAP) or as late as possible (ALAP). A preference-oriented earliest deadline (POED) scheduler is proposed and its application in multiprocessor systems for energy-efficient fault tolerance is

  10. Preface of the special issue on Advances in Control and Fault-Tolerant Systems

    Korbicz, Jozef; Maquin, Didier; THEILLIOL, DIDIER

    2012-01-01

    Today's automatic control systems are of high degrees of integration, complexity, embedding and networking of heterogeneous entities. This trend is driven by the industrial needs for achieving new technical performance and meeting additional performance demands. A most critical and important issue surrounding the design and operation of complex automatic systems is the application of Fault Detection and Isolation and Fault-Tolerant Control (FDI/FTC) technology, aiming at guaranteeing high sys...

  11. Science Letters: Low-cost fault tolerance in evolvable multiprocessor systems: a graceful degradation approach

    Shervin VAKILI; Sied Mehdi FAKHRAIE; Siamak MOHAMMADI; Ali AHMADI

    2009-01-01

    The evolvable multiprocessor (EvoMP), as a novel multiprocessor system-on-chip (MPSoC) machine with evolvable task decomposition and scheduling, claims a major feature of low-cost and efficient fault tolerance. Non-centralized control and adaptive distribution of the program among the available processors are two major capabilities of this platform, which remarkably help to achieve an efficient fault tolerance scheme. This letter presents the operational as well as architectural details of this fault tolerance scheme. In this method, when a processor becomes faulty, it will be eliminated of contribution in program execution in remaining run-time. This method also utilizes dynamic rescheduling capability of the system to achieve the maximum possible efficiency after processor reduction. The results confirm the efficiency and remarkable advantages of the proposed approach over common redundancy based techniques in similar systems.

  12. Novel fault tolerant modular system architecture for I and C applications

    Novel fault tolerant 3U modular system architecture has been developed for safety related and safety critical I and C systems of the reactor. Design innovatively utilizes simplest multi-drop serial bus called Inter-Integrated Circuits (I2C) Bus for system operation with simplicity, fault tolerance and online maintainability (hot swap). I2C bus failure modes analysis was done and system design was hardened for possible failure modes. System backplane uses only passive components, dual redundant I2C buses, data consistency checks and geographical addressing scheme to tackle bus lock ups/stuck buses and bit flips in data transactions. Dual CPU active/standby redundancy architecture with hot swap implements tolerance for CPU software stuck up conditions and hardware faults. System cards implement hot swap for online maintainability, power supply fault containment, communication buses fault containment and I/O channel to channel isolation and independency. Typical applications for pure hardwired (without real time software) Core Temperature Monitoring System for FBRs, as a Universal Signal Conditioning System for safety related I and C systems and as a complete control system for non nuclear safety systems have also been discussed. (author)

  13. Design of fault tolerant control system for steam generator using fuzzy logic

    A controller and sensor fault tolerant system for a steam generator is designed with fuzzy logic. A structure of the proposed fault tolerant redundant system is composed of a supervisor and two fuzzy weighting modulators. A supervisor alternatively checks a controller and a sensor induced performances to identify which part, a controller or a sensor, is faulty. In order to analyze controller induced performance both an error and a change in error of the system output are chosen as fuzzy variables. The fuzzy logic for a sensor induced performance uses two variables : a deviation between two sensor outputs and its frequency. Fuzzy weighting modulator generates an output signal compensated for faulty input signal. Simulations show that the proposed fault tolerant control scheme for a stem generator regulates well water level by suppressing fault effect of either controllers or sensors. Therefore through duplicating sensors and controllers with the proposed fault tolerant scheme, both a reliability of a steam generator control and sensor system and that of a power plant increase even more

  14. A real-time fault-tolerant scheduling algorithm with low dependability cost in on-board computer system

    WANG Pei-dong; WEI Zhen-hua

    2008-01-01

    To make the on-board computer system more dependable and real-time in a satellite, an algorithm of the fault-tolerant scheduling in the on-board computer system with high priority recovery is proposed in this paper. This algorithm can schedule the on-board fault-tolerant tasks in real time. Due to the use of dependability cost, the overhead of scheduling the fault-tolerant tasks can be reduced. The mechanism of the high priority recovery will improve the response to recovery tasks. The fault-tolerant scheduling model is presented simulation results validate the correctness and feasibility of the proposed algorithm.

  15. Fault tolerance control of phase current in permanent magnet synchronous motor control system

    Chen, Kele; Chen, Ke; Chen, Xinglong; Li, Jinying

    2014-08-01

    As the Photoelectric tracking system develops from earth based platform to all kinds of moving platform such as plane based, ship based, car based, satellite based and missile based, the fault tolerance control system of phase current sensor is studied in order to detect and control of failure of phase current sensor on a moving platform. By using a DC-link current sensor and the switching state of the corresponding SVPWM inverter, the failure detection and fault control of three phase current sensor is achieved. Under such conditions as one failure, two failures and three failures, fault tolerance is able to be controlled. The reason why under the method, there exists error between fault tolerance control and actual phase current, is analyzed, and solution to weaken the error is provided. The experiment based on permanent magnet synchronous motor system is conducted, and the method is proven to be capable of detecting the failure of phase current sensor effectively and precisely, and controlling the fault tolerance simultaneously. With this method, even though all the three phase current sensors malfunction, the moving platform can still work by reconstructing the phase current of the motor.

  16. Scheduling and Optimization of Fault-Tolerant Embedded Systems with Transparency/Performance Trade-Offs

    Izosimov, Viacheslav; Pop, Paul; Eles, Petru; Peng, Zebo

    2012-01-01

    In this article, we propose a strategy for the synthesis of fault-tolerant schedules and for the mapping of fault-tolerant applications. Our techniques handle transparency/performance trade-offs and use the faultoccurrence information to reduce the overhead due to fault tolerance. Processes and...

  17. The fault-tolerant multiprocessor computer

    Smith, T.B. III; Lala, J.H.; Goldberg, J.; Kautz, W.H.; Melliar-Smith, P.M.; Green, M.W.; Levitt, K.N.; Schwartz, R.L.; Weinstock, C.B.; Palumbo, D.; Butler, R.W.

    1986-01-01

    This book presents studies of two fault-tolerant computer systems designed to meet the extreme reliability requirements for safety- critical functions in advanced NASA vehicles , plus a study of potential architectures for future flight control fault-tolerant systems, which might succeed the current generation of computers. While it is understood that these studies were done for NASA, they also have practical commercial applicability. The fault-tolerant multiprocessor (FTMP) architecture is a high reliability computer concept. The basic organization of the FTMP is that of a general purpose homogeneous multiprocessor. Three processors operate on a shared system (memory and l/O) bus. Replication and tight synchronization of all elements and hardware voting are employed to detect and correct any single fault. Reconfiguration is then employed to ''repair'' a fault. Multiple faults may be tolerated as a sequence of single faults with repair between fault occurrences.

  18. Fault tolerant distributed real time computer systems for I and C of prototype fast breeder reactor

    Highlights: • Architecture of distributed real time computer system (DRTCS) used in I and C of PFBR is explained. • Fault tolerant (hot standby) architecture, fault detection and switch over are detailed. • Scaled down model was used to study functional and performance requirements of DRTCS. • Quality of service parameters for scaled down model was critically studied. - Abstract: Prototype fast breeder reactor (PFBR) is in the advanced stage of construction at Kalpakkam, India. Three-tier architecture is adopted for instrumentation and control (I and C) of PFBR wherein bottom tier consists of real time computer (RTC) systems, middle tier consists of process computers and top tier constitutes of display stations. These RTC systems are geographically distributed and networked together with process computers and display stations. Hot standby architecture comprising of dual redundant RTC systems with switch over logic system is deployed in order to achieve fault tolerance. Fault tolerant dual redundant network connectivity is provided in each RTC system and TCP/IP protocol is selected for network communication. In order to assess the performance of distributed RTC systems, scaled down model was developed with 9 representative systems and nearly 15% of I and C signals of PFBR were connected and monitored. Functional and performance testing were carried out for each RTC system and the fault tolerant characteristics were studied by creating various faults into the system and observed the performance. Various quality of service parameters like connection establishment delay, priority parameter, transit delay, throughput, residual error ratio, etc., are critically studied for the network

  19. Fault Injection and Monitoring Capability for a Fault-Tolerant Distributed Computation System

    Torres-Pomales, Wilfredo; Yates, Amy M.; Malekpour, Mahyar R.

    2010-01-01

    The Configurable Fault-Injection and Monitoring System (CFIMS) is intended for the experimental characterization of effects caused by a variety of adverse conditions on a distributed computation system running flight control applications. A product of research collaboration between NASA Langley Research Center and Old Dominion University, the CFIMS is the main research tool for generating actual fault response data with which to develop and validate analytical performance models and design methodologies for the mitigation of fault effects in distributed flight control systems. Rather than a fixed design solution, the CFIMS is a flexible system that enables the systematic exploration of the problem space and can be adapted to meet the evolving needs of the research. The CFIMS has the capabilities of system-under-test (SUT) functional stimulus generation, fault injection and state monitoring, all of which are supported by a configuration capability for setting up the system as desired for a particular experiment. This report summarizes the work accomplished so far in the development of the CFIMS concept and documents the first design realization.

  20. Fault Tolerant Computer Architecture

    Sorin, Daniel

    2009-01-01

    For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes

  1. Cost and benefits design optimization model for fault tolerant flight control systems

    Rose, J.

    1982-01-01

    Requirements and specifications for a method of optimizing the design of fault-tolerant flight control systems are provided. Algorithms that could be used for developing new and modifying existing computer programs are also provided, with recommendations for follow-on work.

  2. Safety verification of a fault tolerant reconfigurable autonomous goal-based robotic control system

    Braman, Julia M. B.; Murray, Richard M.; Wagner, David A.

    2007-01-01

    Fault tolerance and safety verification of control systems are essential for the success of autonomous robotic systems. A control architecture called Mission Data System (MDS), developed at the Jet Propulsion Laboratory, takes a goal-based control approach. In this paper, a method for converting goal network control programs into linear hybrid systems is developed. The linear hybrid system can then be verified for safety in the presence of failures using existing symbo...

  3. Diagnosis and Tolerant Strategy of an Open-Switch Fault for T-type Three-Level Inverter Systems

    Choi, Uimin; Lee, Kyo Beum; Blaabjerg, Frede

    2014-01-01

    -tolerant strategy is explained by dividing into two cases: the faulty condition of half-bridge switches and the neutral-point switches. The performance of the T-type inverter system improves considerably by the proposed fault tolerant algorithm when a switch fails. The roposed method does not require additional......This paper proposes a new diagnosis method of an open-switch fault and fault-tolerant control strategy for T-type three-level inverter systems. The location of faulty switch can be identified by the average of normalized phase current and the change of the neutral-point voltage. The proposed fault...

  4. Adaptive Fault-Tolerant Control of Uncertain Nonlinear Large-Scale Systems With Unknown Dead Zone.

    Chen, Mou; Tao, Gang

    2016-08-01

    In this paper, an adaptive neural fault-tolerant control scheme is proposed and analyzed for a class of uncertain nonlinear large-scale systems with unknown dead zone and external disturbances. To tackle the unknown nonlinear interaction functions in the large-scale system, the radial basis function neural network (RBFNN) is employed to approximate them. To further handle the unknown approximation errors and the effects of the unknown dead zone and external disturbances, integrated as the compounded disturbances, the corresponding disturbance observers are developed for their estimations. Based on the outputs of the RBFNN and the disturbance observer, the adaptive neural fault-tolerant control scheme is designed for uncertain nonlinear large-scale systems by using a decentralized backstepping technique. The closed-loop stability of the adaptive control system is rigorously proved via Lyapunov analysis and the satisfactory tracking performance is achieved under the integrated effects of unknown dead zone, actuator fault, and unknown external disturbances. Simulation results of a mass-spring-damper system are given to illustrate the effectiveness of the proposed adaptive neural fault-tolerant control scheme for uncertain nonlinear large-scale systems. PMID:26340792

  5. Fault-Tolerant Heat Exchanger

    Izenson, Michael G.; Crowley, Christopher J.

    2005-01-01

    A compact, lightweight heat exchanger has been designed to be fault-tolerant in the sense that a single-point leak would not cause mixing of heat-transfer fluids. This particular heat exchanger is intended to be part of the temperature-regulation system for habitable modules of the International Space Station and to function with water and ammonia as the heat-transfer fluids. The basic fault-tolerant design is adaptable to other heat-transfer fluids and heat exchangers for applications in which mixing of heat-transfer fluids would pose toxic, explosive, or other hazards: Examples could include fuel/air heat exchangers for thermal management on aircraft, process heat exchangers in the cryogenic industry, and heat exchangers used in chemical processing. The reason this heat exchanger can tolerate a single-point leak is that the heat-transfer fluids are everywhere separated by a vented volume and at least two seals. The combination of fault tolerance, compactness, and light weight is implemented in a unique heat-exchanger core configuration: Each fluid passage is entirely surrounded by a vented region bridged by solid structures through which heat is conducted between the fluids. Precise, proprietary fabrication techniques make it possible to manufacture the vented regions and heat-conducting structures with very small dimensions to obtain a very large coefficient of heat transfer between the two fluids. A large heat-transfer coefficient favors compact design by making it possible to use a relatively small core for a given heat-transfer rate. Calculations and experiments have shown that in most respects, the fault-tolerant heat exchanger can be expected to equal or exceed the performance of the non-fault-tolerant heat exchanger that it is intended to supplant (see table). The only significant disadvantages are a slight weight penalty and a small decrease in the mass-specific heat transfer.

  6. Robust fault-tolerant H∞ control of active suspension systems with finite-frequency constraint

    Wang, Rongrong; Jing, Hui; Karimi, Hamid Reza; Chen, Nan

    2015-10-01

    In this paper, the robust fault-tolerant (FT) H∞ control problem of active suspension systems with finite-frequency constraint is investigated. A full-car model is employed in the controller design such that the heave, pitch and roll motions can be simultaneously controlled. Both the actuator faults and external disturbances are considered in the controller synthesis. As the human body is more sensitive to the vertical vibration in 4-8 Hz, robust H∞ control with this finite-frequency constraint is designed. Other performances such as suspension deflection and actuator saturation are also considered. As some of the states such as the sprung mass pitch and roll angles are hard to measure, a robust H∞ dynamic output-feedback controller with fault tolerant ability is proposed. Simulation results show the performance of the proposed controller.

  7. Fault Tolerance Mobile Agent System Using Witness Agent in 2-Dimensional Mesh Network

    Ahmad Rostami

    2010-09-01

    Full Text Available Mobile agents are computer programs that act autonomously on behalf of a user or its owner and travel through a network of heterogeneous machines. Fault tolerance is important in their itinerary. In this paper, existent methods of fault tolerance in mobile agents are described which they are considered in linear network topology. In the methods three agents are used to fault tolerance by cooperating to each others for detecting and recovering server and agent failure. Three types of agents are: actual agent which performs programs for its owner, witness agent which monitors the actual agent and the witness agent after itself, probe which is sent for recovery the actual agent or the witness agent on the side of the witness agent. Communication mechanism in the methods is message passing between these agents. The methods are considered in linear network. We introduce our witness agent approach for fault tolerance mobile agent systems in Two Dimensional Mesh (2D-Mesh Network. Indeed Our approach minimizes Witness-Dependency in this network and then represents its algorithm.

  8. A Systematic Approach to Sensitivity Analysis of Fault Tolerant Systems in NMR Architecture

    Kourosh Aslansefat

    2015-01-01

    Full Text Available A fault tree illustrates the ways through which a system fails. It states different ways in which combination of faulty components result in an undesired event in the system. Being used in phases such as designing and exploiting industrial systems, and the designers able to evaluate the dependability attributes such as reliability, MTTF and sensitivity. In addition, in the mentioned ability, the fault tree is a systematic method for finding systems bottlenecks and weakness point. In spite of its extensive use in evaluating the reliability of systems, fault tree is rarely used in calculating sensitivity. In the last decade, few researches has been conducted in this field, however these methods are not applicable to large scale systems and are not systematic. This paper provides a systematic method for evaluating system sensitivity through fault tree. Then, it introduces sensitivity of NMR architecture as one of the common structures of fault tolerance which is used for enhancing systems’ reliability, safety and availability in industry. This article presents a comprehensive and parameterized formula for NMR structure's sensitivity. The presented method can be a great help for designing and exploiting reliable systems engineers in systematic and instant calculation of sensitivity by means of fault tree.

  9. Modeling and Design of Fault-Tolerant and Self-Adaptive Reconfigurable Networked Embedded Systems

    Streichert Thilo

    2006-01-01

    Full Text Available Automotive, avionic, or body-area networks are systems that consist of several communicating control units specialized for certain purposes. Typically, different constraints regarding fault tolerance, availability and also flexibility are imposed on these systems. In this article, we will present a novel framework for increasing fault tolerance and flexibility by solving the problem of hardware/software codesign online. Based on field-programmable gate arrays (FPGAs in combination with CPUs, we allow migrating tasks implemented in hardware or software from one node to another. Moreover, if not enough hardware/software resources are available, the migration of functionality from hardware to software or vice versa is provided. Supporting such flexibility through services integrated in a distributed operating system for networked embedded systems is a substantial step towards self-adaptive systems. Beside the formal definition of methods and concepts, we describe in detail a first implementation of a reconfigurable networked embedded system running automotive applications.

  10. Problems related to the integration of fault tolerant aircraft electronic systems

    Bannister, J. A.; Adlakha, V.; Triyedi, K.; Alspaugh, T. A., Jr.

    1982-01-01

    Problems related to the design of the hardware for an integrated aircraft electronic system are considered. Taxonomies of concurrent systems are reviewed and a new taxonomy is proposed. An informal methodology intended to identify feasible regions of the taxonomic design space is described. Specific tools are recommended for use in the methodology. Based on the methodology, a preliminary strawman integrated fault tolerant aircraft electronic system is proposed. Next, problems related to the programming and control of inegrated aircraft electronic systems are discussed. Issues of system resource management, including the scheduling and allocation of real time periodic tasks in a multiprocessor environment, are treated in detail. The role of software design in integrated fault tolerant aircraft electronic systems is discussed. Conclusions and recommendations for further work are included.

  11. Modeling and Design of Fault-Tolerant and Self-Adaptive Reconfigurable Networked Embedded Systems

    Jürgen Teich

    2006-06-01

    Full Text Available Automotive, avionic, or body-area networks are systems that consist of several communicating control units specialized for certain purposes. Typically, different constraints regarding fault tolerance, availability and also flexibility are imposed on these systems. In this article, we will present a novel framework for increasing fault tolerance and flexibility by solving the problem of hardware/software codesign online. Based on field-programmable gate arrays (FPGAs in combination with CPUs, we allow migrating tasks implemented in hardware or software from one node to another. Moreover, if not enough hardware/software resources are available, the migration of functionality from hardware to software or vice versa is provided. Supporting such flexibility through services integrated in a distributed operating system for networked embedded systems is a substantial step towards self-adaptive systems. Beside the formal definition of methods and concepts, we describe in detail a first implementation of a reconfigurable networked embedded system running automotive applications.

  12. To err is robotic, to tolerate immunological: fault detection in multirobot systems.

    Tarapore, Danesh; Lima, Pedro U; Carneiro, Jorge; Christensen, Anders Lyhne

    2015-01-01

    Fault detection and fault tolerance represent two of the most important and largely unsolved issues in the field of multirobot systems (MRS). Efficient, long-term operation requires an accurate, timely detection, and accommodation of abnormally behaving robots. Most existing approaches to fault-tolerance prescribe a characterization of normal robot behaviours, and train a model to recognize these behaviours. Behaviours unrecognized by the model are consequently labelled abnormal or faulty. MRS employing these models do not transition well to scenarios involving temporal variations in behaviour (e.g., online learning of new behaviours, or in response to environment perturbations). The vertebrate immune system is a complex distributed system capable of learning to tolerate the organism's tissues even when they change during puberty or metamorphosis, and to mount specific responses to invading pathogens, all without the need of a genetically hardwired characterization of normality. We present a generic abnormality detection approach based on a model of the adaptive immune system, and evaluate the approach in a swarm of robots. Our results reveal the robust detection of abnormal robots simulating common electro-mechanical and software faults, irrespective of temporal changes in swarm behaviour. Abnormality detection is shown to be scalable in terms of the number of robots in the swarm, and in terms of the size of the behaviour classification space. PMID:25642825

  13. Observer-based fault-tolerant control for a class of nonlinear networked control systems

    Mahmoud, M. S.; Memon, A. M.; Shi, Peng

    2014-08-01

    This paper presents a fault-tolerant control (FTC) scheme for nonlinear systems which are connected in a networked control system. The nonlinear system is first transformed into two subsystems such that the unobservable part is affected by a fault and the observable part is unaffected. An observer is then designed which gives state estimates using a Luenberger observer and also estimates unknown parameter of the system; this helps in fault estimation. The FTC is applied in the presence of sampling due to the presence of a network in the loop. The controller gain is obtained using linear-quadratic regulator technique. The methodology is applied on a mechatronic system and the results show satisfactory performance.

  14. Fault detection and fault-tolerant control using sliding modes

    Alwi, Halim; Tan, Chee Pin

    2011-01-01

    ""Fault Detection and Fault-tolerant Control Using Sliding Modes"" is the first text dedicated to showing the latest developments in the use of sliding-mode concepts for fault detection and isolation (FDI) and fault-tolerant control in dynamical engineering systems. It begins with an introduction to the basic concepts of sliding modes to provide a background to the field. This is followed by chapters that describe the use and design of sliding-mode observers for FDI using robust fault reconstruction. The development of a class of sliding-mode observers is described from first principles throug

  15. Fault-tolerant multiprocessor computer

    Smith, T.B. III; Lala, J.H.; Goldberg, J.; Kautz, W.H.; Melliar-Smith, P.M.; Green, M.W.; Levitt, K.N.; Schwartz, R.L.; Weinstock, C.B.; Palumbo, D.L.

    1986-01-01

    The development and evaluation of fault-tolerant computer architectures and software-implemented fault tolerance (SIFT) for use in advanced NASA vehicles and potentially in flight-control systms are described in a collection of previously published reports prepared for NASA. Topics addressed include the principles of fault-tolerant multiprocessor (FTMP) operation; processor and slave regional designs; FTMP executive, facilities, aceptance-test/diagnostic, applications, and support software; FTM reliability and availability models; SIFT hardware design; and SIFT validation and verification.

  16. A method for the computation of reliability bounds for non-repairable fault-tolerant systems

    Suñé, Víctor; Carrasco, Juan A.

    1997-01-01

    A realistic modeling of fault-tolerant systems requires to take into account phenomena such as the dependence of component failure rates and coverage parameters on the operational configuration of the system, which cannot be properly captured using combinatorial techniques. Such dependencies can be modeled with detail using continuous-time Markov chains (CTMC’s). However, the use of CTMC models is limited by the well-known state space explosion problem. In this paper we develop a method for t...

  17. Architectures for fault-tolerant spacecraft computers

    Rennels, D. A.

    1978-01-01

    This paper summarizes the results of a long-term research program in fault-tolerant computing for spacecraft on-board processing. In response to changing device technology this program has progressed from the design of a fault-tolerant uniprocessor to the development of fault-tolerant distributed computer systems. The unusual requirements of spacecraft computing are described along with the resulting real-time computer architectures. The following aspects of these designs are discussed: (1) architectural features to minimize complexity in the distributed computer system, (2) fault-detection and recovery, (3) techniques to enhance reliability and testability, and (4) design approaches for LSI implementation.

  18. An architecture for fault tolerant controllers

    Niemann, Hans Henrik; Stoustrup, Jakob

    2005-01-01

    A general architecture for fault tolerant control is proposed. The architecture is based on the (primary) YJBK parameterization of all stabilizing compensators and uses the dual YJBK parameterization to quantify the performance of the fault tolerant system. The approach suggested can be applied f...

  19. Feasibility analysis and design of a fault tolerant computing system: a TMR microprocessor system design of 64-Bit COTS microprocessors

    Eken, Huseyin Baha

    2001-01-01

    The purpose of this thesis is to analyze and determine the feasibility of implementing a fault tolerant computing system that is able to function in the presence of radiation induced Single Event Upsets (SEU) by using the Triple Modular Redundancy (TMR) technique with 64-bit Commercial-Off-The- Shelf (COTS) microprocessors. Due to the radiation environment in space, electronic devices must be designed to tolerate the radiation effects. While there are radiation-hardened devices that can toler...

  20. Fault-tolerant Supervisory Control

    Izadi-Zamanabadi, Roozbeh

    systematic analysis of reconfiguration and design of supervisor logic. In addition, useful experience is obtained through implementation of a fault-tolerant control scheme against a simulated ship and its propulsion system. A development methodology, which was suggested in the Control Engineering Department...... realized on the basis of an extension to the traditional state-event machines. Due to parallelity (inherent modularity) the supervisor logic is more easily modified, updated, maintained, and tested. A salient feature is that a change in one task only necessitates redesign of essentially one corresponding...... state-event machine (SEM). A heuristic guideline is provided for designing the logic in form of SEMs. A ship propulsion system benchmark has been designed and used as a case study. This includes experimentation with the above methodologies and implementation of a fault-tolerant control against the...

  1. Economic modeling of fault tolerant flight control systems in commercial applications

    Finelli, G. B.

    1982-01-01

    This paper describes the current development of a comprehensive model which will supply the assessment and analysis capability to investigate the economic viability of Fault Tolerant Flight Control Systems (FTFCS) for commercial aircraft of the 1990's and beyond. An introduction to the unique attributes of fault tolerance and how they will influence aircraft operations and consequent airline costs and benefits is presented. Specific modeling issues and elements necessary for accurate assessment of all costs affected by ownership and operation of FTFCS are delineated. Trade-off factors are presented, aimed at exposing economically optimal realizations of system implementations, resource allocation, and operating policies. A trade-off example is furnished to graphically display some of the analysis capabilities of the comprehensive simulation model now being developed.

  2. A Decoding Approach to Fault Tolerant Control of Linear Systems with Quantized Disturbance Input

    Fosson, Sophie M

    2010-01-01

    The aim of this paper is to propose an alternative method to solve a Fault Tolerant Control problem. The model is a linear system affected by a disturbance term: this represents a large class of technological faulty processes. The goal is to make the system able to tolerate the undesired perturbation, i.e., to remove or at least reduce its negative effects; such a task is performed in three steps: the detection of the fault, its identification and the consequent process recovery. When the disturbance function is known to be \\emph{quantized} over a finite number of levels, the detection can be successfully executed by a recursive \\emph{decoding} algorithm, arising from Information and Coding Theory and suitably adapted to the control framework. This technique is analyzed and tested in a flight control issue; both theoretical considerations and simulations are reported.

  3. Fault-tolerant supervisory control of VAV air-conditioning systems

    Liu, X.-F.; Dexter, A. [Department of Engineering Science, University of Oxford, Oxford (United Kingdom)

    2001-07-01

    The paper describes a supervisory control scheme that adapts to the presence of degradation faults and minimises any resulting increase in energy consumption or deterioration in occupant comfort. Since there is a high degree of uncertainty associated with the results of any fault identification scheme in information-poor systems of this type, the supervisory control scheme uses fuzzy models to predict the control performance and a computationally undemanding optimisation scheme to determine the most appropriate set-points. The fault-tolerant control scheme is developed and evaluated using a detailed computer simulation of a multi-zone, variable-air-volume (VAV), air-conditioning system. The fuzzy models relate the performance of the terminal-boxes, the air-handling unit and the chiller to fuzzy descriptions of the cooling load, the supply air and chilled water temperature set-points, and the amount of air-side and water-side fouling. Results are presented that demonstrate the ability of the fuzzy models to predict the performance and show how the power consumption of the air-conditioning system varies with set-point changes and the presence of both water-side and air-side fouling. The main factors that determine the suitability of a particular air-conditioning system for fault-tolerant control are also discussed. (author)

  4. Fuxi: A fault-tolerant resource management and job scheduling system at internet scale

    Zhang, Z; C. Li; Tao, Y; Yang, R; H. Tang; Xu, J.

    2014-01-01

    Scalability and fault-tolerance are two fundamental challenges for all distributed computing at Internet scale. Despite many recent advances from both academia and industry, these two problems are still far from settled. In this paper, we present Fuxi, a resource management and job scheduling system that is capable of handling the kind of workload at Alibaba where hundreds of terabytes of data are generated and analyzed everyday to help optimize the company's business operations and user expe...

  5. A metaobject architecture for fault-tolerant distributed systems : the FRIENDS approach

    Fabre, Jean-Charles; Pérennou, Tanguy

    1998-01-01

    The FRIENDS system developed at LAAS-CNRS is a metalevel architecture providing libraries of metaobjects for fault tolerance, secure communication, and group-based distributed applications. The use of metaobjects provides a nice separation of concerns between mechanisms and applications. Metaobjects can be used transparently by applications and can be composed according to the needs of a given application, a given architecture, and its underlying properties. In FRIENDS, metaobjects are use...

  6. Energy/Reliability Trade-offs in Fault-Tolerant Event-Triggered Distributed Embedded Systems

    Gan, Junhe; Gruian, Flavius; Pop, Paul;

    2011-01-01

    This paper presents an approach to the synthesis of low-power fault-tolerant hard real-time applications mapped on distributed heterogeneous embedded systems. Our synthesis approach decides the mapping of tasks to processing elements, as well as the voltage and frequency levels for executing each...... with limited energy and hardware resources. We evaluated the algorithm proposed using several synthetic and reallife benchmarks....

  7. Advanced information processing system: The Army Fault-Tolerant Architecture detailed design overview

    Harper, Richard E.; Babikyan, Carol A.; Butler, Bryan P.; Clasen, Robert J.; Harris, Chris H.; Lala, Jaynarayan H.; Masotto, Thomas K.; Nagle, Gail A.; Prizant, Mark J.; Treadwell, Steven

    1994-01-01

    The Army Avionics Research and Development Activity (AVRADA) is pursuing programs that would enable effective and efficient management of large amounts of situational data that occurs during tactical rotorcraft missions. The Computer Aided Low Altitude Night Helicopter Flight Program has identified automated Terrain Following/Terrain Avoidance, Nap of the Earth (TF/TA, NOE) operation as key enabling technology for advanced tactical rotorcraft to enhance mission survivability and mission effectiveness. The processing of critical information at low altitudes with short reaction times is life-critical and mission-critical necessitating an ultra-reliable/high throughput computing platform for dependable service for flight control, fusion of sensor data, route planning, near-field/far-field navigation, and obstacle avoidance operations. To address these needs the Army Fault Tolerant Architecture (AFTA) is being designed and developed. This computer system is based upon the Fault Tolerant Parallel Processor (FTPP) developed by Charles Stark Draper Labs (CSDL). AFTA is hard real-time, Byzantine, fault-tolerant parallel processor which is programmed in the ADA language. This document describes the results of the Detailed Design (Phase 2 and 3 of a 3-year project) of the AFTA development. This document contains detailed descriptions of the program objectives, the TF/TA NOE application requirements, architecture, hardware design, operating systems design, systems performance measurements and analytical models.

  8. Fault-Tolerant Onboard Monitoring and Decision Support Systems

    Lajic, Zoran

    The purpose of this research project is to improve current onboard decision support systems. Special focus is on the onboard prediction of the instantaneous sea state. In this project a new approach to increasing the overall reliability of a monitoring and decision support system has been...

  9. Fault-Tolerant Scheduling for Real-Time Embedded Control Systems

    Chun-Hua Yang; Geert Deconinck; Wei-Hua Gui

    2004-01-01

    With the increasing complexity of industrial application, an embedded control system (ECS) requires processing a number of hard real-time tasks and needs fault-tolerance to assure high reliability. Considering the characteristics of real-time tasks in ECS, an integrated algorithm is proposed to schedule real-time tasks and to guarantee that all real-time tasks are completed before their deadlines even in the presence of faults. Based on the nonpreemptive critical-section protocol (NCSP), this paper analyzes the blocking time introduced by resource conflicts of relevancy tasks in fault-tolerant multiprocessor systems. An extended schedulability condition is presented to check the assignment feasibility of a given task to a processor. A primary/backup approach and on-line replacement of failed processors are used to tolerate processor failures. The analysis reveals that the integrated algorithm bounds the blocking time, requires limited overhead on the number of processors, and still assures good processor utilization. This is also demonstrated by simulation results. Both analysis and simulation show the effectiveness of the proposed algorithm in ECS.

  10. A Fault-Tolerant Modulation Method to Counteract the Double Open-Switch Fault in Matrix Converter Drive Systems without Redundant Power Devices

    Chen, Der-Fa; Nguyen-Duy, Khiem; Liu, Tian-Hua; Andersen, Michael A. E.

    This paper studies the double open-switch fault issue occurring within the conventional matrix converter driving a three-phase permanent-magnet synchronous motor system and proposes a fault-tolerant solution by introducing a revised modulation strategy. In this switching strategy, the rectifier...

  11. A Fault-Tolerant Modulation Method to Counteract the Double Open-Switch Fault in Matrix Converter Drive Systems without Redundant Power Devices

    Chen, Der-Fa; Nguyen-Duy, Khiem; Liu, Tian-Hua;

    2012-01-01

    This paper studies the double open-switch fault issue occurring within the conventional matrix converter driving a three-phase permanent-magnet synchronous motor system and proposes a fault-tolerant solution by introducing a revised modulation strategy. In this switching strategy, the rectifier-s...

  12. Optimal maintenance center inventories for fault-tolerant repairable systems

    Lawrence, S. H.; Schaefer, M. K.

    1984-01-01

    A probabilistic approach is taken to determine the optimal repairable parts inventory for a maintenance center, servicing machines which contain several m-out-of-n systems of different parts, with a constraint on the total inventory investment. A model, based on the discrete Markov process, accounts for a typical ultrareliable avionics system, such as one presently being developed by NASA. The dynamic programming algorithm for minimizing the stockout and holding costs is applied to an exemplary maintenance center, and solutions for single-item and multi-item cases are given. The computational burden is noted to be reasonable and a computer program is used to generate optimal solutions.

  13. Fault-tolerant Agreement in Synchronous Message-passing Systems

    Raynal, Michel

    2010-01-01

    The present book focuses on the way to cope with the uncertainty created by process failures (crash, omission failures and Byzantine behavior) in synchronous message-passing systems (i.e., systems whose progress is governed by the passage of time). To that end, the book considers fundamental problems that distributed synchronous processes have to solve. These fundamental problems concern agreement among processes (if processes are unable to agree in one way or another in presence of failures, no non-trivial problem can be solved). They are consensus, interactive consistency, k-set agreement an

  14. A Ship Propulsion System Model for Fault-tolerant Control

    Izadi-Zamanabadi, Roozbeh; Blanke, M.

    . The propulsion system model is presented in two versions: the first one consists of one engine and one propeller, and the othe one consists of two engines and their corresponding propellers placed in parallel in the ship. The corresponding programs are developed and are available....

  15. Fault tolerant PLC system using CPU and I/O redundancy with switch over logic system for nuclear instrumentation

    Nuclear instrumentation in power plants and fuel reprocessing plants demand very high reliable fault tolerant programmable logic controllers (PLC) since it is directly related to hazardous operation involving safety of plant, operator and in turn public at large. Components of control systems can fail depending on the circumstances and level of preparedness in plants, leading to a minor or major disaster. Utilizing existing technology and configuring system architecture, a fault tolerant PLC can provide superior solution to meet some of the challenges like high reliability, integrity and availability. This paper presents the background, concepts and implementation of fault tolerant PLC architecture using CPU and I/O redundancy with switch over logic system for nuclear instrumentation. (author)

  16. Task Migration for Fault-Tolerance in Mixed-Criticality Embedded Systems

    Saraswat, Prabhat Kumar; Pop, Paul; Madsen, Jan

    2009-01-01

    In this paper we are interested in mixed-criticality embedded applications implemented on distributed architectures. Depending on their time-criticality, tasks can be hard or soft real-time and regarding safety-criticality, tasks can be fault-tolerant to transient faults, permanent faults, or have...

  17. Fault tolerant computer control for a Maglev transportation system

    Lala, Jaynarayan H.; Nagle, Gail A.; Anagnostopoulos, George

    1994-05-01

    Magnetically levitated (Maglev) vehicles operating on dedicated guideways at speeds of 500 km/hr are an emerging transportation alternative to short-haul air and high-speed rail. They have the potential to offer a service significantly more dependable than air and with less operating cost than both air and high-speed rail. Maglev transportation derives these benefits by using magnetic forces to suspend a vehicle 8 to 200 mm above the guideway. Magnetic forces are also used for propulsion and guidance. The combination of high speed, short headways, stringent ride quality requirements, and a distributed offboard propulsion system necessitates high levels of automation for the Maglev control and operation. Very high levels of safety and availability will be required for the Maglev control system. This paper describes the mission scenario, functional requirements, and dependability and performance requirements of the Maglev command, control, and communications system. A distributed hierarchical architecture consisting of vehicle on-board computers, wayside zone computers, a central computer facility, and communication links between these entities was synthesized to meet the functional and dependability requirements on the maglev. Two variations of the basic architecture are described: the Smart Vehicle Architecture (SVA) and the Zone Control Architecture (ZCA). Preliminary dependability modeling results are also presented.

  18. Plan for the Characterization of HIRF Effects on a Fault-Tolerant Computer Communication System

    Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.; Koppen, Sandra V.

    2008-01-01

    This report presents the plan for the characterization of the effects of high intensity radiated fields on a prototype implementation of a fault-tolerant data communication system. Various configurations of the communication system will be tested. The prototype system is implemented using off-the-shelf devices. The system will be tested in a closed-loop configuration with extensive real-time monitoring. This test is intended to generate data suitable for the design of avionics health management systems, as well as redundancy management mechanisms and policies for robust distributed processing architectures.

  19. Fault Tolerant Neural Network for ECG Signal Classification Systems

    MERAH, M.

    2011-08-01

    Full Text Available The aim of this paper is to apply a new robust hardware Artificial Neural Network (ANN for ECG classification systems. This ANN includes a penalization criterion which makes the performances in terms of robustness. Specifically, in this method, the ANN weights are normalized using the auto-prune method. Simulations performed on the MIT ? BIH ECG signals, have shown that significant robustness improvements are obtained regarding potential hardware artificial neuron failures. Moreover, we show that the proposed design achieves better generalization performances, compared to the standard back-propagation algorithm.

  20. Synchronization of fault-tolerant parallel processing systems

    Harper, R.E.; Lala, J.H.

    1990-06-26

    This patent describes a system for synchronizing the operation of redundant processors forming a group thereof wherein a frame of operation is defined for each processor as a time period during which a selected number of specified processing events occurs. It comprises: means for performing a final one of the specified processing events for identifying the end of a current frame of operation; means responsive to the performing of the final one of the specified processing events by the last-to-perform of the processors for synchronizing the operations of the processors so that all of the processors can subsequently start their next frame of operation at substantially the same time; and each the processor further includes means responsive to the synchronizing means for performing a first one of the specified processing events to identify the start of the next frame of operation.

  1. Reliability model of fault-tolerant data processing system with primary and backup nodes

    Rahman, P. A.; Bobkova, E. Yu

    2016-04-01

    This paper deals with the fault-tolerant data processing systems, which are widely used in modern world of information technologies and have acceptable overhead expenses in hardware implementation. A simplified reliability model for duplex systems and the offered by authors advanced model for data processing systems with primary and backup nodes based on a three-state model of recoverable elements, which takes into consideration different failure rates of passive and active nodes and finite time of node activation, are also given. A calculation formula for the availability factor of the dual-node data processing system with primary and backup nodes and calculation examples are also provided.

  2. Rectifier Fault Diagnosis and Fault Tolerance of a Doubly Fed Brushless Starter Generator

    Liwei Shi; Zhou Bo

    2015-01-01

    This paper presents a rectifier fault diagnosis method with wavelet packet analysis to improve the fault tolerant four-phase doubly fed brushless starter generator (DFBLSG) system reliability. The system components and fault tolerant principle of the high reliable DFBLSG are given. And the common fault of the rectifier is analyzed. The process of wavelet packet transforms fault detection/identification algorithm is introduced in detail. The fault tolerant performance and output voltage experi...

  3. Fault-tolerant control of delta operator systems with actuator saturation and effectiveness loss

    Yang, Hongjiu; Zhang, Luyang; Zhao, Ling; Yuan, Yuan

    2016-07-01

    This paper studies the problem of robust fault-tolerant control against the actuator effectiveness loss for delta operator systems with actuator saturation. Ellipsoids are used to estimate the domain of attraction for the delta operator systems with actuator saturation and effectiveness loss. Some invariance set conditions used for enlarging the domain of attraction are expressed by linear matrix inequalities. Discussions on system performance optimisation are presented in this paper, including reduction on computational complexity, expansion of the domain of attraction and disturbance rejection. Two numerical examples are given to illustrate the effectiveness of the developed techniques.

  4. (m,n-Semirings and a Generalized Fault-Tolerance Algebra of Systems

    Syed Eqbal Alam

    2013-01-01

    Full Text Available We propose a new class of mathematical structures called (m,n-semirings (which generalize the usual semirings and describe their basic properties. We define partial ordering and generalize the concepts of congruence, homomorphism, and so forth, for (m,n-semirings. Following earlier work by Rao (2008, we consider systems made up of several components whose failures may cause them to fail and represent the set of such systems algebraically as an (m,n-semiring. Based on the characteristics of these components, we present a formalism to compare the fault-tolerance behavior of two systems using our framework of a partially ordered (m,n-semiring.

  5. MICROTHREAD BASED (MTB) COARSE GRAINED FAULT TOLERANCE SUPERSCALAR PROCESSOR ARCHITECTURE

    2006-01-01

    Fault tolerance in microprocessor systems has become a popular topic of architecture research.Much work has been done at different levels to accomplish reliability against soft errors, and some fault tolerance architectures have been proposed. But little attention is paid to the thread level superscalar fault tolerance.This letter introduces microthread concept into superscalar processor fault tolerance domain, and puts forward a novel fault tolerance architecture, namely, MicroThread Based (MTB) coarse grained transient fault tolerance superscalar processor architecture, then discusses some detailed implementations.

  6. Model prototype utilization in the analysis of fault tolerant control and data processing systems

    Kovalev, I. V.; Tsarev, R. Yu; Gruzenkin, D. V.; Prokopenko, A. V.; Knyazkov, A. N.; Laptenok, V. D.

    2016-04-01

    The procedure assessing the profit of control and data processing system implementation is presented in the paper. The reasonability of model prototype creation and analysis results from the implementing of the approach of fault tolerance provision through the inclusion of structural and software assessment redundancy. The developed procedure allows finding the best ratio between the development cost and the analysis of model prototype and earnings from the results of this utilization and information produced. The suggested approach has been illustrated by the model example of profit assessment and analysis of control and data processing system.

  7. Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

    Harper, Richard

    1989-01-01

    In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.

  8. Fault-tolerant parallel processor

    Harper, R.E.; Lala, J.H. (Charles Stark Draper Laboratory, Inc., Cambridge, MA (USA))

    1991-06-01

    This paper addresses issues central to the design and operation of an ultrareliable, Byzantine resilient parallel computer. Interprocessor connectivity requirements are met by treating connectivity as a resource that is shared among many processing elements, allowing flexibility in their configuration and reducing complexity. Redundant groups are synchronized solely by message transmissions and receptions, which aslo provide input data consistency and output voting. Reliability analysis results are presented that demonstrate the reduced failure probability of such a system. Performance analysis results are presented that quantify the temporal overhead involved in executing such fault-tolerance-specific operations. Empirical performance measurements of prototypes of the architecture are presented. 30 refs.

  9. Fault Tolerant Approach for Data Encryption and Digital Signature Based on ECC System

    ZENG Yong; MA Jian-feng

    2005-01-01

    An integrated fault tolerant approach for data encryption and digital signature based on elliptic curve cryptography is proposed. This approach allows the receiver to verify the sender's identity and can simultaneously deal with error detection and data correction. Up to three errors in our approach can be detected and corrected. This approach has at least the same security as that based on RSA system, but smaller keys to achieve the same level of security. Our approach is more efficient than the known ones and more suited for limited environments like personal digital assistants (PDAs), mobile phones and smart cards without RSA co- processors.

  10. Fault Tolerant Wind Farm Control

    Odgaard, Peter Fogh; Stoustrup, Jakob

    2013-01-01

    with best at a wind turbine control level. However, some faults are better dealt with at the wind farm control level, if the wind turbine is located in a wind farm. In this paper a benchmark model for fault detection and isolation, and fault tolerant control of wind turbines implemented at the wind...... farm control level is presented. The benchmark model includes a small wind farm of nine wind turbines, based on simple models of the wind turbines as well as the wind and interactions between wind turbines in the wind farm. The model includes wind and power references scenarios as well as three...... relevant fault scenarios. This benchmark model is used in an international competition dealing with Wind Farm fault detection and isolation and fault tolerant control....

  11. A Concept for fault tolerant controllers

    Niemann, Hans Henrik; Poulsen, Niels Kjølstad

    2009-01-01

    This paper describe a concept for fault tolerant controllers (FTC) based on the YJBK (after Youla, Jabr, Bongiorno and Kucera) parameterization. This controller architecture will allow to change the controller on-line in the case of faults in the system. In the described FTC concept, a safe mode...

  12. Reliability and coverage analysis of non-repairable fault-tolerant memory systems

    Cox, G. W.; Carroll, B. D.

    1976-01-01

    A method was developed for the construction of probabilistic state-space models for nonrepairable systems. Models were developed for several systems which achieved reliability improvement by means of error-coding, modularized sparing, massive replication and other fault-tolerant techniques. From the models developed, sets of reliability and coverage equations for the systems were developed. Comparative analyses of the systems were performed using these equation sets. In addition, the effects of varying subunit reliabilities on system reliability and coverage were described. The results of these analyses indicated that a significant gain in system reliability may be achieved by use of combinations of modularized sparing, error coding, and software error control. For sufficiently reliable system subunits, this gain may far exceed the reliability gain achieved by use of massive replication techniques, yet result in a considerable saving in system cost.

  13. State of the art on fault-tolerant real time distributed systems

    The integration of new computerized functions in power plant, and especially nuclear power plant, control and instrumentation systems implies more and more stringent requirements as to communication system reliability. For if an item of equipment, or even a computer program, can be validated and qualified, no formal qualification procedure is presently imposed on communication networks. This is certainly due to the relative immaturity of these networks, but also to their complexity. It is for this reason that, in the context of preparation for the future PWR 2000 standardized nuclear plants, it would seem appropriate to take a look at fault-tolerant communication systems. Since C and I type applications (in the control room) are divided between several computers and are required to contend with extremely severe time constraints, EDF has undertaken investigation of fault-tolerant, real time distributed systems. This paper summarized the state of the art in the field as it appears from discussion with computer manufacturers, academics and research workers on related projects. The results obtained were then used to determine trends as to ''promising'' solutions. The paper concludes with recommended study programs for the PCC department of EDF/R and DD for the next few years. (author), 9 figs., 10 refs., 2 annexes

  14. Model Checking a Byzantine-Fault-Tolerant Self-Stabilizing Protocol for Distributed Clock Synchronization Systems

    Malekpour, Mahyar R.

    2007-01-01

    This report presents the mechanical verification of a simplified model of a rapid Byzantine-fault-tolerant self-stabilizing protocol for distributed clock synchronization systems. This protocol does not rely on any assumptions about the initial state of the system. This protocol tolerates bursts of transient failures, and deterministically converges within a time bound that is a linear function of the self-stabilization period. A simplified model of the protocol is verified using the Symbolic Model Verifier (SMV) [SMV]. The system under study consists of 4 nodes, where at most one of the nodes is assumed to be Byzantine faulty. The model checking effort is focused on verifying correctness of the simplified model of the protocol in the presence of a permanent Byzantine fault as well as confirmation of claims of determinism and linear convergence with respect to the self-stabilization period. Although model checking results of the simplified model of the protocol confirm the theoretical predictions, these results do not necessarily confirm that the protocol solves the general case of this problem. Modeling challenges of the protocol and the system are addressed. A number of abstractions are utilized in order to reduce the state space. Also, additional innovative state space reduction techniques are introduced that can be used in future verification efforts applied to this and other protocols.

  15. Nonlinear, Adaptive and Fault-tolerant Control for Electro-hydraulic Servo Systems

    Choux, Martin

    Fluid power systems have been in use since 1795 with the rst hydraulic press patented by Joseph Bramah and today form the basis of many industries. Electro hydraulic servo systems are uid power systems controlled in closed-loop. They transform reference input signals into a set of movements in...... detected early and handled. Moreover, the task of controlling electro hydraulic systems for high performance operations is challenging due to the highly nonlinear behaviour of such systems and the large amount of uncertainties present in their models. This thesis focuses on nonlinear adaptive fault......-tolerant control for a representative electro hydraulic servo controlled motion system. The thesis extends existing models of hydraulic systems by considering more detailed dynamics in the servo valve and in the friction inside the hydraulic cylinder. It identies the model parameters using experimental data from a...

  16. Design Approach for Fault Recoverable ALU with Improved Fault Tolerance

    Ankit K V

    2015-08-01

    Full Text Available A new design for fault tolerant and fault recoverable ALU System has been proposed in this paper. Reliability is one of the most critical factors that have to be considered during the designing phase of any IC. In critical applications like Medical equipment & Military applications this reliability factor plays a very critical role in determining the acceptance of product. Insertion of special modules in the main design for reliability enhancement will give considerable amount of area & power penalty. So, a novel approach to this problem is to find ways for reusing the already available components in digital system in efficient way to implement recoverable methodologies. Triple Modular Redundancy (TMR has traditionally used for protecting digital logic from the SEUs (single event upset by triplicating the critical components of the system to give fault tolerance to system. ScTMR- Scan chain-based error recovery TMR technique provides recovery for all internal faults. ScTMR uses a roll-forward approach and employs the scan chain implemented in the circuits for testability purposes to recover the system to fault-free state. The proposed design will incorporate a ScTMR controller over TMR system of ALU and will make the system fault tolerant and fault recoverable. Hence, proposed design will be more efficient & reliable to use in critical applications, than any other design present till today.

  17. Multi-agent Platform and Toolbox for Fault Tolerant Networked Control Systems

    Mário J. G. C. Mendes

    2009-04-01

    Full Text Available Industrial distributed networked control systems use different communication networks to exchange different critical levels of information. Real-time control, fault diagnosis (FDI and Fault Tolerant Networked Control (FTNC systems demand one of the more stringent data exchange in the communication networks of these networked control systems (NCS. When dealing with large-scale complex NCS, designing FTNC systems is a very difficult task due to the large number of sensors and actuators spatially distributed and network connected. To solve this issue, a FTNC platform and toolbox are presented in this paper using simple and verifiable principles coming mainly from a decentralized design based on causal modelling partitioning of the NCS and distributed computing using multi-agent systems paradigm, allowing the use of agents with well established FTC methodologies or new ones developed taking into account the NCS specificities. The multi-agent platform and toolbox for FTNC systems have been built in Matlab/Simulink environment, which is in our days the scientific benchmark for this kind of research. Although the tests have been performed with a simple case, the results are promising and this approach is expected to succeed with more complex processes.

  18. Fault Tolerant External Memory Algorithms

    Jørgensen, Allan Grønlund; Brodal, Gerth Stølting; Mølhave, Thomas

    2009-01-01

    Algorithms dealing with massive data sets are usually designed for I/O-efficiency, often captured by the I/O model by Aggarwal and Vitter. Another aspect of dealing with massive data is how to deal with memory faults, e.g. captured by the adversary based faulty memory RAM by Finocchi and Italiano....... However, current fault tolerant algorithms do not scale beyond the internal memory. In this paper we investigate for the first time the connection between I/O-efficiency in the I/O model and fault tolerance in the faulty memory RAM, and we assume that both memory and disk are unreliable. We show a lower...... bound on the number of I/Os required for any deterministic dictionary that is resilient to memory faults. We design a static and a dynamic deterministic dictionary with optimal query performance as well as an optimal sorting algorithm and an optimal priority queue. Finally, we consider scenarios where...

  19. A Fault-Tolerant Emergency-Aware Access Control Scheme for Cyber-Physical Systems

    Wu, Guowei; Xia, Feng; Yao, Lin

    2012-01-01

    Access control is an issue of paramount importance in cyber-physical systems (CPS). In this paper, an access control scheme, namely FEAC, is presented for CPS. FEAC can not only provide the ability to control access to data in normal situations, but also adaptively assign emergency-role and permissions to specific subjects and inform subjects without explicit access requests to handle emergency situations in a proactive manner. In FEAC, emergency-group and emergency-dependency are introduced. Emergencies are processed in sequence within the group and in parallel among groups. A priority and dependency model called PD-AGM is used to select optimal response-action execution path aiming to eliminate all emergencies that occurred within the system. Fault-tolerant access control polices are used to address failure in emergency management. A case study of the hospital medical care application shows the effectiveness of FEAC.

  20. GRID COMPUTING AND FAULT TOLERANCE APPROACH

    Pankaj Gupta,

    2011-10-01

    Full Text Available Grid computing is a means of allocating the computational power of alarge number of computers to complex difficult computation orproblem. Grid computing is a distributed computing paradigm thatdiffers from traditional distributed computing in that it is aimed toward large scale systems that even span organizational boundaries. This paper proposes a method to achieve maximum fault tolerance in the Grid environment system by using Reliability consideration by using Replication approach and Check-point approach. Fault tolerance is an important property for large scale computational grid systems, where geographically distributed nodes co-operate to execute a task. In order to achieve high level of reliability and availability, the grid infrastructure should be a foolproof fault tolerant. Since the failure of resources affects job execution fatally, fault tolerance service is essential to satisfy QOS requirement in grid computing. Commonly utilized techniques for providing fault tolerance are job check pointing and replication. Both techniques mitigate the amount of work lost due to changing system availability but can introduce significant runtime overhead. The latter largely depends on the length of check pointing interval and the chosen number of replicas, respectively. In case of complex scientific workflows where tasks can execute in well defined order reliability is another biggest challenge because of the unreliable nature of the grid resources.

  1. Design Approach for Fault Recoverable ALU with Improved Fault Tolerance

    Ankit K V; S Murali Narasimham; Viajya Prakash A M

    2015-01-01

    A new design for fault tolerant and fault recoverable ALU System has been proposed in this paper. Reliability is one of the most critical factors that have to be considered during the designing phase of any IC. In critical applications like Medical equipment & Military applications this reliability factor plays a very critical role in determining the acceptance of product. Insertion of special modules in the main design for reliability enhancement will give considerable amount of ...

  2. Methods and apparatuses for self-generating fault-tolerant keys in spread-spectrum systems

    Moradi, Hussein; Farhang, Behrouz; Subramanian, Vijayarangam

    2015-12-15

    Self-generating fault-tolerant keys for use in spread-spectrum systems are disclosed. At a communication device, beacon signals are received from another communication device and impulse responses are determined from the beacon signals. The impulse responses are circularly shifted to place a largest sample at a predefined position. The impulse responses are converted to a set of frequency responses in a frequency domain. The frequency responses are shuffled with a predetermined shuffle scheme to develop a set of shuffled frequency responses. A set of phase differences is determined as a difference between an angle of the frequency response and an angle of the shuffled frequency response at each element of the corresponding sets. Each phase difference is quantized to develop a set of secret-key quantized phases and a set of spreading codes is developed wherein each spreading code includes a corresponding phase of the set of secret-key quantized phases.

  3. Fault tolerant operation of switched reluctance machine

    Wang, Wei

    The energy crisis and environmental challenges have driven industry towards more energy efficient solutions. With nearly 60% of electricity consumed by various electric machines in industry sector, advancement in the efficiency of the electric drive system is of vital importance. Adjustable speed drive system (ASDS) provides excellent speed regulation and dynamic performance as well as dramatically improved system efficiency compared with conventional motors without electronics drives. Industry has witnessed tremendous grow in ASDS applications not only as a driving force but also as an electric auxiliary system for replacing bulky and low efficiency auxiliary hydraulic and mechanical systems. With the vast penetration of ASDS, its fault tolerant operation capability is more widely recognized as an important feature of drive performance especially for aerospace, automotive applications and other industrial drive applications demanding high reliability. The Switched Reluctance Machine (SRM), a low cost, highly reliable electric machine with fault tolerant operation capability, has drawn substantial attention in the past three decades. Nevertheless, SRM is not free of fault. Certain faults such as converter faults, sensor faults, winding shorts, eccentricity and position sensor faults are commonly shared among all ASDS. In this dissertation, a thorough understanding of various faults and their influence on transient and steady state performance of SRM is developed via simulation and experimental study, providing necessary knowledge for fault detection and post fault management. Lumped parameter models are established for fast real time simulation and drive control. Based on the behavior of the faults, a fault detection scheme is developed for the purpose of fast and reliable fault diagnosis. In order to improve the SRM power and torque capacity under faults, the maximum torque per ampere excitation are conceptualized and validated through theoretical analysis and

  4. A Byzantine-Fault Tolerant Self-Stabilizing Protocol for Distributed Clock Synchronization Systems

    Malekpour, Mahyar R.

    2006-01-01

    Embedded distributed systems have become an integral part of safety-critical computing applications, necessitating system designs that incorporate fault tolerant clock synchronization in order to achieve ultra-reliable assurance levels. Many efficient clock synchronization protocols do not, however, address Byzantine failures, and most protocols that do tolerate Byzantine failures do not self-stabilize. Of the Byzantine self-stabilizing clock synchronization algorithms that exist in the literature, they are based on either unjustifiably strong assumptions about initial synchrony of the nodes or on the existence of a common pulse at the nodes. The Byzantine self-stabilizing clock synchronization protocol presented here does not rely on any assumptions about the initial state of the clocks. Furthermore, there is neither a central clock nor an externally generated pulse system. The proposed protocol converges deterministically, is scalable, and self-stabilizes in a short amount of time. The convergence time is linear with respect to the self-stabilization period. Proofs of the correctness of the protocol as well as the results of formal verification efforts are reported.

  5. Design of fault tolerant control system for individual blade control helicopters

    Tamayo, Sergio

    This dissertation presents the development of a fault tolerant control scheme for helicopters fitted with individually controlled blades. This novel approach attempts to improve fault tolerant capabilities of helicopter control system by increasing control redundancy using additional actuators for individual blade input and software re-mixing to obtain nominal or close to nominal conditions under failure. An advanced interactive simulation environment has been developed including modeling of sensor failure, swashplate actuator failure, individual blade actuator failure, and blade delamination to support the design, testing, and evaluation of the control laws. This simulation environment is based on the blade element theory for the calculation of forces and moments generated by the main rotor. This discretized model allows for individual blade analysis, which in turn allows measuring the consequences of a stuck blade, or loss of the surface area of the blade itself, with respect to the dynamics of the whole helicopter. The control laws are based on non-linear dynamic inversion and artificial neural network augmentation, which is a mix of linear and nonlinear methods that compensates for model inaccuracies due to linearization or failure. A stability analysis based on the Lyapunov function approach has shown that bounded tracking error is guaranteed, and under specific circumstances, global stability is guaranteed as well. An analysis over the degrees of freedom of the mechanical system and its impact over the helicopter handling qualities is also performed to measure the degree of redundancy achieved with the addition of individual blade actuators as compared to a classic swashplate helicopter configuration. Mathematical analysis and numerical simulation, using reconfiguration of the individual blade control under failure have shown that this control architecture can potentially improve the survivability of the aircraft and reduce pilot workload under failure

  6. Diagnosis and Fault-tolerant Control, 2nd edition

    Blanke, Mogens; Kinnaert, Michel; Lunze, Jan; Starosweicki, Marcel

    Fault-tolerant control aims at a graceful degradation of the behaviour of automated systems in case of faults. It satisfies the industrial demand for enhanced availability and safety, in contrast to traditional reactions to faults that bring about sudden shutdowns and loss of availability. The book...... presents effective model-based analysis and design methods for fault diagnosis and fault-tolerant control. Architectural and structural models are used to analyse the propagation of the fault throught the process, to test the fault detectability and to find the redundancies in the process that can be used...... to ensure fault tolerance. Design methods for diagnostic systems and fault-tolerant controllers are presented for processes that are described by analytical models, by discrete-event models or that can be dealt with as quantised systems. Five case studies on pilot processes show the applicability of...

  7. A Fault-tolerance Estimating Method for Ionosphere Corrections in Satellite Navigation System

    GAO Shuliang; LI Rui; HUANG Zhigang

    2011-01-01

    Aiming to the reliable estimates of the ionosphere differential corrections for the satellite navigation system in the presence of the ionosphere anomaly,a fault-tolerance estimating method,which is based on the distributed Kalman filtering,is proposed.The method utilizes the parallel sub-filters for estimating the ionosphere differential corrections.Meanwhile,an infinite norm (IN) method is proposed for the detection of the ionosphere irregularity in the filter processing.Once the anomaly is detected,the sub-filter contaminated by the anomaly measurements will be excluded to ensure the reliability of the estimates.The simulation is conducted to validate the method and the results indicate that the anomaly can be found timely due to the novel fault detection method based on the infinite norm.Because of the parallel sub-filter architecture,the measurements are classified by the spatial distribution so that the ionosphere anomaly can be positioned and excluded more easily.Thus,the method can provide the robust and accurate ionosphere differential corrections.

  8. A novel N-input voting algorithm for X-by-wire fault-tolerant systems.

    Karimi, Abbas; Zarafshan, Faraneh; Al-Haddad, S A R; Ramli, Abdul Rahman

    2014-01-01

    Voting is an important operation in multichannel computation paradigm and realization of ultrareliable and real-time control systems that arbitrates among the results of N redundant variants. These systems include N-modular redundant (NMR) hardware systems and diversely designed software systems based on N-version programming (NVP). Depending on the characteristics of the application and the type of selected voter, the voting algorithms can be implemented for either hardware or software systems. In this paper, a novel voting algorithm is introduced for real-time fault-tolerant control systems, appropriate for applications in which N is large. Then, its behavior has been software implemented in different scenarios of error-injection on the system inputs. The results of analyzed evaluations through plots and statistical computations have demonstrated that this novel algorithm does not have the limitations of some popular voting algorithms such as median and weighted; moreover, it is able to significantly increase the reliability and availability of the system in the best case to 2489.7% and 626.74%, respectively, and in the worst case to 3.84% and 1.55%, respectively. PMID:25386613

  9. A Novel N-Input Voting Algorithm for X-by-Wire Fault-Tolerant Systems

    Abbas Karimi

    2014-01-01

    Full Text Available Voting is an important operation in multichannel computation paradigm and realization of ultrareliable and real-time control systems that arbitrates among the results of N redundant variants. These systems include N-modular redundant (NMR hardware systems and diversely designed software systems based on N-version programming (NVP. Depending on the characteristics of the application and the type of selected voter, the voting algorithms can be implemented for either hardware or software systems. In this paper, a novel voting algorithm is introduced for real-time fault-tolerant control systems, appropriate for applications in which N is large. Then, its behavior has been software implemented in different scenarios of error-injection on the system inputs. The results of analyzed evaluations through plots and statistical computations have demonstrated that this novel algorithm does not have the limitations of some popular voting algorithms such as median and weighted; moreover, it is able to significantly increase the reliability and availability of the system in the best case to 2489.7% and 626.74%, respectively, and in the worst case to 3.84% and 1.55%, respectively.

  10. Coordinated Fault Tolerance for High-Performance Computing

    Dongarra, Jack; Bosilca, George; et al.

    2013-04-08

    Our work to meet our goal of end-to-end fault tolerance has focused on two areas: (1) improving fault tolerance in various software currently available and widely used throughout the HEC domain and (2) using fault information exchange and coordination to achieve holistic, systemwide fault tolerance and understanding how to design and implement interfaces for integrating fault tolerance features for multiple layers of the software stack—from the application, math libraries, and programming language runtime to other common system software such as jobs schedulers, resource managers, and monitoring tools.

  11. Modular, Fault-Tolerant Electronics Supporting Space Exploration Project

    National Aeronautics and Space Administration — Modern electronic systems tolerate only as many point failures as there are redundant system copies, using mere macro-scale redundancy. Fault Tolerant Electronics...

  12. Fault Tolerant Control: A Simultaneous Stabilization Result

    Stoustrup, Jakob; Blondel, V.D.

    2004-01-01

    This paper discusses the problem of designing fault tolerant compensators that stabilize a given system both in the nominal situation, as well as in the situation where one of the sensors or one of the actuators has failed. It is shown that such compensators always exist, provided that the system...

  13. Task Mapping and Bandwidth Reservation for Mixed Hard/Soft Fault-Tolerant Embedded Systems

    Saraswat, Prabhat Kumar; Pop, Paul; Madsen, Jan

    reserved for the servers determines the quality of service (QoS) for soft tasks. CBS enforces temporal isolation, such that soft task overruns do not affect the timing guarantees of hard tasks. Transient faults in hard tasks are tolerated using checkpointing with rollback recovery. We have proposed a Tabu...... Search-based approach for task mapping and CBS bandwidth reservation, such that the deadlines for the hard tasks are satisfied, even in the case of transient faults, and the QoS for the soft tasks is maximized. Researchers have used fixed execution time models, such as the worst-case execution times for...

  14. A design fix to supervisory control for fault-tolerant scheduling of real-time multiprocessor systems with aperiodic tasks

    Devaraj, Rajesh; Sarkar, Arnab; Biswas, Santosh

    2015-11-01

    In the article 'Supervisory control for fault-tolerant scheduling of real-time multiprocessor systems with aperiodic tasks', Park and Cho presented a systematic way of computing a largest fault-tolerant and schedulable language that provides information on whether the scheduler (i.e., supervisor) should accept or reject a newly arrived aperiodic task. The computation of such a language is mainly dependent on the task execution model presented in their paper. However, the task execution model is unable to capture the situation when the fault of a processor occurs even before the task has arrived. Consequently, a task execution model that does not capture this fact may possibly be assigned for execution on a faulty processor. This problem has been illustrated with an appropriate example. Then, the task execution model of Park and Cho has been modified to strengthen the requirement that none of the tasks are assigned for execution on a faulty processor.

  15. Towards fault-tolerant optimal control

    Chizeck, H. J.; Willsky, A. S.

    1979-01-01

    The paper considers the design of fault-tolerant controllers that may endow systems with dynamic reliability. Results for jump linear quadratic Gaussian control problems are extended to include random jump costs, trajectory discontinuities, and a simple case of non-Markovian mode transitions.

  16. Parallel fault-tolerant robot control

    Hamilton, D. L.; Bennett, J. K.; Walker, I. D.

    1992-01-01

    A shared memory multiprocessor architecture is used to develop a parallel fault-tolerant robot controller. Several versions of the robot controller are developed and compared. A robot simulation is also developed for control observation. Comparison of a serial version of the controller and a parallel version without fault tolerance showed the speedup possible with the coarse-grained parallelism currently employed. The performance degradation due to the addition of processor fault tolerance was demonstrated by comparison of these controllers with their fault-tolerant versions. Comparison of the more fault-tolerant controller with the lower-level fault-tolerant controller showed how varying the amount of redundant data affects performance. The results demonstrate the trade-off between speed performance and processor fault tolerance.

  17. Fault tolerance issues in nanoelectronics

    Spagocci, S.

    2008-01-01

    The astonishing success story of microelectronics cannot go on indefinitely. In fact, once devices reach the few-atom scale (nanoelectronics), transient quantum effects are expected to impair their behaviour. Fault tolerant techniques will then be required. The aim of this thesis is to investigate the problem of transient errors in nanoelectronic devices. Transient error rates for a selection of nanoelectronic gates, based upon quantum cellular automata and single electron devi...

  18. Architecture for Intrusion Detection System with Fault Tolerance Using Mobile Agent

    Chintan Bhatt; Asha Koshti; Hemant Agrawal; Zakiya Malek; Bhushan Trivedi

    2011-01-01

    This paper is a survey of the work, done for making an IDS fault tolerant.Architecture of IDS that usesmobile Agent provides higher scalability. Mobile Agent uses Platform for detecting Intrusions using filterAgent, co-relater agent, Interpreter agent and rule database. When server (IDS Monitor) goes down,other hosts based on priority takes Ownership. This architecture uses decentralized collection andanalysis for identifying Intrusion. Rule sets are fed based on user-behaviour or application...

  19. Low-Cost Fault Tolerant Methodology for Real Time MPSoC Based Embedded System

    2014-01-01

    We are proposing a design methodology for a fault tolerant homogeneous MPSoC having additional design objectives that include low hardware overhead and performance. We have implemented three different FT methodologies on MPSoCs and compared them against the defined constraints. The comparison of these FT methodologies is carried out by modelling their architectures in VHDL-RTL, on Spartan 3 FPGA. The results obtained through simulations helped us to identify the most relevant scheme in terms ...

  20. Fault tolerant control schemes using integral sliding modes

    Hamayun, Mirza Tariq; Alwi, Halim

    2016-01-01

    The key attribute of a Fault Tolerant Control (FTC) system is its ability to maintain overall system stability and acceptable performance in the face of faults and failures within the feedback system. In this book Integral Sliding Mode (ISM) Control Allocation (CA) schemes for FTC are described, which have the potential to maintain close to nominal fault-free performance (for the entire system response), in the face of actuator faults and even complete failures of certain actuators. Broadly an ISM controller based around a model of the plant with the aim of creating a nonlinear fault tolerant feedback controller whose closed-loop performance is established during the design process. The second approach involves retro-fitting an ISM scheme to an existing feedback controller to introduce fault tolerance. This may be advantageous from an industrial perspective, because fault tolerance can be introduced without changing the existing control loops. A high fidelity benchmark model of a large transport aircraft is u...

  1. Mission reliability analysis of fault-tolerant multiple-phased systems

    Fault-tolerant multiple-phased systems (FTMPS) are defined as systems whose critical components are independently replicated and whose operational life can be partitioned into a set of disjoint periods, called 'phases'. Because of their deployment in critical applications, their mission reliability analysis is a task of primary relevance to validate the designs. This paper is focused on the reliability analysis of FTMPS with random phase durations, non-exponentially distributed repair activities and different repair policies. For self-repairable FTMPS with a component-level reconfiguration architecture, we derive several efficient formulations from the underlying structure characteristics for their intraphase behavior analysis. We also present a uniform solution framework of the mission reliability for FTMPS with generally distributed phase durations. Compared with existing methods based on deterministic and stochastic Petri nets or Markov regenerative stochastic Petri nets, our approach is more simple in concept and powerful in computation. Two examples of FTMPS are analyzed to illustrate the advantages of our approach

  2. Mission reliability analysis of fault-tolerant multiple-phased systems

    Mo Yuchang [Harbin Institute of Technology, Harbin, Heilongjiang 150001 (China)], E-mail: myc@ftcl.hit.edu.cn; Siewiorek, Daniel [Department of Computer Science and Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213 (United States); Yang Xiaozong [Harbin Institute of Technology, Harbin, Heilongjiang 150001 (China)

    2008-07-15

    Fault-tolerant multiple-phased systems (FTMPS) are defined as systems whose critical components are independently replicated and whose operational life can be partitioned into a set of disjoint periods, called 'phases'. Because of their deployment in critical applications, their mission reliability analysis is a task of primary relevance to validate the designs. This paper is focused on the reliability analysis of FTMPS with random phase durations, non-exponentially distributed repair activities and different repair policies. For self-repairable FTMPS with a component-level reconfiguration architecture, we derive several efficient formulations from the underlying structure characteristics for their intraphase behavior analysis. We also present a uniform solution framework of the mission reliability for FTMPS with generally distributed phase durations. Compared with existing methods based on deterministic and stochastic Petri nets or Markov regenerative stochastic Petri nets, our approach is more simple in concept and powerful in computation. Two examples of FTMPS are analyzed to illustrate the advantages of our approach.

  3. Fault-Tolerant, Real-Time, Multi-Core Computer System

    Gostelow, Kim P.

    2012-01-01

    A document discusses a fault-tolerant, self-aware, low-power, multi-core computer for space missions with thousands of simple cores, achieving speed through concurrency. The proposed machine decides how to achieve concurrency in real time, rather than depending on programmers. The driving features of the system are simple hardware that is modular in the extreme, with no shared memory, and software with significant runtime reorganizing capability. The document describes a mechanism for moving ongoing computations and data that is based on a functional model of execution. Because there is no shared memory, the processor connects to its neighbors through a high-speed data link. Messages are sent to a neighbor switch, which in turn forwards that message on to its neighbor until reaching the intended destination. Except for the neighbor connections, processors are isolated and independent of each other. The processors on the periphery also connect chip-to-chip, thus building up a large processor net. There is no particular topology to the larger net, as a function at each processor allows it to forward a message in the correct direction. Some chip-to-chip connections are not necessarily nearest neighbors, providing short cuts for some of the longer physical distances. The peripheral processors also provide the connections to sensors, actuators, radios, science instruments, and other devices with which the computer system interacts.

  4. Synthesis of Fault-Tolerant Schedules with Transparency/Performance Trade-offs for Distributed Embedded Systems

    Izosimov, Viacheslav; Pop, Paul; Eles, Petru; Peng, Zebo

    application. We propose a novel algorithm for the synthesis of fault-tolerant schedules that can handle the transparency/performance trade-offs imposed by the designer, and makes use of the fault-occurrence information to reduce the overhead due to fault tolerance. We model the application as a conditional...... process graph, where the fault occurrence information is represented as conditional edges and the transparent recovery is captured using synchronization nodes....

  5. Validation Methods Research for Fault-Tolerant Avionics and Control Systems Sub-Working Group Meeting. CARE 3 peer review

    Trivedi, K. S. (Editor); Clary, J. B. (Editor)

    1980-01-01

    A computer aided reliability estimation procedure (CARE 3), developed to model the behavior of ultrareliable systems required by flight-critical avionics and control systems, is evaluated. The mathematical models, numerical method, and fault-tolerant architecture modeling requirements are examined, and the testing and characterization procedures are discussed. Recommendations aimed at enhancing CARE 3 are presented; in particular, the need for a better exposition of the method and the user interface is emphasized.

  6. Abnormal fault-recovery characteristics of the fault-tolerant multiprocessor uncovered using a new fault-injection methodology

    Padilla, Peter A.

    1991-03-01

    An investigation was made in AIRLAB of the fault handling performance of the Fault Tolerant MultiProcessor (FTMP). Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once in every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles Byzantine or lying faults. Byzantine faults behave such that the faulted unit points to a working unit as the source of errors. The design's problems involve: (1) the design and interface between the simplex error detection hardware and the error processing software, (2) the functional capabilities of the FTMP system bus, and (3) the communication requirements of a multiprocessor architecture. These weak areas in the FTMP's design increase the probability that, for any hardware fault, a good line replacement unit (LRU) is mistakenly disabled by the fault management software.

  7. Focused fault injection testing of software implemented fault tolerance mechanisms of Voltan TMR nodes

    Tao, S.; Ezhilchelvan, P. D.; Shrivastava, S. K.

    1995-03-01

    One way of gaining confidence in the adequacy of fault tolerance mechanisms of a system is to test the system by injecting faults and see how the system performs under faulty conditions. This paper presents an application of the focused fault injection method that has been developed for testing software implemented fault tolerance mechanisms of distributed systems. The method exploits the object oriented approach of software implementation to support the injection of specific classes of faults. With the focused fault injection method, the system tester is able to inject specific classes of faults (including malicious ones) such that the fault tolerance mechanisms of a target system can be tested adequately. The method has been applied to test the design and implementation of voting, clock synchronization, and ordering modules of the Voltan TMR (triple modular redundant) node. The tests performed uncovered three flaws in the system software.

  8. System Wide Joint Position Sensor Fault Tolerance in Robot Systems Using Cartesian Accelerometers

    Aldridge, Hal A.; Juang, Jer-Nan

    1997-01-01

    Joint position sensors are necessary for most robot control systems. A single position sensor failure in a normal robot system can greatly degrade performance. This paper presents a method to obtain position information from Cartesian accelerometers without integration. Depending on the number and location of the accelerometers. the proposed system can tolerate the loss of multiple position sensors. A solution technique suitable for real-time implementation is presented. Simulations were conducted using 5 triaxial accelerometers to recover from the loss of up to 4 joint position sensors on a 7 degree of freedom robot moving in general three dimensional space. The simulations show good estimation performance using non-ideal accelerometer measurements.

  9. Microcontroller-Based Fault Tolerant Data Acquisition System For Air Quality Monitoring And Control Of Environmental Pollution

    Tochukwu Chiagunye; Eze Aru Okereke; Ilo Somtoochukwu

    2015-01-01

    ABSTRACT The design applied Passive fault tolerance to a microcontroller based data acquisition system to achieve the stated considerations where redundant sensors and microcontrollers with associated circuitry were designed and implemented to enable measurement of pollutant concentration information from chimney vents in two industry. Microsoft visual basic was used to develop a data mining tool which implemented an underlying artificial neural network model for forecasting pollutant concent...

  10. Thermoelectric-Driven Sustainable Sensing and Actuation Systems for Fault-Tolerant Nuclear Incidents

    Longtin, Jon [Stony Brook Univ., NY (United States)

    2016-02-08

    safety systems, etc. Such an approach is intrinsically fault tolerant: in the event that system temperatures increase, the amount of available energy will increase, which will make more power available for applications. The system can also be used during normal conditions to provide enhanced monitoring of key system components.

  11. Thermoelectric-Driven Sustainable Sensing and Actuation Systems for Fault-Tolerant Nuclear Incidents

    safety systems, etc. Such an approach is intrinsically fault tolerant: in the event that system temperatures increase, the amount of available energy will increase, which will make more power available for applications. The system can also be used during normal conditions to provide enhanced monitoring of key system components.

  12. Robust Fault-Tolerant Control for Uncertain Networked Control Systems with State-Delay and Random Data Packet Dropout

    Xiaomei Qi

    2012-01-01

    Full Text Available A robust fault-tolerant controller design problem for networked control system (NCS with random packet dropout in both sensor-to-controller link and controller-to-actuator link is investigated. A novel stochastic NCS model with state-delay, model uncertainty, disturbance, probabilistic sensor failure, and actuator failure is proposed. The random packet dropout, sensor failures, and actuator failures are characterized by a binary random variable. The sufficient condition for asymptotical mean-square stability of NCS is derived and the closed-loop NCS satisfies H∞ performance constraints caused by the random packet dropout and disturbance. The fault-tolerant controller is designed by solving a linear matrix inequality. A numerical example is presented to illustrate the effectiveness of the proposed method.

  13. Low-Cost Fault Tolerant Methodology for Real Time MPSoC Based Embedded System

    Mohsin Amin

    2014-01-01

    Full Text Available We are proposing a design methodology for a fault tolerant homogeneous MPSoC having additional design objectives that include low hardware overhead and performance. We have implemented three different FT methodologies on MPSoCs and compared them against the defined constraints. The comparison of these FT methodologies is carried out by modelling their architectures in VHDL-RTL, on Spartan 3 FPGA. The results obtained through simulations helped us to identify the most relevant scheme in terms of the given design constraints.

  14. Improving Information Security by Implementing Fault Tolerance Concepts

    Aurel Serb

    2014-01-01

    Security issues are complex, and the risks of cyber crime are often difficult to stipulate, even for experts. The issues presented in this article try to be formed in a contribution to the consolidation of problems in the field of computer arhitecture cyber security. Fault tolerance is the best guarantee that high-confidence systems will not succumb to physical, design, or human-machine interaction faults. A fault tolerant system is one that can continue to operate reliably by producing accep...

  15. Evaluation of Simple Causal Message Logging for Large-Scale Fault Tolerant HPC Systems

    Bronevetsky, G; Meneses, E; Kale, L V

    2011-02-25

    The era of petascale computing brought machines with hundreds of thousands of processors. The next generation of exascale supercomputers will make available clusters with millions of processors. In those machines, mean time between failures will range from a few minutes to few tens of minutes, making the crash of a processor the common case, instead of a rarity. Parallel applications running on those large machines will need to simultaneously survive crashes and maintain high productivity. To achieve that, fault tolerance techniques will have to go beyond checkpoint/restart, which requires all processors to roll back in case of a failure. Incorporating some form of message logging will provide a framework where only a subset of processors are rolled back after a crash. In this paper, we discuss why a simple causal message logging protocol seems a promising alternative to provide fault tolerance in large supercomputers. As opposed to pessimistic message logging, it has low latency overhead, especially in collective communication operations. Besides, it saves messages when more than one thread is running per processor. Finally, we demonstrate that a simple causal message logging protocol has a faster recovery and a low performance penalty when compared to checkpoint/restart. Running NAS Parallel Benchmarks (CG, MG and BT) on 1024 processors, simple causal message logging has a latency overhead below 5%.

  16. Fault Tolerant Ethernet Based Network for Time Sensitive Applications in Electrical Power Distribution Systems

    Leos Bohac

    2013-01-01

    Full Text Available The paper analyses and experimentally verifies deployment of Ethernet based network technology to enable fault tolerant and timely exchange of data among a number of high voltage protective relays that use proprietary serial communication line to exchange data in real time on a state of its high voltage circuitry facilitating a fast protection switching in case of critical failures. The digital serial signal is first fetched into PCM multiplexer where it is mapped to the corresponding E1 (2 Mbit/s time division multiplexed signal. Subsequently, the resulting E1 frames are then packetized and sent through Ethernet control LAN to the opposite PCM demultiplexer where the same but reverse processing is done finally sending a signal into the opposite protective relay. The challenge of this setup is to assure very timely delivery of the control information between protective relays even in the cases of potential failures of Ethernet network itself. The tolerance of Ethernet network to faults is assured using widespread per VLAN Rapid Spanning Tree Protocol potentially extended by 1+1 PCM protection as a valuable option.

  17. A Piecewise Affine Hybrid Systems Approach to Fault Tolerant Satellite Formation Control

    Grunnet, Jacob Deleuran; Larsen, Jesper Abildgaard; Bak, Thomas;

    2008-01-01

    In this paper a procedure for modelling satellite formations   including failure dynamics as a piecewise-affine hybrid system is   shown. The formulation enables recently developed methods and tools   for control and analysis of piecewise-affine systems to be applied   leading to synthesis of fault...

  18. Measures of Fault Tolerance in Distributed Simulated Annealing

    Prakash, Aaditya

    2012-01-01

    In this paper, we examine the different measures of Fault Tolerance in a Distributed Simulated Annealing process. Optimization by Simulated Annealing on a distributed system is prone to various sources of failure. We analyse simulated annealing algorithm, its architecture in distributed platform and potential sources of failures. We examine the behaviour of tolerant distributed system for optimization task. We present possible methods to overcome the failures and achieve fault tolerance for t...

  19. Fault-Tolerant Process Control Methods and Applications

    Mhaskar, Prashant; Christofides, Panagiotis D

    2013-01-01

    Fault-Tolerant Process Control focuses on the development of general, yet practical, methods for the design of advanced fault-tolerant control systems; these ensure an efficient fault detection and a timely response to enhance fault recovery, prevent faults from propagating or developing into total failures, and reduce the risk of safety hazards. To this end, methods are presented for the design of advanced fault-tolerant control systems for chemical processes which explicitly deal with actuator/controller failures and sensor faults and data losses. Specifically, the book puts forward: ·         a framework for  detection, isolation and diagnosis of actuator and sensor faults for nonlinear systems; ·         controller reconfiguration and safe-parking-based fault-handling methodologies; ·         integrated-data- and model-based fault-detection and isolation and fault-tolerant control methods; ·         methods for handling sensor faults and data losses; and ·      ...

  20. Enhancement of Fault Tolerance in Cloud Computing

    Pushpanjali Gupta; Rasmi Ranjan Patra

    2014-01-01

    In recent years researchers are trying to work out scientific applications in cloud so that it decreases the infrastructure cost and increases the span of team and finally innovative ideas towards applications is increased. But the cloud is still not as much reliable, controllable as grid. So in the evolving Cloud computing environment there is a great need of fault tolerance mechanism for the system to work effectively even in the presence of failure. Moreover Big Organizations ar...

  1. Designing fault-tolerant real-time computer systems with diversified bus architecture for nuclear power plants

    Fault-tolerant real-time computer (FT-RTC) systems are widely used to perform safe operation of nuclear power plants (NPP) and safe shutdown in the event of any untoward situation. Design requirements for such systems need high reliability, availability, computational ability for measurement via sensors, control action via actuators, data communication and human interface via keyboard or display. All these attributes of FT-RTC systems are required to be implemented using best known methods such as redundant system design using diversified bus architecture to avoid common cause failure, fail-safe design to avoid unsafe failure and diagnostic features to validate system operation. In this context, the system designer must select efficient as well as highly reliable diversified bus architecture in order to realize fault-tolerant system design. This paper presents a comparative study between CompactPCI bus and Versa Module Eurocard (VME) bus architecture for designing FT-RTC systems with switch over logic system (SOLS) for NPP. (author)

  2. A Dynamic Slack Management Technique for Real-Time Distributed Embedded System with Enhanced Fault Tolerance and Resource Constraints

    Santhi Baskaran,

    2011-01-01

    Full Text Available This project work aims to develop a dynamic slack management technique, for real-time distributed embedded systems to reduce the total energy consumption in addition to timing, precedence and resource constraints. The Slack Distribution Technique proposed considers a modified Feedback Control Scheduling (FCS algorithm. This algorithm schedules dependent tasks effectively with precedence and resource constraints. It further minimizes the schedule length and utilizes the available slack to increase the energy efficiency. A fault tolerant mechanism uses a deferred-active-backup scheme increases the schedulability and provides reliability to the system.

  3. Fault-tolerant architectures for superconducting qubits

    In this short review, I draw attention to new developments in the theory of fault tolerance in quantum computation that may give concrete direction to future work in the development of superconducting qubit systems. The basics of quantum error-correction codes, which I will briefly review, have not significantly changed since their introduction 15 years ago. But an interesting picture has emerged of an efficient use of these codes that may put fault-tolerant operation within reach. It is now understood that two-dimensional surface codes, close relatives of the original toric code of Kitaev, can be adapted as shown by Raussendorf and Harrington to effectively perform logical gate operations in a very simple planar architecture, with error thresholds for fault-tolerant operation simulated to be 0.75%. This architecture uses topological ideas in its functioning, but it is not 'topological quantum computation'-there are no non-abelian anyons in sight. I offer some speculations on the crucial pieces of superconducting hardware that could be demonstrated in the next couple of years that would be clear stepping stones towards this surface-code architecture.

  4. Massive Sensor Array Fault Tolerance: Tolerance Mechanism and Fault Injection for Validation

    Dugan Um

    2010-01-01

    Full Text Available As today's machines become increasingly complex in order to handle intricate tasks, the number of sensors must increase for intelligent operations. Given the large number of sensors, detecting, isolating, and then tolerating faulty sensors is especially important. In this paper, we propose fault tolerance architecture suitable for a massive sensor array often found in highly advanced systems such as autonomous robots. One example is the sensitive skin, a type of massive sensor array. The objective of the sensitive skin is autonomous guidance of machines in unknown environments, requiring elongated operations in a remote site. The entirety of such a system needs to be able to work remotely without human attendance for an extended period of time. To that end, we propose a fault-tolerant architecture whereby component and analytical redundancies are integrated cohesively for effective failure tolerance of a massive array type sensor or sensor system. In addition, we discuss the evaluation results of the proposed tolerance scheme by means of fault injection and validation analysis as a measure of system reliability and performance.

  5. Multi-Fault Tolerance ofAircraft Control Surface Based on Fault-Tolerant Multi-Agent System%基于容错多智能体系统的飞机舵面多故障容错

    袁侃; 胡寿松

    2011-01-01

    To solve the problem of multi-fault diagnosis of complex systems, the basic concept of MultiAgent Systems (MAS) is expanded to establish the concept of Fault-Tolerant Multi-Agent System (FATMAS). Based on which the definitions of various functions, rules and flows for realizing multi-fault tolerance in FATMAS are given. Then the multiple faults can be detected by calling different functions of the system according to some definite flows, and the system can recover from the faults by copying the tasks of non-critical agents. The results of simulation in the F-16 aircraft multi-fault diagnosis and self-repairing illustrated that this method can effectively reduce the number of replicated agents through clear inter-agent message passing mechanism, and it is relatively close to the process of dealing with failures in actual system,therefore it can be used in complex system for multi-fault diagnosis and tolerance.%为解决复杂系统的多故障容错问题,首先将多智能体系统(MAS)的基本概念进行了扩充,定义了容错多智能体系统(FATMAS)的相关概念,并在此基础上提出了在FATMAS中实现多故障容错的各类函数定义、规则与流程.然后按照一定的流程调用不同函数就可对系统中的多故障进行检测,并可通过复制非关键智能体的任务使系统从多故障中恢复运行.F-16飞机多故障诊断与自修复的仿真结果说明,该方法能够通过清晰的智能体间消息传递机制有效地减少系统中复制智能体的数目,而且比较接近于实际系统中的故障处理过程,可用于复杂系统的多故障诊断与容错.

  6. Guaranteed Cost Active Fault-tolerant Control of Networked Control System with Packet Dropout and Transmission Delay

    Xiao-Yuan Luo; Mei-Jie Shang; Cai-Lian Chen; Xin-Ping Guan

    2010-01-01

    The problem of guaranteed cost active fault-tolerant controller (AFTC) design for networked control systems (NCSs)with both packet dropout and transmission delay is studied in this paper.Considering the packet dropout and transmission delay,a piecewise constant controller is adopted.With a guaranteed cost function,optimal controllers whose number is equal to the number of actuators are designed,and the design process is formulated as a convex optimal problem that can be solved by existing software.The control strategy is proposed as follows:when actuator failures appear,the fault detection and isolation unit sends out the information to the controller choosing strategy,and then the optimal stabilizing controller with the smallest guaranteed cost value is chosen.Two illustrative examples are given to demonstrate the effectiveness of the proposed approach.By comparing with the existing methods,it can be seen that our method has a better performance.

  7. Microcontroller-Based Fault Tolerant Data Acquisition System For Air Quality Monitoring And Control Of Environmental Pollution

    Tochukwu Chiagunye

    2015-08-01

    Full Text Available ABSTRACT The design applied Passive fault tolerance to a microcontroller based data acquisition system to achieve the stated considerations where redundant sensors and microcontrollers with associated circuitry were designed and implemented to enable measurement of pollutant concentration information from chimney vents in two industry. Microsoft visual basic was used to develop a data mining tool which implemented an underlying artificial neural network model for forecasting pollutant concentrations for future time periods. The feed forward back propagation method was used to train the ANN model with a training data set while a decision tree algorithm was used to select an optimal output result for the model from its two output neurons.

  8. Fault Tolerance in Control Architectures for Mobile Robots: Fantasy or Reality?

    Crestani, Didier; Godary-Dejean, Karen

    2012-01-01

    Due to the future development of robotic autonomous systems in human environment, the fault tolerance paradigm will be a central issue in robotics. This article presents a survey of fault tolerance concepts, means and implementations in robotic architectures.

  9. Fault-Tolerant Precision Formation Guidance for Interferometry Project

    National Aeronautics and Space Administration — A methodology is to be developed that will allow the development and implementation of fault-tolerant control system for distributed collaborative spacecraft. The...

  10. Fault tolerance and reliability in integrated ship control

    Nielsen, Jens Frederik Dalsgaard; Izadi-Zamanabadi, Roozbeh; Schiøler, Henrik

    2002-01-01

    Various strategies for achieving fault tolerance in large scale control systems are discussed. The positive and negative impacts of distribution through network communication are presented. The ATOMOS framework for standardized reliable marine automation is presented along with the corresponding...

  11. Fault-tolerant logics for FPGA linux

    The increasing use of SRAM-based reconfigurable architectures at important areas of research and development (like particle accelerators and space applications) brings new, currently partially unattended effects on top. An already well known, but nevertheless important problem of such systems is its susceptibility to radiation which increases in conjunction with particle flux and energy. Regarding to current knowledge, errors induced by Single Event Upsets (SEU) and Single Event Transients (SET) are handled exclusively in hardware by the use of spacial and temporal redundancy features. Our field of research is to extend conventional fault tolerance to multiple layers of embedded computer systems, starting with the FPGA bit layer and ending up in the software application layer to get a maximum of radiation tolerance in systems running FPGA Linux in radiation susceptible environments. Only a collaboration of all these layers is able to create an adequate amount of data security and process integrity.

  12. Extensions to the Parallel Real-Time Artificial Intelligence System (PRAIS) for fault-tolerant heterogeneous cycle-stealing reasoning

    Goldstein, David

    1991-01-01

    Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.

  13. A SAFE approach towards early design space exploration of Fault-tolerant multimedia MPSoCs

    P. van Stralen; A. Pimentel

    2012-01-01

    With the reduction in feature size, transient errors start to play an important role in modern embedded systems. It is therefore important to make fault-tolerance a first-class citizen in embedded system design. Fault-tolerance patterns are techniques to make an application fault-tolerant. Not only

  14. Diagnosis and Fault-tolerant Control

    Blanke, Mogens; Kinnaert, Michel; Lunze, Jan; Staroswiecki, Marcel

    The book presents effective model-based analysis and design methods for fault diagnosis and fault-tolerant control. Architectural and structural models are used to analyse the propagation of the fault through the process, to test the fault detectability and to find the redundancies in the process...... applicability of the presented methods. The theoretical results are illustrated by two running examples which are used throughout the book. The book addresses engineering students, engineers in industry and researchers who wish to get a survey over the variety of approaches to process diagnosis and fault...

  15. Software reliability through fault-avoidance and fault-tolerance

    Vouk, Mladen A.; Mcallister, David F.

    1993-01-01

    Strategies and tools for the testing, risk assessment and risk control of dependable software-based systems were developed. Part of this project consists of studies to enable the transfer of technology to industry, for example the risk management techniques for safety-concious systems. Theoretical investigations of Boolean and Relational Operator (BRO) testing strategy were conducted for condition-based testing. The Basic Graph Generation and Analysis tool (BGG) was extended to fully incorporate several variants of the BRO metric. Single- and multi-phase risk, coverage and time-based models are being developed to provide additional theoretical and empirical basis for estimation of the reliability and availability of large, highly dependable software. A model for software process and risk management was developed. The use of cause-effect graphing for software specification and validation was investigated. Lastly, advanced software fault-tolerance models were studied to provide alternatives and improvements in situations where simple software fault-tolerance strategies break down.

  16. Electrical Steering of Vehicles - Fault-tolerant Analysis and Design

    Blanke, Mogens; Thomsen, Jesper Sandberg

    2006-01-01

    The topic of this paper is systems that need be designed such that no single fault can cause failure at the overall level. A methodology is presented for analysis and design of fault-tolerant architectures, where diagnosis and autonomous reconfiguration can replace high cost triple redundancy...

  17. A Proactive Fault Tolerant Strategy for Desktop Grid

    Geeta Arora; Dr. Shaveta Rani; Dr. Paramjit Singh

    2015-01-01

    Desktop Grid resources are volatile, heterogeneous and geographically distributed in nature, so scheduling and fault tolerance become the important challenges for Desktop grid systems. In this paper a proactive fault tolerant strategy is developed which also considers the unwanted delays in association with speed and load of resources in the Grid while scheduling jobs on Grid resources. Simulation experiments are conducted by using GridSim toolkit 5.2. The experimental results obtained from a...

  18. Incorporating Fault Tolerance Tactics in Software Architecture Patterns

    Harrison, Neil B.; Avgeriou, Paris

    2008-01-01

    One important way that an architecture impacts fault tolerance is by making it easy or hard to implement measures that improve fault tolerance. Many such measures are described as fault tolerance tactics. We studied how various fault tolerance tactics can be implemented in the best-known architectur

  19. Object Replication and CORBA Fault-Tolerant Object Service

    2001-01-01

    CORBA (Common Object Request Broker Arc hitecture) provides 16Common Object Services for distributed application develo pment, but none of them are fault-tolerance related services. In this paper, we propose a replicated object based Fault-Tolerant Object Service (FTOS) for COR BA environment. Two fault-tolerant mechanisms are provided in FTOS including dy namic voting mechanism and object replication mechanism. The dynamic voting mech anism uses majority-voting strategy to ensure object state consistency in failu re situations. The object replication mechanism can help system administrators t o replicate and start-up objects easily. Our implementation provides a library according to the style of COSS. With this library, programmers can develop distr ibuted applications with fault-tolerance capability very easily.

  20. Byzantine-fault tolerant self-stabilizing protocol for distributed clock synchronization systems

    Malekpour, Mahyar R. (Inventor)

    2010-01-01

    A rapid Byzantine self-stabilizing clock synchronization protocol that self-stabilizes from any state, tolerates bursts of transient failures, and deterministically converges within a linear convergence time with respect to the self-stabilization period. Upon self-stabilization, all good clocks proceed synchronously. The Byzantine self-stabilizing clock synchronization protocol does not rely on any assumptions about the initial state of the clocks. Furthermore, there is neither a central clock nor an externally generated pulse system. The protocol converges deterministically, is scalable, and self-stabilizes in a short amount of time. The convergence time is linear with respect to the self-stabilization period.

  1. The New Fault Tolerant Onboard Computer for Microsatellite Missions

    2006-01-01

    This paper describes an onboard computer with dual processing modules. Each processing module is composed of 32 bit ARM reduced instruction set computer processor and other commercial-off-the-shelf devices. A set of fault handling mechanisms is implemented in the computer system, which enables the system to tolerate a single fault. The onboard software is organized around a set of processes that communicate among each other through a routing process. Meeting an extremely tight set of constraints that include mass, volume, power consumption and space environmental conditions, the fault-tolerant onboard computer has excellent data processing capability that can meet the erquirements of micro-satellite missions.

  2. A failure-distance dased method to bound the reliability of non-repairable fault-tolerant systems without the knowledge of minimal cuts

    Suñé, Víctor; Carrasco, Juan A.

    2001-01-01

    CTMC (continuous-time Markov chains) are a commonly used formalism for modeling fault-tolerant systems. One of the major drawbacks of CTMC is the well-known state-space explosion problem. This work develops and analyzes a method (SC-BM) to compute bounds for the reliability of non-repairable fault-tolerant systems in which only a portion of the state space of the CTMC is generated. SC-BM uses the failure distance concept as the method described in [1] but, unlike that method, w...

  3. Incorporating Fault Tolerance Tactics in Software Architecture Patterns

    Harrison, Neil B.; Avgeriou, Paris

    2008-01-01

    One important way that an architecture impacts fault tolerance is by making it easy or hard to implement measures that improve fault tolerance. Many such measures are described as fault tolerance tactics. We studied how various fault tolerance tactics can be implemented in the best-known architecture patterns. This shows that certain patterns are better suited to implementing fault tolerance tactics than others, and that certain alternate tactics are better matches than others for a given pat...

  4. USAGE OF STANDARD PERSONAL COMPUTER PORTS FOR DESIGNING OF THE DOUBLE REDUNDANT FAULT-TOLERANT COMPUTER CONTROL SYSTEMS

    Rafig SAMEDOV

    2005-01-01

    Full Text Available In this study, for designing of the fault-tolerant control systems by using standard personal computers, the ports have been investigated, different structure versions have been designed and the method for choosing of an optimal structure has been suggested. In this scope, first of all, the ÇİFTYAK system has been defined and its work principle has been determined. Then, data transmission ports of the standard personal computers have been classified and analyzed. After that, the structure versions have been designed and evaluated according to the used data transmission methods, the numbers of ports and the criterions of reliability, performance, truth, control and cost. Finally, the method for choosing of the most optimal structure version has been suggested.

  5. Enhancement of Fault Tolerance in Cloud Computing

    Pushpanjali Gupta

    2014-08-01

    Full Text Available In recent years researchers are trying to work out scientific applications in cloud so that it decreases the infrastructure cost and increases the span of team and finally innovative ideas towards applications is increased. But the cloud is still not as much reliable, controllable as grid. So in the evolving Cloud computing environment there is a great need of fault tolerance mechanism for the system to work effectively even in the presence of failure. Moreover Big Organizations are also opting for using Hybrid Cloud instead of private Cloud. Thus, in this paper we propose an approach of using a new framework in Cloud so as to use Cloud for scientific applications as well makes the public Cloud trustworthy platform. There is a progressive approach introduced to provide an effective way to achieve high fault tolerance in Clouds by enabling a new workflow planning method to balance performance, reliability and cost for critical scientific applications and focus mainly on use of distributed resources for workflow execution mainly in serial and concurrent manner.

  6. ENHANCEMENT OF FAULT TOLERANCE IN CLOUD COMPUTING

    Pushpanjali Gupta

    2015-10-01

    Full Text Available In recent years researchers are trying to work out scientific applications in cloud so that it decreases the infrastructure cost and increases the span of team and finally innovative ideas towards applications is increased. But the cloud is still not as much reliable, controllable as grid. So in the evolving Cloud computing environment there is a great need of fault tolerance mechanism for the system to work effectively even in the presence of failure. Moreover Big Organizations are also opting for using Hybrid Cloud instead of private Cloud. Thus, in this paper we propose an approach of using a new framework in Cloud so as to use Cloud for scientific applications as well makes the public Cloud trustworthy platform. There is a progressive approach introduced to provide an effective way to achieve high fault tolerance in Clouds by enabling a new workflow planning method to balance performance, reliability and cost for critical scientific applications and focus mainly on use of distributed resources for workflow execution mainly in serial and concurrent manner.

  7. Simulation modeling based method for choosing an effective set of fault tolerance mechanisms for real-time avionics systems

    Bakhmurov, A. G.; Balashov, V. V.; Glonina, A. B.; Pashkov, V. N.; Smeliansky, R. L.; Volkanov, D. Yu.

    2013-12-01

    In this paper, the reliability allocation problem (RAP) for real-time avionics systems (RTAS) is considered. The proposed method for solving this problem consists of two steps: (i) creation of an RTAS simulation model at the necessary level of abstraction and (ii) application of metaheuristic algorithm to find an optimal solution (i. e., to choose an optimal set of fault tolerance techniques). When during the algorithm execution it is necessary to measure the execution time of some software components, the simulation modeling is applied. The procedure of simulation modeling also consists of the following steps: automatic construction of simulation model of the RTAS configuration and running this model in a simulation environment to measure the required time. This method was implemented as an experimental software tool. The tool works in cooperation with DYANA simulation environment. The results of experiments with the implemented method are presented. Finally, future plans for development of the presented method and tool are briefly described.

  8. DESIGN AND IMPLEMENTATION OF PROCESS FAULT-TOLERANT SYSTEM FOR HIGH-PERFORMANCE FAULT-TOLERANT COMPUTER%面向高端容错计算机的进程容错系统设计与实现

    吴楠; 张东; 刘璧怡

    2013-01-01

    High-end fault-tolerant computers are mainly used in key sectors such as banking and telecommunications, and are extremely sensitive to failure, so it is extremely important to guarantee the availability of their key processes. Common mechanism of fault-tolerant is mainly realised based on static structural redundancy principle, but the redundancy in hardware layer costs high and is complex in execution, while the redundancy in application layer is of low versatile. This paper proposes a fault-tolerant mechanism and policy based on process redundancy, which constructs dual-modular redundancy or multi-modular redundancy on key application processes. The method employs the means of interprocess synchronisation to ensure the operation of redundancy processes based on the same execution logic, supervises the system and makes corresponding error handing on different faults. Compared with traditional fault-tolerant way, the process fault-tolerant management system has the characteristics of high versatility and low cost, can effectively ensure high reliability of the system with less performance lost and avoid the complexity in hardware customisation at the same time, while it keeps the transparent to applications and users as well.%高端容错计算机主要应用于银行、电信等关键领域中,对于系统失效极其敏感,保证系统关键进程的可靠性至关重要.常见的容错机制主要依据静态结构冗余原理实现,然而硬件层的冗余成本很高且实现复杂,应用软件层的冗余则不具有通用性.提出一种基于进程冗余的容错机制和策略,对关键进程构造双模冗余或多模冗余,采用进程间同步等手段确保冗余进程按照同样的执行逻辑运行,监控系统并对不同的错误进行相应的错误处理.与传统的容错方式相比,进程容错管理系统具有通用性高、成本低等特点,能在较小的性能损耗下有效地保证系统的高可靠性,同时避

  9. Fault-tolerant building-block computer study

    Rennels, D. A.

    1978-01-01

    Ultra-reliable core computers are required for improving the reliability of complex military systems. Such computers can provide reliable fault diagnosis, failure circumvention, and, in some cases serve as an automated repairman for their host systems. A small set of building-block circuits which can be implemented as single very large integration devices, and which can be used with off-the-shelf microprocessors and memories to build self checking computer modules (SCCM) is described. Each SCCM is a microcomputer which is capable of detecting its own faults during normal operation and is described to communicate with other identical modules over one or more Mil Standard 1553A buses. Several SCCMs can be connected into a network with backup spares to provide fault-tolerant operation, i.e. automated recovery from faults. Alternative fault-tolerant SCCM configurations are discussed along with the cost and reliability associated with their implementation.

  10. Design and Verification of Fault-Tolerant Components

    Zhang, Miaomiao; Liu, Zhiming; Ravn, Anders Peter; Morisset, Charles

    2009-01-01

    We present a systematic approach to design and verification of fault-tolerant components with real-time properties as found in embedded systems. A state machine model of the correct component is augmented with internal transitions that represent hypothesized faults. Also, constraints on the...... occurrence or timing of faults are included in this model. This model of a faulty component is then extended with fault detection and recovery mechanisms, again in the form of state machines. Desired properties of the component are model checked for each of the successive models. The models can be made...... relatively detailed such that they can serve directly as blueprints for engineering, and yet be amenable to exhaustive verication. The approach is illustrated with a design of a triple modular fault-tolerant system that is a real case we received from our collaborators in the aerospace field. We use UPPAAL...

  11. Scheduling and Voltage Scaling for Energy/Reliability Trade-offs in Fault-Tolerant Time-Triggered Embedded Systems

    Pop, Paul; Poulsen, Kåre Harbo; Izosimov, Viacheslav; Eles, Petru

    -execution and dynamic voltage scaling-based low-power techniques are competing for the slack in the schedules. Our approach decides the voltage levels and start times of processes and the transmission times of messages, such that the transient faults are tolerated, the timing constraints of the application are...

  12. Computer aided reliability, availability, and safety modeling for fault-tolerant computer systems with commentary on the HARP program

    Shooman, Martin L.

    1991-01-01

    Many of the most challenging reliability problems of our present decade involve complex distributed systems such as interconnected telephone switching computers, air traffic control centers, aircraft and space vehicles, and local area and wide area computer networks. In addition to the challenge of complexity, modern fault-tolerant computer systems require very high levels of reliability, e.g., avionic computers with MTTF goals of one billion hours. Most analysts find that it is too difficult to model such complex systems without computer aided design programs. In response to this need, NASA has developed a suite of computer aided reliability modeling programs beginning with CARE 3 and including a group of new programs such as: HARP, HARP-PC, Reliability Analysts Workbench (Combination of model solvers SURE, STEM, PAWS, and common front-end model ASSIST), and the Fault Tree Compiler. The HARP program is studied and how well the user can model systems using this program is investigated. One of the important objectives will be to study how user friendly this program is, e.g., how easy it is to model the system, provide the input information, and interpret the results. The experiences of the author and his graduate students who used HARP in two graduate courses are described. Some brief comparisons were made with the ARIES program which the students also used. Theoretical studies of the modeling techniques used in HARP are also included. Of course no answer can be any more accurate than the fidelity of the model, thus an Appendix is included which discusses modeling accuracy. A broad viewpoint is taken and all problems which occurred in the use of HARP are discussed. Such problems include: computer system problems, installation manual problems, user manual problems, program inconsistencies, program limitations, confusing notation, long run times, accuracy problems, etc.

  13. FTMP (Fault Tolerant Multiprocessor) programmer's manual

    Feather, F. E.; Liceaga, C. A.; Padilla, P. A.

    1986-01-01

    The Fault Tolerant Multiprocessor (FTMP) computer system was constructed using the Rockwell/Collins CAPS-6 processor. It is installed in the Avionics Integration Research Laboratory (AIRLAB) of NASA Langley Research Center. It is hosted by AIRLAB's System 10, a VAX 11/750, for the loading of programs and experimentation. The FTMP support software includes a cross compiler for a high level language called Automated Engineering Design (AED) System, an assembler for the CAPS-6 processor assembly language, and a linker. Access to this support software is through an automated remote access facility on the VAX which relieves the user of the burden of learning how to use the IBM 4381. This manual is a compilation of information about the FTMP support environment. It explains the FTMP software and support environment along many of the finer points of running programs on FTMP. This will be helpful to the researcher trying to run an experiment on FTMP and even to the person probing FTMP with fault injections. Much of the information in this manual can be found in other sources; we are only attempting to bring together the basic points in a single source. If the reader should need points clarified, there is a list of support documentation in the back of this manual.

  14. Fault-Tolerant Mechanism of the Distributed Cluster Computers"

    SHANG Yizi; JIN Yang; WU Baosheng

    2007-01-01

    The distributed system with high performance and stability is commonly adopted in large scale scientific and engineering computing. In this paper, we discuss a fault-tolerant mechanism under Linux circumstance to improve the fault-tolerant ability of the system, namely a scheme and frame to form the stable computing platform. In terms of the structure and function of the distributed system, active list and file invocation strategies are employed in the task management. System multilevel fault-tolerance can be achieved by repeated processes in a single node and task migration on multi-nodes. Manager node agent introduced in this paper administrates the nodes using the list, disposes of the tasks according to the nodes'performance, and hence, to be able to make full use of the cluster resources. An evaluation method is proposed to appraise the performance. The analyzed results show the usefulness of the scheme proposed except for some additional overhead of memory consumption.

  15. Fault Tolerant Parallel Filters Based On Bch Codes

    K.Mohana Krishna

    2015-04-01

    Full Text Available Digital filters are used in signal processing and communication systems. In some cases, the reliability of those systems is critical, and fault tolerant filter implementations are needed. Over the years, many techniques that exploit the filters’ structure and properties to achieve fault tolerance have been proposed. As technology scales, it enables more complex systems that incorporate many filters. In those complex systems, it is common that some of the filters operate in parallel, for example, by applying the same filter to different input signals. Recently, a simple technique that exploits the presence of parallel filters to achieve multiple fault tolerance has been presented. In this brief, that idea is generalized to show that parallel filters can be protected using Bose– Chaudhuri–Hocquenghem codes (BCH in which each filter is the equivalent of a bit in a traditional ECC. This new scheme allows more efficient protection when the number of parallel filters is large.

  16. Energy/Reliability Trade-offs in Fault-Tolerant Event-Triggered Distributed Embedded Systems

    Gan, Junhe; Gruian, Flavius; Pop, Paul; Madsen, Jan

    2011-01-01

    reliability simultaneously is especially challenging, since lowering the voltage to reduce the energy consumption has been shown to increase the transient fault rate. We presented a Tabu Search-based approach which uses an energy/reliability trade-off model to find reliable and schedulable implementations...

  17. A continuous-time semi-markov bayesian belief network model for availability measure estimation of fault tolerant systems

    Márcio das Chagas Moura

    2008-08-01

    Full Text Available In this work it is proposed a model for the assessment of availability measure of fault tolerant systems based on the integration of continuous time semi-Markov processes and Bayesian belief networks. This integration results in a hybrid stochastic model that is able to represent the dynamic characteristics of a system as well as to deal with cause-effect relationships among external factors such as environmental and operational conditions. The hybrid model also allows for uncertainty propagation on the system availability. It is also proposed a numerical procedure for the solution of the state probability equations of semi-Markov processes described in terms of transition rates. The numerical procedure is based on the application of Laplace transforms that are inverted by the Gauss quadrature method known as Gauss Legendre. The hybrid model and numerical procedure are illustrated by means of an example of application in the context of fault tolerant systems.Neste trabalho, é proposto um modelo baseado na integração entre processos semi-Markovianos e redes Bayesianas para avaliação da disponibilidade de sistemas tolerantes à falha. Esta integração resulta em um modelo estocástico híbrido o qual é capaz de representar as características dinâmicas de um sistema assim como tratar as relações de causa e efeito entre fatores externos tais como condições ambientais e operacionais. Além disso, o modelo híbrido permite avaliar a propagação de incerteza sobre a disponibilidade do sistema. É também proposto um procedimento numérico para a solução das equações de probabilidade de estado de processos semi-Markovianos descritos por taxas de transição. Tal procedimento numérico é baseado na aplicação de transformadas de Laplace que são invertidas pelo método de quadratura Gaussiana conhecido como Gauss Legendre. O modelo híbrido e procedimento numérico são ilustrados por meio de um exemplo de aplicação no contexto de

  18. Electronic Power Switch for Fault-Tolerant Networks

    Volp, J.

    1987-01-01

    Power field-effect transistors reduce energy waste and simplify interconnections. Current switch containing power field-effect transistor (PFET) placed in series with each load in fault-tolerant power-distribution system. If system includes several loads and supplies, switches placed in series with adjacent loads and supplies. System of switches protects against overloads and losses of individual power sources.

  19. SIFT - Multiprocessor architecture for Software Implemented Fault Tolerance flight control and avionics computers

    Forman, P.; Moses, K.

    1979-01-01

    A brief description of a SIFT (Software Implemented Fault Tolerance) Flight Control Computer with emphasis on implementation is presented. A multiprocessor system that relies on software-implemented fault detection and reconfiguration algorithms is described. A high level reliability and fault tolerance is achieved by the replication of computing tasks among processing units.

  20. Learning Fault-tolerant Speech Parsing with SCREEN

    Wermter, S; Wermter, Stefan; Weber, Volker

    1994-01-01

    This paper describes a new approach and a system SCREEN for fault-tolerant speech parsing. SCREEEN stands for Symbolic Connectionist Robust EnterprisE for Natural language. Speech parsing describes the syntactic and semantic analysis of spontaneous spoken language. The general approach is based on incremental immediate flat analysis, learning of syntactic and semantic speech parsing, parallel integration of current hypotheses, and the consideration of various forms of speech related errors. The goal for this approach is to explore the parallel interactions between various knowledge sources for learning incremental fault-tolerant speech parsing. This approach is examined in a system SCREEN using various hybrid connectionist techniques. Hybrid connectionist techniques are examined because of their promising properties of inherent fault tolerance, learning, gradedness and parallel constraint integration. The input for SCREEN is hypotheses about recognized words of a spoken utterance potentially analyzed by a spe...

  1. Fault Tolerant Heterogeneous Limited Duplication Scheduling algorithm for Decentralized Grid

    DR. NITIN

    2013-04-01

    Full Text Available Fault tolerance is one of the most desirable property in decentralized grid computing systems, where computational resources are geographically distributed. These resources collaborate in order to execute workflow applications as fast as possible. In workflow applications, tasks are dependent on each other, so it becomes extremely vital that scheduling techniques should also have some decentralized fault tolerant mechanism. In this paper, we have proposed a decentralized fault tolerant mechanism which utilize the checkpoint concept; for Heterogeneous Limited Duplication (HLD algorithm. HLD is based on task duplication scheduling in heterogeneous environment. There are two fold benefits firstly; if node failure occurs then rest of grid nodes sustain the execution of application. Secondly, less makespan of application is obtained using checkpoint concept. Therefore, application scheduled over decentralized grid systems (which are known for their unreliable behavior will yield results fast utilizing algorithm proposed in this paper.

  2. Adding Fault Tolerance to NPB Benchmarks Using ULFM

    Parchman, Zachary W [Tennessee Technological University (TTU); Vallee, Geoffroy R [ORNL; Naughton III, Thomas J [ORNL; Engelmann, Christian [ORNL; Bernholdt, David E [ORNL; Scott, Stephen L [Tennessee Technological University (TTU)

    2016-01-01

    In the world of high-performance computing, fault tolerance and application resilience are becoming some of the primary concerns because of increasing hardware failures and memory corruptions. While the research community has been investigating various options, from system-level solutions to application-level solutions, standards such as the Message Passing Interface (MPI) are also starting to include such capabilities. The current proposal for MPI fault tolerant is centered around the User-Level Failure Mitigation (ULFM) concept, which provides means for fault detection and recovery of the MPI layer. This approach does not address application-level recovery, which is currently left to application developers. In this work, we present a mod- ification of some of the benchmarks of the NAS parallel benchmark (NPB) to include support of the ULFM capabilities as well as application-level strategies and mechanisms for application-level failure recovery. As such, we present: (i) an application-level library to checkpoint and restore data, (ii) extensions of NPB benchmarks for fault tolerance based on different strategies, (iii) a fault injection tool, and (iv) some preliminary results that show the impact of such fault tolerant strategies on the application execution.

  3. Analysis of a hardware and software fault tolerant processor for critical applications

    Dugan, Joanne B.

    1993-01-01

    Computer systems for critical applications must be designed to tolerate software faults as well as hardware faults. A unified approach to tolerating hardware and software faults is characterized by classifying faults in terms of duration (transient or permanent) rather than source (hardware or software). Errors arising from transient faults can be handled through masking or voting, but errors arising from permanent faults require system reconfiguration to bypass the failed component. Most errors which are caused by software faults can be considered transient, in that they are input-dependent. Software faults are triggered by a particular set of inputs. Quantitative dependability analysis of systems which exhibit a unified approach to fault tolerance can be performed by a hierarchical combination of fault tree and Markov models. A methodology for analyzing hardware and software fault tolerant systems is applied to the analysis of a hypothetical system, loosely based on the Fault Tolerant Parallel Processor. The models consider both transient and permanent faults, hardware and software faults, independent and related software faults, automatic recovery, and reconfiguration.

  4. Design Approach for Fault Tolerance in FPGA Architecture

    Ms. Shweta S. Meshram

    2011-03-01

    Full Text Available Failures of nano-metric technologies owing to defects and shrinking process tolerances give rise tosignificant challenges for IC testing. In recent years the application space of reconfigurable devices hasgrown to include many platforms with a strong need for fault tolerance. While these systems frequentlycontain hardware redundancy to allow for continued operation in the presence of operational faults, theneed to recover faulty hardware and return it to full functionality quickly and efficiently is great. Inaddition to providing functional density, FPGAs provide a level of fault tolerance generally not found inmask-programmable devices by including the capability to reconfigure around operational faults in thefield. Reliability and process variability are serious issues for FPGAs in the future. With advancement inprocess technology, the feature size is decreasing which leads to higher defect densities, moresophisticated techniques at increased costs are required to avoid defects. If nano-technology fabricationare applied the yield may go down to zero as avoiding defect during fabrication will not be a feasibleoption Hence, feature architecture have to be defect tolerant. In regular structure like FPGA, redundancyis commonly used for fault tolerance. In this work we present a solution in which configuration bit-streamof FPGA is modified by a hardware controller that is present on the chip itself. The technique usesredundant device for replacing faulty device and increases the yield.

  5. Design Approach for Fault Tolerance in FPGA Architecture

    Ms. Shweta S. Meshram

    2011-03-01

    Full Text Available Failures of nano-metric technologies owing to defects and shrinking process tolerances give rise to significant challenges for IC testing. In recent years the application space of reconfigurable devices has grown to include many platforms with a strong need for fault tolerance. While these systems frequently contain hardware redundancy to allow for continued operation in the presence of operational faults, the need to recover faulty hardware and return it to full functionality quickly and efficiently is great. In addition to providing functional density, FPGAs provide a level of fault tolerance generally not found in mask-programmable devices by including the capability to reconfigure around operational faults in the field. Reliability and process variability are serious issues for FPGAs in the future. With advancement in process technology, the feature size is decreasing which leads to higher defect densities, more sophisticated techniques at increased costs are required to avoid defects. If nano-technology fabrication are applied the yield may go down to zero as avoiding defect during fabrication will not be a feasible option Hence, feature architecture have to be defect tolerant. In regular structure like FPGA, redundancy is commonly used for fault tolerance. In this work we present a solution in which configuration bit-stream of FPGA is modified by a hardware controller that is present on the chip itself. The technique uses redundant device for replacing faulty device and increases the yield.

  6. Fault Tolerant Control for Civil Structures Based on LMI Approach

    Chunxu Qu

    2013-01-01

    Full Text Available The control system may lose the performance to suppress the structural vibration due to the faults in sensors or actuators. This paper designs the filter to perform the fault detection and isolation (FDI and then reforms the control strategy to achieve the fault tolerant control (FTC. The dynamic equation of the structure with active mass damper (AMD is first formulated. Then, an estimated system is built to transform the FDI filter design problem to the static gain optimization problem. The gain is designed to minimize the gap between the estimated system and the practical system, which can be calculated by linear matrix inequality (LMI approach. The FDI filter is finally used to isolate the sensor faults and reform the FTC strategy. The efficiency of FDI and FTC is validated by the numerical simulation of a three-story structure with AMD system with the consideration of sensor faults. The results show that the proposed FDI filter can detect the sensor faults and FTC controller can effectively tolerate the faults and suppress the structural vibration.

  7. False alarms in fault-tolerant dominating sets in graphs

    Mateusz Nikodem

    2012-01-01

    Full Text Available We develop the problem of fault-tolerant dominating sets (liar's dominating sets in graphs. Namely, we consider a new kind of fault - a false alarm. Characterization of such fault-tolerant dominating sets in three different cases (dependent on the classification of the types of the faults are presented.

  8. Fault Tolerant System for Sparse Traffic Grooming in Optical WDM Mesh Networks Using Combiner Queue

    Sandip R. Shinde

    2016-03-01

    Full Text Available Queuing theory is an important concept in current internet technology. As the requirement of bandwidth goes on increasing it is necessary to use optical communication for transfer of data. Optical communication at backbone network requires various devices for traffic grooming. The cost of these devices is very high which leads to increase in the cost of network. One of the solutions to this problem is to have sparse traffic grooming in optical WDM mesh network. Sparse traffic grooming allows only few nodes in the network as grooming node (G-node. These G-nodes has the grooming capability and other nodes are simple nodes where traffic grooming is not possible. The grooming nodes are the special nodes which has high cost. The possibility of faults at such a node, or link failure is high. Resolving such faults and providing efficient network is very important. So we have importance of such survivable sparse traffic grooming network. Queuing theory helps to improve the result of network and groom the traffic in the network. The paper focuses on the improvement in performance of the backbone network and reduction in blocking probability. To achieve the goals of the work we have simulated the model. The main contribution is to use survivability on the sparse grooming network and use of combiner queues at each node. It has observed that Combiner queuing alone does the job of minimizing blocking probability and balancing the load over the network. The model is not only cost effective but also increases the performance of network and minimizes the call blocking probability

  9. Fault-tolerance for MPI Codes on Computational Clusters

    Hagen, Knut Imar

    2007-01-01

    This thesis focuses on fault-tolerance for MPI codes on computational clusters. When an application runs on a very large cluster with thousands of processors, there is likely that a process crashes due to a hardware or software failure. Fault-tolerance is the ability of a system to respond gracefully to an unexpected hardware or software failure. A test application which is meant to run for several weeks on several nodes is used in this thesis. The application is a seismic MPI application, w...

  10. Rule-based fault-tolerant flight control

    Handelman, Dave

    1988-01-01

    Fault tolerance has always been a desirable characteristic of aircraft. The ability to withstand unexpected changes in aircraft configuration has a direct impact on the ability to complete a mission effectively and safely. The possible synergistic effects of combining techniques of modern control theory, statistical hypothesis testing, and artificial intelligence in the attempt to provide failure accommodation for aircraft are investigated. This effort has resulted in the definition of a theory for rule based control and a system for development of such a rule based controller. Although presented here in response to the goal of aircraft fault tolerance, the rule based control technique is applicable to a wide range of complex control problems.